面向本征图像分解的高质量渲染数据集与非局部卷积网络
High quality rendered dataset and non-local graph convolutional network for intrinsic image decomposition
- 2022年27卷第2期 页码:404-420
纸质出版日期: 2022-02-16 ,
录用日期: 2021-12-01
DOI: 10.11834/jig.210705
移动端阅览
浏览全部资源
扫码关注微信
纸质出版日期: 2022-02-16 ,
录用日期: 2021-12-01
移动端阅览
王玉洁, 樊庆楠, 李坤, 陈冬冬, 杨敬钰, 卢健智, Dani Lischinski, 陈宝权. 面向本征图像分解的高质量渲染数据集与非局部卷积网络[J]. 中国图象图形学报, 2022,27(2):404-420.
Yujie Wang, Qingnan Fan, Kun Li, Dongdong Chen, Jingyu Yang, Jianzhi Lu, Lischinski Dani, Baoquan Chen. High quality rendered dataset and non-local graph convolutional network for intrinsic image decomposition[J]. Journal of Image and Graphics, 2022,27(2):404-420.
目的
2
本征图像分解是计算视觉和图形学领域的一个基本问题,旨在将图像中场景的纹理和光照成分分离开来。基于深度学习的本征图像分解方法受限于现有的数据集,存在分解结果过度平滑、在真实数据泛化能力较差等问题。
方法
2
首先设计基于图卷积的模块,显式地考虑图像中的非局部信息。同时,为了使训练的网络可以处理更复杂的光照情况,渲染了高质量的合成数据集。此外,引入了一个基于神经网络的反照率图像优化模块,提升获得的反照率图像的局部平滑性。
结果
2
将不同方法在所提的数据集上训练,相比之前合成数据集CGIntrinsics进行训练的结果,在IIW(intrinsic images in the wild)测试数据集的平均WHDR(weighted human disagreement rate)降低了7.29%,在SAW(shading annotations in the wild)测试集的AP(average precision)指标上提升了2.74%。同时,所提出的基于图卷积的神经网络,在IIW、SAW数据集上均取得了较好的结果,在视觉结果上显著优于此前的方法。此外,利用本文算法得到的本征结果,在重光照、纹理编辑和光照编辑等图像编辑任务上,取得了更优的结果。
结论
2
所提出的数据集质量更高,有利于基于神经网络的本征分解模型的训练。同时,提出的本征分解模型由于显式地结合了非局部先验,得到了更优的本征分解结果,并通过一系列应用任务进一步验证了结果。
Objective
2
Intrinsic decomposition is a key problem in computer vision and graphics applications. It aims at separating lighting effects and material-oriented characteristics of object surfaces of the depicted scene within the image. Intrinsic decomposition from a single input image is highly ill-posed since the amount of unknowns is twice of the known values. Most classical approaches model intrinsic decomposition task with handcrafted priors to generate reasonable decomposition results. But they perform poorly in complicated scenarios as the prior knowledge is too limited to model complicated light-material interactions in real-world scenes. Deep neural network based methods can automatically learn the knowledge from data to avoid using handcrafted priors to model the task. However
due to the dependency on training datasets
the performance of current deep learning based methods is still limited because of various constraints in the current intrinsic datasets. Moreover
the learned networks tend to suffer from poor generalization once there is a large difference between the training and target domain. Another issue of deep neural network based methods is that the limited receptive field probably constrains the ability of the models to exploit the non-local information in the intrinsic component prediction process.
Method
2
A graph convolution based module is designed to fully utilize the non-local cues within the input feature space. The module takes a feature map as input and outputs a feature map with same resolution as the input feature map. For producing the output feature vector for each position
the module uses information that includes the feature of itself
the information extracted from the local neighborhood and the information aggregated from the non-local neighbors that are likely to be very distant. The full intrinsic decomposition framework is constructed by integrating the devised non-local feature learning module into a U-Net network. In addition
to improve the piece-wise smoothness of the produced albedo results
we incorporate a neural network based image refinement module into the full pipeline
which is able to adaptively remove unnecessary artifacts while preserving structural information within the scenes depicted in input images. Simultaneously
there are noticeable limitations in existing intrinsic image datasets including limited sample amount
unrealistic scene and achromatic lighting in shading and sparse annotations
which will cause generalization issues for deep learning models and limit the decomposition performance as well. A new photorealistic rendered dataset for intrinsic image decomposition is proposed
which is rendered by leveraging large-scale 3D indoor scene models
along with high-quality textures and lighting to simulate the real-world environment. The chromatic shading components are first implemented.
Result
2
To validate the effectiveness of the proposed dataset
several state-of-the-art methods are trained on both the proposed dataset and CGIntrinsics dataset
a previously proposed dataset
and tested on intrinsic image evaluation benchmarks
i.e.
intrinsie images in the wild (IIW)/shading annotations in the wild (SAW) test sets. Compared to the variants trained on CGIntrinsics dataset
the variants trained on the proposed dataset demonstrate a 7.29% improvement in averaging weighted human disagreement rate (WHDR) on IIW test set and a 2.74% gain for average precision (AP) on SAW test set. Simultaneously
the proposed graph convolution based network achieves comparable quantitative results on both IIW and SAW test sets and gets significantly better qualitative results. To further investigate the intrinsic decomposition quality for different methods
a number of application tasks including re-lighting and texture/lighting editing are conducted utilizing the generated intrinsic components. The proposed method demonstrates more promising application effects comparing with two state-of-the-art methods
further highlighting its superiority and application potential.
Conclusion
2
Based on the non-local priors in classical methods for intrinsic image decomposition
a graph convolutional network for intrinsic decomposition is proposed
in which non-local cues are utilized. To mitigate the issues existed in current intrinsic image datasets
a new high quality photorealistic dataset is rendered
which provides dense labels for albedo and shading. The depicted scenes in the images of the proposed dataset have complicated textures and illuminations that closely approximate general indoor scenes in reality
which helps to mitigate the domain gap issues. The shading labels in this dataset first consider chromatic lighting
which allows the neural networks to better separate material properties and lighting effects
especially for the effects introduced by inter-reflections between diffuse surfaces. The decomposition results of both the proposed method and two current state-of-the-art methods are applied to a range of application scenarios
visually demonstrating the superior decomposition quality and application potentials of the proposed method.
图像处理图像理解本征图像分解图卷积网络(GCN)合成数据集
image processingimage understandingintrinsic image decompositiongraph convolutional neural network(GCN)synthetic dataset
Barron J T and Malik J. 2015. Shape, illumination, and reflectance from shading. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(8): 1670-1687[DOI: 10.1109/TPAMI.2014.2377712]
Barrow H G and Tenenbaum J M. 1978. Recovering intrinsic scene characteristics from images//Hanson A and Riseman E, eds. Computer Vision Systems. New York: Academic Press
Baslamisli A S, Groenestege T T, Das P, Le H A, Karaoglu S and Gevers T. 2018. Joint learning of intrinsic images and semantic segmentation//Proceedings of the 15th European Conference on Computer Vision (ECCV). Munich, Germany: Springer: 289-305[DOI: 10.1007/978-3-030-01231-1_18http://dx.doi.org/10.1007/978-3-030-01231-1_18]
Bell S, Bala K and Snavely N. 2014. Intrinsic images in the wild. ACM Transactions on Graphics, 33(4): #159[DOI: 10.1145/2601097.2601206]
Bi S, Han X G and Yu Y Z. 2015. AnL1image transform for edge-preserving smoothing and scene-level intrinsic decomposition. ACM Transactions on Graphics, 34(4): #78[DOI: 10.1145/2766946].
Boyadzhiev I, Paris S and Bala. 2013. User-assisted image compositing for photographic lighting. 2013. ACM Transactions on Graphics, 32(4): 1-12[DOI: 10.1145/2461912.2461973]
Bruna J, Zaremba, Szlam A and Lecun Y. 2013. Spectral networks and locally connected networks on graphs. [EB/OL]. [2021-08-16].https://arxiv.org/pdf/1312.6203.pdfhttps://arxiv.org/pdf/1312.6203.pdf
Butler D J, J Wulff, Stanley G B and Black M J. 2012. A naturalistic open source movie for optical flow evaluation. //Proceedings of 2012 European Conference on Computer Vision (ECCV). Florence, Italy: Springer: 611-625[DOI: 10.1007/978-3-642-33783-3_44http://dx.doi.org/10.1007/978-3-642-33783-3_44]
Chen Q F and Koltun V. 2013. A simple model for intrinsic image decomposition with depth cues//Proceedings of 2013 IEEE International Conference on Computer Vision. Sydney, Australia: IEEE: 241-248[DOI: 10.1109/ICCV.2013.37http://dx.doi.org/10.1109/ICCV.2013.37]
Fan Q N, Yang J L, Hua G, Chen B Q and Wipf D. 2018. Revisiting deep intrinsic image decompositions//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 8944-8952[DOI: 10.1109/CVPR.2018.00932http://dx.doi.org/10.1109/CVPR.2018.00932]
Garces E, Munoz A, Lopez-Moreno J and Gutierrez D. 2012. Intrinsic images by clustering. Computer Graphics Forum, 31(4): 1415-1424[DOI: 10.1111/j.1467-8659.2012.03137.x]
Gehler P V, Rother C, Kiefel M, Zhang L M and Schölkopf B. 2011. Recovering intrinsic images with a global sparsity prior on reflectance//Proceedings of the 24th International Conference on Neural Information Processing Systems. Granada, Spain: Curran Associates Inc: 765-773
Grosse R, Johnson M K, Adelson E H and Freeman W T. 2009. Ground truth dataset and baseline evaluations for intrinsic image algorithms//Proceedings of the 12th IEEE International Conference on Computer Vision. Kyoto, Japan: IEEE: 2335-2342[DOI: 10.1109/ICCV.2009.5459428http://dx.doi.org/10.1109/ICCV.2009.5459428]
Hamilton W L, Ying R and Leskovec J. 2017. Inductive representation learning on large graphs//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, USA: Curran Associates Inc: 1025-1035
Henaff M, Bruna J and LeCun Y. 2015. Deep convolutional networks on graph-structured data[EB/OL]. [2021-08-16].https://arxiv.org/pdf/1506.05163.pdfhttps://arxiv.org/pdf/1506.05163.pdf
Kovacs B, Bell S, Snavely N and Bala K. 2017. Shading annotations in the wild//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 850-859[DOI: 10.1109/CVPR.2017.97http://dx.doi.org/10.1109/CVPR.2017.97]
Land E H and McCann J J. 1971. Lightness and retinex theory. Journal of the Optical Society of America, 61(1): 1-11[DOI: 10.1364/JOSA.61.000001]
Li R Y, Wang S, Zhu F Y and Huang J Z. 2018a. Adaptive graph convolutional neural networks//Proceedings of the 32nd AAAI Conference on Artificial Intelligence. New Orleans, USA: AAAI Press: 3546-3553
Li W B, Saeedi S, McCormac J, Clark R, Tzoumanikas D, Ye Q, Huang Y Z, Tang R and Leutenegger S. 2018b. InteriorNet: mega-scale multi-sensor photo-realistic indoor scenes dataset//Proceedings of 2018 British Machine Vision Conference. Newcastle, UK: BMVA Press
Li Z Q and Snavely N. 2018a. CGIntrinsics: better intrinsic image decomposition through physically-based rendering//Proceedings of the 15th European Conference on Computer Vision (ECCV). Munich, Germany: Springer: 381-399[DOI: 10.1007/978-3-030-01219-9_23http://dx.doi.org/10.1007/978-3-030-01219-9_23]
Li Z Q and Snavely N. 2018b. Learning intrinsic image decomposition from watching the world//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 9039-9048[DOI: 10.1109/CVPR.2018.00942http://dx.doi.org/10.1109/CVPR.2018.00942]
Ma W C, Chu H, Zhou B L, Urtasun R and Torralba A. 2018. Single image intrinsic decomposition without a single intrinsic image//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 211-229[DOI: 10.1007/978-3-030-01264-9_13http://dx.doi.org/10.1007/978-3-030-01264-9_13]
Meka A, Zollhöfer M, Richardt C and Theobalt C. 2016. Live intrinsic video. ACM Transactions on Graphics, 35(4): #109[DOI: 10.1145/2897824.2925907]
Monti F, Boscaini D, Masci J, RodolàE, Svoboda J and Bronstein M M. 2017. Geometric deep learning on graphs and manifolds using mixture model cnns//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 5425-5434[DOI: 10.1109/CVPR.2017.576http://dx.doi.org/10.1109/CVPR.2017.576]
Narihira T, Maire M and Yu S X. 2015a. Learning lightness from human judgement on relative reflectance//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston, USA: IEEE: 2965-2973[DOI: 10.1109/CVPR.2015.7298915http://dx.doi.org/10.1109/CVPR.2015.7298915]
Narihira T, Maire M and Yu S X. 2015b. Direct intrinsics: learning albedo-shading decomposition by convolutional regression//Proceedings of 2015 IEEE International Conference on Computer Vision (ICCV). Santiago, Chile: IEEE: 2992[DOI: 10.1109/ICCV.2015.342http://dx.doi.org/10.1109/ICCV.2015.342]
Nestmeyer T and Gehler P V. 2017. Reflectance adaptive filtering improves intrinsic image estimation//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA:IEEE: 1771-1780[DOI: 10.1109/CVPR.2017.192http://dx.doi.org/10.1109/CVPR.2017.192]
Sha H and Liu Y. 2021. Review on deep learning based prediction of image intrinsic properties. Journal of Graphics, 42(3): 385-397
沙浩, 刘越. 2021. 基于深度学习的图像本征属性预测方法综述. 图学学报, 42(3): 385-397
Shen L, Tan P and Lin S. 2008. Intrinsic image decomposition with non-local texture cues//Proceedings of 2018 IEEE Conference on Computer Vision and Pattern Recognition. Anchorage: USA: IEEE: 1-7[DOI: 10.1109/CVPR.2008.4587660http://dx.doi.org/10.1109/CVPR.2008.4587660]
Shen L and Yeo C. 2011. Intrinsic images decomposition using a local and global sparse representation of reflectance//Proceedings of 2011 IEEE Conference on Computer Vision and Pattern Recognition. Colorado Springs, USA: IEEE: 697-704[DOI: 10.1109/CVPR.2011.5995738http://dx.doi.org/10.1109/CVPR.2011.5995738]
Shi J, Dong Y, Su H and Yu S X. 2017. Learning non-lambertian object intrinsics across ShapeNet categories//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 5844-5853[DOI: 10.1109/CVPR.2017.619http://dx.doi.org/10.1109/CVPR.2017.619]
Shi J, Dong Y, Tong X and Chen Y Y. 2015. Efficient intrinsic image decomposition for RGBD images//Proceedings of the 21st ACM Symposium on Virtual Reality Software and Technology. Beijing, China: ACM: 17-25[DOI: 10.1145/2821592.2821601http://dx.doi.org/10.1145/2821592.2821601]
Simonovsky M and Komodakis N. 2017. Dynamic edge-conditioned filters in convolutional neural networks on graphs//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA: IEEE: 29-38[DOI: 10.1109/CVPR.2017.11http://dx.doi.org/10.1109/CVPR.2017.11]
Sinha P and Adelson E. 1993. Recovering reflectance and illumination in a world of painted polyhedra//Proceedings of the 4th International Conference on Computer Vision. Berlin, Germany: IEEE: 156-163[DOI: 10.1109/ICCV.1993.378224http://dx.doi.org/10.1109/ICCV.1993.378224]
Song S R, Yu F, Zeng A, Chang A X, Savva M and Funkhouser T. 2017. Semantic scene completion from a single depth image//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA: IEEE: 190-198[DOI: 10.1109/CVPR.2017.28http://dx.doi.org/10.1109/CVPR.2017.28]
Wang N Y, Zhang Y D, Li Z W, Fu Y W, Liu W and Jiang Y G. 2018. Pixel2Mesh: generating 3D mesh models from single RGB images//Proceedings of the 15th European Conference on Computer Vision (ECCV). Munich, Germany: Springer: 55-71[DOI: 10.1007/978-3-030-01252-6_4http://dx.doi.org/10.1007/978-3-030-01252-6_4]
Wang Y J, Li K, Yang J Y and Ye X C. 2017. Intrinsic decomposition from a single RGB-D image with sparse and non-local priors//Proceedings of 2017 IEEE International Conference on Multimedia and Expo (ICME). Hong Kong, China: 1201-1206[DOI: 10.1109/ICME.2017.8019390http://dx.doi.org/10.1109/ICME.2017.8019390]
Yi L, Su H, Guo X W and Guibas L. 2017. SyncSpecCNN: synchronized spectral CNN for 3D shape segmentation//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Hawaii, USA: IEEE: 6584-6592[DOI: 10.1109/CVPR.2017.697http://dx.doi.org/10.1109/CVPR.2017.697]
Zhao Q, Tan P, Dai Q, Shen L, Wu E and Lin S. 2012. A closed-form solution to retinex with nonlocal texture constraints. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(7): 1437-1444[DOI: 10.1109/TPAMI.2012.77]
Zhou T H, Krahenbuhl P and Efros A A. 2015. Learning data-driven reflectance priors for intrinsic image decomposition//Proceedings of 2015 IEEE International Conference on Computer Vision (ICCV). Santiago, Chile: IEEE: 3469-3477[DOI: 10.1109/ICCV.2015.396http://dx.doi.org/10.1109/ICCV.2015.396]
相关作者
相关机构