基於拉普拉斯金字塔的深度人臉影像光影重建

Yu Tseng; 曾瑀

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/70884

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	莊永裕(Yung-Yu Chuang)
dc.contributor.author	Yu Tseng	en
dc.contributor.author	曾瑀	zh_TW
dc.date.accessioned	2021-06-17T04:42:20Z	-
dc.date.available	2021-08-10
dc.date.copyright	2018-08-10
dc.date.issued	2018
dc.date.submitted	2018-08-05
dc.identifier.citation	Bibliography [1] P. J. Burt and E. H. Adelson. The laplacian pyramid as a compact image code. In Readings in Computer Vision, pages 671–679. Elsevier, 1987. [2] P. Debevec, T. Hawkins, C. Tchou, H.-P. Duiker, W. Sarokin, and M. Sagar. Acquiring the reflectance field of a human face. In Proceedings of the 27th annual conference on Computer graphics and interactive techniques, pages 145–156. ACM Press/Addison-Wesley Publishing Co., 2000. [3] J. Gauthier. Conditional generative adversarial nets for convolutional face generation. Class Project for Stanford CS231N: Convolutional Neural Networks for Visual Recognition, Winter semester, 2014(5):2, 2014. [4] A. S. Georghiades, P. N. Belhumeur, and D. J. Kriegman. Illumination-based image synthesis: Creating novel images of human faces under differing pose and lighting. In Multi-View Modeling and Analysis of Visual Scenes, 1999.(MVIEW’99) Proceedings. IEEE Workshop on, pages 47–54. IEEE, 1999. [5] S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167, 2015. [6] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros. Image-to-image translation with conditional adversarial networks. arxiv, 2016. [7] J. Johnson, A. Alahi, and L. Fei-Fei. Perceptual losses for real-time style transfer and super-resolution. In European Conference on Computer Vision, pages 694–711. Springer, 2016. [8] A. Kae, K. Sohn, H. Lee, and E. Learned-Miller. Augmenting CRFs with Boltzmann machine shape priors for image labeling. 2013. [9] S. Liu, J. Yang, C. Huang, and M.-H. Yang. Multi-objective convolutional learning for face labeling. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015. [10] Z. Liu, Y. Shan, and Z. Zhang. Expressive expression mapping with ratio images. In Proceedings of the 28th annual conference on Computer graphics and interactive techniques, pages 271–276. ACM, 2001. [11] A. Oliva, A. Torralba, and P. G. Schyns. Hybrid images. In ACM Transactions on Graphics (TOG), volume 25, pages 527–532. ACM, 2006. [12] D. Pathak, P. Krahenbuhl, J. Donahue, T. Darrell, and A. A. Efros. Context encoders: Feature learning by inpainting. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2536–2544, 2016. [13] P. Paysan, R. Knothe, B. Amberg, S. Romdhani, and T. Vetter. A 3d face model for pose and illumination invariant face recognition. In Proceedings of the 6th IEEE International Conference on Advanced Video and Signal based Surveillance (AVSS) for Security, Safety and Monitoring in Smart Environments, Genova, Italy, 2009. IEEE. [14] O. Ronneberger, P. Fischer, and T. Brox. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention, pages 234–241. Springer, 2015. [15] S. Sengupta, A. Kanazawa, C. D. Castillo, and D. W. Jacobs. Sfsnet: Learning shape, refectance and illuminance of faces in the wild. In Computer Vision and Pattern Regognition (CVPR), 2018. [16] Y. Shih, S. Paris, C. Barnes, W. T. Freeman, and F. Durand. Style transfer for headshot portraits. In ACM Transactions on Graphics (TOG), volume 33, page 148. ACM, 2014. [17] Z. Shu, E. Yumer, S. Hadap, K. Sunkavalli, E. Shechtman, and D. Samaras. Neural face editing with intrinsic image disentangling. In Computer Vision and Pattern Recognition (CVPR), 2017 IEEE Conference on, pages 5444–5453. IEEE, 2017. [18] X. Wang and A. Gupta. Generative image modeling using style and structure adversarial networks. In European Conference on Computer Vision, pages 318–335. Springer, 2016. [19] Z. Wen, Z. Liu, and T. S. Huang. Face relighting with radiance environment maps. In null, page 158. IEEE, 2003. [20] B. Zhou, A. Lapedriza, J. Xiao, A. Torralba, and A. Oliva. Learning deep features for scene recognition using places database. In Advances in neural information processing systems, pages 487–495, 2014. [21] X. Zhu, Z. Lei, X. Liu, H. Shi, and S. Z. Li. Face alignment across large poses: A 3d solution. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 146–155, 2016.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/70884	-
dc.description.abstract	這篇論文提出了一個人臉的光影重建系統，將參考的人臉影像之打光模式重建到目標人臉影像上。我們提出了一個深度學習的架構，並使用有兩個輸入分支的U-Net模組，使我們可以同時對參考人臉影像以及目標人臉影像進行學習。我們的學習目標不定在色彩空間上，而是讓模組學習人臉的拉普拉斯金字塔。假設金字塔的高頻段保留了更多材質資訊，我們將學習目標定在金字塔的低頻部分，並保留原始的高頻部分。實驗結果顯示這樣的設計可以更好的適應不同色光並保留更多細節。我們亦將人臉的深度影像以及人臉的面部剖析分割圖與人臉影像串聯做為輸入，提供給模組更多的人臉幾何資訊。我們進行了讀者研究來評估我們的系統。在研究中，我們要求使用者指出重建光影後的影像其光線來向。此研究中我們達到了82.4%的正確辨識率。	zh_TW
dc.description.abstract	This thesis presents a human face relighting system, which relights the target human face image in a specific lighting pattern according to the reference face image. We propose a 2-Input-Stream U-Net structure to learn from the target face and reference face simultaneously. Instead of learning the face on color space, we operate on frequency space by learning the Laplacian Pyramid of the face. Under the assumption that the high-frequency part preserves more information of textures, we only learn the low-frequency part and the high-frequency part remains the same. The experiment result shows that the model is more adaptive to lighting color and preserves more details. We also concatenate the depth map and face parsing segmentation map to the input stack to provide facial geometry information. To evaluate our model, we conduct a user study by asking users to identify the lighting direction of the relit image. We obtained 82.4% accuracy on this survey.	en
dc.description.provenance	Made available in DSpace on 2021-06-17T04:42:20Z (GMT). No. of bitstreams: 1 ntu-107-R05944002-1.pdf: 5054303 bytes, checksum: 724585d66561720d9545aff81120c857 (MD5) Previous issue date: 2018	en
dc.description.tableofcontents	致謝 i 摘要 ii Abstract iii List of Figures vii List of Tables ix Chapter 1 Introduction 1 Chapter 2 Related Work 3 2.1 Classical Inverse Rendering 3 2.2 Face Intrinsic Decomposition using Neural Network 4 2.3 Face Editing by Ratio Image 4 Chapter 3 Methodology 5 3.1 Lighting Patterns 5 3.2 Dataset 6 3.2.1 Training Data Triplet 6 3.2.2 Data Augmentation 8 3.3 Model 9 3.3.1 2-Input-Stream U-Net Architecture 9 3.3.2 Multiple Input 11 3.3.3 Loss Function 13 3.4 Training Details 17 3.4.1 Training Data 18 3.4.2 Implementation Details 19 Chapter 4 Experiment and Result 20 4.1 Evaluation on Synthetic Data 20 4.2 Result on Flickr2 Dataset 21 4.3 Comparison over Different Output Types 22 4.4 Comparison over Different Input Types 23 4.5 User Study 24 Chapter 5 Discussion 26 5.1 Failure Cases 26 5.2 Limitation 27 Chapter 6 Conclusion 28 Bibliography 29
dc.language.iso	en
dc.subject	深度影像	zh_TW
dc.subject	拉普拉斯金字塔	zh_TW
dc.subject	打光模式	zh_TW
dc.subject	人臉光影重建	zh_TW
dc.subject	人臉剖析	zh_TW
dc.subject	face relighting	en
dc.subject	Face parsing	en
dc.subject	Depth map	en
dc.subject	Laplacian Pyramid	en
dc.subject	lighting pattern	en
dc.title	基於拉普拉斯金字塔的深度人臉影像光影重建	zh_TW
dc.title	Deep Face Relighting using Laplacian Pyramid	en
dc.type	Thesis
dc.date.schoolyear	106-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	林彥宇(Yen-Yu Lin),陳煥宗(Hwann-Tzong Chen)
dc.subject.keyword	人臉光影重建,打光模式,拉普拉斯金字塔,深度影像,人臉剖析,	zh_TW
dc.subject.keyword	face relighting,lighting pattern,Laplacian Pyramid,Depth map,Face parsing,	en
dc.relation.page	31
dc.identifier.doi	10.6342/NTU201802521
dc.rights.note	有償授權
dc.date.accepted	2018-08-06
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	資訊網路與多媒體研究所	zh_TW
顯示於系所單位：	資訊網路與多媒體研究所

文件中的檔案：

檔案	大小	格式
ntu-107-1.pdf 未授權公開取用	4.94 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。