利用深度學習之影片深度估測建構 3D 多視角智慧風格轉換

Kai-Cheng Chang; 張凱程

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/86127

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	陳祝嵩(Chu-Song Chen)
dc.contributor.author	Kai-Cheng Chang	en
dc.contributor.author	張凱程	zh_TW
dc.date.accessioned	2023-03-19T23:38:07Z	-
dc.date.copyright	2022-09-14
dc.date.issued	2022
dc.date.submitted	2022-09-07
dc.identifier.citation	[1] J. Bian, Z. Li, N. Wang, H. Zhan, C. Shen, M.-M. Cheng, and I. Reid. Unsupervised scale-consistent depth and ego-motion learning from monocular video. Advances in neural information processing systems, 32, 2019. [2] D. Chen, J. Liao, L. Yuan, N. Yu, and G. Hua. Coherent online video style transfer. In Proceedings of the IEEE International Conference on Computer Vision, pages 1105–1114, 2017. [3] D. Chen, L. Yuan, J. Liao, N. Yu, and G. Hua. Stereoscopic neural style transfer. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 6654–6663, 2018. [4] T. Q. Chen and M. Schmidt. Fast patch-based style transfer of arbitrary style. arXiv preprint arXiv:1612.04337, 2016. [5] P.-Z. Chiang, M.-S. Tsai, H.-Y. Tseng, W.-S. Lai, and W.-C. Chiu. Stylizing 3d scene via implicit representation and hypernetwork. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 1475–1484, 2022. [6] D. Crandall, A. Owens, N. Snavely, and D. Huttenlocher. Discrete-continuous optimization for large-scale structure from motion. In CVPR 2011, pages 3001–3008. IEEE, 2011. [7] Y. Deng, F. Tang, W. Dong, H. Huang, C. Ma, and C. Xu. Arbitrary video style transfer via multi-channel correlation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 1210–1217, 2021. [8] J. Donahue, L. Anne Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell. Long-term recurrent convolutional networks for visual recognition and description. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2625–2634, 2015. [9] H. Fan and Y. Yang. Pointrnn: Point recurrent neural network for moving point cloud processing. arXiv preprint arXiv:1910.08287, 2019. [10] W. Gao, Y. Li, Y. Yin, and M.-H. Yang. Fast video multi-style transfer. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 3222–3230, 2020. [11] L. A. Gatys, A. S. Ecker, and M. Bethge. Image style transfer using convolutional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2414–2423, 2016. [12] S. Gu, C. Chen, J. Liao, and L. Yuan. Arbitrary style transfer with deep feature reshuffle. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 8222–8231, 2018. [13] R. I. Hartley and P. Sturm. Triangulation. Computer vision and image understanding, 68(2):146–157, 1997. [14] S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997. [15] H. Huang, H. Wang, W. Luo, L. Ma, W. Jiang, X. Zhu, Z. Li, and W. Liu. Real-time neural style transfer for videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 783–791, 2017. [16] H.-P. Huang, H.-Y. Tseng, S. Saini, M. Singh, and M.-H. Yang. Learning to stylize novel views. In Proceedings of the IEEE/CVF InternationalConference on Computer Vision, pages 13869–13878, 2021. [17] X. Huang and S. Belongie. Arbitrary style transfer in real-time with adaptive instance normalization. In Proceedings of the IEEE international conference on computer vision, pages 1501–1510, 2017. [18] Y.-H. Huang, Y. He, Y.-J. Yuan, Y.-K. Lai, and L. Gao. Stylizednerf: consistent 3d scene stylization as stylized nerf via 2d-3d mutual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18342–18352, 2022. [19] J. Johnson, A. Alahi, and L. Fei-Fei. Perceptual losses for real-time style transfer and super-resolution. In European conference on computer vision, pages 694–711. Springer, 2016. [20] A. Knapitsch, J. Park, Q.-Y. Zhou, and V. Koltun. Tanks and temples: Benchmarking large-scale scene reconstruction. ACM Transactions on Graphics(ToG), 36(4):1–13, 2017. [21] W.-S. Lai, J.-B. Huang, O. Wang, E. Shechtman, E. Yumer, and M.-H. Yang. Learning blind video temporal consistency. In Proceedings of the European conference on computer vision (ECCV), pages 170–185, 2018. [22] Y.-C. Lee, K.-W. Tseng, Y.-T. Chen, C.-C. Chen, C.-S. Chen, and Y.-P. Hung. 3d video stabilization with depth estimation by cnn-based optimization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10621–10630, 2021. [23] X. Li, S. Liu, J. Kautz, and M.-H. Yang. Learning linear transformations for fast image and video style transfer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3809–3817, 2019. [24] Y. Li, C. Fang, J. Yang, Z. Wang, X. Lu, and M.-H. Yang. Universal style transfer via feature transforms. Advances in neural information processing systems, 30, 2017. [25] Y. Li, N. Wang, J. Liu, and X. Hou. Demystifying neural style transfer. arXiv preprint arXiv:1701.01036, 2017. [26] S. Liu, T. Lin, D. He, F. Li, M. Wang, X. Li, Z. Sun, Q. Li, and E. Ding. Adaattn: Revisit attention mechanism in arbitrary neural style transfer. In Proceedings of the IEEE/CVF international conference on computer vision, pages 6649–6658, 2021. [27] F. Luan, S. Paris, E. Shechtman, and K. Bala. Deep photo style transfer. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4990–4998, 2017. [28] R. Mahjourian, M. Wicke, and A. Angelova. Unsupervised learning of depth and ego-motion from monocular video using 3d geometric constraints. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5667–5675, 2018. [29] B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and R. Ng. Nerf: Representing scenes as neural radiance fields for view synthesis. In European conference on computer vision, pages 405–421. Springer, 2020. [30] Y. Min, Y. Zhang, X. Chai, and X. Chen. An efficient pointlstm for point clouds based gesture recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5761–5770, 2020. [31] F. Mu, J. Wang, Y. Wu, and Y. Li. 3d photo stylization: Learning to generate stylized novel views from a single image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16273–16282, 2022. [32] D. Y. Park and K. H. Lee. Arbitrary style transfer with style-attentional networks. In proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5880–5888, 2019. [33] C. R. Qi, L. Yi, H. Su, and L. J. Guibas. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. Advances in neural information processing systems, 30, 2017. [34] G. Riegler and V. Koltun. Free view synthesis. In European Conference on Computer Vision, 2020. [35] G. Riegler and V. Koltun. Stable view synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12216–12225, 2021. [36] E. Risser, P. Wilmot, and C. Barnes. Stable and controllable neural texture synthesis and style transfer using histogram losses. arXiv preprint arXiv:1701.08893, 2017. [37] C. Rockwell, D. F. Fouhey, and J. Johnson. Pixelsynth: Generating a 3d-consistent experience from a single image. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 14104–14113, 2021. [38] M. Ruder, A. Dosovitskiy, and T. Brox. Artistic style transfer for videos. In German conference on pattern recognition, pages 26–36. Springer, 2016. [39] T. Salimans, A. Karpathy, X. Chen, and D. P. Kingma. Pixelcnn++: Improving the pixelcnn with discretized logistic mixture likelihood and other modifications. arXiv preprint arXiv:1701.05517, 2017. [40] J. L. Schonberger and J.-M. Frahm. Structure-from-motion revisited. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4104–4113, 2016. [41] X. Shi, Z. Chen, H. Wang, D.-Y. Yeung, W.-K. Wong, and W.-c. Woo. Convolutional lstm network: A machine learning approach for precipitation nowcasting. Advances in neural information processing systems, 28, 2015. [42] J. Shotton, B. Glocker, C. Zach, S. Izadi, A. Criminisi, and A. Fitzgibbon. Scene coordinate regression forests for camera relocalization in rgb-d images. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2930–2937, 2013. [43] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. [44] C. Sweeney. Theia multiview geometry library: Tutorial & reference, 2015. [45] B. Triggs, P. F. McLauchlan, R. I. Hartley, and A. W. Fitzgibbon. Bundle adjustment —a modern synthesis. In International workshop on vision algorithms, pages 298–372. Springer, 1999. [46] K.-W. Tseng, Y.-C. Lee, and C.-S. Chen. Artistic style novel view synthesis based on a single image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2258–2262, 2022. [47] D. Ulyanov, A. Vedaldi, and V. Lempitsky. Instance normalization: The missing ingredient for fast stylization. arXiv preprint arXiv:1607.08022, 2016. [48] W. Wang, S. Yang, J. Xu, and J. Liu. Consistent video style transfer via relaxation and regularization. IEEE Transactions on Image Processing, 29:9125–9139, 2020. [49] O. Wiles, G. Gkioxari, R. Szeliski, and J. Johnson. Synsin: End-to-end view synthesis from a single image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7467–7477, 2020. [50] C. Wu. Towards linear-time incremental structure from motion. In 2013 International Conference on 3D Vision-3DV 2013, pages 127–134. IEEE, 2013. [51] C. Zhang, M. Fiore, I. Murray, and P. Patras. Cloudlstm: A recurrent neural model for spatiotemporal point-cloud stream forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 10851–10858, 2021. [52] T. Zhou, M. Brown, N. Snavely, and D. G. Lowe. Unsupervised learning of depth and ego-motion from video. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1851–1858, 2017. [53] T. Zhou, R. Tucker, J. Flynn, G. Fyffe, and N. Snavely. Stereo magnification: Learning view synthesis using multiplane images. arXiv preprint arXiv:1805.09817, 2018.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/86127	-
dc.description.abstract	3D 智慧風格轉換於最近一兩年備受關注，而現有的 3D 風格轉換方法主要是透過預估影像中的場景在三維空間中的位子後再進行 3D 場景的風格轉換，並結合多視角來預估目標視角的成像。然而目前的 3D 風格轉換方法需要依賴耗時比較久的全局最佳化方法來預估深度與相機座標，因此，我們的目標是藉由較有效率的深度學習模型所生成的深度與相機座標，並且結合現有 3D 風格轉換模型來完成多視角影像估測。其中這個改變方法需要面臨的問題是不同時間點所產生的3D 座標系統會不一致，而使得現有的 3D 風格轉換方法無法產生正常的轉換成果。而我們以建構局部且隨時間增加和刪減的 3D 點雲，加上改良現有的 3D 風格轉換方法，來產生既沒有破壞 3D 場景架構也不會因點雲變換而不停閃爍的風格轉換成果。這個方法最大的好處在於我們的架構不需要依賴特定的 3D 建構方法也能產生與現有方法相似的成果，減少了許多時間成本。我們同時也提供了 VR 體驗讓使用者可以觀賞 3D 風格轉換影片。	zh_TW
dc.description.abstract	3D style transfer has been investigated for about two years. Recently developed methods either used neural radiance field or point-cloud generation to apply for 3D scene style transfer and they combined novel view synthesis to predict target views. However, these works are faced with time-consuming problems because they need globallyconsistent optimization. We aim to speed up by applying the learning-based structurefrom-motion(SfM) module to SOTA. The naive combination causes problems due to the inconsistency of 3D coordinates between different views generated by local-optimized SfM-learners. To overcome this issue, we use the sliding-window point cloud set that adds points close to the current view and removes points far away from it to make sure the results in 3D space won’t be affected by the 3D difference. Stylizing different point cloud sets may generate flickering results; therefore, we modified the style transfer module we use to deal with the flickering problem. The experiment shows that our reformed method can accomplish comparable visual results to the original style transfer module, while we can utilize a much more efficient SfM constructor compared with their method. Besides, we implement novel view synthesis applications like stereo videos in a Virtual Reality system for the visual experience.	en
dc.description.provenance	Made available in DSpace on 2023-03-19T23:38:07Z (GMT). No. of bitstreams: 1 U0001-0109202216190200.pdf: 3976117 bytes, checksum: cada5b67903f28ac362c5e43c06fa057 (MD5) Previous issue date: 2022	en
dc.description.tableofcontents	Verification Letter from the Oral Examination Committee i Acknowledgements iii 摘要 v Abstract vi Contents viii List of Figures x List of Tables xiii Denotation xiv Chapter 1 Introduction 1 1.1 Introduction of Style Transfer . . . . . . . . . . . . . . . . . . . . . 1 1.2 Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Chapter 2 Related Work 7 2.1 Image and Video Stylization . . . . . . . . . . . . . . . . . . . . . . 7 2.2 Structure-from-Motion . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.3 Novel View Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.4 3D Scene Stylization . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.5 PointLSTM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Chapter 3 Methodology 13 3.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.2 3D Point Cloud Construction . . . . . . . . . . . . . . . . . . . . . . 14 3.3 3D Scene Stylization . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.4 Novel View Rendering . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.5 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.6 Implementation Details . . . . . . . . . . . . . . . . . . . . . . . . . 21 Chapter 4 Experiments 22 4.1 Qualitative Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.2 Quantitative Results . . . . . . . . . . . . . . . . . . . . . . . . . . 23 4.3 User Preference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4.4 Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 Chapter 5 Conclusion 29 References 31
dc.language.iso	en
dc.title	利用深度學習之影片深度估測建構 3D 多視角智慧風格轉換	zh_TW
dc.title	3D Novel View Style Transfer via Learning-based Video Depth Estimation	en
dc.type	Thesis
dc.date.schoolyear	110-2
dc.description.degree	碩士
dc.contributor.coadvisor	洪一平(Yi-Ping Hung)
dc.contributor.oralexamcommittee	傅立成(Li-Chen Fu)
dc.subject.keyword	多視角,風格轉換,立體影片,虛擬實境,	zh_TW
dc.subject.keyword	Novel View Synthesis,Style Transfer,Stereoscopic Video,Virtual Reality,	en
dc.relation.page	37
dc.identifier.doi	10.6342/NTU202203070
dc.rights.note	同意授權(全球公開)
dc.date.accepted	2022-09-07
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	資訊工程學研究所	zh_TW
dc.date.embargo-lift	2022-09-14	-
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
U0001-0109202216190200.pdf	3.88 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。