類神經輻射場之可解釋隱變量於三維物件操控

Yu-Shan Huang; 黃郁珊

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/85052

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	王鈺強(Yu-Chiang Frank Wang)
dc.contributor.advisor	王鈺強(Yu-Chiang Frank Wang \| ycwang@ntu.edu.tw \| ),
dc.contributor.author	Yu-Shan Huang	en
dc.contributor.author	黃郁珊	zh_TW
dc.date.accessioned	2023-03-19T22:40:35Z	-
dc.date.copyright	2022-09-30
dc.date.issued	2022
dc.date.submitted	2022-09-28
dc.identifier.citation	B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi,and R. Ng, “Nerf: Representing scenes as neural radiance fields for view synthesis,” in European conference on computer vision. Springer, 2020, pp.405–421. J. L. Schonberger and J.-M. Frahm, “Structure-from-motion revisited,” in Pro-ceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 4104–4113. Y.-J. Yuan, Y.-K. Lai, T. Wu, L. Gao, and L. Liu, “A revisit of shape editing techniques: From the geometric to the neural viewpoint,” Journal of Computer Science and Technology, vol. 36, no. 3, pp. 520–554, 2021. J. J. Park, P. Florence, J. Straub, R. Newcombe, and S. Lovegrove, “Deepsdf: Learning continuous signed distance functions for shape representation,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 165–174. Z. Chen and H. Zhang, “Learning implicit fields for generative shape modeling,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 5939–5948. L. Mescheder, M. Oechsle, M. Niemeyer, S. Nowozin, and A. Geiger, “Occupancy networks: Learning 3d reconstruction in function space,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 4460–4470. V. Sitzmann, M. Zollh ̈ofer, and G. Wetzstein, “Scene representation networks: Continuous 3d-structure-aware neural scene representations,” Advances in Neural Information Processing Systems, vol. 32, 2019. S. Liu, S. Saito, W. Chen, and H. Li, “Learning to infer implicit surfaces without 3d supervision,” Advances in Neural Information Processing Systems, vol. 32, 2019. S. Liu, Y. Zhang, S. Peng, B. Shi, M. Pollefeys, and Z. Cui, “Dist: Rendering deep implicit signed distance function with differentiable sphere tracing,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 2019–2028. M. Niemeyer, L. Mescheder, M. Oechsle, and A. Geiger, “Differentiable volumetric rendering: Learning implicit 3d representations without 3d supervision,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 3504–3515. A. Yu, V. Ye, M. Tancik, and A. Kanazawa, “pixelnerf: Neural radiance fields from one or few images,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 4578–4587. K. Rematas, R. Martin-Brualla, and V. Ferrari, “Sharf: Shape-conditioned radiance fields from a single view,” arXiv preprint arXiv:2102.08860, 2021. A. Trevithick and B. Yang, “Grf: Learning a general radiance field for 3d representation and rendering,” arXiv preprint arXiv:2010.04595, 2020. E. R. Chan, M. Monteiro, P. Kellnhofer, J. Wu, and G. Wetzstein, “pi-gan: Periodic implicit generative adversarial networks for 3d-aware image synthesis,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 5799–5809. K. Schwarz, Y. Liao, M. Niemeyer, and A. Geiger, “Graf: Generative radiance fields for 3d-aware image synthesis,” Advances in Neural Information Processing Systems, vol. 33, pp. 20 154–20 166, 2020. M. Niemeyer and A. Geiger, “Giraffe: Representing scenes as compositional generative neural feature fields,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 11 453–11 464. J. Gu, L. Liu, P. Wang, and C. Theobalt, “Stylenerf: A style-based 3d-aware generator for high-resolution image synthesis,” arXiv preprint arXiv:2110.08985, 2021. S. Liu, X. Zhang, Z. Zhang, R. Zhang, J.-Y. Zhu, and B. Russell, “Editing conditional radiance fields,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 5773–5783. C. Wang, M. Chai, M. He, D. Chen, and J. Liao, “Clip-nerf: Text-and-image driven manipulation of neural radiance fields,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 3835–3844. A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark et al., “Learning transferable visual models from natural language supervision,” in International Conference on Machine Learning. PMLR, 2021, pp. 8748–8763. E. Pajouheshgar, T. Zhang, and S. S ̈usstrunk, “Optimizing latent space directions for gan-based local image editing,” in ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022, pp. 1740–1744. E. H ̈ark ̈onen, A. Hertzmann, J. Lehtinen, and S. Paris, “Ganspace: Discovering interpretable gan controls,” Advances in Neural Information Processing Systems, vol. 33, pp. 9841–9850, 2020. Z. He, M. Kan, and S. Shan, “Eigengan: Layer-wise eigen-learning for gans,”in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 14 408–14 417. Y. Shen, C. Yang, X. Tang, and B. Zhou, “Interfacegan: Interpreting the disentangled face representation learned by gans,” IEEE transactions on pattern analysis and machine intelligence, 2020. Y. Shen and B. Zhou, “Closed-form factorization of latent semantics in gans,”in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 1532–1540. A. Voynov and A. Babenko, “Unsupervised discovery of interpretable directions in the gan latent space,” in International conference on machine learning. PMLR, 2020, pp. 9786–9796. Z. Wu, D. Lischinski, and E. Shechtman, “Stylespace analysis: Disentangled controls for stylegan image generation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 12 863–12 872. J. Zhu, R. Feng, Y. Shen, D. Zhao, Z.-J. Zha, J. Zhou, and Q. Chen, “Low-rank subspaces in gans,” Advances in Neural Information Processing Systems, vol. 34, pp. 16 648–16 658, 2021. J. Zhu, Y. Shen, Y. Xu, D. Zhao, and Q. Chen, “Region-based semantic factorization in gans,” arXiv preprint arXiv:2202.09649, 2022. E. Perez, F. Strub, H. De Vries, V. Dumoulin, and A. Courville, “Film: Visual reasoning with a general conditioning layer,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, no. 1, 2018. Y. Zhang, H. Ling, J. Gao, K. Yin, J.-F. Lafleche, A. Barriuso, A. Torralba, and S. Fidler, “Datasetgan: Efficient labeled data factory with minimal human effort,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 10 145–10 155. A. P. S. Kohli, V. Sitzmann, and G. Wetzstein, “Semantic implicit neural scene representations with semi-supervised training,” in 2020 International Conference on 3D Vision (3DV). IEEE, 2020, pp. 423–433. J. Zhu, Y. Shen, D. Zhao, and B. Zhou, “In-domain gan inversion for real image editing,” in European conference on computer vision. Springer, 2020, pp. 592–608. A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood from incomplete data via the em algorithm,” Journal of the Royal Statistical Society: Series B (Methodological), vol. 39, no. 1, pp. 1–22, 1977. A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V. Koltun, “Carla: An open urban driving simulator,” in Conference on robot learning. PMLR, 2017, pp. 1–16. K. Park, K. Rematas, A. Farhadi, and S. M. Seitz, “Photoshape: Photorealistic materials for large-scale shape collections,” arXiv preprint arXiv:1809.09761, 2018. Z. Liu, P. Luo, X. Wang, and X. Tang, “Deep learning face attributes in the wild,” in Proceedings of the IEEE international conference on computer vision, 2015, pp. 3730–3738. H. Ling, K. Kreis, D. Li, S. W. Kim, A. Torralba, and S. Fidler, “Editgan: High-precision semantic image editing,” Advances in Neural Information Processing Systems, vol. 34, pp. 16 331–16 345, 2021. L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille,“Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs,” IEEE transactions on pattern analysis and machine intelligence, vol. 40, no. 4, pp. 834–848, 2017. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778. L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam, “Encoder-decoder with atrous separable convolution for semantic image segmentation,”in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 801–818. K. Mo, S. Zhu, A. X. Chang, L. Yi, S. Tripathi, L. J. Guibas, and H. Su, “Partnet: A large-scale benchmark for fine-grained and hierarchical part-level 3d object understanding,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 909–918. A. X. Chang, T. Funkhouser, L. Guibas, P. Hanrahan, Q. Huang, Z. Li, S. Savarese, M. Savva, S. Song, H. Su et al., “Shapenet: An information-rich 3d model repository,” arXiv preprint arXiv:1512.03012, 2015. T. Karras, S. Laine, M. Aittala, J. Hellsten, J. Lehtinen, and T. Aila, “Analyzing and improving the image quality of stylegan,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 8110–8119. T. Karras, S. Laine, and T. Aila, “A style-based generator architecture for generative adversarial networks,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 4401–4410.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/85052	-
dc.description.abstract	控制三維物體的形變一直以來都是三維視覺領域中備受討論的領域。近年來，全新的三維表示法「類神經輻射場（NeRF）」被提出且蓬勃發展，在對場景的建模上取得很大的成功。使用此種表示法來進行物體合成或控制建模內容逐漸為人們所關注。此篇論文中，我們採用了一個能夠感知語義的生成型類神經輻射場，透過探索與詮釋為特定類別建模的生成型類神經輻射場所學習到的隱變量，得以對該類別之特定區域進行編輯。以預先訓練的生成型類神經輻射場為基礎，我們加入一個語義分割器，用來對每種物體作內部的區域分割，使得此類神經輻射場能夠同時渲染出所選視角的二維圖像與相對應的語義分割結果。我們提出的架構能對生成型類神經輻射所學到的隱變量做操縱，除了可以針對特定部位做編輯，編輯效果也能有不同程度的變化。我們將此架構以不同的生成型類神經輻射場和不同物體類別的資料集做實驗，結果成功地驗證此方法的有效性和實用性。	zh_TW
dc.description.abstract	Manipulating 3D objects has been among the active research topic for 3D vision. With the development and success of neural radiance field (NeRF) on scene modeling, synthesizing and manipulating 3D objects using such a representation becomes desirable. In this thesis, we introduce a semantic-aware generative NeRF, which is able to interpret the latent representation learned by category-specific generative NeRFs and to achieve editing of particular part attributes. With pre-trained generative NeRF, we propose to deploy a semantic segmentor for performing part segmentation on the object category. This allows the rendering of the 2D image and prediction of the corresponding segmentation mask. Our proposed scheme learns to manipulate the resulting latent representation and is optimized to edit the object part of interest with varying degrees. We conduct experiments on various object categories on benchmark datasets, and the results successfully verify the effectiveness and practicality of our proposed model.	en
dc.description.provenance	Made available in DSpace on 2023-03-19T22:40:35Z (GMT). No. of bitstreams: 1 U0001-2309202205145500.pdf: 21533858 bytes, checksum: 446ff66fedd2a7f7f98b6f77ffd7f153 (MD5) Previous issue date: 2022	en
dc.description.tableofcontents	中文摘要i Abstract ii List of Figures v List of Tables viii 1 Introduction 1 2 Related work 4 2.1 Latent Representation Manipulation for 2D Images 4 2.2 Generative NeRF 5 2.3 3D Model Editing via NeRF 5 3 Method 7 3.1 Problem Formulation and Model Overview 7 3.2 Semantic-Aware Generative NeRF for Manipulating Object Semantics 8 3.2.1 Brief review of generative NeRFs 8 3.2.2 Interpreting object semantics 9 3.2.3 Manipulating object semantics 10 3.3 Training and Inference 12 3.3.1 Training and Optimization 12 3.3.2 Inference 12 4 Experiments 14 4.1 Datasets and Settings 14 4.2 Implementation Details 16 4.2.1 GRAF 16 4.2.2 pi-GAN 16 4.3 Evaluation 17 4.3.1 Comparisons with 2D Manipulation Models 18 4.3.2 Comparisons with NeRF Editing Models 19 4.4 Real Image Editing 21 4.5 COLMAP 21 4.6 Additional Experimental Results 23 5 Conclusion 29 Reference 30
dc.language.iso	en
dc.title	類神經輻射場之可解釋隱變量於三維物件操控	zh_TW
dc.title	Interpreting Latent Representation in Neural Radiance Fields for Manipulating Object Semantics	en
dc.type	Thesis
dc.date.schoolyear	110-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	陳駿丞(Jun-Cheng Chen),孫紹華(Shao-Hua Sun)
dc.subject.keyword	深度學習,電腦視覺,類神經輻射場,生成對抗網路,語義,	zh_TW
dc.subject.keyword	deep learning,computer vision,3D computer vision,generative adversarial networks,semantics,	en
dc.relation.page	36
dc.identifier.doi	10.6342/NTU202203877
dc.rights.note	同意授權(限校園內公開)
dc.date.accepted	2022-09-28
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	電信工程學研究所	zh_TW
dc.date.embargo-lift	2022-09-30	-
顯示於系所單位：	電信工程學研究所

文件中的檔案：

檔案	大小	格式
U0001-2309202205145500.pdf 授權僅限NTU校內IP使用（校園外請利用VPN校外連線服務）	21.03 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。