增強 3D 場景理解的通用語意神經輻射場

周子庭; Zi-Ting Chou

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/92674

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	王鈺強	zh_TW
dc.contributor.advisor	Yu-Chiang Frank Wang	en
dc.contributor.author	周子庭	zh_TW
dc.contributor.author	Zi-Ting Chou	en
dc.date.accessioned	2024-06-04T16:06:03Z	-
dc.date.available	2024-06-05	-
dc.date.copyright	2024-06-04	-
dc.date.issued	2024	-
dc.date.submitted	2024-05-30	-
dc.identifier.citation	[1] J. T. Barron, B. Mildenhall, M. Tancik, P. Hedman, R. Martin-Brualla, and P. P. Srinivasan. Mip-nerf: A multiscale representation for anti-aliasing neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5855–5864, 2021. [2] D. Chang, A. Božič, T. Zhang, Q. Yan, Y. Chen, S. Süsstrunk, and M. Nießner. Rc-mvsnet: unsupervised multi-view stereo with neural rendering. In European Conference on Computer Vision, pages 665–680. Springer, 2022. [3] A. Chen, Z. Xu, A. Geiger, J. Yu, and H. Su. Tensorf: Tensorial radiance fields. In European Conference on Computer Vision, pages 333–350. Springer, 2022. [4] A. Chen, Z. Xu, F. Zhao, X. Zhang, F. Xiang, J. Yu, and H. Su. Mvsnerf: Fast generalizable radiance field reconstruction from multi-view stereo. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 14124–14133, 2021. [5] A. Dai, A. X. Chang, M. Savva, M. Halber, T. Funkhouser, and M. Nießner. Scannet: Richly-annotated 3d reconstructions of indoor scenes. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5828–5839, 2017. [6] Y. Dai, Z. Zhu, Z. Rao, and B. Li. Mvs2: Deep unsupervised multi-view stereo with multi-view symmetry. In 2019 International Conference on 3D Vision (3DV), pages 1–8. Ieee, 2019. [7] K. Deng, A. Liu, J.-Y. Zhu, and D. Ramanan. Depth-supervised nerf: Fewer views and faster training for free. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12882–12891, 2022. [8] D. Eigen, C. Puhrsch, and R. Fergus. Depth map prediction from a single image using a multi-scale deep network. Advances in neural information processing systems, 27, 2014. [9] S. Fridovich-Keil, A. Yu, M. Tancik, Q. Chen, B. Recht, and A. Kanazawa. Plenoxels: Radiance fields without neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5501–5510, 2022. [10] X. Fu, S. Zhang, T. Chen, Y. Lu, L. Zhu, X. Zhou, A. Geiger, and Y. Liao. Panoptic nerf: 3d-to-2d label transfer for panoptic urban scene segmentation. In 2022 International Conference on 3D Vision (3DV), pages 1–11. IEEE, 2022. [11] C. Gao, Y. Shih, W.-S. Lai, C.-K. Liang, and J.-B. Huang. Portrait neural radiance fields from a single image. arXiv preprint arXiv:2012.05903, 2020. [12] S. Garg, N. Sünderhauf, F. Dayoub, D. Morrison, A. Cosgun, G. Carneiro, Q. Wu, T.-J. Chin, I. Reid, S. Gould, et al. Semantics for robotic mapping, perception and interaction: A survey. Foundations and Trends® in Robotics, 8(1–2):1–224, 2020. [13] R. Girshick. Fast r-cnn. In Proceedings of the IEEE international conference on computer vision, pages 1440–1448, 2015. [14] X. Gu, Z. Fan, S. Zhu, Z. Dai, F. Tan, and P. Tan. Cascade cost volume for highresolution multi-view stereo and stereo matching. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2495–2504, 2020. [15] S. Hu, F. Hong, L. Pan, H. Mei, L. Yang, and Z. Liu. Sherf: Generalizable human nerf from a single image. arXiv preprint arXiv:2303.12791, 2023. [16] B. Huang, H. Yi, C. Huang, Y. He, J. Liu, and X. Liu. M3vsnet: Unsupervised multimetric multi-view stereo network. In 2021 IEEE International Conference on Image Processing (ICIP), pages 3163–3167. IEEE, 2021. [17] M. M. Johari, Y. Lepoittevin, and F. Fleuret. Geonerf: Generalizing nerf with geometry priors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18365–18375, 2022. [18] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014. [19] A. Kundu, K. Genova, X. Yin, A. Fathi, C. Pantofaru, L. J. Guibas, A. Tagliasacchi, F. Dellaert, and T. Funkhouser. Panoptic neural fields: A semantic object-aware neural scene representation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12871–12881, 2022. [20] T. Liao, X. Zhang, Y. Xiu, H. Yi, X. Liu, G.-J. Qi, Y. Zhang, X. Wang, X. Zhu, and Z. Lei. High-fidelity clothed avatar reconstruction from a single image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8662–8672, 2023. [21] H. Lin, S. Peng, Z. Xu, Y. Yan, Q. Shuai, H. Bao, and X. Zhou. Efficient neu-ral radiance fields for interactive free-viewpoint video. In SIGGRAPH Asia 2022 Conference Papers, pages 1–9, 2022. [22] F. Liu, C. Zhang, Y. Zheng, and Y. Duan. Semantic ray: Learning a generalizable semantic field with cross-reprojection attention. In CVPR, pages 17386–17396, 2023. [23] Y. Liu, S. Peng, L. Liu, Q. Wang, P. Wang, C. Theobalt, X. Zhou, and W. Wang. Neural rays for occlusion-aware image-based rendering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7824– 7833, 2022. [24] R. Martin-Brualla, N. Radwan, M. S. Sajjadi, J. T. Barron, A. Dosovitskiy, and D. Duckworth. Nerf in the wild: Neural radiance fields for unconstrained photo collections. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7210–7219, 2021. [25] B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and R. Ng. Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1):99–106, 2021. [26] T. Müller, A. Evans, C. Schied, and A. Keller. Instant neural graphics primitives with a multiresolution hash encoding. ACM Transactions on Graphics (ToG), 41(4):1–15, 2022. [27] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, et al. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019. [28] C. Reiser, S. Peng, Y. Liao, and A. Geiger. Kilonerf: Speeding up neural radiance fields with thousands of tiny mlps. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 14335–14345, 2021. [29] B. Roessle, J. T. Barron, B. Mildenhall, P. P. Srinivasan, and M. Nießner. Dense depth priors for neural radiance fields from sparse input views. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12892– 12901, 2022. [30] O. Ronneberger, P. Fischer, and T. Brox. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pages 234–241. Springer, 2015. [31] Y. Siddiqui, L. Porzi, S. R. Bulò, N. Müller, M. Nießner, A. Dai, and P. Kontschieder. Panoptic lifting for 3d scene understanding with neural fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9043– 9052, 2023. [32] J. Straub, T. Whelan, L. Ma, Y. Chen, E. Wijmans, S. Green, J. J. Engel, R. Mur- Artal, C. Ren, S. Verma, et al. The replica dataset: A digital replica of indoor spaces. arXiv preprint arXiv:1906.05797, 2019. [33] M. Suhail, C. Esteves, L. Sigal, and A. Makadia. Generalizable patch-based neural rendering. In European Conference on Computer Vision, pages 156–174. Springer, 2022. [34] C. Sun, M. Sun, and H.-T. Chen. Direct voxel grid optimization: Super-fast convergence for radiance fields reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5459–5469, 2022. [35] A. Trevithick and B. Yang. Grf: Learning a general radiance field for 3d representation and rendering. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 15182–15192, 2021. [36] S. Vora, N. Radwan, K. Greff, H. Meyer, K. Genova, M. S. Sajjadi, E. Pot, A. Tagliasacchi, and D. Duckworth. Nesf: Neural semantic fields for generalizable semantic segmentation of 3d scenes. arXiv preprint arXiv:2111.13260, 2021. [37] B. Wang, L. Chen, and B. Yang. Dm-nerf: 3d scene geometry decomposition and manipulation from 2d images. arXiv preprint arXiv:2208.07227, 2022. [38] P. Wang, X. Chen, T. Chen, S. Venugopalan, Z. Wang, et al. Is attention all nerf needs? arXiv preprint arXiv:2207.13298, 2022. [39] Q. Wang, Z. Wang, K. Genova, P. P. Srinivasan, H. Zhou, J. T. Barron, R. Martin- Brualla, N. Snavely, and T. Funkhouser. Ibrnet: Learning multi-view image-based rendering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4690–4699, 2021. [40] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing, 13(4):600–612, 2004. [41] Z. Weng, Z. Wang, and S. Yeung. Zeroavatar: Zero-shot 3d avatar generation from a single image. arXiv preprint arXiv:2305.16411, 2023. [42] Y. Yao, Z. Luo, S. Li, T. Fang, and L. Quan. Mvsnet: Depth inference for unstructured multi-view stereo. In Proceedings of the European conference on computer vision (ECCV), pages 767–783, 2018. [43] A. Yu, R. Li, M. Tancik, H. Li, R. Ng, and A. Kanazawa. Plenoctrees for real-time rendering of neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5752–5761, 2021. [44] A. Yu, V. Ye, M. Tancik, and A. Kanazawa. pixelnerf: Neural radiance fields from one or few images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4578–4587, 2021. [45] R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 586–595, 2018. [46] S. Zhi, T. Laidlow, S. Leutenegger, and A. J. Davison. In-place scene labelling and understanding with implicit scene representation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 15838–15847, 2021.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/92674	-
dc.description.abstract	利用多視角輸入合成新視角圖像，神經輻射場（Neural Radiance Fields，簡稱NeRF）已成為三維視覺領域中的熱門研究課題。在本文中，我們引入了一種名為通用語義神經輻射場（Generalizable Semantic Neural Radiance Fields，簡稱GSNeRF）的方法，它在合成過程中獨特地考慮了圖像語義，從而可以為未見過的場景生成新視角圖像及其相關的語義地圖。我們的GSNeRF由兩個階段組成：語義幾何推理和深度引導的視覺渲染。前者能夠觀察多視角圖像輸入，從場景中提取語義和幾何特徵。後者在圖像幾何信息的指導下，執行圖像和語義渲染，具有更好的性能。我們的實驗不僅證實了GSNeRF在新視角圖像和語義分割合成方面優於先前的工作，而且進一步驗證了我們的採樣策略對視覺渲染的有效性。	zh_TW
dc.description.abstract	Utilizing multi-view inputs to synthesize novel-view images, Neural Radiance Fields (NeRF) have emerged as a popular research topic in 3D vision. In this work, we introduce a Generalizable Semantic Neural Radiance Fields (GSNeRF), which uniquely takes image semantics into the synthesis process so that both novel view image and the associated semantic maps can be produced for unseen scenes. Our GSNeRF is composed of two stages: Semantic Geo-Reasoning and Depth-Guided Visual rendering. The former is able to observe multi-view image inputs to extract semantic and geometry features from a scene. Guided by the resulting image geometry information, the latter performs both image and semantic rendering with improved performances. Our experiments not only confirm that GSNeRF performs favorably against prior works on both novel-view image and semantic segmentation synthesis but the effectiveness of our sampling strategy for visual rendering is further verified.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2024-06-04T16:06:03Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2024-06-04T16:06:03Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	Acknowledgements i 摘要 iii Abstract v Contents vii List of Figures xi List of Tables xiii Chapter 1 Introduction 1 Chapter 2 Related Work 5 2.1 Neural Radiance Fields . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 Generalizable Novel View Synthesis . . . . . . . . . . . . . . . . . . 6 2.3 Multi-tasking NeRF . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Chapter 3 Brief Review of Generalizable NeRFs 9 Chapter 4 Method 11 4.1 Problem Formulation and Model Overview . . . . . . . . . . . . . . 11 4.2 Generalizable Semantic NeRF . . . . . . . . . . . . . . . . . . . . . 12 4.2.1 Semantic Geo-Reasoning . . . . . . . . . . . . . . . . . . . . . . . 12 4.2.2 Depth-Guided Visual Rendering . . . . . . . . . . . . . . . . . . . 14 4.3 Training and Inference . . . . . . . . . . . . . . . . . . . . . . . . . 18 4.3.1 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4.3.2 Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 Chapter 5 Experiments 21 5.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 5.2 Results and analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 22 5.2.1 Quantitative Results . . . . . . . . . . . . . . . . . . . . . . . . . . 22 5.2.2 Qualitative Results . . . . . . . . . . . . . . . . . . . . . . . . . . 23 5.2.3 Ablation Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 5.2.4 Sampling Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . 24 Chapter 6 Conclusion 27 References 29 Appendix A — Additional implementation Details 37 A.1 Self-Supervised Depth Loss . . . . . . . . . . . . . . . . . . . . . . 37 A.2 Target View Depth Estimation . . . . . . . . . . . . . . . . . . . . . 38 A.3 Masking Unrelated Features for Depth-Guided Visual Rendering . . . 39 A.4 Training Strategy for Depth-Guided Volume Rendering . . . . . . . . 39 A.5 More Training Details . . . . . . . . . . . . . . . . . . . . . . . . . 40 A.6 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 Appendix B — Additional Experiments and Analysis 41 B.1 Analysis of the depth-guided sampling strategy . . . . . . . . . . . . 41 B.2 Finetuning on Unseen Scenes . . . . . . . . . . . . . . . . . . . . . 42 B.3 Observations on Different Number of Source Views . . . . . . . . . 43 B.4 Compare with GeoNeRF + semhead . . . . . . . . . . . . . . . . . . 45 B.5 More Qualitative Evaluation . . . . . . . . . . . . . . . . . . . . . . 45 B.6 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46	-
dc.language.iso	en	-
dc.subject	深度學習	zh_TW
dc.subject	三維電腦視覺	zh_TW
dc.subject	神經輻射場	zh_TW
dc.subject	語意分割	zh_TW
dc.subject	3D Computer Vision	en
dc.subject	Deep Learning	en
dc.subject	Semantic Segmentation	en
dc.subject	Neural Radiance Field	en
dc.title	增強 3D 場景理解的通用語意神經輻射場	zh_TW
dc.title	GSNeRF: Generalizable Semantic Neural Radiance Fields with Enhanced 3D Scene Understanding	en
dc.type	Thesis	-
dc.date.schoolyear	112-2	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	陳祝嵩;楊福恩	zh_TW
dc.contributor.oralexamcommittee	Chu-Song Chen;Fu-En Yang	en
dc.subject.keyword	深度學習,三維電腦視覺,神經輻射場,語意分割,	zh_TW
dc.subject.keyword	Deep Learning,3D Computer Vision,Neural Radiance Field,Semantic Segmentation,	en
dc.relation.page	46	-
dc.identifier.doi	10.6342/NTU202400922	-
dc.rights.note	同意授權(限校園內公開)	-
dc.date.accepted	2024-05-30	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	電信工程學研究所	-
顯示於系所單位：	電信工程學研究所

文件中的檔案：

檔案	大小	格式
ntu-112-2.pdf 授權僅限NTU校內IP使用（校園外請利用VPN校外連線服務）	1.22 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。