Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 電信工程學研究所
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/92674
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor王鈺強zh_TW
dc.contributor.advisorYu-Chiang Frank Wangen
dc.contributor.author周子庭zh_TW
dc.contributor.authorZi-Ting Chouen
dc.date.accessioned2024-06-04T16:06:03Z-
dc.date.available2024-06-05-
dc.date.copyright2024-06-04-
dc.date.issued2024-
dc.date.submitted2024-05-30-
dc.identifier.citation[1] J. T. Barron, B. Mildenhall, M. Tancik, P. Hedman, R. Martin-Brualla, and P. P. Srinivasan. Mip-nerf: A multiscale representation for anti-aliasing neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5855–5864, 2021.
[2] D. Chang, A. Božič, T. Zhang, Q. Yan, Y. Chen, S. Süsstrunk, and M. Nießner. Rc-mvsnet: unsupervised multi-view stereo with neural rendering. In European Conference on Computer Vision, pages 665–680. Springer, 2022.
[3] A. Chen, Z. Xu, A. Geiger, J. Yu, and H. Su. Tensorf: Tensorial radiance fields. In European Conference on Computer Vision, pages 333–350. Springer, 2022.
[4] A. Chen, Z. Xu, F. Zhao, X. Zhang, F. Xiang, J. Yu, and H. Su. Mvsnerf: Fast generalizable radiance field reconstruction from multi-view stereo. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 14124–14133, 2021.
[5] A. Dai, A. X. Chang, M. Savva, M. Halber, T. Funkhouser, and M. Nießner. Scannet: Richly-annotated 3d reconstructions of indoor scenes. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5828–5839, 2017.
[6] Y. Dai, Z. Zhu, Z. Rao, and B. Li. Mvs2: Deep unsupervised multi-view stereo with multi-view symmetry. In 2019 International Conference on 3D Vision (3DV), pages 1–8. Ieee, 2019.
[7] K. Deng, A. Liu, J.-Y. Zhu, and D. Ramanan. Depth-supervised nerf: Fewer views and faster training for free. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12882–12891, 2022.
[8] D. Eigen, C. Puhrsch, and R. Fergus. Depth map prediction from a single image using a multi-scale deep network. Advances in neural information processing systems, 27, 2014.
[9] S. Fridovich-Keil, A. Yu, M. Tancik, Q. Chen, B. Recht, and A. Kanazawa. Plenoxels: Radiance fields without neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5501–5510, 2022.
[10] X. Fu, S. Zhang, T. Chen, Y. Lu, L. Zhu, X. Zhou, A. Geiger, and Y. Liao. Panoptic nerf: 3d-to-2d label transfer for panoptic urban scene segmentation. In 2022 International Conference on 3D Vision (3DV), pages 1–11. IEEE, 2022.
[11] C. Gao, Y. Shih, W.-S. Lai, C.-K. Liang, and J.-B. Huang. Portrait neural radiance fields from a single image. arXiv preprint arXiv:2012.05903, 2020.
[12] S. Garg, N. Sünderhauf, F. Dayoub, D. Morrison, A. Cosgun, G. Carneiro, Q. Wu, T.-J. Chin, I. Reid, S. Gould, et al. Semantics for robotic mapping, perception and interaction: A survey. Foundations and Trends® in Robotics, 8(1–2):1–224, 2020.
[13] R. Girshick. Fast r-cnn. In Proceedings of the IEEE international conference on computer vision, pages 1440–1448, 2015.
[14] X. Gu, Z. Fan, S. Zhu, Z. Dai, F. Tan, and P. Tan. Cascade cost volume for highresolution multi-view stereo and stereo matching. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2495–2504, 2020.
[15] S. Hu, F. Hong, L. Pan, H. Mei, L. Yang, and Z. Liu. Sherf: Generalizable human nerf from a single image. arXiv preprint arXiv:2303.12791, 2023.
[16] B. Huang, H. Yi, C. Huang, Y. He, J. Liu, and X. Liu. M3vsnet: Unsupervised multimetric multi-view stereo network. In 2021 IEEE International Conference on Image Processing (ICIP), pages 3163–3167. IEEE, 2021.
[17] M. M. Johari, Y. Lepoittevin, and F. Fleuret. Geonerf: Generalizing nerf with geometry priors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18365–18375, 2022.
[18] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
[19] A. Kundu, K. Genova, X. Yin, A. Fathi, C. Pantofaru, L. J. Guibas, A. Tagliasacchi, F. Dellaert, and T. Funkhouser. Panoptic neural fields: A semantic object-aware neural scene representation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12871–12881, 2022.
[20] T. Liao, X. Zhang, Y. Xiu, H. Yi, X. Liu, G.-J. Qi, Y. Zhang, X. Wang, X. Zhu, and Z. Lei. High-fidelity clothed avatar reconstruction from a single image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8662–8672, 2023.
[21] H. Lin, S. Peng, Z. Xu, Y. Yan, Q. Shuai, H. Bao, and X. Zhou. Efficient neu-ral radiance fields for interactive free-viewpoint video. In SIGGRAPH Asia 2022 Conference Papers, pages 1–9, 2022.
[22] F. Liu, C. Zhang, Y. Zheng, and Y. Duan. Semantic ray: Learning a generalizable semantic field with cross-reprojection attention. In CVPR, pages 17386–17396, 2023.
[23] Y. Liu, S. Peng, L. Liu, Q. Wang, P. Wang, C. Theobalt, X. Zhou, and W. Wang. Neural rays for occlusion-aware image-based rendering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7824– 7833, 2022.
[24] R. Martin-Brualla, N. Radwan, M. S. Sajjadi, J. T. Barron, A. Dosovitskiy, and D. Duckworth. Nerf in the wild: Neural radiance fields for unconstrained photo collections. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7210–7219, 2021.
[25] B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and R. Ng. Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1):99–106, 2021.
[26] T. Müller, A. Evans, C. Schied, and A. Keller. Instant neural graphics primitives with a multiresolution hash encoding. ACM Transactions on Graphics (ToG), 41(4):1–15, 2022.
[27] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, et al. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
[28] C. Reiser, S. Peng, Y. Liao, and A. Geiger. Kilonerf: Speeding up neural radiance fields with thousands of tiny mlps. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 14335–14345, 2021.
[29] B. Roessle, J. T. Barron, B. Mildenhall, P. P. Srinivasan, and M. Nießner. Dense depth priors for neural radiance fields from sparse input views. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12892– 12901, 2022.
[30] O. Ronneberger, P. Fischer, and T. Brox. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pages 234–241. Springer, 2015.
[31] Y. Siddiqui, L. Porzi, S. R. Bulò, N. Müller, M. Nießner, A. Dai, and P. Kontschieder. Panoptic lifting for 3d scene understanding with neural fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9043– 9052, 2023.
[32] J. Straub, T. Whelan, L. Ma, Y. Chen, E. Wijmans, S. Green, J. J. Engel, R. Mur- Artal, C. Ren, S. Verma, et al. The replica dataset: A digital replica of indoor spaces. arXiv preprint arXiv:1906.05797, 2019.
[33] M. Suhail, C. Esteves, L. Sigal, and A. Makadia. Generalizable patch-based neural rendering. In European Conference on Computer Vision, pages 156–174. Springer, 2022.
[34] C. Sun, M. Sun, and H.-T. Chen. Direct voxel grid optimization: Super-fast convergence for radiance fields reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5459–5469, 2022.
[35] A. Trevithick and B. Yang. Grf: Learning a general radiance field for 3d representation and rendering. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 15182–15192, 2021.
[36] S. Vora, N. Radwan, K. Greff, H. Meyer, K. Genova, M. S. Sajjadi, E. Pot, A. Tagliasacchi, and D. Duckworth. Nesf: Neural semantic fields for generalizable semantic segmentation of 3d scenes. arXiv preprint arXiv:2111.13260, 2021.
[37] B. Wang, L. Chen, and B. Yang. Dm-nerf: 3d scene geometry decomposition and manipulation from 2d images. arXiv preprint arXiv:2208.07227, 2022.
[38] P. Wang, X. Chen, T. Chen, S. Venugopalan, Z. Wang, et al. Is attention all nerf needs? arXiv preprint arXiv:2207.13298, 2022.
[39] Q. Wang, Z. Wang, K. Genova, P. P. Srinivasan, H. Zhou, J. T. Barron, R. Martin- Brualla, N. Snavely, and T. Funkhouser. Ibrnet: Learning multi-view image-based rendering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4690–4699, 2021.
[40] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing, 13(4):600–612, 2004.
[41] Z. Weng, Z. Wang, and S. Yeung. Zeroavatar: Zero-shot 3d avatar generation from a single image. arXiv preprint arXiv:2305.16411, 2023.
[42] Y. Yao, Z. Luo, S. Li, T. Fang, and L. Quan. Mvsnet: Depth inference for unstructured multi-view stereo. In Proceedings of the European conference on computer vision (ECCV), pages 767–783, 2018.
[43] A. Yu, R. Li, M. Tancik, H. Li, R. Ng, and A. Kanazawa. Plenoctrees for real-time rendering of neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5752–5761, 2021.
[44] A. Yu, V. Ye, M. Tancik, and A. Kanazawa. pixelnerf: Neural radiance fields from one or few images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4578–4587, 2021.
[45] R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 586–595, 2018.
[46] S. Zhi, T. Laidlow, S. Leutenegger, and A. J. Davison. In-place scene labelling and understanding with implicit scene representation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 15838–15847, 2021.
-
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/92674-
dc.description.abstract利用多視角輸入合成新視角圖像,神經輻射場(Neural Radiance Fields,簡稱NeRF)已成為三維視覺領域中的熱門研究課題。在本文中,我們引入了一種名為通用語義神經輻射場(Generalizable Semantic Neural Radiance Fields,簡稱GSNeRF)的方法,它在合成過程中獨特地考慮了圖像語義,從而可以為未見過的場景生成新視角圖像及其相關的語義地圖。我們的GSNeRF由兩個階段組成:語義幾何推理和深度引導的視覺渲染。前者能夠觀察多視角圖像輸入,從場景中提取語義和幾何特徵。後者在圖像幾何信息的指導下,執行圖像和語義渲染,具有更好的性能。我們的實驗不僅證實了GSNeRF在新視角圖像和語義分割合成方面優於先前的工作,而且進一步驗證了我們的採樣策略對視覺渲染的有效性。zh_TW
dc.description.abstractUtilizing multi-view inputs to synthesize novel-view images, Neural Radiance Fields (NeRF) have emerged as a popular research topic in 3D vision. In this work, we introduce a Generalizable Semantic Neural Radiance Fields (GSNeRF), which uniquely takes image semantics into the synthesis process so that both novel view image and the associated semantic maps can be produced for unseen scenes. Our GSNeRF is composed of two stages: Semantic Geo-Reasoning and Depth-Guided Visual rendering. The former is able to observe multi-view image inputs to extract semantic and geometry features from a scene. Guided by the resulting image geometry information, the latter performs both image and semantic rendering with improved performances. Our experiments not only confirm that GSNeRF performs favorably against prior works on both novel-view image and semantic segmentation synthesis but the effectiveness of our sampling strategy for visual rendering is further verified.en
dc.description.provenanceSubmitted by admin ntu (admin@lib.ntu.edu.tw) on 2024-06-04T16:06:03Z
No. of bitstreams: 0
en
dc.description.provenanceMade available in DSpace on 2024-06-04T16:06:03Z (GMT). No. of bitstreams: 0en
dc.description.tableofcontentsAcknowledgements i
摘要 iii
Abstract v
Contents vii
List of Figures xi
List of Tables xiii
Chapter 1 Introduction 1
Chapter 2 Related Work 5
2.1 Neural Radiance Fields . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Generalizable Novel View Synthesis . . . . . . . . . . . . . . . . . . 6
2.3 Multi-tasking NeRF . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Chapter 3 Brief Review of Generalizable NeRFs 9
Chapter 4 Method 11
4.1 Problem Formulation and Model Overview . . . . . . . . . . . . . . 11
4.2 Generalizable Semantic NeRF . . . . . . . . . . . . . . . . . . . . . 12
4.2.1 Semantic Geo-Reasoning . . . . . . . . . . . . . . . . . . . . . . . 12
4.2.2 Depth-Guided Visual Rendering . . . . . . . . . . . . . . . . . . . 14
4.3 Training and Inference . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.3.1 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.3.2 Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Chapter 5 Experiments 21
5.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5.2 Results and analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.2.1 Quantitative Results . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.2.2 Qualitative Results . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.2.3 Ablation Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.2.4 Sampling Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Chapter 6 Conclusion 27
References 29
Appendix A — Additional implementation Details 37
A.1 Self-Supervised Depth Loss . . . . . . . . . . . . . . . . . . . . . . 37
A.2 Target View Depth Estimation . . . . . . . . . . . . . . . . . . . . . 38
A.3 Masking Unrelated Features for Depth-Guided Visual Rendering . . . 39
A.4 Training Strategy for Depth-Guided Volume Rendering . . . . . . . . 39
A.5 More Training Details . . . . . . . . . . . . . . . . . . . . . . . . . 40
A.6 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
Appendix B — Additional Experiments and Analysis 41
B.1 Analysis of the depth-guided sampling strategy . . . . . . . . . . . . 41
B.2 Finetuning on Unseen Scenes . . . . . . . . . . . . . . . . . . . . . 42
B.3 Observations on Different Number of Source Views . . . . . . . . . 43
B.4 Compare with GeoNeRF + semhead . . . . . . . . . . . . . . . . . . 45
B.5 More Qualitative Evaluation . . . . . . . . . . . . . . . . . . . . . . 45
B.6 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
-
dc.language.isoen-
dc.subject深度學習zh_TW
dc.subject三維電腦視覺zh_TW
dc.subject神經輻射場zh_TW
dc.subject語意分割zh_TW
dc.subject3D Computer Visionen
dc.subjectDeep Learningen
dc.subjectSemantic Segmentationen
dc.subjectNeural Radiance Fielden
dc.title增強 3D 場景理解的通用語意神經輻射場zh_TW
dc.titleGSNeRF: Generalizable Semantic Neural Radiance Fields with Enhanced 3D Scene Understandingen
dc.typeThesis-
dc.date.schoolyear112-2-
dc.description.degree碩士-
dc.contributor.oralexamcommittee陳祝嵩;楊福恩zh_TW
dc.contributor.oralexamcommitteeChu-Song Chen;Fu-En Yangen
dc.subject.keyword深度學習,三維電腦視覺,神經輻射場,語意分割,zh_TW
dc.subject.keywordDeep Learning,3D Computer Vision,Neural Radiance Field,Semantic Segmentation,en
dc.relation.page46-
dc.identifier.doi10.6342/NTU202400922-
dc.rights.note同意授權(限校園內公開)-
dc.date.accepted2024-05-30-
dc.contributor.author-college電機資訊學院-
dc.contributor.author-dept電信工程學研究所-
顯示於系所單位:電信工程學研究所

文件中的檔案:
檔案 大小格式 
ntu-112-2.pdf
授權僅限NTU校內IP使用(校園外請利用VPN校外連線服務)
1.22 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved