使用神經輻射場進行同時定位與地圖建構

鍾起鳴; Chi-Ming Chung

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/91908

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	陳文進	zh_TW
dc.contributor.advisor	Wen-Chin Chen	en
dc.contributor.author	鍾起鳴	zh_TW
dc.contributor.author	Chi-Ming Chung	en
dc.date.accessioned	2024-02-26T16:24:01Z	-
dc.date.available	2024-02-27	-
dc.date.copyright	2024-02-26	-
dc.date.issued	2022	-
dc.date.submitted	2002-01-01	-
dc.identifier.citation	[1] 视觉 SLAM 十四讲: 从理论到实践. 电子工业出版社, 2017. [2] T. D. Barfoot. State Estimation for Robotics. Cambridge University Press, USA, 1st edition, 2017. [3] J. T. Barron, B. Mildenhall, M. Tancik, P. Hedman, R. Martin-Brualla, and P. P. Srinivasan. Mip-nerf: A multiscale representation for antialiasing neural radiance fields. ICCV, 2021. [4] J. T. Barron, B. Mildenhall, D. Verbin, P. P. Srinivasan, and P. Hedman. Mip-nerf 360: Unbounded antialiased neural radiance fields. CVPR, 2022. [5] G. Bradski. The OpenCV Library. Dr. Dobb’s Journal of Software Tools, 2000. [6] C. Campos, R. Elvira, J. J. G. Rodríguez, J. M. M. Montiel, and J. D. Tardós. Orb-slam3: An accurate open-source library for visual, visual–inertial, and multimap slam. IEEE Transactions on Robotics, 37(6):1874–1890, 2021. [7] A. Chen, Z. Xu, F. Zhao, X. Zhang, F. Xiang, J. Yu, and H. Su. Mvsnerf: Fast generalizable radiance field reconstruction from multiview stereo. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 14124–14133, 2021. [8] Y. Chen, Y. Chen, and G. Wang. Bundle adjustment revisited. CoRR, abs/ 1912.03858, 2019. [9] R. Clark. Volumetric bundle adjustment for online photorealistic scene capture. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 6124–6132, June 2022. [10] A. Dai, A. X. Chang, M. Savva, M. Halber, T. Funkhouser, and M. Nießner. Scannet: Richly-annotated 3d reconstructions of indoor scenes. In Proc. Computer Vision and Pattern Recognition (CVPR), IEEE, 2017. [11] J. Engel, T. Schöps, and D. Cremers. LSD-SLAM: Largescale direct monocular SLAM. In European Conference on Computer Vision (ECCV), September 2014. [12] G. Guennebaud, B. Jacob, et al. Eigen v3. http://eigen.tuxfamily.org, 2010. [13] J. Huang, S.-S. Huang, H. Song, and S.M. Hu. Di-fusion: Online implicit 3d reconstruction with deep priors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021. [14] Y. Jeong, S. Ahn, C. Choy, A. Anandkumar, M. Cho, and J. Park. Self-calibrating neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 5846–5854, October 2021. [15] L. Koestler, N. Yang, N. Zeller, and D. Cremers. Tandem: Tracking and dense mapping in real-time using deep multiview stereo. In Conference on Robot Learning (CoRL), 2021. [16] J. Kopf, X. Rong, and J.-B. Huang. Robust consistent video depth estimation. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 1611–1621, 2021. [17] R. Kümmerle, G. Grisetti, H. Strasdat, K. Konolige, and W. Burgard. G2o: A general framework for graph optimization. In 2011 IEEE International Conference on Robotics and Automation, pages 3607–3613, 2011. [18] C.-H. Lin, W.-C. Ma, A. Torralba, and S. Lucey. Barf: Bundle-adjusting neural radiance fields. In IEEE International Conference on Computer Vision (ICCV), 2021. [19] L. Liu, J. Gu, K. Z. Lin, T.-S. Chua, and C. Theobalt. Neural sparse voxel fields. NeurIPS, 2020. [20] X. Luo, J. Huang, R. Szeliski, K. Matzen, and J. Kopf. Consistent video depth estimation. 39(4), 2020. [21] R. Martin-Brualla, N. Radwan, M. S. M. Sajjadi, J. T. Barron, A. Dosovitskiy, and D. Duckworth. NeRF in the Wild: Neural Radiance Fields for Unconstrained Photo Collections. In CVPR, 2021. [22] N. Max. Optical models for direct volume rendering. IEEE Transactions on Visualization and Computer Graphics, 1(2):99–108, 1995. [23] B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and R. Ng. Nerf: Representing scenes as neural radiance fields for view synthesis. In ECCV, 2020. [24] T. Müller. Tiny CUDA neural network framework, 2021. https://github.com/nvlabs/ tiny-cuda-nn. [25] T. Müller, A. Evans, C. Schied, and A. Keller. Instant neural graphics primitives with a multiresolution hash encoding. ACM Trans. Graph., 41(4):102:1–102:15, July 2022. [26] R. Mur-Artal, J. M. M. Montiel, and J. D. Tardós. Orb-slam: A versatile and accurate monocular slam system. IEEE Transactions on Robotics, 31(5):1147–1163, 2015. [27] R. Mur-Artal and J. D. Tardós. ORB-SLAM2: an open-source SLAM system for monocular, stereo and RGBD cameras. IEEE Transactions on Robotics, 33(5):1255–1262, 2017. [28] J. Ortiz, A. Clegg, J. Dong, E. Sucar, D. Novotny, M. Zollhoefer, and M. Mukadam. isdf: Real-time neural signed distance fields for robot perception. In Robotics: Science and Systems, 2022. [29] J. J. Park, P. Florence, J. Straub, R. Newcombe, and S. Lovegrove. Deepsdf: Learning continuous signed distance functions for shape representation. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019. [30] Sara Fridovich-Keil and Alex Yu, M. Tancik, Q. Chen, B. Recht, and A. Kanazawa. Plenoxels: Radiance fields without neural networks. In CVPR, 2022. [31] J. L. Schönberger and J.-M. Frahm. Structure-from-motion revisited. In Conference on Computer Vision and Pattern Recognition (CVPR), 2016. [32] J. L. Schönberger, E. Zheng, M. Pollefeys, and J.-M. Frahm. Pixelwise view selection for unstructured multi-view stereo. In European Conference on Computer Vision (ECCV), 2016. [33] T. Schöps, T. Sattler, and M. Pollefeys. Bad slam: Bundle adjusted direct rgb-d slam. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 134–144, 2019. [34] J. Straub, T. Whelan, L. Ma, Y. Chen, E. Wijmans, S. Green, J. J. Engel, R. MurArtal, C. Ren, S. Verma, A. Clarkson, M. Yan, B. Budge, Y. Yan, X. Pan, J. Yon, Y. Zou, K. Leon, N. Carter, J. Briales, T. Gillingham, E. Mueggler, L. Pesqueira, M. Savva, D. Batra, H. M. Strasdat, R. D. Nardi, M. Goesele, S. Lovegrove, and R. Newcombe. The Replica dataset: A digital replica of indoor spaces. arXiv preprint arXiv:1906.05797, 2019. [35] J. Sturm, N. Engelhard, F. Endres, W. Burgard, and D. Cremers. A benchmark for the evaluation of rgb-d slam systems. In Proc. of the International Conference on Intelligent Robot Systems (IROS), Oct. 2012. [36] E. Sucar, S. Liu, J. Ortiz, and A. Davison. iMAP: Implicit mapping and positioning in real-time. In Proceedings of the International Conference on Computer Vision (ICCV), 2021. [37] C. Sun, M. Sun, and H. Chen. Direct voxel grid optimization: Super-fast convergence for radiance fields reconstruction. In CVPR, 2022. [38] C. Sun, M. Sun, and H.-T. Chen. Improved direct voxel grid optimization for radiance fields reconstruction, 2022. [39] T. Takikawa, J. Litalien, K. Yin, K. Kreis, C. Loop, D. Nowrouzezahrai, A. Jacobson, M. McGuire, and S. Fidler. Neural geometric level of detail: Real-time rendering with implicit 3D shapes. 2021. [40] M. Tancik, V. Casser, X. Yan, S. Pradhan, B. Mildenhall, P. P. Srinivasan, J. T. Barron, and H. Kretzschmar. Block-nerf: Scalable large scene neural view synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8248–8258, June 2022. [41] Z. Teed and J. Deng. Tangent space backpropagation for 3d transformation groups. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021. [42] I. Ueda, Y. Fukuhara, H. Kataoka, H. Aizawa, H. Shishido, and I. Kitahara. Neural density-distance fields. In Proceedings of the European Conference on Computer Vision, 2022. [43] Z. Wang, S. Wu, W. Xie, M. Chen, and V. A. Prisacariu. NeRF−−: Neural radiance fields without known camera parameters. arXiv preprint arXiv:2102.07064, 2021. [44] Y. Wei, S. Liu, Y. Rao, W. Zhao, J. Lu, and J. Zhou. Nerfingmvs: Guided optimization of neural radiance fields for indoor multiview stereo. In ICCV, 2021. [45] T. Whelan, M. Kaess, M. Fallon, H. Johannsson, J. Leonard, and J. McDonald. Kintinuous: Spatially extended kinectfusion. 2012. [46] Y. Yao, Z. Luo, S. Li, T. Fang, and L. Quan. Mvsnet: Depth inference for unstructured multiview stereo. European Conference on Computer Vision (ECCV), 2018. [47] X. Zhang, S. Bi, K. Sunkavalli, H. Su, and Z. Xu. Nerfusion: Fusing radiance fields for large-scale scene reconstruction. CVPR, 2022. [48] S. Zhi, E. Sucar, A. Mouton, I. Haughton, T. Laidlow, and A. J. Davison. Ilabel: Interactive neural scene labelling, 2021. [49] T. Zhou, M. Brown, N. Snavely, and D. G. Lowe. Unsupervised learning of depth and ego-motion from video. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 6612–6619, 2017. [50] Z. Zhu, S. Peng, V. Larsson, W. Xu, H. Bao, Z. Cui, M. R. Oswald, and M. Pollefeys. Niceslam: Neural implicit scalable encoding for slam. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/91908	-
dc.description.abstract	這篇論文收集了我對神經輻射場 (NeRF) 和同步定位與映射 (SLAM) 的經驗和見解，並涵蓋了一篇提交給機器人會議的論文。我們提出了適用於spatial AI 的 Orbeez-SLAM。一種可以通過視覺信號執行複雜任務並與人類合作的spatial AI是值得期待的。為了實現這一點，我們需要一個無需預訓練即可輕鬆適應新場景並實時為下游任務生成密集地圖的視覺SLAM。在這項工作中，我們開發了一個名為 Orbeez-SLAM 的視覺 SLAM，它成功地與 NeRF 和視覺里程計合作來實現我們的目標。此外，Orbeez-SLAM 可以與單目相機配合使用，因為它只需要 RGB 輸入，使其廣泛適用於真實世界。	zh_TW
dc.description.abstract	This thesis collects my experiences and insights about the Neural Radiance Field (NERF) and Simultaneous Localization and Mapping (SLAM) and covers a paper submitted to a robotic conference. We propose Orbeez-SLAM applicable to spatial AI. A spatial AI that can perform complex tasks through visual signals and cooperate with humans is anticipated. To achieve this, we need a visual SLAM that easily adapts to new scenes without pre-training and generates dense maps for downstream tasks in real-time. In this work, we develop a visual SLAM named Orbeez-SLAM, which successfully collaborates with NeRF and visual odometry to achieve our goals. Moreover, Orbeez-SLAM can work with the monocular camera since it only needs RGB inputs, making it widely applicable to the real world.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2024-02-26T16:24:01Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2024-02-26T16:24:01Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	Verification Letter from the Oral Examination Committee i Acknowledgements iii 摘要 v Abstract vii Contents ix List of Figures xiii List of Tables xv Denotation xvii Chapter 1 Introduction 1 1.1 ORB-SLAM2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 instant-ngp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3.1 Imitation Learning . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3.2 Our NeRF-SLAM . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.4 Organization of This Thesis . . . . . . . . . . . . . . . . . . . . . . 5 1.5 Source Code Management . . . . . . . . . . . . . . . . . . . . . . . 6 Chapter 2 NeRF 7 2.1 Implicit Neural Representation . . . . . . . . . . . . . . . . . . . . . 7 2.2 Differentiable Volume Rendering . . . . . . . . . . . . . . . . . . . 8 2.3 Density Grid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.4 NeRF Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Chapter 3 vSLAM 15 3.1 Traditional vSLAM . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.1.1 Reprojection Error . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.1.2 Photometric Error . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.2 NeRF-SLAM Works . . . . . . . . . . . . . . . . . . . . . . . . . . 17 Chapter 4 Orbeez-SLAM 19 4.1 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 4.2 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 4.2.1 Pose Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 4.2.2 Bundle Adjustment . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.2.3 NeRF Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.3 Optimize Camera Pose from NeRF . . . . . . . . . . . . . . . . . . 24 4.3.1 Lie Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 4.3.2 Lie Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4.3.3 Perturbation Model of Lie Algebra (Left multiplication) . . . . . . . 28 4.3.4 Backpropagate from Loss to Camera Pose . . . . . . . . . . . . . . 28 4.4 Ray-casting Triangulation . . . . . . . . . . . . . . . . . . . . . . . 32 Chapter 5 Experiments 35 5.1 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 5.1.1 Implementation Details. . . . . . . . . . . . . . . . . . . . . . . . . 35 5.1.2 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 5.1.3 Baselines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 5.1.4 Evaluation Settings . . . . . . . . . . . . . . . . . . . . . . . . . . 36 5.1.5 Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 5.2 Quantitative Results . . . . . . . . . . . . . . . . . . . . . . . . . . 38 5.2.1 Evaluation on TUM RGB-D . . . . . . . . . . . . . . . . . . . . . 38 5.2.2 Evaluation on ScanNet . . . . . . . . . . . . . . . . . . . . . . . . 39 5.2.3 Evaluation on Replica . . . . . . . . . . . . . . . . . . . . . . . . . 39 5.2.4 Runtime Comparison . . . . . . . . . . . . . . . . . . . . . . . . . 40 5.3 Qualitative Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 5.4 Ablation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 5.4.1 Photometric Loss . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 5.4.2 Distortion Loss . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 5.5 Failure Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 5.5.1 Ray-casting Triangulation . . . . . . . . . . . . . . . . . . . . . . . 45 5.5.2 Depth Supervision . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 5.5.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 Chapter 6 Conclusion 49 Chapter 7 Future works 51 7.1 Depth Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 7.2 Minimizing Photometric Error from Direct SLAM with Ray-casting Triangulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 7.3 Second-order Optimizer . . . . . . . . . . . . . . . . . . . . . . . . 53 7.4 NeDDF Reprojection Error . . . . . . . . . . . . . . . . . . . . . . . 54 References 55 List of Figures 2.1 Visualization of the Continuous Formula . . . . . . . . . . . . . . . . 9 2.2 Visualization of the Discrete Formula . . . . . . . . . . . . . . . . . . 9 2.3 Visualization of the Weight Formula . . . . . . . . . . . . . . . . . . . 10 2.4 Skip Voxel Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.1 ORB Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 4.1 Orbeez-SLAM Process . . . . . . . . . . . . . . . . . . . . . . . . . . 19 4.2 System Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 4.3 NeRF Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.4 Ray-casting Triangulation in NeRF . . . . . . . . . . . . . . . . . . . 32 4.5 Point Clouds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 5.1 Comparison of Rendering Results . . . . . . . . . . . . . . . . . . . . 41 5.2 NeRF Results across Time . . . . . . . . . . . . . . . . . . . . . . . . 42 5.3 Ablation Study for Photometric Loss . . . . . . . . . . . . . . . . . . 43 5.4 Visualization of w/o and w/ Distortion Loss. . . . . . . . . . . . . . . . 45 5.5 Density Grid Failure with NeRF . . . . . . . . . . . . . . . . . . . . . 46 List of Tables 5.1 Tracking Results on TUM RGB-D . . . . . . . . . . . . . . . . . . . . 38 5.2 Tracking Results on ScanNet . . . . . . . . . . . . . . . . . . . . . . . 39 5.3 Reconstruction Results on Replica . . . . . . . . . . . . . . . . . . . . 40 5.4 Runtime Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 5.5 Ablation Study for Photometric Loss . . . . . . . . . . . . . . . . . . 43 5.6 Distortion Loss on Replica office0 . . . . . . . . . . . . . . . . . . . . 45	-
dc.language.iso	en	-
dc.title	使用神經輻射場進行同時定位與地圖建構	zh_TW
dc.title	SLAM system with NeRF mapping	en
dc.type	Thesis	-
dc.date.schoolyear	110-2	-
dc.description.degree	碩士	-
dc.contributor.coadvisor	徐宏民	zh_TW
dc.contributor.coadvisor	Winston Hsu	en
dc.contributor.oralexamcommittee	葉梅珍;陳奕廷	zh_TW
dc.contributor.oralexamcommittee	Mei-Chen Yeh;Yi-Ting Chen	en
dc.subject.keyword	神經輻射場,同時定位與地圖建構,電腦視覺,機器人學,深度學習,	zh_TW
dc.subject.keyword	Neural Radiance Field (NeRF),Simultaneous Localization and Mapping (SLAM),Computer Vision,Robotics,Deep Learning,	en
dc.relation.page	61	-
dc.identifier.doi	10.6342/NTU202204136	-
dc.rights.note	同意授權(全球公開)	-
dc.date.accepted	2022-09-30	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	資訊網路與多媒體研究所	-
顯示於系所單位：	資訊網路與多媒體研究所

文件中的檔案：

檔案	大小	格式
ntu-110-2.pdf	16.12 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。