請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/91908
完整後設資料紀錄
DC 欄位 | 值 | 語言 |
---|---|---|
dc.contributor.advisor | 陳文進 | zh_TW |
dc.contributor.advisor | Wen-Chin Chen | en |
dc.contributor.author | 鍾起鳴 | zh_TW |
dc.contributor.author | Chi-Ming Chung | en |
dc.date.accessioned | 2024-02-26T16:24:01Z | - |
dc.date.available | 2024-02-27 | - |
dc.date.copyright | 2024-02-26 | - |
dc.date.issued | 2022 | - |
dc.date.submitted | 2002-01-01 | - |
dc.identifier.citation | [1] 视觉 SLAM 十四讲: 从理论到实践. 电子工业出版社, 2017. [2] T. D. Barfoot. State Estimation for Robotics. Cambridge University Press, USA, 1st edition, 2017. [3] J. T. Barron, B. Mildenhall, M. Tancik, P. Hedman, R. Martin-Brualla, and P. P. Srinivasan. Mip-nerf: A multiscale representation for antialiasing neural radiance fields. ICCV, 2021. [4] J. T. Barron, B. Mildenhall, D. Verbin, P. P. Srinivasan, and P. Hedman. Mip-nerf 360: Unbounded antialiased neural radiance fields. CVPR, 2022. [5] G. Bradski. The OpenCV Library. Dr. Dobb’s Journal of Software Tools, 2000. [6] C. Campos, R. Elvira, J. J. G. Rodríguez, J. M. M. Montiel, and J. D. Tardós. Orb-slam3: An accurate open-source library for visual, visual–inertial, and multimap slam. IEEE Transactions on Robotics, 37(6):1874–1890, 2021. [7] A. Chen, Z. Xu, F. Zhao, X. Zhang, F. Xiang, J. Yu, and H. Su. Mvsnerf: Fast generalizable radiance field reconstruction from multiview stereo. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 14124–14133, 2021. [8] Y. Chen, Y. Chen, and G. Wang. Bundle adjustment revisited. CoRR, abs/ 1912.03858, 2019. [9] R. Clark. Volumetric bundle adjustment for online photorealistic scene capture. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 6124–6132, June 2022. [10] A. Dai, A. X. Chang, M. Savva, M. Halber, T. Funkhouser, and M. Nießner. Scannet: Richly-annotated 3d reconstructions of indoor scenes. In Proc. Computer Vision and Pattern Recognition (CVPR), IEEE, 2017. [11] J. Engel, T. Schöps, and D. Cremers. LSD-SLAM: Largescale direct monocular SLAM. In European Conference on Computer Vision (ECCV), September 2014. [12] G. Guennebaud, B. Jacob, et al. Eigen v3. http://eigen.tuxfamily.org, 2010. [13] J. Huang, S.-S. Huang, H. Song, and S.M. Hu. Di-fusion: Online implicit 3d reconstruction with deep priors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021. [14] Y. Jeong, S. Ahn, C. Choy, A. Anandkumar, M. Cho, and J. Park. Self-calibrating neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 5846–5854, October 2021. [15] L. Koestler, N. Yang, N. Zeller, and D. Cremers. Tandem: Tracking and dense mapping in real-time using deep multiview stereo. In Conference on Robot Learning (CoRL), 2021. [16] J. Kopf, X. Rong, and J.-B. Huang. Robust consistent video depth estimation. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 1611–1621, 2021. [17] R. Kümmerle, G. Grisetti, H. Strasdat, K. Konolige, and W. Burgard. G2o: A general framework for graph optimization. In 2011 IEEE International Conference on Robotics and Automation, pages 3607–3613, 2011. [18] C.-H. Lin, W.-C. Ma, A. Torralba, and S. Lucey. Barf: Bundle-adjusting neural radiance fields. In IEEE International Conference on Computer Vision (ICCV), 2021. [19] L. Liu, J. Gu, K. Z. Lin, T.-S. Chua, and C. Theobalt. Neural sparse voxel fields. NeurIPS, 2020. [20] X. Luo, J. Huang, R. Szeliski, K. Matzen, and J. Kopf. Consistent video depth estimation. 39(4), 2020. [21] R. Martin-Brualla, N. Radwan, M. S. M. Sajjadi, J. T. Barron, A. Dosovitskiy, and D. Duckworth. NeRF in the Wild: Neural Radiance Fields for Unconstrained Photo Collections. In CVPR, 2021. [22] N. Max. Optical models for direct volume rendering. IEEE Transactions on Visualization and Computer Graphics, 1(2):99–108, 1995. [23] B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and R. Ng. Nerf: Representing scenes as neural radiance fields for view synthesis. In ECCV, 2020. [24] T. Müller. Tiny CUDA neural network framework, 2021. https://github.com/nvlabs/ tiny-cuda-nn. [25] T. Müller, A. Evans, C. Schied, and A. Keller. Instant neural graphics primitives with a multiresolution hash encoding. ACM Trans. Graph., 41(4):102:1–102:15, July 2022. [26] R. Mur-Artal, J. M. M. Montiel, and J. D. Tardós. Orb-slam: A versatile and accurate monocular slam system. IEEE Transactions on Robotics, 31(5):1147–1163, 2015. [27] R. Mur-Artal and J. D. Tardós. ORB-SLAM2: an open-source SLAM system for monocular, stereo and RGBD cameras. IEEE Transactions on Robotics, 33(5):1255–1262, 2017. [28] J. Ortiz, A. Clegg, J. Dong, E. Sucar, D. Novotny, M. Zollhoefer, and M. Mukadam. isdf: Real-time neural signed distance fields for robot perception. In Robotics: Science and Systems, 2022. [29] J. J. Park, P. Florence, J. Straub, R. Newcombe, and S. Lovegrove. Deepsdf: Learning continuous signed distance functions for shape representation. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019. [30] Sara Fridovich-Keil and Alex Yu, M. Tancik, Q. Chen, B. Recht, and A. Kanazawa. Plenoxels: Radiance fields without neural networks. In CVPR, 2022. [31] J. L. Schönberger and J.-M. Frahm. Structure-from-motion revisited. In Conference on Computer Vision and Pattern Recognition (CVPR), 2016. [32] J. L. Schönberger, E. Zheng, M. Pollefeys, and J.-M. Frahm. Pixelwise view selection for unstructured multi-view stereo. In European Conference on Computer Vision (ECCV), 2016. [33] T. Schöps, T. Sattler, and M. Pollefeys. Bad slam: Bundle adjusted direct rgb-d slam. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 134–144, 2019. [34] J. Straub, T. Whelan, L. Ma, Y. Chen, E. Wijmans, S. Green, J. J. Engel, R. MurArtal, C. Ren, S. Verma, A. Clarkson, M. Yan, B. Budge, Y. Yan, X. Pan, J. Yon, Y. Zou, K. Leon, N. Carter, J. Briales, T. Gillingham, E. Mueggler, L. Pesqueira, M. Savva, D. Batra, H. M. Strasdat, R. D. Nardi, M. Goesele, S. Lovegrove, and R. Newcombe. The Replica dataset: A digital replica of indoor spaces. arXiv preprint arXiv:1906.05797, 2019. [35] J. Sturm, N. Engelhard, F. Endres, W. Burgard, and D. Cremers. A benchmark for the evaluation of rgb-d slam systems. In Proc. of the International Conference on Intelligent Robot Systems (IROS), Oct. 2012. [36] E. Sucar, S. Liu, J. Ortiz, and A. Davison. iMAP: Implicit mapping and positioning in real-time. In Proceedings of the International Conference on Computer Vision (ICCV), 2021. [37] C. Sun, M. Sun, and H. Chen. Direct voxel grid optimization: Super-fast convergence for radiance fields reconstruction. In CVPR, 2022. [38] C. Sun, M. Sun, and H.-T. Chen. Improved direct voxel grid optimization for radiance fields reconstruction, 2022. [39] T. Takikawa, J. Litalien, K. Yin, K. Kreis, C. Loop, D. Nowrouzezahrai, A. Jacobson, M. McGuire, and S. Fidler. Neural geometric level of detail: Real-time rendering with implicit 3D shapes. 2021. [40] M. Tancik, V. Casser, X. Yan, S. Pradhan, B. Mildenhall, P. P. Srinivasan, J. T. Barron, and H. Kretzschmar. Block-nerf: Scalable large scene neural view synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8248–8258, June 2022. [41] Z. Teed and J. Deng. Tangent space backpropagation for 3d transformation groups. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021. [42] I. Ueda, Y. Fukuhara, H. Kataoka, H. Aizawa, H. Shishido, and I. Kitahara. Neural density-distance fields. In Proceedings of the European Conference on Computer Vision, 2022. [43] Z. Wang, S. Wu, W. Xie, M. Chen, and V. A. Prisacariu. NeRF−−: Neural radiance fields without known camera parameters. arXiv preprint arXiv:2102.07064, 2021. [44] Y. Wei, S. Liu, Y. Rao, W. Zhao, J. Lu, and J. Zhou. Nerfingmvs: Guided optimization of neural radiance fields for indoor multiview stereo. In ICCV, 2021. [45] T. Whelan, M. Kaess, M. Fallon, H. Johannsson, J. Leonard, and J. McDonald. Kintinuous: Spatially extended kinectfusion. 2012. [46] Y. Yao, Z. Luo, S. Li, T. Fang, and L. Quan. Mvsnet: Depth inference for unstructured multiview stereo. European Conference on Computer Vision (ECCV), 2018. [47] X. Zhang, S. Bi, K. Sunkavalli, H. Su, and Z. Xu. Nerfusion: Fusing radiance fields for large-scale scene reconstruction. CVPR, 2022. [48] S. Zhi, E. Sucar, A. Mouton, I. Haughton, T. Laidlow, and A. J. Davison. Ilabel: Interactive neural scene labelling, 2021. [49] T. Zhou, M. Brown, N. Snavely, and D. G. Lowe. Unsupervised learning of depth and ego-motion from video. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 6612–6619, 2017. [50] Z. Zhu, S. Peng, V. Larsson, W. Xu, H. Bao, Z. Cui, M. R. Oswald, and M. Pollefeys. Niceslam: Neural implicit scalable encoding for slam. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022. | - |
dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/91908 | - |
dc.description.abstract | 這篇論文收集了我對神經輻射場 (NeRF) 和同步定位與映射 (SLAM) 的經驗和見解,並涵蓋了一篇提交給機器人會議的論文。我們提出了適用於spatial AI 的 Orbeez-SLAM。 一種可以通過視覺信號執行複雜任務並與人類合作的spatial AI是值得期待的。為了實現這一點,我們需要一個無需預訓練即可輕鬆適應新場景並實時為下游任務生成密集地圖的視覺SLAM。在這項工作中,我們開發了一個名為 Orbeez-SLAM 的視覺 SLAM,它成功地與 NeRF 和視覺里程計合作來實現我們的目標。此外,Orbeez-SLAM 可以與單目相機配合使用,因為它只需要 RGB 輸入,使其廣泛適用於真實世界。 | zh_TW |
dc.description.abstract | This thesis collects my experiences and insights about the Neural Radiance Field (NERF) and Simultaneous Localization and Mapping (SLAM) and covers a paper submitted to a robotic conference. We propose Orbeez-SLAM applicable to spatial AI. A spatial AI that can perform complex tasks through visual signals and cooperate with humans is anticipated. To achieve this, we need a visual SLAM that easily adapts to new scenes without pre-training and generates dense maps for downstream tasks in real-time. In this work, we develop a visual SLAM named Orbeez-SLAM, which successfully collaborates with NeRF and visual odometry to achieve our goals. Moreover, Orbeez-SLAM can work with the monocular camera since it only needs RGB inputs, making it widely applicable to the real world. | en |
dc.description.provenance | Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2024-02-26T16:24:01Z No. of bitstreams: 0 | en |
dc.description.provenance | Made available in DSpace on 2024-02-26T16:24:01Z (GMT). No. of bitstreams: 0 | en |
dc.description.tableofcontents | Verification Letter from the Oral Examination Committee i Acknowledgements iii 摘要 v Abstract vii Contents ix List of Figures xiii List of Tables xv Denotation xvii Chapter 1 Introduction 1 1.1 ORB-SLAM2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 instant-ngp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3.1 Imitation Learning . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3.2 Our NeRF-SLAM . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.4 Organization of This Thesis . . . . . . . . . . . . . . . . . . . . . . 5 1.5 Source Code Management . . . . . . . . . . . . . . . . . . . . . . . 6 Chapter 2 NeRF 7 2.1 Implicit Neural Representation . . . . . . . . . . . . . . . . . . . . . 7 2.2 Differentiable Volume Rendering . . . . . . . . . . . . . . . . . . . 8 2.3 Density Grid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.4 NeRF Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Chapter 3 vSLAM 15 3.1 Traditional vSLAM . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.1.1 Reprojection Error . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.1.2 Photometric Error . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.2 NeRF-SLAM Works . . . . . . . . . . . . . . . . . . . . . . . . . . 17 Chapter 4 Orbeez-SLAM 19 4.1 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 4.2 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 4.2.1 Pose Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 4.2.2 Bundle Adjustment . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.2.3 NeRF Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.3 Optimize Camera Pose from NeRF . . . . . . . . . . . . . . . . . . 24 4.3.1 Lie Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 4.3.2 Lie Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4.3.3 Perturbation Model of Lie Algebra (Left multiplication) . . . . . . . 28 4.3.4 Backpropagate from Loss to Camera Pose . . . . . . . . . . . . . . 28 4.4 Ray-casting Triangulation . . . . . . . . . . . . . . . . . . . . . . . 32 Chapter 5 Experiments 35 5.1 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 5.1.1 Implementation Details. . . . . . . . . . . . . . . . . . . . . . . . . 35 5.1.2 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 5.1.3 Baselines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 5.1.4 Evaluation Settings . . . . . . . . . . . . . . . . . . . . . . . . . . 36 5.1.5 Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 5.2 Quantitative Results . . . . . . . . . . . . . . . . . . . . . . . . . . 38 5.2.1 Evaluation on TUM RGB-D . . . . . . . . . . . . . . . . . . . . . 38 5.2.2 Evaluation on ScanNet . . . . . . . . . . . . . . . . . . . . . . . . 39 5.2.3 Evaluation on Replica . . . . . . . . . . . . . . . . . . . . . . . . . 39 5.2.4 Runtime Comparison . . . . . . . . . . . . . . . . . . . . . . . . . 40 5.3 Qualitative Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 5.4 Ablation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 5.4.1 Photometric Loss . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 5.4.2 Distortion Loss . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 5.5 Failure Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 5.5.1 Ray-casting Triangulation . . . . . . . . . . . . . . . . . . . . . . . 45 5.5.2 Depth Supervision . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 5.5.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 Chapter 6 Conclusion 49 Chapter 7 Future works 51 7.1 Depth Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 7.2 Minimizing Photometric Error from Direct SLAM with Ray-casting Triangulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 7.3 Second-order Optimizer . . . . . . . . . . . . . . . . . . . . . . . . 53 7.4 NeDDF Reprojection Error . . . . . . . . . . . . . . . . . . . . . . . 54 References 55 List of Figures 2.1 Visualization of the Continuous Formula . . . . . . . . . . . . . . . . 9 2.2 Visualization of the Discrete Formula . . . . . . . . . . . . . . . . . . 9 2.3 Visualization of the Weight Formula . . . . . . . . . . . . . . . . . . . 10 2.4 Skip Voxel Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.1 ORB Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 4.1 Orbeez-SLAM Process . . . . . . . . . . . . . . . . . . . . . . . . . . 19 4.2 System Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 4.3 NeRF Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.4 Ray-casting Triangulation in NeRF . . . . . . . . . . . . . . . . . . . 32 4.5 Point Clouds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 5.1 Comparison of Rendering Results . . . . . . . . . . . . . . . . . . . . 41 5.2 NeRF Results across Time . . . . . . . . . . . . . . . . . . . . . . . . 42 5.3 Ablation Study for Photometric Loss . . . . . . . . . . . . . . . . . . 43 5.4 Visualization of w/o and w/ Distortion Loss. . . . . . . . . . . . . . . . 45 5.5 Density Grid Failure with NeRF . . . . . . . . . . . . . . . . . . . . . 46 List of Tables 5.1 Tracking Results on TUM RGB-D . . . . . . . . . . . . . . . . . . . . 38 5.2 Tracking Results on ScanNet . . . . . . . . . . . . . . . . . . . . . . . 39 5.3 Reconstruction Results on Replica . . . . . . . . . . . . . . . . . . . . 40 5.4 Runtime Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 5.5 Ablation Study for Photometric Loss . . . . . . . . . . . . . . . . . . 43 5.6 Distortion Loss on Replica office0 . . . . . . . . . . . . . . . . . . . . 45 | - |
dc.language.iso | en | - |
dc.title | 使用神經輻射場進行同時定位與地圖建構 | zh_TW |
dc.title | SLAM system with NeRF mapping | en |
dc.type | Thesis | - |
dc.date.schoolyear | 110-2 | - |
dc.description.degree | 碩士 | - |
dc.contributor.coadvisor | 徐宏民 | zh_TW |
dc.contributor.coadvisor | Winston Hsu | en |
dc.contributor.oralexamcommittee | 葉梅珍;陳奕廷 | zh_TW |
dc.contributor.oralexamcommittee | Mei-Chen Yeh;Yi-Ting Chen | en |
dc.subject.keyword | 神經輻射場,同時定位與地圖建構,電腦視覺,機器人學,深度學習, | zh_TW |
dc.subject.keyword | Neural Radiance Field (NeRF),Simultaneous Localization and Mapping (SLAM),Computer Vision,Robotics,Deep Learning, | en |
dc.relation.page | 61 | - |
dc.identifier.doi | 10.6342/NTU202204136 | - |
dc.rights.note | 同意授權(全球公開) | - |
dc.date.accepted | 2022-09-30 | - |
dc.contributor.author-college | 電機資訊學院 | - |
dc.contributor.author-dept | 資訊網路與多媒體研究所 | - |
顯示於系所單位: | 資訊網路與多媒體研究所 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-110-2.pdf | 16.12 MB | Adobe PDF | 檢視/開啟 |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。