Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊網路與多媒體研究所
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/91908
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor陳文進zh_TW
dc.contributor.advisorWen-Chin Chenen
dc.contributor.author鍾起鳴zh_TW
dc.contributor.authorChi-Ming Chungen
dc.date.accessioned2024-02-26T16:24:01Z-
dc.date.available2024-02-27-
dc.date.copyright2024-02-26-
dc.date.issued2022-
dc.date.submitted2002-01-01-
dc.identifier.citation[1] 视觉 SLAM 十四讲: 从理论到实践. 电子工业出版社, 2017.
[2] T. D. Barfoot. State Estimation for Robotics. Cambridge University Press, USA, 1st edition, 2017.
[3] J. T. Barron, B. Mildenhall, M. Tancik, P. Hedman, R. Martin­-Brualla, and P. P. Srinivasan. Mip-­nerf: A multiscale representation for anti­aliasing neural radiance fields. ICCV, 2021.
[4] J. T. Barron, B. Mildenhall, D. Verbin, P. P. Srinivasan, and P. Hedman. Mip­-nerf 360: Unbounded anti­aliased neural radiance fields. CVPR, 2022.
[5] G. Bradski. The OpenCV Library. Dr. Dobb’s Journal of Software Tools, 2000.
[6] C. Campos, R. Elvira, J. J. G. Rodríguez, J. M. M. Montiel, and J. D. Tardós. Orb-slam3: An accurate open-­source library for visual, visual–inertial, and multimap slam. IEEE Transactions on Robotics, 37(6):1874–1890, 2021.
[7] A. Chen, Z. Xu, F. Zhao, X. Zhang, F. Xiang, J. Yu, and H. Su. Mvsnerf: Fast
generalizable radiance field reconstruction from multi­view stereo. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 14124–14133, 2021.
[8] Y. Chen, Y. Chen, and G. Wang. Bundle adjustment revisited. CoRR, abs/ 1912.03858, 2019.
[9] R. Clark. Volumetric bundle adjustment for online photorealistic scene capture. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 6124–6132, June 2022.
[10] A. Dai, A. X. Chang, M. Savva, M. Halber, T. Funkhouser, and M. Nießner. Scannet: Richly­-annotated 3d reconstructions of indoor scenes. In Proc. Computer Vision and Pattern Recognition (CVPR), IEEE, 2017.
[11] J. Engel, T. Schöps, and D. Cremers. LSD­-SLAM: Large­scale direct monocular
SLAM. In European Conference on Computer Vision (ECCV), September 2014.
[12] G. Guennebaud, B. Jacob, et al. Eigen v3. http://eigen.tuxfamily.org, 2010.
[13] J. Huang, S.­-S. Huang, H. Song, and S.­M. Hu. Di­-fusion: Online implicit 3d reconstruction with deep priors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021.
[14] Y. Jeong, S. Ahn, C. Choy, A. Anandkumar, M. Cho, and J. Park. Self-calibrating neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 5846–5854, October 2021.
[15] L. Koestler, N. Yang, N. Zeller, and D. Cremers. Tandem: Tracking and dense mapping in real-­time using deep multi­view stereo. In Conference on Robot Learning (CoRL), 2021.
[16] J. Kopf, X. Rong, and J.-­B. Huang. Robust consistent video depth estimation. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 1611–1621, 2021.
[17] R. Kümmerle, G. Grisetti, H. Strasdat, K. Konolige, and W. Burgard. G2o: A general framework for graph optimization. In 2011 IEEE International Conference on Robotics and Automation, pages 3607–3613, 2011.
[18] C.­-H. Lin, W.-­C. Ma, A. Torralba, and S. Lucey. Barf: Bundle-­adjusting neural radiance fields. In IEEE International Conference on Computer Vision (ICCV), 2021.
[19] L. Liu, J. Gu, K. Z. Lin, T.­-S. Chua, and C. Theobalt. Neural sparse voxel fields. NeurIPS, 2020.
[20] X. Luo, J. Huang, R. Szeliski, K. Matzen, and J. Kopf. Consistent video depth estimation. 39(4), 2020.
[21] R. Martin­-Brualla, N. Radwan, M. S. M. Sajjadi, J. T. Barron, A. Dosovitskiy, and D. Duckworth. NeRF in the Wild: Neural Radiance Fields for Unconstrained Photo Collections. In CVPR, 2021.
[22] N. Max. Optical models for direct volume rendering. IEEE Transactions on
Visualization and Computer Graphics, 1(2):99–108, 1995.
[23] B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and R. Ng. Nerf: Representing scenes as neural radiance fields for view synthesis. In ECCV, 2020.
[24] T. Müller. Tiny CUDA neural network framework, 2021. https://github.com/nvlabs/ tiny-­cuda-­nn.
[25] T. Müller, A. Evans, C. Schied, and A. Keller. Instant neural graphics primitives with a multiresolution hash encoding. ACM Trans. Graph., 41(4):102:1–102:15, July 2022.
[26] R. Mur­-Artal, J. M. M. Montiel, and J. D. Tardós. Orb­-slam: A versatile and accurate monocular slam system. IEEE Transactions on Robotics, 31(5):1147–1163, 2015.
[27] R. Mur­-Artal and J. D. Tardós. ORB­-SLAM2: an open-­source SLAM system
for monocular, stereo and RGB­D cameras. IEEE Transactions on Robotics, 33(5):1255–1262, 2017.
[28] J. Ortiz, A. Clegg, J. Dong, E. Sucar, D. Novotny, M. Zollhoefer, and M. Mukadam. isdf: Real­-time neural signed distance fields for robot perception. In Robotics: Science and Systems, 2022.
[29] J. J. Park, P. Florence, J. Straub, R. Newcombe, and S. Lovegrove. Deepsdf: Learning continuous signed distance functions for shape representation. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019.
[30] Sara Fridovich-­Keil and Alex Yu, M. Tancik, Q. Chen, B. Recht, and A. Kanazawa. Plenoxels: Radiance fields without neural networks. In CVPR, 2022.
[31] J. L. Schönberger and J.-­M. Frahm. Structure­-from­-motion revisited. In Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
[32] J. L. Schönberger, E. Zheng, M. Pollefeys, and J.-­M. Frahm. Pixelwise view selection for unstructured multi­-view stereo. In European Conference on Computer
Vision (ECCV), 2016.
[33] T. Schöps, T. Sattler, and M. Pollefeys. Bad slam: Bundle adjusted direct rgb­-d slam. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 134–144, 2019.
[34] J. Straub, T. Whelan, L. Ma, Y. Chen, E. Wijmans, S. Green, J. J. Engel, R. MurArtal, C. Ren, S. Verma, A. Clarkson, M. Yan, B. Budge, Y. Yan, X. Pan, J. Yon, Y. Zou, K. Leon, N. Carter, J. Briales, T. Gillingham, E. Mueggler, L. Pesqueira, M. Savva, D. Batra, H. M. Strasdat, R. D. Nardi, M. Goesele, S. Lovegrove, and R. Newcombe. The Replica dataset: A digital replica of indoor spaces. arXiv preprint
arXiv:1906.05797, 2019.
[35] J. Sturm, N. Engelhard, F. Endres, W. Burgard, and D. Cremers. A benchmark for the evaluation of rgb-­d slam systems. In Proc. of the International Conference on Intelligent Robot Systems (IROS), Oct. 2012.
[36] E. Sucar, S. Liu, J. Ortiz, and A. Davison. iMAP: Implicit mapping and positioning in real­-time. In Proceedings of the International Conference on Computer Vision (ICCV), 2021.
[37] C. Sun, M. Sun, and H. Chen. Direct voxel grid optimization: Super-­fast convergence for radiance fields reconstruction. In CVPR, 2022.
[38] C. Sun, M. Sun, and H.-­T. Chen. Improved direct voxel grid optimization for radiance fields reconstruction, 2022.
[39] T. Takikawa, J. Litalien, K. Yin, K. Kreis, C. Loop, D. Nowrouzezahrai, A. Jacobson, M. McGuire, and S. Fidler. Neural geometric level of detail: Real­-time rendering with implicit 3D shapes. 2021.
[40] M. Tancik, V. Casser, X. Yan, S. Pradhan, B. Mildenhall, P. P. Srinivasan, J. T. Barron, and H. Kretzschmar. Block-­nerf: Scalable large scene neural view synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8248–8258, June 2022.
[41] Z. Teed and J. Deng. Tangent space backpropagation for 3d transformation groups. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition (CVPR), 2021.
[42] I. Ueda, Y. Fukuhara, H. Kataoka, H. Aizawa, H. Shishido, and I. Kitahara. Neural density­-distance fields. In Proceedings of the European Conference on Computer Vision, 2022.
[43] Z. Wang, S. Wu, W. Xie, M. Chen, and V. A. Prisacariu. NeRF−−: Neural radiance fields without known camera parameters. arXiv preprint arXiv:2102.07064, 2021.
[44] Y. Wei, S. Liu, Y. Rao, W. Zhao, J. Lu, and J. Zhou. Nerfingmvs: Guided optimization of neural radiance fields for indoor multi­view stereo. In ICCV, 2021.
[45] T. Whelan, M. Kaess, M. Fallon, H. Johannsson, J. Leonard, and J. McDonald.
Kintinuous: Spatially extended kinectfusion. 2012.
[46] Y. Yao, Z. Luo, S. Li, T. Fang, and L. Quan. Mvsnet: Depth inference for unstructured multi­view stereo. European Conference on Computer Vision (ECCV), 2018.
[47] X. Zhang, S. Bi, K. Sunkavalli, H. Su, and Z. Xu. Nerfusion: Fusing radiance fields for large­-scale scene reconstruction. CVPR, 2022.
[48] S. Zhi, E. Sucar, A. Mouton, I. Haughton, T. Laidlow, and A. J. Davison. Ilabel: Interactive neural scene labelling, 2021.
[49] T. Zhou, M. Brown, N. Snavely, and D. G. Lowe. Unsupervised learning of depth and ego­-motion from video. In 2017 IEEE Conference on Computer Vision and
Pattern Recognition (CVPR), pages 6612–6619, 2017.
[50] Z. Zhu, S. Peng, V. Larsson, W. Xu, H. Bao, Z. Cui, M. R. Oswald, and M. Pollefeys. Nice­slam: Neural implicit scalable encoding for slam. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
-
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/91908-
dc.description.abstract這篇論文收集了我對神經輻射場 (NeRF) 和同步定位與映射 (SLAM) 的經驗和見解,並涵蓋了一篇提交給機器人會議的論文。我們提出了適用於spatial AI 的 Orbeez-SLAM。

一種可以通過視覺信號執行複雜任務並與人類合作的spatial AI是值得期待的。為了實現這一點,我們需要一個無需預訓練即可輕鬆適應新場景並實時為下游任務生成密集地圖的視覺SLAM。在這項工作中,我們開發了一個名為 Orbeez-SLAM 的視覺 SLAM,它成功地與 NeRF 和視覺里程計合作來實現我們的目標。此外,Orbeez-SLAM 可以與單目相機配合使用,因為它只需要 RGB 輸入,使其廣泛適用於真實世界。
zh_TW
dc.description.abstractThis thesis collects my experiences and insights about the Neural Radiance Field (NERF) and Simultaneous Localization and Mapping (SLAM) and covers a paper submitted to a robotic conference. We propose Orbeez-SLAM applicable to spatial AI.

A spatial AI that can perform complex tasks through visual signals and cooperate with humans is anticipated. To achieve this, we need a visual SLAM that easily adapts to new scenes without pre-training and generates dense maps for downstream tasks in real-time. In this work, we develop a visual SLAM named Orbeez-SLAM, which successfully collaborates with NeRF and visual odometry to achieve our goals. Moreover, Orbeez-SLAM can work with the monocular camera since it only needs RGB inputs, making it widely applicable to the real world.
en
dc.description.provenanceSubmitted by admin ntu (admin@lib.ntu.edu.tw) on 2024-02-26T16:24:01Z
No. of bitstreams: 0
en
dc.description.provenanceMade available in DSpace on 2024-02-26T16:24:01Z (GMT). No. of bitstreams: 0en
dc.description.tableofcontentsVerification Letter from the Oral Examination Committee i
Acknowledgements iii
摘要 v
Abstract vii
Contents ix
List of Figures xiii
List of Tables xv
Denotation xvii
Chapter 1 Introduction 1
1.1 ORB-­SLAM2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 instant-­ngp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3.1 Imitation Learning . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3.2 Our NeRF-­SLAM . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Organization of This Thesis . . . . . . . . . . . . . . . . . . . . . . 5
1.5 Source Code Management . . . . . . . . . . . . . . . . . . . . . . . 6
Chapter 2 NeRF 7
2.1 Implicit Neural Representation . . . . . . . . . . . . . . . . . . . . . 7
2.2 Differentiable Volume Rendering . . . . . . . . . . . . . . . . . . . 8
2.3 Density Grid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.4 NeRF Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Chapter 3 vSLAM 15
3.1 Traditional vSLAM . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.1.1 Reprojection Error . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.1.2 Photometric Error . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.2 NeRF-­SLAM Works . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Chapter 4 Orbeez-­SLAM 19
4.1 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.2 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.2.1 Pose Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.2.2 Bundle Adjustment . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.2.3 NeRF Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.3 Optimize Camera Pose from NeRF . . . . . . . . . . . . . . . . . . 24
4.3.1 Lie Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.3.2 Lie Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.3.3 Perturbation Model of Lie Algebra (Left multiplication) . . . . . . . 28
4.3.4 Backpropagate from Loss to Camera Pose . . . . . . . . . . . . . . 28
4.4 Ray-­casting Triangulation . . . . . . . . . . . . . . . . . . . . . . . 32
Chapter 5 Experiments 35
5.1 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.1.1 Implementation Details. . . . . . . . . . . . . . . . . . . . . . . . . 35
5.1.2 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5.1.3 Baselines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5.1.4 Evaluation Settings . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5.1.5 Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.2 Quantitative Results . . . . . . . . . . . . . . . . . . . . . . . . . . 38
5.2.1 Evaluation on TUM RGB-­D . . . . . . . . . . . . . . . . . . . . . 38
5.2.2 Evaluation on ScanNet . . . . . . . . . . . . . . . . . . . . . . . . 39
5.2.3 Evaluation on Replica . . . . . . . . . . . . . . . . . . . . . . . . . 39
5.2.4 Runtime Comparison . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.3 Qualitative Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.4 Ablation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5.4.1 Photometric Loss . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5.4.2 Distortion Loss . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5.5 Failure Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.5.1 Ray-­casting Triangulation . . . . . . . . . . . . . . . . . . . . . . . 45
5.5.2 Depth Supervision . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.5.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
Chapter 6 Conclusion 49
Chapter 7 Future works 51
7.1 Depth Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
7.2 Minimizing Photometric Error from Direct SLAM with Ray-­casting
Triangulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
7.3 Second-­order Optimizer . . . . . . . . . . . . . . . . . . . . . . . . 53
7.4 NeDDF Reprojection Error . . . . . . . . . . . . . . . . . . . . . . . 54
References 55

List of Figures
2.1 Visualization of the Continuous Formula . . . . . . . . . . . . . . . . 9
2.2 Visualization of the Discrete Formula . . . . . . . . . . . . . . . . . . 9
2.3 Visualization of the Weight Formula . . . . . . . . . . . . . . . . . . . 10
2.4 Skip Voxel Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.1 ORB Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.1 Orbeez-­SLAM Process . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.2 System Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.3 NeRF Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.4 Ray-­casting Triangulation in NeRF . . . . . . . . . . . . . . . . . . . 32
4.5 Point Clouds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.1 Comparison of Rendering Results . . . . . . . . . . . . . . . . . . . . 41
5.2 NeRF Results across Time . . . . . . . . . . . . . . . . . . . . . . . . 42
5.3 Ablation Study for Photometric Loss . . . . . . . . . . . . . . . . . . 43
5.4 Visualization of w/o and w/ Distortion Loss. . . . . . . . . . . . . . . . 45
5.5 Density Grid Failure with NeRF . . . . . . . . . . . . . . . . . . . . . 46

List of Tables
5.1 Tracking Results on TUM RGB-­D . . . . . . . . . . . . . . . . . . . . 38
5.2 Tracking Results on ScanNet . . . . . . . . . . . . . . . . . . . . . . . 39
5.3 Reconstruction Results on Replica . . . . . . . . . . . . . . . . . . . . 40
5.4 Runtime Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.5 Ablation Study for Photometric Loss . . . . . . . . . . . . . . . . . . 43
5.6 Distortion Loss on Replica office0 . . . . . . . . . . . . . . . . . . . . 45
-
dc.language.isoen-
dc.title使用神經輻射場進行同時定位與地圖建構zh_TW
dc.titleSLAM system with NeRF mappingen
dc.typeThesis-
dc.date.schoolyear110-2-
dc.description.degree碩士-
dc.contributor.coadvisor徐宏民zh_TW
dc.contributor.coadvisorWinston Hsuen
dc.contributor.oralexamcommittee葉梅珍;陳奕廷zh_TW
dc.contributor.oralexamcommitteeMei-Chen Yeh;Yi-Ting Chenen
dc.subject.keyword神經輻射場,同時定位與地圖建構,電腦視覺,機器人學,深度學習,zh_TW
dc.subject.keywordNeural Radiance Field (NeRF),Simultaneous Localization and Mapping (SLAM),Computer Vision,Robotics,Deep Learning,en
dc.relation.page61-
dc.identifier.doi10.6342/NTU202204136-
dc.rights.note同意授權(全球公開)-
dc.date.accepted2022-09-30-
dc.contributor.author-college電機資訊學院-
dc.contributor.author-dept資訊網路與多媒體研究所-
顯示於系所單位:資訊網路與多媒體研究所

文件中的檔案:
檔案 大小格式 
ntu-110-2.pdf16.12 MBAdobe PDF檢視/開啟
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved