Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 電子工程學研究所
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98411
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor簡韶逸zh_TW
dc.contributor.advisorShao-Yi Chienen
dc.contributor.author陳欣妤zh_TW
dc.contributor.authorHsin-Yu Chenen
dc.date.accessioned2025-08-05T16:16:02Z-
dc.date.available2025-08-06-
dc.date.copyright2025-08-05-
dc.date.issued2025-
dc.date.submitted2025-07-16-
dc.identifier.citation[1] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” International journal of computer vision, vol. 60, pp. 91–110, 2004. 4
[2] E. Rublee, V. Rabaud, K. Konolige, and G. Bradski, “Orb: An efficient alternative to sift or surf,” in 2011 International conference on computer vision. Ieee, 2011, pp. 2564–2571. 4, 17
[3] H. Bay, “Surf: Speeded up robust features,” Computer Vision—ECCV, 2006. 4
[4] D. DeTone, T. Malisiewicz, and A. Rabinovich, “Superpoint: Self-supervised interest point detection and description,” in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 2018, pp. 224–236. 4, 11
[5] M. Dusmanu, I. Rocco, T. Pajdla, M. Pollefeys, J. Sivic, A. Torii, and T. Sattler, “D2-net: A trainable cnn for joint description and detection of local features,” in Proceedings of the ieee/cvf conference on computer vision and pattern recognition, 2019, pp. 8092–8101. 4, 11
[6] S. Gauglitz, T. HÅNollerer, and M. Turk, “Evaluation of interest point detectors and feature descriptors for visual tracking,” International journal of computer vision, vol. 94, pp. 335–360, 2011. 9
[7] T. Tuytelaars, K. Mikolajczyk et al., “Local invariant feature detectors: a survey,” Foundations and trends® in computer graphics and vision, vol. 3, no. 3, pp. 177–280, 2008. 9
[8] Y. Tian, V. Balntas, T. Ng, A. Barroso-Laguna, Y. Demiris, and K. Mikolajczyk, “D2d: Keypoint extraction with describe to detect approach,” in Proceedings of the Asian conference on computer vision, 2020. 10
[9] T. Sattler, W. Maddern, C. Toft, A. Torii, L. Hammarstrand, E. Stenborg, D. Safari, M. Okutomi, M. Pollefeys, J. Sivic et al., “Benchmarking 6dof outdoor visual localization in changing conditions,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 8601– 8610. 10
[10] K. M. Yi, E. Trulls, V. Lepetit, and P. Fua, “Lift: Learned invariant feature transform,” in Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VI 14. Springer, 2016, pp. 467–483. 10
[11] J. Revaud, C. De Souza, M. Humenberger, and P. Weinzaepfel, “R2d2: Reliable and repeatable detector and descriptor,” Advances in neural information processing systems, vol. 32, 2019. 11
[12] M. Tyszkiewicz, P. Fua, and E. Trulls, “Disk: Learning local features with policy gradient,” Advances in Neural Information Processing Systems, vol. 33, pp. 14 254–14 265, 2020. 11
[13] P.-E. Sarlin, D. DeTone, T. Malisiewicz, and A. Rabinovich, “Superglue: Learning feature matching with graph neural networks,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 4938–4947. 11, 15
[14] T. Dao, D. Fu, S. Ermon, A. Rudra, and C. R´e, “Flashattention: Fast and memory-efficient exact attention with io-awareness,” Advances in Neural Information Processing Systems, vol. 35, pp. 16 344–16 359, 2022. 11
[15] A. Seki and M. Pollefeys, “Sgm-nets: Semi-global matching with neural networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 231–240. 11
[16] P. Lindenberger, P.-E. Sarlin, and M. Pollefeys, “Lightglue: Local feature matching at light speed,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 17 627–17 638. 12, 15
[17] M. A. Fischler and R. C. Bolles, “Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography,” Communications of the ACM, vol. 24, no. 6, pp. 381–395, 1981. 12, 14, 17
[18] B. Triggs, P. F. McLauchlan, R. I. Hartley, and A. W. Fitzgibbon, “Bundle adjustment—a modern synthesis,” in Vision Algorithms: Theory and Practice: International Workshop on Vision Algorithms Corfu, Greece, September 21– 22, 1999 Proceedings. Springer, 2000, pp. 298–372. 12
[19] C. Wu, “Visualsfm: A visual structure from motion system,” http://www. cs. washington. edu/homes/ccwu/vsfm, 2011. 13
[20] J. L. Schonberger and J.-M. Frahm, “Structure-from-motion revisited,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 4104–4113. 13
[21] V. Lepetit, F. Moreno-Noguer, and P. Fua, “Ep n p: An accurate o (n) solution to the p n p problem,” International journal of computer vision, vol. 81, pp. 155–166, 2009. 14, 17
[22] E. Brachmann and C. Rother, “Expert sample consensus applied to camera re-localization,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 7525–7534. 14
[23] E. Brachmann, A. Krull, S. Nowozin, J. Shotton, F. Michel, S. Gumhold, and C. Rother, “Dsac-differentiable ransac for camera localization,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 6684–6692. 14
[24] E. Brachmann and C. Rother, “Visual camera re-localization from rgb and rgb-d images using dsac,” IEEE transactions on pattern analysis and machine intelligence, vol. 44, no. 9, pp. 5847–5865, 2021. 14
[25] J. Shotton, B. Glocker, C. Zach, S. Izadi, A. Criminisi, and A. Fitzgibbon, “Scene coordinate regression forests for camera relocalization in rgb-d images,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2013, pp. 2930–2937. 14
[26] W. Cheng, W. Lin, K. Chen, and X. Zhang, “Cascaded parallel filtering for memory-efficient image-based localization,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 1032–1041. 14, 15
[27] P.-E. Sarlin, C. Cadena, R. Siegwart, and M. Dymczyk, “From coarse to fine: Robust hierarchical localization at large scale,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 12 716–12 725. 14, 15
[28] T. Sattler, B. Leibe, and L. Kobbelt, “Improving image-based localization by active correspondence search,” in Computer Vision–ECCV 2012: 12th European Conference on Computer Vision, Florence, Italy, October 7-13, 2012, Proceedings, Part I 12. Springer, 2012, pp. 752–765. 14
[29] L. Yang, R. Shrestha, W. Li, S. Liu, G. Zhang, Z. Cui, and P. Tan, “Scenesqueezer: Learning to compress scene for camera relocalization,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 8259–8268. 14, 15
[30] X. Li, S. Wang, Y. Zhao, J. Verbeek, and J. Kannala, “Hierarchical scene coordinate classification and regression for visual localization,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 11 983–11 992. 14
[31] S. Tang, C. Tang, R. Huang, S. Zhu, and P. Tan, “Learning camera localization via dense scene matching,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 1831–1841. 14
[32] M. Mera-Trujillo, B. Smith, and V. Fragoso, “Efficient scene compression for visual-based localization,” in 2020 International Conference on 3D Vision (3DV). IEEE, 2020, pp. 1–10. 15
[33] Z. Liu, J. Li, Z. Shen, G. Huang, S. Yan, and C. Zhang, “Learning efficient convolutional networks through network slimming,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2736–2744. 15
[34] J. Sun, Z. Shen, Y. Wang, H. Bao, and X. Zhou, “Loftr: Detector-free local feature matching with transformers,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 8922–8931. 15
[35] R. A. Newcombe, S. J. Lovegrove, and A. J. Davison, “Dtam: Dense tracking and mapping in real-time,” in 2011 international conference on computer vision. IEEE, 2011, pp. 2320–2327. 16
[36] J. Engel, T. Sch¨ops, and D. Cremers, “Lsd-slam: Large-scale direct monocular slam,” in European conference on computer vision. Springer, 2014, pp. 834–849. 16
[37] C. Forster, M. Pizzoli, and D. Scaramuzza, “Svo: Fast semi-direct monocular visual odometry,” in 2014 IEEE international conference on robotics and automation (ICRA). IEEE, 2014, pp. 15–22. 16
[38] R. Wang, M. Schworer, and D. Cremers, “Stereo dso: Large-scale direct sparse visual odometry with stereo cameras,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 3903–3911. 16
[39] T. Pire, T. Fischer, G. Castro, P. De Crist´oforis, J. Civera, and J. J. Berlles, “S-ptam: Stereo parallel tracking and mapping,” Robotics and Autonomous Systems, vol. 93, pp. 27–42, 2017. 17
[40] R. Mur-Artal, J. M. M. Montiel, and J. D. Tardos, “Orb-slam: a versatile and accurate monocular slam system,” IEEE transactions on robotics, vol. 31, no. 5, pp. 1147–1163, 2015. 17, 18
[41] P.-J. Duh, Y.-C. Sung, L.-Y. F. Chiang, Y.-J. Chang, and K.-W. Chen, “Veye: A vision-based navigation system for the visually impaired,” IEEE Transactions on Multimedia, vol. 23, pp. 1567–1580, 2020. 17
[42] K.-W. Chen, C.-H.Wang, X.Wei, Q. Liang, C.-S. Chen, M.-H. Yang, and Y.-P. Hung, “Vision-based positioning for internet-of-vehicles,” IEEE Transactions on Intelligent Transportation Systems, vol. 18, no. 2, pp. 364–376, 2016. 17
[43] E. Stenborg, T. Sattler, and L. Hammarstrand, “Using image sequences for long-term visual localization,” in 2020 International conference on 3d vision (3DV). IEEE, 2020, pp. 938–948. 17
-
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98411-
dc.description.abstract精確的六自由度(6-DoF)相機定位對於自駕車、移動機器人以及擴增實境等多種應用至關重要,因為它能確保系統的穩定運作並提升整體效能。
單張影像定位的目標是在已知場景中,對給定的查詢影像估計其六自由度相機位姿。然而,在大規模環境中,由於外觀變化劇烈,要達成高精度的定位極具挑戰性。能夠應對此類情況的方法通常需耗費大量運算資源,因此難以在邊緣設備上執行。視覺里程計可以即時從連續影格中估計相對位姿,雖然傳統方法能達成即時運算,但仍須透過回環偵測與稀疏調整來修正累積誤差。
本論文提出一種基於特徵的單目視覺定位系統,結合伺服器端的單張影像定位與邊緣設備上的視覺里程計。透過邊緣設備提供的連續影像資訊,我們的系統能降低單張定位失敗的風險,並藉由使用全域地圖來解決視覺里程計所產生的漂移問題。該系統在保持與現有先進定位方法相當精度的同時,亦展現出良好的運算效率。此外,我們僅將深度學習模型用於特徵偵測與匹配,因此不需針對不同場景重新訓練模型,使本系統具備良好的場景適應性,能夠快速部署至新的環境中。
zh_TW
dc.description.abstractPrecise 6-Degree-of-Freedom (6-DoF) camera localization is crucial for a wide range of applications, including autonomous driving, mobile robotics, and augmented reality, as it ensures reliable operation and enhances overall system effectiveness.
Single-image localization determines the 6-DoF camera pose for a given query image within a known scene. However, achieving accurate localization in large-scale environments with significant appearance changes is particularly challenging. Methods that provide robust results often require intensive computational resources, making them difficult to run on edge devices. Visual odometry can recover relative camera poses from consecutive frames in real-time. While traditional methods can achieve real-time relative pose estimation, they require loop closure and bundle adjustment to mitigate drift problems.
In this thesis, we propose a feature-based monocular visual localization system that combines single-image localization on the server with visual odometry on an edge device. By leveraging the continuity of consecutive frames provided by the edge device, our system mitigates the risk of single-image localization failures and uses the global map to prevent drift issues associated with visual odometry. Our system demonstrates efficiency while maintaining accuracy comparable to state-of-the-art localization algorithms. Furthermore, deep learning models are employed exclusively for feature detection and matching, eliminating the need to train new models for different scenes. This characteristic enhances the system's adaptability, enabling easy deployment in new environments.
en
dc.description.provenanceSubmitted by admin ntu (admin@lib.ntu.edu.tw) on 2025-08-05T16:16:02Z
No. of bitstreams: 0
en
dc.description.provenanceMade available in DSpace on 2025-08-05T16:16:02Z (GMT). No. of bitstreams: 0en
dc.description.tableofcontentsMaster’s Thesis Acceptance Certificate i
Acknowledgement iii
Chinese Abstract v
Abstract vii
Contents ix
List of Figures xiii
List of Tables xv
1 Introduction 1
1.1 Introduction of Image-based Localization 1
1.1.1 Taxonomy 1
1.1.2 Visual Localization 2
1.1.3 Visual Odometry and Epipolar Geometry 4
1.2 Challenges 6
1.3 Contribution 6
1.4 Thesis Organization 7
2 Related Work 9
2.1 Learning Based Feature Detection and Matching 9
2.1.1 Learning Feature Detection and Extraction 10
2.1.2 Feature Matching 11
2.2 Image-Based Localization 12
2.2.1 Structure-from-Motion (SfM) 12
2.2.2 Large-Scale Visual Localization 13
2.2.3 Visual Odometry (VO) and Visual Simultaneous Localization and Mapping (vSLAM) 16
2.3 Hybrid visual-based localization and odometry 17
3 Proposed Method 19
3.1 System Overview 19
3.2 Server-Side Visual Localization Pipeline 22
3.2.1 Offline SfM for Pre-built 3D Map 22
3.2.2 Online Visual Localization Pipeline 23
3.3 Edge-Side Integration of Local Visual Localization and Odometry 25
3.3.1 2D-3D Local Localization 25
3.3.2 2D-2D Visual Odometry 30
4 Experimental Results 33
4.1 Description of Dataset 33
4.2 Implementation Details 34
4.3 Comparisons With Existing Methods on Dataset 35
4.4 Component-wise System Analysis 37
4.4.1 Utilization Ratio of System Components 37
4.4.2 Comparisons With Different Matcher 38
4.4.3 Effectiveness of 2D-3D Local Localization 40
4.4.4 Comparative Analysis of 2D-2D Visual Odometry with Varying Thresholds 40
4.4.5 Runtime & Memory Utilization 42
4.4.6 Localization Trajectory 43
5 Conclusion 45
Reference 47
-
dc.language.isoen-
dc.subject視覺里程計zh_TW
dc.subject視覺定位zh_TW
dc.subjectVisual odometryen
dc.subjectVisual localizationen
dc.title伺服器與邊緣設備於實境環境中基於特徵的分佈式單目視覺定位系統zh_TW
dc.titleDistributed Feature-Based Monocular Visual Localization with Server-Edge Collaboration in the Wilden
dc.typeThesis-
dc.date.schoolyear113-2-
dc.description.degree碩士-
dc.contributor.oralexamcommittee施吉昇;莊永裕;陳冠文zh_TW
dc.contributor.oralexamcommitteeChi-Sheng Shih;Yung-Yu Chuang;Kuan-Wen Chenen
dc.subject.keyword視覺定位,視覺里程計,zh_TW
dc.subject.keywordVisual localization,Visual odometry,en
dc.relation.page52-
dc.identifier.doi10.6342/NTU202501939-
dc.rights.note未授權-
dc.date.accepted2025-07-18-
dc.contributor.author-college電機資訊學院-
dc.contributor.author-dept電子工程學研究所-
dc.date.embargo-liftN/A-
顯示於系所單位:電子工程學研究所

文件中的檔案:
檔案 大小格式 
ntu-113-2.pdf
  未授權公開取用
29.61 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved