伺服器與邊緣設備於實境環境中基於特徵的分佈式單目視覺定位系統

陳欣妤; Hsin-Yu Chen

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98411

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	簡韶逸	zh_TW
dc.contributor.advisor	Shao-Yi Chien	en
dc.contributor.author	陳欣妤	zh_TW
dc.contributor.author	Hsin-Yu Chen	en
dc.date.accessioned	2025-08-05T16:16:02Z	-
dc.date.available	2025-08-06	-
dc.date.copyright	2025-08-05	-
dc.date.issued	2025	-
dc.date.submitted	2025-07-16	-
dc.identifier.citation	[1] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” International journal of computer vision, vol. 60, pp. 91–110, 2004. 4 [2] E. Rublee, V. Rabaud, K. Konolige, and G. Bradski, “Orb: An efficient alternative to sift or surf,” in 2011 International conference on computer vision. Ieee, 2011, pp. 2564–2571. 4, 17 [3] H. Bay, “Surf: Speeded up robust features,” Computer Vision—ECCV, 2006. 4 [4] D. DeTone, T. Malisiewicz, and A. Rabinovich, “Superpoint: Self-supervised interest point detection and description,” in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 2018, pp. 224–236. 4, 11 [5] M. Dusmanu, I. Rocco, T. Pajdla, M. Pollefeys, J. Sivic, A. Torii, and T. Sattler, “D2-net: A trainable cnn for joint description and detection of local features,” in Proceedings of the ieee/cvf conference on computer vision and pattern recognition, 2019, pp. 8092–8101. 4, 11 [6] S. Gauglitz, T. HÅNollerer, and M. Turk, “Evaluation of interest point detectors and feature descriptors for visual tracking,” International journal of computer vision, vol. 94, pp. 335–360, 2011. 9 [7] T. Tuytelaars, K. Mikolajczyk et al., “Local invariant feature detectors: a survey,” Foundations and trends® in computer graphics and vision, vol. 3, no. 3, pp. 177–280, 2008. 9 [8] Y. Tian, V. Balntas, T. Ng, A. Barroso-Laguna, Y. Demiris, and K. Mikolajczyk, “D2d: Keypoint extraction with describe to detect approach,” in Proceedings of the Asian conference on computer vision, 2020. 10 [9] T. Sattler, W. Maddern, C. Toft, A. Torii, L. Hammarstrand, E. Stenborg, D. Safari, M. Okutomi, M. Pollefeys, J. Sivic et al., “Benchmarking 6dof outdoor visual localization in changing conditions,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 8601– 8610. 10 [10] K. M. Yi, E. Trulls, V. Lepetit, and P. Fua, “Lift: Learned invariant feature transform,” in Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VI 14. Springer, 2016, pp. 467–483. 10 [11] J. Revaud, C. De Souza, M. Humenberger, and P. Weinzaepfel, “R2d2: Reliable and repeatable detector and descriptor,” Advances in neural information processing systems, vol. 32, 2019. 11 [12] M. Tyszkiewicz, P. Fua, and E. Trulls, “Disk: Learning local features with policy gradient,” Advances in Neural Information Processing Systems, vol. 33, pp. 14 254–14 265, 2020. 11 [13] P.-E. Sarlin, D. DeTone, T. Malisiewicz, and A. Rabinovich, “Superglue: Learning feature matching with graph neural networks,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 4938–4947. 11, 15 [14] T. Dao, D. Fu, S. Ermon, A. Rudra, and C. R´e, “Flashattention: Fast and memory-efficient exact attention with io-awareness,” Advances in Neural Information Processing Systems, vol. 35, pp. 16 344–16 359, 2022. 11 [15] A. Seki and M. Pollefeys, “Sgm-nets: Semi-global matching with neural networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 231–240. 11 [16] P. Lindenberger, P.-E. Sarlin, and M. Pollefeys, “Lightglue: Local feature matching at light speed,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 17 627–17 638. 12, 15 [17] M. A. Fischler and R. C. Bolles, “Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography,” Communications of the ACM, vol. 24, no. 6, pp. 381–395, 1981. 12, 14, 17 [18] B. Triggs, P. F. McLauchlan, R. I. Hartley, and A. W. Fitzgibbon, “Bundle adjustment—a modern synthesis,” in Vision Algorithms: Theory and Practice: International Workshop on Vision Algorithms Corfu, Greece, September 21– 22, 1999 Proceedings. Springer, 2000, pp. 298–372. 12 [19] C. Wu, “Visualsfm: A visual structure from motion system,” http://www. cs. washington. edu/homes/ccwu/vsfm, 2011. 13 [20] J. L. Schonberger and J.-M. Frahm, “Structure-from-motion revisited,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 4104–4113. 13 [21] V. Lepetit, F. Moreno-Noguer, and P. Fua, “Ep n p: An accurate o (n) solution to the p n p problem,” International journal of computer vision, vol. 81, pp. 155–166, 2009. 14, 17 [22] E. Brachmann and C. Rother, “Expert sample consensus applied to camera re-localization,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 7525–7534. 14 [23] E. Brachmann, A. Krull, S. Nowozin, J. Shotton, F. Michel, S. Gumhold, and C. Rother, “Dsac-differentiable ransac for camera localization,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 6684–6692. 14 [24] E. Brachmann and C. Rother, “Visual camera re-localization from rgb and rgb-d images using dsac,” IEEE transactions on pattern analysis and machine intelligence, vol. 44, no. 9, pp. 5847–5865, 2021. 14 [25] J. Shotton, B. Glocker, C. Zach, S. Izadi, A. Criminisi, and A. Fitzgibbon, “Scene coordinate regression forests for camera relocalization in rgb-d images,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2013, pp. 2930–2937. 14 [26] W. Cheng, W. Lin, K. Chen, and X. Zhang, “Cascaded parallel filtering for memory-efficient image-based localization,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 1032–1041. 14, 15 [27] P.-E. Sarlin, C. Cadena, R. Siegwart, and M. Dymczyk, “From coarse to fine: Robust hierarchical localization at large scale,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 12 716–12 725. 14, 15 [28] T. Sattler, B. Leibe, and L. Kobbelt, “Improving image-based localization by active correspondence search,” in Computer Vision–ECCV 2012: 12th European Conference on Computer Vision, Florence, Italy, October 7-13, 2012, Proceedings, Part I 12. Springer, 2012, pp. 752–765. 14 [29] L. Yang, R. Shrestha, W. Li, S. Liu, G. Zhang, Z. Cui, and P. Tan, “Scenesqueezer: Learning to compress scene for camera relocalization,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 8259–8268. 14, 15 [30] X. Li, S. Wang, Y. Zhao, J. Verbeek, and J. Kannala, “Hierarchical scene coordinate classification and regression for visual localization,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 11 983–11 992. 14 [31] S. Tang, C. Tang, R. Huang, S. Zhu, and P. Tan, “Learning camera localization via dense scene matching,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 1831–1841. 14 [32] M. Mera-Trujillo, B. Smith, and V. Fragoso, “Efficient scene compression for visual-based localization,” in 2020 International Conference on 3D Vision (3DV). IEEE, 2020, pp. 1–10. 15 [33] Z. Liu, J. Li, Z. Shen, G. Huang, S. Yan, and C. Zhang, “Learning efficient convolutional networks through network slimming,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2736–2744. 15 [34] J. Sun, Z. Shen, Y. Wang, H. Bao, and X. Zhou, “Loftr: Detector-free local feature matching with transformers,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 8922–8931. 15 [35] R. A. Newcombe, S. J. Lovegrove, and A. J. Davison, “Dtam: Dense tracking and mapping in real-time,” in 2011 international conference on computer vision. IEEE, 2011, pp. 2320–2327. 16 [36] J. Engel, T. Sch¨ops, and D. Cremers, “Lsd-slam: Large-scale direct monocular slam,” in European conference on computer vision. Springer, 2014, pp. 834–849. 16 [37] C. Forster, M. Pizzoli, and D. Scaramuzza, “Svo: Fast semi-direct monocular visual odometry,” in 2014 IEEE international conference on robotics and automation (ICRA). IEEE, 2014, pp. 15–22. 16 [38] R. Wang, M. Schworer, and D. Cremers, “Stereo dso: Large-scale direct sparse visual odometry with stereo cameras,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 3903–3911. 16 [39] T. Pire, T. Fischer, G. Castro, P. De Crist´oforis, J. Civera, and J. J. Berlles, “S-ptam: Stereo parallel tracking and mapping,” Robotics and Autonomous Systems, vol. 93, pp. 27–42, 2017. 17 [40] R. Mur-Artal, J. M. M. Montiel, and J. D. Tardos, “Orb-slam: a versatile and accurate monocular slam system,” IEEE transactions on robotics, vol. 31, no. 5, pp. 1147–1163, 2015. 17, 18 [41] P.-J. Duh, Y.-C. Sung, L.-Y. F. Chiang, Y.-J. Chang, and K.-W. Chen, “Veye: A vision-based navigation system for the visually impaired,” IEEE Transactions on Multimedia, vol. 23, pp. 1567–1580, 2020. 17 [42] K.-W. Chen, C.-H.Wang, X.Wei, Q. Liang, C.-S. Chen, M.-H. Yang, and Y.-P. Hung, “Vision-based positioning for internet-of-vehicles,” IEEE Transactions on Intelligent Transportation Systems, vol. 18, no. 2, pp. 364–376, 2016. 17 [43] E. Stenborg, T. Sattler, and L. Hammarstrand, “Using image sequences for long-term visual localization,” in 2020 International conference on 3d vision (3DV). IEEE, 2020, pp. 938–948. 17	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98411	-
dc.description.abstract	精確的六自由度（6-DoF）相機定位對於自駕車、移動機器人以及擴增實境等多種應用至關重要，因為它能確保系統的穩定運作並提升整體效能。單張影像定位的目標是在已知場景中，對給定的查詢影像估計其六自由度相機位姿。然而，在大規模環境中，由於外觀變化劇烈，要達成高精度的定位極具挑戰性。能夠應對此類情況的方法通常需耗費大量運算資源，因此難以在邊緣設備上執行。視覺里程計可以即時從連續影格中估計相對位姿，雖然傳統方法能達成即時運算，但仍須透過回環偵測與稀疏調整來修正累積誤差。本論文提出一種基於特徵的單目視覺定位系統，結合伺服器端的單張影像定位與邊緣設備上的視覺里程計。透過邊緣設備提供的連續影像資訊，我們的系統能降低單張定位失敗的風險，並藉由使用全域地圖來解決視覺里程計所產生的漂移問題。該系統在保持與現有先進定位方法相當精度的同時，亦展現出良好的運算效率。此外，我們僅將深度學習模型用於特徵偵測與匹配，因此不需針對不同場景重新訓練模型，使本系統具備良好的場景適應性，能夠快速部署至新的環境中。	zh_TW
dc.description.abstract	Precise 6-Degree-of-Freedom (6-DoF) camera localization is crucial for a wide range of applications, including autonomous driving, mobile robotics, and augmented reality, as it ensures reliable operation and enhances overall system effectiveness. Single-image localization determines the 6-DoF camera pose for a given query image within a known scene. However, achieving accurate localization in large-scale environments with significant appearance changes is particularly challenging. Methods that provide robust results often require intensive computational resources, making them difficult to run on edge devices. Visual odometry can recover relative camera poses from consecutive frames in real-time. While traditional methods can achieve real-time relative pose estimation, they require loop closure and bundle adjustment to mitigate drift problems. In this thesis, we propose a feature-based monocular visual localization system that combines single-image localization on the server with visual odometry on an edge device. By leveraging the continuity of consecutive frames provided by the edge device, our system mitigates the risk of single-image localization failures and uses the global map to prevent drift issues associated with visual odometry. Our system demonstrates efficiency while maintaining accuracy comparable to state-of-the-art localization algorithms. Furthermore, deep learning models are employed exclusively for feature detection and matching, eliminating the need to train new models for different scenes. This characteristic enhances the system's adaptability, enabling easy deployment in new environments.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2025-08-05T16:16:02Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2025-08-05T16:16:02Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	Master’s Thesis Acceptance Certificate i Acknowledgement iii Chinese Abstract v Abstract vii Contents ix List of Figures xiii List of Tables xv 1 Introduction 1 1.1 Introduction of Image-based Localization 1 1.1.1 Taxonomy 1 1.1.2 Visual Localization 2 1.1.3 Visual Odometry and Epipolar Geometry 4 1.2 Challenges 6 1.3 Contribution 6 1.4 Thesis Organization 7 2 Related Work 9 2.1 Learning Based Feature Detection and Matching 9 2.1.1 Learning Feature Detection and Extraction 10 2.1.2 Feature Matching 11 2.2 Image-Based Localization 12 2.2.1 Structure-from-Motion (SfM) 12 2.2.2 Large-Scale Visual Localization 13 2.2.3 Visual Odometry (VO) and Visual Simultaneous Localization and Mapping (vSLAM) 16 2.3 Hybrid visual-based localization and odometry 17 3 Proposed Method 19 3.1 System Overview 19 3.2 Server-Side Visual Localization Pipeline 22 3.2.1 Offline SfM for Pre-built 3D Map 22 3.2.2 Online Visual Localization Pipeline 23 3.3 Edge-Side Integration of Local Visual Localization and Odometry 25 3.3.1 2D-3D Local Localization 25 3.3.2 2D-2D Visual Odometry 30 4 Experimental Results 33 4.1 Description of Dataset 33 4.2 Implementation Details 34 4.3 Comparisons With Existing Methods on Dataset 35 4.4 Component-wise System Analysis 37 4.4.1 Utilization Ratio of System Components 37 4.4.2 Comparisons With Different Matcher 38 4.4.3 Effectiveness of 2D-3D Local Localization 40 4.4.4 Comparative Analysis of 2D-2D Visual Odometry with Varying Thresholds 40 4.4.5 Runtime & Memory Utilization 42 4.4.6 Localization Trajectory 43 5 Conclusion 45 Reference 47	-
dc.language.iso	en	-
dc.subject	視覺里程計	zh_TW
dc.subject	視覺定位	zh_TW
dc.subject	Visual odometry	en
dc.subject	Visual localization	en
dc.title	伺服器與邊緣設備於實境環境中基於特徵的分佈式單目視覺定位系統	zh_TW
dc.title	Distributed Feature-Based Monocular Visual Localization with Server-Edge Collaboration in the Wild	en
dc.type	Thesis	-
dc.date.schoolyear	113-2	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	施吉昇;莊永裕;陳冠文	zh_TW
dc.contributor.oralexamcommittee	Chi-Sheng Shih;Yung-Yu Chuang;Kuan-Wen Chen	en
dc.subject.keyword	視覺定位,視覺里程計,	zh_TW
dc.subject.keyword	Visual localization,Visual odometry,	en
dc.relation.page	52	-
dc.identifier.doi	10.6342/NTU202501939	-
dc.rights.note	未授權	-
dc.date.accepted	2025-07-18	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	電子工程學研究所	-
dc.date.embargo-lift	N/A	-
顯示於系所單位：	電子工程學研究所

文件中的檔案：

檔案	大小	格式
ntu-113-2.pdf 未授權公開取用	29.61 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。