自主移動機器人之實時視覺定位與不確定性估測系統

Jun-Han Chen; 陳俊翰

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/52975

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	李綱(Kang Li)
dc.contributor.author	Jun-Han Chen	en
dc.contributor.author	陳俊翰	zh_TW
dc.date.accessioned	2021-06-15T16:37:05Z	-
dc.date.available	2025-08-05
dc.date.copyright	2020-09-16
dc.date.issued	2020
dc.date.submitted	2020-08-06
dc.identifier.citation	[1] A. Valada, N. Radwan, and W. Burgard, “Deep Auxiliary Learning for Visual Localization and Odometry,” Proceedings - IEEE International Conference on Robotics and Automation, pp. 6939–6946, 2018. [2] N. Radwan, A. Valada, and W. Burgard, “VLocNet++: Deep Multitask Learning for Semantic Visual Localization and Odometry,” IEEE Robotics and Automation Letters, vol. 3, no. 4, pp. 4407–4414, 2018. [3] Y. Lin, Z. Liu, J. Huang, C. Wang, G. Du, J. Bai, and S. Lian, “Deep GlobalRelative Networks for End-to-End 6-DoF Visual Localization and Odometry,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2019. [4] A. Kendall, M. Grimes, and R. Cipolla, “PoseNet: A convolutional network for real-time 6-dof camera relocalization,” in Proceedings of the IEEE International Conference on Computer Vision, 2015. [5] F. Walch, C. Hazirbas, L. Leal-Taixe, T. Sattler, S. Hilsenbeck, and D. Cremers, “Image-Based Localization Using LSTMs for Structured Feature Correlation,” Proceedings of the IEEE International Conference on Computer Vision, vol. 2017-Octob, pp. 627–637, 2017. [6] I. Melekhov, J. Ylioinas, J. Kannala, and E. Rahtu, “Image-Based Localization Using Hourglass Networks,” Proceedings - 2017 IEEE International Conference on Computer Vision Workshops, ICCVW 2017, vol. 2018-Janua, pp. 870–877, 2017. [7] T. Naseer and W. Burgard, “Deep regression for monocular camera-based 6-DoF global localization in outdoor environments,” in IEEE International Conference on Intelligent Robots and Systems, 2017. [8] J. Wu, L. Ma, and X. Hu, “Delving deeper into convolutional neural networks for camera relocalization,” Proceedings - IEEE International Conference on Robotics and Automation, pp. 5644–5651, 2017. [9] R. Arandjelovic, P. Gronat, A. Torii, T. Pajdla, and J. Sivic, “NetVLAD: CNN architecture for weakly supervised place recognition,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2016. [10] A. Gordo, J. Almazan, J. Revaud, and D. Larlus, “Deep image retrieval: ´ Learning global representations for image search,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2016. [11] T. Weyand, I. Kostrikov, and J. Philbin, “Planet - photo geolocation with convolutional neural networks,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2016. [12] P. E. Sarlin, C. Cadena, R. Siegwart, and M. Dymczyk, “From coarse to fine: Robust hierarchical localization at large scale,” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2019-June, pp. 12 708–12 717, 2019. [13] T. Sattler, Q. Zhou, M. Pollefeys, and L. Leal-taix, “Understanding the Limitations of CNN-based Absolute Camera Pose Regression Chalmers University of Technology,” Cvpr, pp. 3302–3312, 2019. [14] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” International Journal of Computer Vision, 2004. [15] D. Detone, T. Malisiewicz, and A. Rabinovich, “SuperPoint: Self-supervised interest point detection and description,” in IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 2018. [16] M. A. Fischler and R. C. Bolles, “Random sample consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography,” Communications of the ACM, 1981. [17] J. Wang, H. Zha, and R. Cipolla, “Coarse-to-fine vision-based localization by indexing scale-invariant features,” IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 2006. [18] Y. Li, N. Snavely, D. Huttenlocher, and P. Fua, “Worldwide pose estimation using 3D point clouds,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2012. [19] Q. Hao, R. Cai, Z. Li, L. Zhang, Y. Pang, and F. Wu, “3D visual phrases for landmark recognition,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2012. [20] A. Bergamo, S. N. Sinha, and L. Torresani, “Leveraging structure from motion to learn discriminative codebooks for scalable landmark classification,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2013. [21] G. Klein and D. Murray, “Parallel tracking and mapping for small AR workspaces,” in 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality, ISMAR, 2007. [22] M. Kaess, H. Johannsson, R. Roberts, V. Ila, J. J. Leonard, and F. Dellaert, “ISAM2: Incremental smoothing and mapping using the Bayes tree,” International Journal of Robotics Research, 2012. [23] R. A. Newcombe, S. J. Lovegrove, and A. J. Davison, “DTAM: Dense tracking and mapping in real-time,” in Proceedings of the IEEE International Conference on Computer Vision, 2011. [24] J. Engel, T. Schops, and D. Cremers, “LSD-SLAM: Large-Scale Direct ¨ monocular SLAM,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2014. [25] M. Cummins and P. Newman, “FAB-MAP: Probabilistic localization and mapping in the space of appearance,” International Journal of Robotics Research, 2008. [26] N. Sunderhauf, S. Shirazi, F. Dayoub, B. Upcroft, and M. Milford, “On the ¨ performance of ConvNet features for place recognition,” in IEEE International Conference on Intelligent Robots and Systems, 2015. [27] E. Simo-Serra, E. Trulls, L. Ferraz, I. Kokkinos, P. Fua, and F. MorenoNoguer, “Discriminative learning of deep convolutional feature point descriptors,” Proceedings of the IEEE International Conference on Computer Vision, vol. 2015 Inter, pp. 118–126, 2015. [28] B. Glocker, S. Izadi, J. Shotton, and A. Criminisi, “Real-time RGB-D camera relocalization,” 2013 IEEE International Symposium on Mixed and Augmented Reality, ISMAR 2013, no. October, pp. 173–179, 2013. [29] A. Kendall, M. Grimes, and R. Cipolla, “PoseNet: A convolutional network for real-time 6-dof camera relocalization,” Proceedings of the IEEE International Conference on Computer Vision, vol. 2015 Inter, pp. 2938–2946, 2015. [30] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 07-12-June, pp. 1–9, 2015. [31] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778. [32] T. Naseer and W. Burgard, “Deep regression for monocular camera-based 6-DoF global localization in outdoor environments,” IEEE International Conference on Intelligent Robots and Systems, vol. 2017-Septe, pp. 1525– 1530, 2017. [33] X. Zhang, J. Zou, K. He, and J. Sun, “Accelerating very deep convolutional networks for classification and detection,” IEEE transactions on pattern analysis and machine intelligence, vol. 38, no. 10, pp. 1943–1955, 2015. [34] S. Brahmbhatt, J. Gu, K. Kim, J. Hays, and J. Kautz, “ [MapNet:PoseNet + ResNet34] Geometry-Aware Learning of Maps for Camera Localization,” Cvpr, pp. 2616–2625, 2018. [35] V. Nair and G. E. Hinton, “Rectified linear units improve restricted boltzmann machines,” in Proceedings of the 27th international conference on machine learning (ICML-10), 2010, pp. 807–814. [36] D.-A. Clevert, T. Unterthiner, and S. Hochreiter, “Fast and accurate deep network learning by exponential linear units (elus),” arXiv preprint arXiv:1511.07289, 2015. [37] R. Clark, S. Wang, A. Markham, N. Trigoni, and H. Wen, “Vidloc: A deep spatio-temporal model for 6-dof video-clip relocalization,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 6856–6864. [38] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” arXiv preprint arXiv:1502.03167, 2015. [39] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: a simple way to prevent neural networks from overfitting,” The journal of machine learning research, vol. 15, no. 1, pp. 1929–1958, 2014. [40] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning. nature 521,” 2015. [41] R. Pascanu, T. Mikolov, and Y. Bengio, “On the difficulty of training recurrent neural networks,” in International conference on machine learning, 2013, pp. 1310–1318. [42] G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. R. Salakhutdinov, “Improving neural networks by preventing co-adaptation of feature detectors,” arXiv preprint arXiv:1207.0580, 2012. [43] M. Lin, Q. Chen, and S. Yan, “Network in network,” arXiv preprint arXiv:1312.4400, 2013. [44] S. L. Altmann, Rotations, quaternions, and double groups. Courier Corporation, 2005. [45] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “Mobilenets: Efficient convolutional neural networks for mobile vision applications,” arXiv preprint arXiv:1704.04861, 2017. [46] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “Mobilenetv2: Inverted residuals and linear bottlenecks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 4510– 4520. [47] Y. Gal and Z. Ghahramani, “Bayesian convolutional neural networks with bernoulli approximate variational inference,” arXiv preprint arXiv:1506.02158, 2015. [48] D. Van Ravenzwaaij, P. Cassey, and S. D. Brown, “A simple introduction to markov chain monte–carlo sampling,” Psychonomic bulletin review, vol. 25, no. 1, pp. 143–154, 2018. [49] Y. Gal and Z. Ghahramani, “Dropout as a bayesian approximation: Representing model uncertainty in deep learning,” in international conference on machine learning, 2016, pp. 1050–1059. [50] E. Brachmann and C. Rother, “Learning Less is More - 6D Camera Localization via 3D Surface Regression,” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 4654–4662, 2018. [51] N. Snavely, S. M. Seitz, and R. Szeliski, “Modeling the world from Internet photo collections,” International Journal of Computer Vision, 2008. [52] F. Dellaert, D. Fox, W. Burgard, and S. Thrun, “Monte Carlo localization for mobile robots,” Proceedings - IEEE International Conference on Robotics and Automation, 1999. [53] B. Zeisl, T. Sattler, and M. Pollefeys, “Camera pose voting for large-scale image-based localization,” in Proceedings of the IEEE International Conference on Computer Vision, 2015. [54] M. Quigley, K. Conley, B. Gerkey, J. Faust, T. Foote, J. Leibs, R. Wheeler, and A. Y. Ng, “Ros: an open-source robot operating system,” in ICRA workshop on open source software, vol. 3, no. 3.2. Kobe, Japan, 2009, p. 5. [55] N. Koenig and A. Howard, “Design and use paradigms for gazebo, an opensource multi-robot simulator,” in 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)(IEEE Cat. No. 04CH37566), vol. 3. IEEE, 2004, pp. 2149–2154. [56] G. Grisetti, C. Stachniss, and W. Burgard, “Improved techniques for grid mapping with rao-blackwellized particle filters,” IEEE transactions on Robotics, vol. 23, no. 1, pp. 34–46, 2007.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/52975	-
dc.description.abstract	本論文提出了實時智慧型載具視覺定位系統。想法來自人類如何使用可見的地標確定其位置。智慧型載具包括自動駕駛汽車和自主移動機器人，兩者都需要實時，準確且高強健性的定位系統。自主移動機器人通常依賴於二維光學雷達，但是在許多情況下，例如走廊，二維光學雷達無法獲得足夠的特徵或地標進行定位。而相機可以獲取不同的特徵，例如告示板、水管、燈，甚至是遠方的消失點。這些特徵是建築物與生俱來的，並且是有用的定位特徵。因此，添加相機功能將是提高定位性能的合理方法。另外，自動駕駛汽車通常是利用三維光學雷達或 GPS/GNSS 來定位。儘管基於三維光學雷達的定位算法可提供準確的定位結果，但光達在實際應用中會遇到初始位置問題（或全局定位問題）、綁架問題、高計算成本和高成本。本研究的視覺定位系統旨在為室內和室外應用提供低成本、可靠、實時且準確的解決方案。該系統訓練卷積神經網絡，以在端到端學習中從單個 RGB 圖像估計智能載具的姿態和不確定性，而無需進行額外的特徵工程或圖形優化。與粒子濾波和條件隨機場不同，大多數深度學習回歸模型中沒有評估不確定性的方法。該不確定性對於防止智慧車輛故障很重要。因此，本文提出了一種解決該問題的方法。本研究的視覺定位系統是通過用於嵌入式視覺應用的高效輕巧的深度卷積神經網絡來實現，驗證卷積神經網絡可用於解決複雜的視覺定位問題。實驗結果表明，我們的視覺定位系統可以在給定環境中全局重新定位，從而解決了迷路、綁架和初始定位問題。此外，該系統已通過實驗驗證，包括我們的室內自主移動機器人和公開的室內及室外資料集。對於大型室外場景，它的精度約為 1.74m 和 7.01°，而在室內，精度則為 2.7m 和 9.27°。它還可以在嵌入式系統（Nvidia Jetson Xavier）上實時運行，每幀花費 49ms 的時間進行計算（等同 20.27fps）。	zh_TW
dc.description.abstract	This thesis proposes a real-time autonomous intelligent vehicle visual localiza-tion system. The idea comes from how humans determine their location using visible landmarks. Autonomous intelligent vehicles include self-driving cars and autonomous mobile robots (AMR), both require real-time, accurate, and robust localization system. AMRs often rely on two-dimensional light detection and rang-ing (2D LiDAR). However, in many scenarios, such as corridors, the 2D LiDAR cannot get enough features or landmarks to localize. Alternatively, the camera can obtain different features, such as notice boards, pipes, ceiling lights, and even the vanishing lines. These features are born with the building and are useful features for localization. Thus, adding camera features will be a reasonable idea to improve localization performance. Self-driving cars often depend on the three-dimensional light detection and ranging (3D LiDAR) or GPS/GNSS. Although 3D LiDAR-based localization algorithms offer accurate localization results, they suffer from the initial pose problem (or global localization problem), kidnapped problem, high computation costs, and high costs during real applications. The proposed visual localization system aims to provide a low-cost, robust, real-time, and accurate solution for indoor and outdoor applications. The system trains a convolutional neural network to estimate the intelligent vehicles’ pose and the uncertainty from a single RGB image in end-to-end learning with no need for additional feature engineering or graph optimization. Unlike particle filtering and conditional random fields, there are no methods of evaluating uncertainty among most deep learning regression models. The uncertainty is significant to prevent an intelligent vehicle from failure. So, this paper proposes a new method to address this issue. The proposed system is achieved by using an efficient and lightweight deep convolutional neural network for embedded vision applications, demonstrat-ing that a convolution neural network can be used to solve complicated visual localization problems. The experiment results show that our visual localization system can globally relocalize within a given environment, which solves the lost, kidnapped, and initial pose problems. Also, it can operate real-time for both indoor and outdoor. Furthermore, the proposed system is verified through experiments, including our indoor AMR and public indoor and outdoor datasets. It achieves approximately 1.74m and 7.02 accuracy for large scale outdoor scenes and 0.31m and 9.27 accuracy indoors. It also operates in real-time, taking 49ms per frame to compute (20.27fps) on the embedded system (Nvidia Jetson Xavier).	en
dc.description.provenance	Made available in DSpace on 2021-06-15T16:37:05Z (GMT). No. of bitstreams: 1 U0001-0508202016224300.pdf: 12944876 bytes, checksum: 6124dc8993c109868a7a37baedba1c38 (MD5) Previous issue date: 2020	en
dc.description.tableofcontents	Abstract i List of Figures vi List of Tables viii 1 Introduction 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Sensors for Localization . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Visual Localization . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.4 Uncertainty in Deep Learning for Visual Localization . . . . . . . 5 1.5 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2 RelatedWork 7 2.1 Traditional Method . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.1.1 Image Retrieval . . . . . . . . . . . . . . . . . . . . . . . 7 2.1.2 Structure-Based Approaches . . . . . . . . . . . . . . . . 8 2.1.3 Visual SLAM . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2 Deep Learning-based Method . . . . . . . . . . . . . . . . . . . . 10 2.2.1 Absolute Pose Regression . . . . . . . . . . . . . . . . . 10 2.2.2 Auxiliary Learning . . . . . . . . . . . . . . . . . . . . . 13 3 Deep Learning Model for Intelligent Visual Localization 17 3.1 Convolutional Neural Network (CNN) . . . . . . . . . . . . . . . 17 3.1.1 Convolution Layer . . . . . . . . . . . . . . . . . . . . . 18 3.1.2 Pooling Layer . . . . . . . . . . . . . . . . . . . . . . . . 21 3.1.3 Activation Function . . . . . . . . . . . . . . . . . . . . . 22 3.1.4 Batch Normalization . . . . . . . . . . . . . . . . . . . . 23 3.1.5 Dropout . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.1.6 Fully Connected Layer . . . . . . . . . . . . . . . . . . . 26 3.1.7 Global Average Pooling . . . . . . . . . . . . . . . . . . 27 3.2 Pose Representation . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.2.1 Euler Angles . . . . . . . . . . . . . . . . . . . . . . . . 28 3.2.2 Rotation Matrices . . . . . . . . . . . . . . . . . . . . . . 30 3.2.3 Quaternions . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.3 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . 31 3.3.1 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . 32 3.4 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.4.1 Depthwise Separable Convolution . . . . . . . . . . . . . 34 3.4.2 MobileNetV2 Feature Extractor . . . . . . . . . . . . . . 37 3.4.3 Localizer . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.4.4 Uncertainty in Deep Learning . . . . . . . . . . . . . . . 39 3.4.5 Loss Function . . . . . . . . . . . . . . . . . . . . . . . . 42 4 Experiment Results 44 4.1 Analysis of Model Parameters and Computation Cost . . . . . . . 44 4.2 Localization Simulation Results . . . . . . . . . . . . . . . . . . 45 4.2.1 7-Scenes . . . . . . . . . . . . . . . . . . . . . . . . . . 46 4.2.2 7-Scenes Localization Results . . . . . . . . . . . . . . . 47 4.2.3 Cambridge Landmarks . . . . . . . . . . . . . . . . . . . 48 4.2.4 Cambridge Landmarks Localization Results . . . . . . . . 49 4.3 Hardware Platform . . . . . . . . . . . . . . . . . . . . . . . . . 50 4.3.1 Autonomous Mobile Robot(AMR) . . . . . . . . . . . . . 50 4.4 Localization Experiment Results . . . . . . . . . . . . . . . . . . 52 4.4.1 YL Corridor . . . . . . . . . . . . . . . . . . . . . . . . 52 4.4.2 YL Corridor Localization Results . . . . . . . . . . . . . 54 4.4.3 Uncertainty Evaluation . . . . . . . . . . . . . . . . . . . 55 5 Conclusion and Future Work 58 5.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 Reference 60
dc.language.iso	en
dc.title	自主移動機器人之實時視覺定位與不確定性估測系統	zh_TW
dc.title	Autonomous Mobile Robot Real-time Visual Localization and Uncertainty Estimation System	en
dc.type	Thesis
dc.date.schoolyear	108-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	鄭榮和(Jung-Ho Cheng),蘇偉儁(Wei-Jiun Su)
dc.subject.keyword	視覺定位系統,深度卷積神經網路,不確定性估測,	zh_TW
dc.subject.keyword	Visual Localization System,Deep Convolutional Neural Network,Uncertainty Estimation,	en
dc.relation.page	67
dc.identifier.doi	10.6342/NTU202002479
dc.rights.note	有償授權
dc.date.accepted	2020-08-06
dc.contributor.author-college	工學院	zh_TW
dc.contributor.author-dept	機械工程學研究所	zh_TW
顯示於系所單位：	機械工程學系

文件中的檔案：

檔案	大小	格式
U0001-0508202016224300.pdf 目前未授權公開取用	12.64 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。