使用特徵加權機制調整特徵相關性的影像定位

Chun-Hsien Li; 李俊賢

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/58938

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	洪一平(Yi-Ping Hung)
dc.contributor.author	Chun-Hsien Li	en
dc.contributor.author	李俊賢	zh_TW
dc.date.accessioned	2021-06-16T08:39:53Z	-
dc.date.available	2020-07-17
dc.date.copyright	2020-07-17
dc.date.issued	2020
dc.date.submitted	2020-07-09
dc.identifier.citation	[1] A. Kendall, M. Grimes, and R. Cipolla, 'Posenet: A convolutional network for real-time 6-dof camera relocalization,' in Proceedings of the IEEE international conference on computer vision, 2015, pp. 2938-2946. [2] F. Walch, C. Hazirbas, L. Leal-Taixe, T. Sattler, S. Hilsenbeck, and D. Cremers, 'Image-based localization using lstms for structured feature correlation,' in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 627-637. [3] S. Hochreiter and J. Schmidhuber, 'Long short-term memory,' Neural computation, vol. 9, no. 8, pp. 1735-1780, 1997. [4] S. Brahmbhatt, J. Gu, K. Kim, J. Hays, and J. Kautz, 'Geometry-aware learning of maps for camera localization,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 2616-2625. [5] Z. Laskar, I. Melekhov, S. Kalia, and J. Kannala, 'Camera relocalization by computing pairwise relative poses using convolutional neural network,' in Proceedings of the IEEE International Conference on Computer Vision Workshops, 2017, pp. 929-938. [6] S. Saha, G. Varma, and C. Jawahar, 'Improved visual relocalization by discovering anchor points,' arXiv preprint arXiv:1811.04370, 2018. [7] A. Kendall and R. Cipolla, 'Geometric loss functions for camera pose regression with deep learning,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 5974-5983. [8] N. Snavely, S. M. Seitz, and R. Szeliski, 'Photo tourism: exploring photo collections in 3D,' in ACM Siggraph 2006 Papers, 2006, pp. 835-846. [9] T. Sattler, M. Havlena, F. Radenovic, K. Schindler, and M. Pollefeys, 'Hyperpoints and fine vocabularies for large-scale location recognition,' in Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 2102-2110. [10] T. Sattler, B. Leibe, and L. Kobbelt, 'Efficient effective prioritized matching for large-scale image-based localization,' IEEE transactions on pattern analysis and machine intelligence, vol. 39, no. 9, pp. 1744-1756, 2016. [11] Y. Li, N. Snavely, D. Huttenlocher, and P. Fua, 'Worldwide pose estimation using 3d point clouds,' in European conference on computer vision, 2012: Springer, pp. 15-29. [12] L. Svärm, O. Enqvist, F. Kahl, and M. Oskarsson, 'City-scale localization for cameras with known vertical direction,' IEEE transactions on pattern analysis and machine intelligence, vol. 39, no. 7, pp. 1455-1461, 2016. [13] B. Zeisl, T. Sattler, and M. Pollefeys, 'Camera pose voting for large-scale image-based localization,' in Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 2704-2712. [14] S. Cao and N. Snavely, 'Graph-based discriminative learning for location recognition,' in Proceedings of the ieee conference on computer vision and pattern recognition, 2013, pp. 700-707. [15] A. Irschara, C. Zach, J.-M. Frahm, and H. Bischof, 'From structure-from-motion point clouds to fast location recognition,' in 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009: IEEE, pp. 2599-2606. [16] R. Arandjelovic, P. Gronat, A. Torii, T. Pajdla, and J. Sivic, 'NetVLAD: CNN architecture for weakly supervised place recognition,' in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 5297-5307. [17] F. Radenović, G. Tolias, and O. Chum, 'Fine-tuning CNN image retrieval with no human annotation,' IEEE transactions on pattern analysis and machine intelligence, vol. 41, no. 7, pp. 1655-1668, 2018. [18] A. Torii, R. Arandjelovic, J. Sivic, M. Okutomi, and T. Pajdla, '24/7 place recognition by view synthesis,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 1808-1817. [19] C. Szegedy et al., 'Going deeper with convolutions,' in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1-9. [20] B. Zhou, A. Lapedriza, J. Xiao, A. Torralba, and A. Oliva, 'Learning deep features for scene recognition using places database,' in Advances in neural information processing systems, 2014, pp. 487-495. [21] A. Paszke et al., 'Automatic differentiation in pytorch,' 2017. [22] S. Izadi et al., 'KinectFusion: real-time 3D reconstruction and interaction using a moving depth camera,' in Proceedings of the 24th annual ACM symposium on User interface software and technology, 2011, pp. 559-568.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/58938	-
dc.description.abstract	利用深度學習機制來達到以影像為基礎的定位是近幾年定位研究的趨勢，這是由於深度學習架構可以利用圖像處理器平行化之後快速運行，來達到實時的效果，同時不需要隨著時間而消耗更多的記憶體資源，只需要讓固定大小的深度模型去認知一個場景的內容，就可以用深度模型中的卷積層去模擬傳統定位的幾何運算。在本論文中，我們主要開發一個端到端的定位系統，透過整合特徵加權機制以及長短期記憶模型，並搭配可以自動學習位置與角度之間的尺度比重的損失函數，來達到好的定位效果。在此之後，我們呈現了此系統在室外資料集以及室內資料集的定位表現，並將結果跟經典的兩個深度學習定位系統做比較。其結果顯示，在中位數誤差表現上，我們不管在室外資料集或是室內資料集上都比前兩者還要更進步，同時我們的誤差軌跡圖也顯示出我們有不錯的提升。最後，我們的運行速度也能保持在平均每秒九十幀以上，是足夠做實時運行的。	zh_TW
dc.description.abstract	The use of deep learning mechanisms to achieve image-based localization is the trend of localization research in recent years. This is because the deep learning architecture can run more quickly after parallelization of GPU to achieve real-time running, and does not need to consume more memory resources over time. It only needs a fixed-size model to recognize the contents of a scene, and simulates the traditional localization geometric operations by the aid of the convolutional layers in the model. In this paper, we mainly develop an end-to-end localization system, which integrates the feature-weighted mechanism and long short-term memory models, and uses a loss function that can automatically learns the scale weights between position and rotation to achieve a good localization result. After this, we presented the localization performance of this system in outdoor dataset and indoor dataset, and compare the results with two classic deep learning localization system. The results show that in terms of the median error performance, we are more advanced than the former in both outdoor and indoor. At the same time, our error trajectory shows that we have a good improvement. Finally, our running speed can also reach more than ninety fps on average, which is sufficient for real-time operation.	en
dc.description.provenance	Made available in DSpace on 2021-06-16T08:39:53Z (GMT). No. of bitstreams: 1 U0001-0807202021152700.pdf: 3520570 bytes, checksum: fd941aaf8729849fd02130c8d0659c3b (MD5) Previous issue date: 2020	en
dc.description.tableofcontents	口試委員會審定書 # 誌謝 i 中文摘要 ii ABSTRACT iii CONTENTS iv LIST OF FIGURES vi LIST OF TABLES viiii Chapter 1 Introduction 1 Chapter 2 Related Works 3 2.1 Structured-Based Localization 3 2.2 Image Retrieval Localization 4 2.3 Deep Learning Localization 4 2.3.1 Absolute Camera Pose Regression 5 2.3.2 Relative Camera Pose Regression 6 Chapter 3 Proposed Featured-Weighted Model 8 3.1 Congenital Defects of LSTM 8 3.2 Network Architecture 10 3.2.1 Edited PoseLSTM 11 3.2.2 Feature-Weighted Mechanism 12 3.3 Loss Function 13 3.3.1 Shortcoming of the Loss Function used by PoseNet 14 3.3.2 Learn Weight Loss Function 15 3.4 Data Augmentation and Hyper-parameters 16 3.4.1 Data Augmentation 17 3.4.2 Hyper-parameters 17 Chapter 4 Experiments 19 4.1 Datasets 19 4.1.1 Outdoor Dataset 17 4.1.2 Indoor Dataset 20 4.2 Outdoor Analysis 20 4.2.1 Median Error 20 4.2.2 Cumulative Error 21 4.2.3 Trajectory Visualization 23 4.2.4 Ablation Study 17 4.3 Indoor Analysis 24 4.3.1 Median Error 25 4.3.2 Cumulative Error 25 4.3.3 Trajectory Visualization 26 4.3.4 Ablation Study 27 4.4 Time Analysis 28 Chapter 5 Conclusions 29 Chapter 5 Future Works 30 APPENDICES 31 APPENDIX A Cumulative Error 32 APPENDIX B Trajectory Visualization 35 REFERENCES 40
dc.language.iso	en
dc.subject	相機姿態估計	zh_TW
dc.subject	特徵加權機制	zh_TW
dc.subject	深度學習	zh_TW
dc.subject	基於影像的定位	zh_TW
dc.subject	Image-based localization	en
dc.subject	Deep learning	en
dc.subject	Feature-weighted mechanism	en
dc.subject	Camera pose estimation	en
dc.title	使用特徵加權機制調整特徵相關性的影像定位	zh_TW
dc.title	Image-based Localization using Feature-Weighted Mechanism for Adjusting Feature Correlation	en
dc.type	Thesis
dc.date.schoolyear	108-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	莊永裕(Yung-Yu Chuang),王鈺強(Yu-Chiang Wang),陳祝嵩(Chu-Song Chen),陳冠文(Kuan-Wen Chen)
dc.subject.keyword	基於影像的定位,深度學習,特徵加權機制,相機姿態估計,	zh_TW
dc.subject.keyword	Image-based localization,Deep learning,Feature-weighted mechanism,Camera pose estimation,	en
dc.relation.page	42
dc.identifier.doi	10.6342/NTU202001396
dc.rights.note	有償授權
dc.date.accepted	2020-07-10
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	資訊工程學研究所	zh_TW
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
U0001-0807202021152700.pdf 未授權公開取用	3.44 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。