Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊網路與多媒體研究所
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/8215
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor洪一平(Yi-Ping Hung)
dc.contributor.authorYu-Lin Tsaien
dc.contributor.author蔡侑霖zh_TW
dc.date.accessioned2021-05-20T00:50:12Z-
dc.date.available2020-08-24
dc.date.available2021-05-20T00:50:12Z-
dc.date.copyright2020-08-24
dc.date.issued2020
dc.date.submitted2020-08-14
dc.identifier.citation[1] D. G. Lowe, 'Distinctive image features from scale-invariant keypoints,' International journal of computer vision, vol. 60, no. 2, pp. 91-110, 2004.
[2] A. Kendall, M. Grimes, and R. Cipolla, 'Posenet: A convolutional network for real-time 6-dof camera relocalization,' in Proceedings of the IEEE international conference on computer vision, 2015, pp. 2938-2946.
[3] Z. Laskar, I. Melekhov, S. Kalia, and J. Kannala, 'Camera relocalization by computing pairwise relative poses using convolutional neural network,' in Proceedings of the IEEE International Conference on Computer Vision Workshops, 2017, pp. 929-938.
[4] P.-E. Sarlin, C. Cadena, R. Siegwart, and M. Dymczyk, 'From coarse to fine: Robust hierarchical localization at large scale,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 12716-12725.
[5] T. Sattler, B. Leibe, and L. Kobbelt, 'Efficient effective prioritized matching for large-scale image-based localization,' IEEE transactions on pattern analysis and machine intelligence, vol. 39, no. 9, pp. 1744-1756, 2016.
[6] T. Sattler et al., 'Are large-scale 3D models really necessary for accurate visual localization?,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 1637-1646.
[7] M. A. Fischler and R. C. Bolles, 'Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography,' Communications of the ACM, vol. 24, no. 6, pp. 381-395, 1981.
[8] E. Brachmann et al., 'Dsac-differentiable ransac for camera localization,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 6684-6692.
[9] E. Brachmann and C. Rother, 'Learning less is more-6d camera localization via 3d surface regression,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 4654-4662.
[10] H. Taira et al., 'InLoc: Indoor visual localization with dense matching and view synthesis,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 7199-7209.
[11] L. Liu, H. Li, and Y. Dai, 'Efficient global 2d-3d matching for camera localization in a large-scale 3d map,' in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 2372-2381.
[12] J. Shotton, B. Glocker, C. Zach, S. Izadi, A. Criminisi, and A. Fitzgibbon, 'Scene coordinate regression forests for camera relocalization in RGB-D images,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2013, pp. 2930-2937.
[13] D. Massiceti, A. Krull, E. Brachmann, C. Rother, and P. H. Torr, 'Random forests versus Neural Networks—What's best for camera localization?,' in 2017 IEEE International Conference on Robotics and Automation (ICRA), 2017: IEEE, pp. 5118-5125.
[14] H. Jégou, M. Douze, C. Schmid, and P. Pérez, 'Aggregating local descriptors into a compact image representation,' in 2010 IEEE computer society conference on computer vision and pattern recognition, 2010: IEEE, pp. 3304-3311.
[15] T. Kanungo, D. M. Mount, N. S. Netanyahu, C. D. Piatko, R. Silverman, and A. Y. Wu, 'An efficient k-means clustering algorithm: Analysis and implementation,' IEEE transactions on pattern analysis and machine intelligence, vol. 24, no. 7, pp. 881-892, 2002.
[16] R. Arandjelovic, P. Gronat, A. Torii, T. Pajdla, and J. Sivic, 'NetVLAD: CNN architecture for weakly supervised place recognition,' in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 5297-5307.
[17] V. Balntas, S. Li, and V. Prisacariu, 'Relocnet: Continuous metric learning relocalisation using neural nets,' in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 751-767.
[18] T. Sattler, Q. Zhou, M. Pollefeys, and L. Leal-Taixe, 'Understanding the limitations of cnn-based absolute camera pose regression,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 3302-3312.
[19] F. Walch, C. Hazirbas, L. Leal-Taixe, T. Sattler, S. Hilsenbeck, and D. Cremers, 'Image-based localization using lstms for structured feature correlation,' in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 627-637.
[20] H. Sak, A. W. Senior, and F. Beaufays, 'Long short-term memory recurrent neural network architectures for large scale acoustic modeling,' 2014.
[21] J. F. Henriques and A. Vedaldi, 'Mapnet: An allocentric spatial memory for mapping environments,' in proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 8476-8484.
[22] A. Valada, N. Radwan, and W. Burgard, 'Deep auxiliary learning for visual localization and odometry,' in 2018 IEEE international conference on robotics and automation (ICRA), 2018: IEEE, pp. 6939-6946.
[23] N. Radwan, A. Valada, and W. Burgard, 'Vlocnet++: Deep multitask learning for semantic visual localization and odometry,' IEEE Robotics and Automation Letters, vol. 3, no. 4, pp. 4407-4414, 2018.
[24] E. Rosten and T. Drummond, 'Machine learning for high-speed corner detection,' in European conference on computer vision, 2006: Springer, pp. 430-443.
[25] E. Rublee, V. Rabaud, K. Konolige, and G. Bradski, 'ORB: An efficient alternative to SIFT or SURF,' in 2011 International conference on computer vision, 2011: Ieee, pp. 2564-2571.
[26] M. Calonder, V. Lepetit, C. Strecha, and P. Fua, 'Brief: Binary robust independent elementary features,' in European conference on computer vision, 2010: Springer, pp. 778-792.
[27] R. Mur-Artal, J. M. M. Montiel, and J. D. Tardos, 'ORB-SLAM: a versatile and accurate monocular SLAM system,' IEEE transactions on robotics, vol. 31, no. 5, pp. 1147-1163, 2015.
[28] I. Rocco, M. Cimpoi, R. Arandjelović, A. Torii, T. Pajdla, and J. Sivic, 'Neighbourhood consensus networks,' in Advances in Neural Information Processing Systems, 2018, pp. 1651-1662.
[29] K. M. Yi, E. Trulls, V. Lepetit, and P. Fua, 'Lift: Learned invariant feature transform,' in European Conference on Computer Vision, 2016: Springer, pp. 467-483.
[30] D. DeTone, T. Malisiewicz, and A. Rabinovich, 'Superpoint: Self-supervised interest point detection and description,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2018, pp. 224-236.
[31] D. Eigen, C. Puhrsch, and R. Fergus, 'Depth map prediction from a single image using a multi-scale deep network,' in Advances in neural information processing systems, 2014, pp. 2366-2374.
[32] I. Laina, C. Rupprecht, V. Belagiannis, F. Tombari, and N. Navab, 'Deeper depth prediction with fully convolutional residual networks,' in 2016 Fourth international conference on 3D vision (3DV), 2016: IEEE, pp. 239-248.
[33] T. Zhou, M. Brown, N. Snavely, and D. G. Lowe, 'Unsupervised learning of depth and ego-motion from video,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 1851-1858.
[34] Y. Zou, Z. Luo, and J.-B. Huang, 'Df-net: Unsupervised joint learning of depth and flow using cross-task consistency,' in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 36-53.
[35] J. Bian et al., 'Unsupervised scale-consistent depth and ego-motion learning from monocular video,' in Advances in neural information processing systems, 2019, pp. 35-45.
[36] A. Torii, R. Arandjelovic, J. Sivic, M. Okutomi, and T. Pajdla, '24/7 place recognition by view synthesis,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 1808-1817.
[37] T. Sattler et al., 'Benchmarking 6dof outdoor visual localization in changing conditions,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 8601-8610.
[38] F. Radenović, G. Tolias, and O. Chum, 'CNN image retrieval learns from BoW: Unsupervised fine-tuning with hard examples,' in European conference on computer vision, 2016: Springer, pp. 3-20.
[39] F. Radenović, G. Tolias, and O. Chum, 'Fine-tuning CNN image retrieval with no human annotation,' IEEE transactions on pattern analysis and machine intelligence, vol. 41, no. 7, pp. 1655-1668, 2018.
[40] Q. Zhou, T. Sattler, M. Pollefeys, and L. Leal-Taixe, 'To Learn or Not to Learn: Visual Localization from Essential Matrices,' arXiv preprint arXiv:1908.01293, 2019.
[41] D. Nistér, 'An efficient solution to the five-point relative pose problem,' IEEE transactions on pattern analysis and machine intelligence, vol. 26, no. 6, pp. 756-770, 2004.
[42] C. Godard, O. Mac Aodha, and G. J. Brostow, 'Unsupervised monocular depth estimation with left-right consistency,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 270-279.
[43] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, 'Image quality assessment: from error visibility to structural similarity,' IEEE transactions on image processing, vol. 13, no. 4, pp. 600-612, 2004.
[44] P.-E. Sarlin, F. Debraine, M. Dymczyk, R. Siegwart, and C. Cadena, 'Leveraging deep visual descriptors for hierarchical efficient localization,' arXiv preprint arXiv:1809.01019, 2018.
[45] R. A. Newcombe et al., 'KinectFusion: Real-time dense surface mapping and tracking,' in 2011 10th IEEE International Symposium on Mixed and Augmented Reality, 2011: IEEE, pp. 127-136.
[46] A. Dai, A. X. Chang, M. Savva, M. Halber, T. Funkhouser, and M. Nießner, 'Scannet: Richly-annotated 3d reconstructions of indoor scenes,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 5828-5839.
[47] H. Noh, A. Araujo, J. Sim, T. Weyand, and B. Han, 'Large-scale image retrieval with attentive deep local features,' in Proceedings of the IEEE international conference on computer vision, 2017, pp. 3456-3465.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/8215-
dc.description.abstract基於影像的定位是希望透過影像資訊推測相機自我位置的問題,同時,對於自駕車、擴增實境、智慧機器人來說是一個關鍵且基礎的技術。近年來,隨著算力的提升和深度學習的發展,許多研究嘗試利用卷積網路強大的特徵描述能力來幫助相機自我定位。然而,當使用場域改變時,這些方法都必須花很多力氣和時間重新訓練其模型,同時顯示其泛化能力相當受到限制。基於圖像搜索概念的定位架構提升了在不同場景的泛化能力,但在預測相機相對位置時而會受限於場景。我們基於圖像檢索的概念提出了一個相機定位的架構,在計算相機相對位置時討論了更多傳統的空間幾何。同時,我們也嘗試用深度學習的方法預測影像深度資訊並加強了我們方法的定位精準度。實驗結果顯示我們的方法和現在最先進的方法有並駕齊驅的定位能力,此外,利用模型壓縮讓我們的定位流程能達到幾乎即時運行。因此,我們認為融合傳統相機方法和深度學習是一個相當有潛力的發展方向。
zh_TW
dc.description.abstractImage-based localization is used to estimate the camera poses within a specific scene coordinate, which is a fundamental technology towards augmented reality, autonomous driving, or mobile robotics. As the advancement of deep learning, end-to-end approaches based on convolutional neural networks have been well developed. However, these methods suffer from the overhead of reconstructing models while been applied to unseen scene. Therefore, image retrieval-based localization approaches have been proposed with generalization capability. In this paper, we follow the concept of image retrieval-based methods and adopt traditional geometry calculation while performing relative pose estimation. We also use the depth information predicted from deep learning methods to enhance the localization performance. The experimental result in indoor dataset shows the state-of-the-art accuracy. Furthermore, by distilling and sharing the encoder of global and local feature, we make our system possible for real-time application. Our method shows great potential to leverage traditional geometric knowledge and deep learning methods.
en
dc.description.provenanceMade available in DSpace on 2021-05-20T00:50:12Z (GMT). No. of bitstreams: 1
U0001-1308202017410300.pdf: 11315996 bytes, checksum: 54aef6834ee65bf379679fb988579353 (MD5)
Previous issue date: 2020
en
dc.description.tableofcontents
口試委員會審定書 #
誌謝 i
中文摘要 ii
ABSTRACT iii
CONTENTS iv
LIST OF FIGURES vi
LIST OF TABLES vii
Chapter 1 Introduction 1
Chapter 2 Related Work 4
2.1 6-DoF Visual Localization 4
2.1.1 Structure-based Localization 4
2.1.2 Image Retrieval Localization 4
2.1.3 Absolute Camera Pose Regression 6
2.2 Local Features 6
2.3 Depth Estimation 7
Chapter 3 Method 9
3.1 Pipeline 9
3.2 Image Retrieval 10
3.3 Feature Extraction and Matching 11
3.4 Relative Pose Estimation 12
3.4.1 2D-2D case 12
3.4.2 2D-3D case 16
3.5 Depth Estimation 17
3.5.1 Depth Estimation Framework 17
3.5.2 Loss Term 18
3.6 Model Distillation 20
Chapter 4 Experiments 21
4.1 Dataset 21
4.2 Localization Result 21
4.2.1 Image Retrieval 22
4.2.2 2D-2D case 22
4.2.3 2D-3D case 24
4.3 Computational Cost 26
4.4 Depth Estimation 27
4.5 Generalization 28
Chapter 5 Conclusion 29
Chapter 6 Future Work 30
REFERENCES 31
dc.language.isoen
dc.title基於深度學習方法之相機定位zh_TW
dc.titleCamera Re-Localization with Deep Learning Methodsen
dc.typeThesis
dc.date.schoolyear108-2
dc.description.degree碩士
dc.contributor.oralexamcommittee莊仁輝(Jen-Hui Chuang),郭景明(Jing-Ming Guo),歐陽明(Ming Ouhyoung),李明穗(Ming-Sui Lee)
dc.subject.keyword基於影像的定位,相機定位,深度學習,擴增實境,zh_TW
dc.subject.keywordImage-based localization,Camera pose estimation,Deep learning,Augmented Reality,en
dc.relation.page36
dc.identifier.doi10.6342/NTU202003304
dc.rights.note同意授權(全球公開)
dc.date.accepted2020-08-14
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept資訊網路與多媒體研究所zh_TW
顯示於系所單位:資訊網路與多媒體研究所

文件中的檔案:
檔案 大小格式 
U0001-1308202017410300.pdf11.05 MBAdobe PDF檢視/開啟
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved