使用 RGB-D 雙流網路與人工設計的基底位姿之攝影機絕對位姿回歸

Rong-Rong Zhang; 張蓉蓉

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/80929

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	洪一平(Yi-Ping Hung)
dc.contributor.author	Rong-Rong Zhang	en
dc.contributor.author	張蓉蓉	zh_TW
dc.date.accessioned	2022-11-24T03:22:22Z	-
dc.date.available	2021-11-08
dc.date.available	2022-11-24T03:22:22Z	-
dc.date.copyright	2021-11-08
dc.date.issued	2021
dc.date.submitted	2021-10-03
dc.identifier.citation	[1] J. Shotton, B. Glocker, C. Zach, S. Izadi, A. Criminisi, and A. Fitzgibbon, “Scene coordinate regression forests for camera relocalization in rgb-d images,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2930– 2937, 2013. [2] L. Kneip, D. Scaramuzza, and R. Siegwart, “A novel parametrization of the perspective-three-point problem for a direct computation of absolute camera position and orientation,” in CVPR 2011, pp. 2969–2976, IEEE, 2011. [3] M. A. Fischler and R. C. Bolles, “Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography,” Communications of the ACM, vol. 24, no. 6, pp. 381–395, 1981. [4] D. DeTone, T. Malisiewicz, and A. Rabinovich, “Superpoint: Self-supervised interest point detection and description,” in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp. 224–236, 2018. [5] M. Dusmanu, I. Rocco, T. Pajdla, M. Pollefeys, J. Sivic, A. Torii, and T. Sattler, “D2-net: A trainable cnn for joint description and detection of local features,” in Proceedings of the ieee/cvf conference on computer vision and pattern recognition, pp. 8092–8101, 2019. [6] J. Revaud, C. De Souza, M. Humenberger, and P. Weinzaepfel, “R2d2: Reliable and repeatable detector and descriptor,” Advances in neural information processing systems, vol. 32, pp. 12405–12415, 2019. [7] Z. Luo, L. Zhou, X. Bai, H. Chen, J. Zhang, Y. Yao, S. Li, T. Fang, and L. Quan, “Aslfeat: Learning local features of accurate shape and localization,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 6589– 6598, 2020. [8] T. Cavallari, L. Bertinetto, J. Mukhoti, P. Torr, and S. Golodetz, “Let’s take this online: Adapting scene coordinate regression network predictions for online rgb-d camera relocalisation,” in 2019 International Conference on 3D Vision (3DV), pp. 564–573, IEEE, 2019. [9] X. Li, J. Ylioinas, J. Verbeek, and J. Kannala, “Scene coordinate regression with angle-based reprojection loss for camera relocalization,” in Proceedings of the European Conference on Computer Vision (ECCV) Workshops, pp. 0–0, 2018. [10] E. Brachmann, A. Krull, S. Nowozin, J. Shotton, F. Michel, S. Gumhold, and C. Rother, “Dsac-differentiable ransac for camera localization,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6684–6692, 2017. [11] E. Brachmann and C. Rother, “Learning less is more-6d camera localization via 3d surface regression,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4654–4662, 2018. [12] E. Brachmann and C. Rother, “Visual camera re-localization from rgb and rgb-d images using dsac,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021. [13] A. Guzman-Rivera, P. Kohli, B. Glocker, J. Shotton, T. Sharp, A. Fitzgibbon, and S. Izadi, “Multi-output learning for camera relocalization,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1114–1121, 2014. [14] J. Valentin, M. Nießner, J. Shotton, A. Fitzgibbon, S. Izadi, and P. H. Torr, “Exploiting uncertainty in regression forests for accurate camera relocalization,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4400– 4408, 2015. [15] L. Meng, J. Chen, F. Tung, J. J. Little, J. Valentin, and C. W. de Silva, “Backtracking regression forests for accurate camera relocalization,” in 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 6886–6893, IEEE, 2017. [16] L. Meng, F. Tung, J. J. Little, J. Valentin, and C. W. de Silva, “Exploiting points and lines in regression forests for rgb-d camera relocalization,” in 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 6827–6834, IEEE, 2018. [17] T. Cavallari, S. Golodetz, N. A. Lord, J. Valentin, L. Di Stefano, and P. H. Torr, “On-the-fly adaptation of regression forests for online camera relocalisation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4457–4466, 2017. [18] T. Cavallari, S. Golodetz, N. A. Lord, J. Valentin, V. A. Prisacariu, L. Di Stefano, and P. H. Torr, “Real-time rgb-d camera pose estimation in novel scenes using a relocalisation cascade,” IEEE transactions on pattern analysis and machine intelligence, vol. 42, no. 10, pp. 2465–2477, 2019. [19] Y. Shavit and R. Ferens, “Introduction to camera pose estimation with deep learning,” arXiv preprint arXiv:1907.05272, 2019. [20] A. Kendall, M. Grimes, and R. Cipolla, “Posenet: A convolutional network for real-time 6-dof camera relocalization,” in Proceedings of the IEEE international conference on computer vision, pp. 2938–2946, 2015. [21] A. Kendall and R. Cipolla, “Modelling uncertainty in deep learning for camera relocalization,” in 2016 IEEE international conference on Robotics and Automation (ICRA), pp. 4762–4769, IEEE, 2016. [22] A. Kendall and R. Cipolla, “Geometric loss functions for camera pose regression with deep learning,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5974–5983, 2017. [23] R. Li, Q. Liu, J. Gui, D. Gu, and H. Hu, “Indoor relocalization in challenging environments with dual-stream convolutional neural networks,” IEEE Transactions on Automation Science and Engineering, vol. 15, no. 2, pp. 651–662, 2017. [24] S. Brahmbhatt, J. Gu, K. Kim, J. Hays, and J. Kautz, “Geometry-aware learning of maps for camera localization,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2616–2625, 2018. [25] M. Tian, Q. Nie, and H. Shen, “3d scene geometry-aware constraint for camera localization with deep learning,” in 2020 IEEE International Conference on Robotics and Automation (ICRA), pp. 4211–4217, IEEE, 2020. [26] A. Valada, N. Radwan, and W. Burgard, “Deep auxiliary learning for visual localization and odometry,” in 2018 IEEE international conference on robotics and automation (ICRA), pp. 6939–6946, IEEE, 2018. [27] N. Radwan, A. Valada, and W. Burgard, “Vlocnet++: Deep multitask learning for semantic visual localization and odometry,” IEEE Robotics and Automation Letters, vol. 3, no. 4, pp. 4407–4414, 2018. [28] A. Torii, J. Sivic, and T. Pajdla, “Visual localization by linear combination of image descriptors,” in 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), pp. 102–109, IEEE, 2011. [29] Z. Laskar, I. Melekhov, S. Kalia, and J. Kannala, “Camera relocalization by computing pairwise relative poses using convolutional neural network,” in Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 929–938, 2017. [30] V. Balntas, S. Li, and V. Prisacariu, “Relocnet: Continuous metric learning relocalisation using neural nets,” in Proceedings of the European Conference on Computer Vision (ECCV), pp. 751–767, 2018. [31] S. Saha, G. Varma, and C. Jawahar, “Improved visual relocalization by discovering anchor points,” arXiv preprint arXiv:1811.04370, 2018. [32] P.-E. Sarlin, C. Cadena, R. Siegwart, and M. Dymczyk, “From coarse to fine: Robust hierarchical localization at large scale,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12716–12725, 2019. [33] T. Sattler, Q. Zhou, M. Pollefeys, and L. Leal-Taixe, “Understanding the limitations of cnn-based absolute camera pose regression,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 3302–3312, 2019. [34] A. Levin, D. Lischinski, and Y. Weiss, “Colorization using optimization,” in ACM SIGGRAPH 2004 Papers, pp. 689–694, 2004. [35] N. Silberman, D. Hoiem, P. Kohli, and R. Fergus, “Indoor segmentation and support inference from rgbd images,” in European conference on computer vision, pp. 746– 760, Springer, 2012. [36] S. Izadi, D. Kim, O. Hilliges, D. Molyneaux, R. Newcombe, P. Kohli, J. Shotton, S. Hodges, D. Freeman, A. Davison, et al., “Kinectfusion: real-time 3d reconstruction and interaction using a moving depth camera,” in Proceedings of the 24th annual ACM symposium on User interface software and technology, pp. 559–568, 2011. [37] R. A. Newcombe, S. Izadi, O. Hilliges, D. Molyneaux, D. Kim, A. J. Davison, P. Kohi, J. Shotton, S. Hodges, and A. Fitzgibbon, “Kinectfusion: Real-time dense surface mapping and tracking,” in 2011 10th IEEE international symposium on mixed and augmented reality, pp. 127–136, IEEE, 2011. [38] J. Valentin, A. Dai, M. Nießner, P. Kohli, P. Torr, S. Izadi, and C. Keskin, “Learning to navigate the energy landscape,” arXiv preprint arXiv:1603.05772, 2016. [39] M. Nießner, M. Zollhöfer, S. Izadi, and M. Stamminger, “Real-time 3d reconstruction at scale using voxel hashing,” ACM Transactions on Graphics (ToG), vol. 32, no. 6, pp. 1–11, 2013. [40] W. Maddern, G. Pascoe, C. Linegar, and P. Newman, “1 year, 1000 km: The oxford robotcar dataset,” The International Journal of Robotics Research, vol. 36, no. 1, pp. 3–15, 2017. [41] M. Hu, S. Wang, B. Li, S. Ning, L. Fan, and X. Gong, “Penet: Towards precise and efficient image guided depth completion,” arXiv preprint arXiv:2103.00783, 2021. [42] J. Park, K. Joo, Z. Hu, C.-K. Liu, and I. So Kweon, “Non-local spatial propagation network for depth completion,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIII 16, pp. 120– 136, Springer, 2020. [43] W. Van Gansbeke, D. Neven, B. De Brabandere, and L. Van Gool, “Sparse and noisy lidar completion with rgb guidance and uncertainty,” in 2019 16th international conference on machine vision applications (MVA), pp. 1–6, IEEE, 2019. [44] C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “Pointnet: Deep learning on point sets for 3d classification and segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 652–660, 2017. [45] C. R. Qi, L. Yi, H. Su, and L. J. Guibas, “Pointnet++: Deep hierarchical feature learning on point sets in a metric space,” arXiv preprint arXiv:1706.02413, 2017.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/80929	-
dc.description.abstract	攝影機絕對位姿回歸是一類單一影像攝影機定位的方法。它將三維場景的資訊編碼於端到端的神經網路中，因此它估計位姿所需的時間少於基於結構的定位方法。在本論文中，我提出了一個使用 RGB-D 影像的絕對位姿回歸方法，旨在融合顏色和深度資訊以達到更準確的定位效果。我使用了雙流網路架構來分別處理彩色影像和深度影像，並結合人工設計的基底位姿來減輕網路受限於訓練資料中運動軌跡的影響。為了和現有的絕對位姿回歸方法比較，我在室內和室外資料集上評估了此方法的定位表現。實驗結果顯示此方法改善了效能。	zh_TW
dc.description.provenance	Made available in DSpace on 2022-11-24T03:22:22Z (GMT). No. of bitstreams: 1 U0001-1409202121575900.pdf: 16081195 bytes, checksum: 274d80b4cb458cd112aa721bd03a4edb (MD5) Previous issue date: 2021	en
dc.description.tableofcontents	摘要 i Abstract ii Contents iii List of Figures v List of Tables vii Chapter 1 Introduction 1 Chapter 2 Related Works 3 2.1 End-to-End Approaches 4 2.1.1 Absolute Camera Pose Regression 4 2.1.2 Multi-task Regression 10 2.2 Step-by-Step Approaches 10 2.2.1 Relative Camera Pose Regression with Image Retrieval 10 2.2.2 Structured-based Localization with Scene Coordinate Regression 11 2.2.3 Structured-based Localization with Image Retrieval 13 Chapter 3 Proposed Method 14 3.1 A Theory of Absolute Camera Pose Regression 14 3.1.1 Theoretical Model 14 3.1.2 Rotation Formalisms 15 3.1.3 Base Poses 16 3.2 Network Architecture 18 3.3 Loss Function 19 3.4 Training Mechanism 22 3.5 Implementation Details 22 3.5.1 Depth Completion 23 3.5.2 Selection of Handcrafted Base Poses 24 Chapter 4 Experiments 26 4.1 Datasets 26 4.2 Comparison with Prior Methods 28 4.3 Ablation Studies 30 4.3.1 Effect of Depth Completion 31 4.3.2 Comparison of Different Base Poses 32 4.3.3 Comparison of Different Network Architectures 34 Chapter 5 Conclusions 35 Chapter 6 Future Work 36 References 37
dc.language.iso	en
dc.subject	攝影機位姿估計	zh_TW
dc.subject	攝影機絕對位姿回歸	zh_TW
dc.subject	單一影像攝影機定位	zh_TW
dc.subject	雙流網路	zh_TW
dc.subject	人工設計的基底位姿	zh_TW
dc.subject	Camera pose estimation	en
dc.subject	Absolute camera pose regression	en
dc.subject	Single-shot camera localization	en
dc.subject	Dual-stream network	en
dc.subject	Handcrafted base poses	en
dc.title	使用 RGB-D 雙流網路與人工設計的基底位姿之攝影機絕對位姿回歸	zh_TW
dc.title	Absolute Camera Pose Regression Using RGB-D Dual-Stream Network and Handcrafted Base Poses	en
dc.date.schoolyear	109-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	陳祝嵩(Hsin-Tsai Liu),陳冠文(Chih-Yang Tseng)
dc.subject.keyword	攝影機絕對位姿回歸,單一影像攝影機定位,雙流網路,人工設計的基底位姿,攝影機位姿估計,	zh_TW
dc.subject.keyword	Absolute camera pose regression,Single-shot camera localization,Dual-stream network,Handcrafted base poses,Camera pose estimation,	en
dc.relation.page	43
dc.identifier.doi	10.6342/NTU202103180
dc.rights.note	同意授權(限校園內公開)
dc.date.accepted	2021-10-05
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	資訊工程學研究所	zh_TW
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
U0001-1409202121575900.pdf 授權僅限NTU校內IP使用（校園外請利用VPN校外連線服務）	15.7 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。