基於深度學習選擇機制之視覺慣性里程計

陳昱仁; Yu-Jen Chen

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/92576

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	簡韶逸	zh_TW
dc.contributor.advisor	Shao-Yi Chien	en
dc.contributor.author	陳昱仁	zh_TW
dc.contributor.author	Yu-Jen Chen	en
dc.date.accessioned	2024-04-22T16:14:58Z	-
dc.date.available	2024-04-23	-
dc.date.copyright	2024-04-22	-
dc.date.issued	2024	-
dc.date.submitted	2024-03-22	-
dc.identifier.citation	[1] C. Forster, M. Pizzoli, and D. Scaramuzza, “SVO: Fast semi-direct monocular visual odometry,” IEEE International Conference on Robotics and Automation (ICRA), pp. 15–22, 2014. [2] C. Forster, L. Carlone, F. Dellaert, and D. Scaramuzza, “IMU preintegration on manifold for efficient visualinertial maximum-a-posteriori estimation.” In Robotics: Science and Systems XI., 2015. [3] E. Jang, S. Gu, and B. Poole, “Categorical reparameterization with gumbel-softmax,” arXiv preprint, 2016. [4] A. Geiger, P. Lenz, C. Stiller, and R. Urtasun, “Vision meets robotics: The KITTI dataset,” The International Journal of Robotics Research, vol. 32, no. 11, p. 1231–1237, 2013. [5] A. Geiger and P. Lenz, “Are we ready for autonomous driving? the KITTI vision benchmark suite,” In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), p. 3354–3361, 2012. [6] R. Mur-Artal and J. D. Tard ́os, “ORB-SLAM2: an open-source SLAM system for monocular, stereo and RGB-D cameras.” IEEE Transactions on Robotics (TRO), p. 1255–1262, 2017. [7] J. Engel, V. Koltun, and D. Cremers, “Direct sparse odometry,” IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), p. 617–625, 2017. [8] M. Bloesch, S. Omari, M. Hutter, and R. Siegwart, “Robust visual inertial odometry using a direct ekfbased approach.” In IEEE/RSJ international conference on intel- ligent robots and systems (IROS), pp. 298–304, 2015. [9] W. Huang and H. Liu, “Online initialization and automatic camera-imu extrinsic calibration for monocular visual-inertial slam,” In International Conference on Robotics and Automation (ICRA), pp. 5182–5189, 2018. [10] T. Qin, P. Li, and S. Shen, “Vins-mono: A robust and versatile monocular visual-inertial state estimator.” IEEE Transactions on Robotics (TRO), pp. 1004–1020, 2018. [11] S. Leutenegger, P. Furgale, V. Rabaud, M. Chli, K. Konolige, and R. a. Siegwart, “Keyframe-based visual-inertial slam using nonlinear optimization,”In Proceedings of Robotis Science and Systems (RSS), 2013. [12] S. Wang, R. Clark, H. Wen, and N. Trigoni, “Deepvo: Towards end-to-end visual odometry with deep recurrent convolutional neural networks,” IEEE International Conference on Robotics and Automation (ICRA), 2017. [13] R. Clark, S. Wang, H. Wen, A. Markham, and N. Trigon, “Vinet: Visual-inertial odometry as a sequence-to-sequence learning problem.” In: Proceedings of the AAAI Conference on Artificial Intelligence., 2017. [14] C. Chen, S. Rosa, Y. Miao, C. X. Lu, W. Wu, A. Markham, and N. Trigoni, “Selective sensor fusion for neural visual-inertial odometry,” In: Proceedingsof the IEEE/CVF Conference on Computer Vision and Pattern Recognition,p. 10542–10551, 2019. [15] L. Liu, G. Li, and T. H. Li, “ATVIO: Attention guided visual-inertial odometry.” ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), p. 4125–4129, 2021. [16] E. J. Shamwell, S. Leung, and W. D. Nothwang, “Vision-aided absolute trajectory estimation using an unsupervised deep network with online error correction,” In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), p. 2524–2531, 2018. [17] L. Han, Y. Lin, G. Du, and S. Lian, “DeepVIO: Self-supervised deep learning of monocular visual inertial odometry using 3d geometric constraints.” In:2019 IEEE/RSJ International Conference on Intelligent Robots and Systems(IROS), p. 6906–6913, 2019. [18] Y. Almalioglu, M. Turan, A. E. Sarı, M. R. U. Saputra, P. P. B. d. Gusmo, A. Markham, and N. Trigoni, “Selfvio: Self-supervised deep monocular visual-inertial odometry and depth estimation,” arXiv preprint arXiv:1911.09968, 2019. [19] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” ICLR, 2015. [20] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” Proceedings of the IEEE conference on computer vision and pattern recognition, p. 770–778, 2016. [21] P. Fischer, A. Dosovitskiy, E. Ilg, P. H ̈ausser, C. Hazırbas ̧ , V. Golkov, P. v. d. Smagt, D. Cremers, and T. Brox, “Flownet: Learning optical flow with convolutional networks,” IEEE International Conference on Computer Vision (ICCV), pp. 2758–2766, 2015. [22] E. Ilg, N. Mayer, T. Saikia, M. Keuper, A. Dosovitskiy, and T. Brox, “Flownet 2.0: Evolution of optical flow estimation with deep networks,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1647–1655, 2017. [23] A. Ranjan and M. J. Black, “Optical flow estimation using a spatial pyramid network,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2720–2729, 2017 [24] D. Sun, X. Yang, M.-Y. Liu, and J. Kautz, “PWC-Net: Cnns for optical flow using pyramid, warping, and cost volume,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8934–8943, 2018. [25] Z. Teed and J. Deng, “RAFT: Recurrent all-pairs field transforms for opticalflow,” Proceedings of European Conference on Computer Vision (ECCV), 2020. [26] C. J. Maddison, A. Mnih, and Y. W. Teh, “The concrete distribution:A continuous relaxation of discrete random variables,” arXiv preprint arXiv:1611.00712, 2016. [27] S. Xie, H. Zheng, C. Liu, and L. Lin, “Snas: stochastic neural architecture search,” arXiv preprint arXiv:1812.09926, 2018. [28] S. Teerapittayanon, B. McDanel, and H. Kung, “Branchynet: Fast inference via early exiting from deep neural networks,” In ICPR, 2016. [29] G. Huang, D. Chen, T. Li, F. Wu, L. v. d. Maaten, and K. Q. Weinberger, “Multiscale dense convolutional networks for efficient prediction,” In ICLR, 2018. [30] A. Graves, “Adaptive computation time for recurrent neural networks,” arXiv preprint arXiv:1603.08983, 2016. [31] Z. Wu, C. Xiong, C.-Y. Ma, R. Socher, and L. S. Davis, “Adaframe: Adaptive frame selection for fast video recognition,” In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition., p. 1278–1287, 2019. [32] O. Triebe, N. Laptev, and R. Rajagopal, “AR-Net: A simple auto-regressive neural network for time-series,” In: European Conference on Computer Vision, pp. 86–104, 2020. [33] A. Mnih and K. Gregor, “Neural variational inference and learning in belief networks,” arXiv preprint arXiv:1402.0030, 2014. [34] A. Geiger, J. Ziegler, and C. Stiller, “Stereoscan: Dense 3d reconstruction in real-time,” In Intelligent Vehicles Symposium (IV), 2011. [35] T. Zhou, M. Brown, N. Snavely, and D. G. Lowe, “Unsupervised learning of depth and ego-motion from video,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. [36] J.-W. Bian, Z. Li, N. Wang, H. Zhan, C. Shen, M.-M. Cheng, and I. Reid, “Unsupervised scale-consistent depth and ego-motion learning from monocular video,” In Advances in Neural Information Processing Systems (NeurIPS), pp. 35–45, 2019. [37] C. Godard, O. M. Aodha, M. Firman, and G. Brostow, “Digging into self-supervised monocular depth estimation,” In: Proceedings of the IEEE/CVF International Conference on Computer Vision., p. 3828–3838, 2019. [38] Y. Zou, P. Ji, T. Quoc-Huy, J.-B. Huang, and M. Chandraker, “Learning monocular visual odometry via self-supervised long-term modeling,” In:European Conference on Computer Vision., p. 710–727, 2020. [39] E. J. Shamwell, S. Leung, and W. D. Nothwang, “Vision-aided absolute trajectory estimation using an unsupervised deep network with online error correction,” In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), p. 2524–2531, 2018. [40] F. Xue, Q. Wang, X. Wang, W. Dong, J. Wang, and H. Zha, “Guided feature selection for deep visual odometry,” In: Asian Conference on Computer Vision., p. 293–308, 2018. [41] C. Chen, S. Rosa, C. X. Lu, B. Wang, N. Trigoni, and A. Markham, “Select-fusion: A generic framework to selectively learn multisensory fusion.” arXiv preprint arXiv:1912.13077, 2019.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/92576	-
dc.description.abstract	視覺慣性里程計（VIO）旨在通過估計物體自運動來預測其軌跡。在過去的幾年裡，基於深度學習的VIO方法與傳統的幾何方法相比之下，在效能上表現出優越性。然而，當前的方法對每次預測都輸入視覺和慣性數據。在基於深度學習的方法中，同時使用兩種感測器資料導致了每次預測的計算成本非常高。由於視覺里程計/視覺慣性里程計系統經常在邊緣設備上運行，其計算能力非常有限，且即時預測至關重要，因此需要解決計算成本的問題。這篇論文介紹了一種基於深度學習的VIO方法，旨在通過在不同情境下有選擇性地停用視覺和慣性資料來減少計算成本。更確切地說，我們引入了一個名為SelectNet的神經網路，其設計是為了學習在當前狀態下何時啟動或停用視覺和慣性特徵提取器。此外，我們採用Gumbel-Softmax resampling，這是一種輕量級的技術，用於訓練SelectNet，使得訓練過程可微分。實驗表明，我們的方法在與其他VIO方法進行比較時取得了競爭性的性能，同時還將計算成本減少了40.13 %。此外，我們還研究了在各種情境下可解釋性的可視化概率軌跡，顯示了視覺和慣性輸入數據之間的有趣聯繫。	zh_TW
dc.description.abstract	Visual inertial odometry (VIO) aims to predict a trajectory by estimating egomotion. Over the past few years, deep learning VIO methods have exhibited superior performance comparing to traditional geometric methods. Nonetheless, current approaches input both visual and inertial data for every estimation. In deep learning based methods, the consistent use both data sources results in a remarkably high computational cost for each estimation. Since VO/VIO systems frequently run on edge devices where real-time implementation is essential, the problem of computational cost needs to be addressed due to the limited computing power available. This thesis introduces a VIO method based on deep learning, aiming to reduce computation costs by selectively deactivating visual and inertial feature in different situations. To be more precise, we introduced a network called SelectNet, which is designed to learn when to activate or deactivate the visual and inertial feature extractors based on the current state. Furthermore, we employ Gumbel-Softmax resampling, a lightweight technique, to train the SelectNet, making the training process differentiable. Extensive experiments demonstrate that our approach achieves competitive performance when compared to other VIO methods while also reducing computation costs by 40.13%. In addition, we investigate the interpretability of visualizing the probability track in various scenarios, unveiling interesting connections between visual and inertial sensory input data.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2024-04-22T16:14:58Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2024-04-22T16:14:58Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	Abstract i List of Figures iv List of Tables v 1 Introduction 1 1.1 Monocular Visual Inertial Odometry 2 1.2 Challenges 3 1.3 Contribution 4 1.4 Thesis Organization 5 2 Related Work 6 2.1 Traditional Method 6 2.2 Neural Network Method 7 2.2.1 Neural Networks for Pose Estimation 7 2.2.2 Neural Networks for Optical Flow 9 2.3 Gumbel Softmax Resampling 13 2.4 Adaptive Computation 14 3 Proposed Method 17 3.1 Problem definition 19 3.2 Neural VIO Models with Adaptive Computation 19 3.2.1 Visual Encoder 19 3.2.2 Inertial Encoder 20 3.2.3 Temporal Decoder and Pose Regression 21 3.2.4 Selection Mechanism 21 3.2.5 Training with Gumbel-Softmax Resampling 23 3.3 Loss Function 24 4 Experimental Results 26 4.1 Description of Datasets 26 4.2 Implementation Details 27 4.3 Evaluation 28 4.3.1 Evaluation Metrics 28 4.3.2 Evaluation Results 30 4.4 Ablation Study 32 4.4.1 Sweep Lambda 32 4.4.2 Sweep Beta 33 4.4.3 Module Evaluation 34 4.4.4 SelectNet Evaluation 34 4.5 Visualization Results 37 4.5.1 Probability Trajectory 37 4.5.2 Correlation 39 5 Conclusion 41 Reference 42	-
dc.language.iso	en	-
dc.subject	姿態追蹤	zh_TW
dc.subject	姿態估測	zh_TW
dc.subject	相機姿態	zh_TW
dc.subject	深度學習	zh_TW
dc.subject	慣性里程計	zh_TW
dc.subject	Camera Pose	en
dc.subject	IMU	en
dc.subject	Deep Learning	en
dc.subject	Pose Tracking	en
dc.subject	Pose Estimation	en
dc.title	基於深度學習選擇機制之視覺慣性里程計	zh_TW
dc.title	Selection Mechanism for Visual Inertial Odometry with Deep Learning Prediction	en
dc.type	Thesis	-
dc.date.schoolyear	112-2	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	陳冠文;鄭文皇;施吉昇	zh_TW
dc.contributor.oralexamcommittee	Kuan-Wen Chen;Wen-Huang Cheng;Chi-Sheng Shih	en
dc.subject.keyword	姿態追蹤,姿態估測,相機姿態,深度學習,慣性里程計,	zh_TW
dc.subject.keyword	Pose Tracking,Pose Estimation,Camera Pose,Deep Learning,IMU,	en
dc.relation.page	47	-
dc.identifier.doi	10.6342/NTU202304450	-
dc.rights.note	未授權	-
dc.date.accepted	2024-03-25	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	電子工程學研究所	-
顯示於系所單位：	電子工程學研究所

文件中的檔案：

檔案	大小	格式
ntu-112-2.pdf 未授權公開取用	4.96 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。