Please use this identifier to cite or link to this item:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/87907Full metadata record
| ???org.dspace.app.webui.jsptag.ItemTag.dcfield??? | Value | Language |
|---|---|---|
| dc.contributor.advisor | 洪一平 | zh_TW |
| dc.contributor.advisor | Yi-Ping Hung | en |
| dc.contributor.author | 俞正楠 | zh_TW |
| dc.contributor.author | Zheng-Nan Yu | en |
| dc.date.accessioned | 2023-07-31T16:14:40Z | - |
| dc.date.available | 2023-11-09 | - |
| dc.date.copyright | 2023-07-31 | - |
| dc.date.issued | 2023 | - |
| dc.date.submitted | 2023-06-26 | - |
| dc.identifier.citation | [1] M. F. Aslan, A. Durdu, A. Yusefi, and A. Yilmaz. Hvionet: A deep learning based hybrid visual–inertial odometry approach for unmanned aerial system position esti mation. Neural Networks, 155:461–474, 2022.
[2] S. Brahmbhatt, J. Gu, K. Kim, J. Hays, and J. Kautz. Geometry-aware learning of maps for camera localization. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2616–2625, 2018. [3] H.-J. Chang. Visual-inertial-uwb fusion for uav localization. Master’s thesis, Na tional Taiwan University, China, May 2022. [4] C. Chen, X. Lu, A. Markham, and N. Trigoni. Ionet: Learning to cure the curse of drift in inertial odometry. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018. [5] C. Chen, S. Rosa, Y. Miao, C. X. Lu, W. Wu, A. Markham, and N. Trigoni. Se lective sensor fusion for neural visual-inertial odometry. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10542– 10551, 2019. [6] C. Chen, S. Rosa, C. Xiaoxuan Lu, N. Trigoni, and A. Markham. Selectfusion: Ageneric framework to selectively learn multisensory fusion. arXiv e-prints, pages arXiv–1912, 2019. [7] R. Clark, S. Wang, A. Markham, N. Trigoni, and H. Wen. Vidloc: A deep spatio temporal model for 6-dof video-clip relocalization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 6856–6864, 2017. [8] C. Forster, L. Carlone, F. Dellaert, and D. Scaramuzza. On-manifold preintegration for real-time visual–inertial odometry. IEEE Transactions on Robotics, 33(1):1–21, 2016. [9] X. Glorot, A. Bordes, and Y. Bengio. Deep sparse rectifier neural networks. In Proceedings of the fourteenth international conference on artificial intelligence and statistics, pages 315–323. JMLR Workshop and Conference Proceedings, 2011. [10] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016. [11] S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997. [12] C. Hori, T. Hori, T.-Y. Lee, Z. Zhang, B. Harsham, J. R. Hershey, T. K. Marks, and K. Sumi. Attention-based multimodal fusion for video description. In Proceedings of the IEEE international conference on computer vision, pages 4193–4202, 2017. [13] S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning, pages 448–456. pmlr, 2015. [14] A. Kendall, Y. Gal, and R. Cipolla. Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7482–7491, 2018. [15] A. Kendall, M. Grimes, and R. Cipolla. Posenet: A convolutional network for real-time 6-dof camera relocalization. In Proceedings of the IEEE international conference on computer vision, pages 2938–2946, 2015. [16] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014. [17] S. Leutenegger, S. Lynen, M. Bosse, R. Siegwart, and P. Furgale. Keyframe-based visual–inertial odometry using nonlinear optimization. The International Journal of Robotics Research, 34(3):314–334, 2015. [18] M. Li and A. I. Mourikis. High-precision, consistent ekf-based visual-inertial odom etry. The International Journal of Robotics Research, 32(6):690–711, 2013. [19] L. Lin, W. Luo, Z. Yan, and W. Zhou. Rigid-aware self-supervised gan for camera ego-motion estimation. Digital Signal Processing, 126:103471, 2022. [20] R. Mur-Artal, J. M. M. Montiel, and J. D. Tardos. Orb-slam: a versatile and accurate monocular slam system. IEEE transactions on robotics, 31(5):1147–1163, 2015. [21] D. Nistér, O. Naroditsky, and J. Bergen. Visual odometry. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, volume 1, pages I–I, 2004. [22] G. L. Oliveira, N. Radwan, W. Burgard, and T. Brox. Topometric localization withdeep learning. In Robotics Research: The 18th International Symposium ISRR, pages 505–520. Springer, 2020. [23] A. Poulose and D. S. Han. Uwb indoor localization using deep learning lstm net works. Applied Sciences, 10(18):6290, 2020. [24] T. Qin, P. Li, and S. Shen. Vins-mono: A robust and versatile monocular visual inertial state estimator. IEEE Transactions on Robotics, 34(4):1004–1020, 2018. [25] A. Ranjan, V. Jampani, L. Balles, K. Kim, D. Sun, J. Wulff, and M. J. Black. Com petitive collaboration: Joint unsupervised learning of depth, camera motion, optical flow and motion segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12240–12249, 2019. [26] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin. Attention is all you need. Advances in neural information processing systems, 30, 2017. [27] S. Wang, R. Clark, H. Wen, and N. Trigoni. Deepvo: Towards end-to-end vi sual odometry with deep recurrent convolutional neural networks. In 2017 IEEE international conference on robotics and automation (ICRA), pages 2043–2050. IEEE, 2017. [28] S. Wang, R. Clark, H. Wen, and N. Trigoni. End-to-end, sequence-to-sequence prob abilistic visual odometry through deep neural networks. The International Journal of Robotics Research, 37(4-5):513–542, 2018. [29] S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon. Cbam: Convolutional block attention module. In Proceedings of the European conference on computer vision (ECCV), pages 3–19, 2018. [30] H. Xu, L. Wang, Y. Zhang, K. Qiu, and S. Shen. Decentralized visual-inertial uwb fusion for relative state estimation of aerial swarm. In 2020 IEEE international conference on robotics and automation (ICRA), pages 8776–8782. IEEE, 2020. [31] K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhutdinov, R. Zemel, and Y. Bengio. Show, attend and tell: Neural image caption generation with visual at tention. arXiv preprint arXiv:1502.03044, 2015. [32] F. Xue, X. Wang, S. Li, Q. Wang, J. Wang, and H. Zha. Beyond tracking: Select ing memory and refining poses for deep visual odometry. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8575– 8583, 2019. [33] B. Yang, J. Li, and H. Zhang. Resilient indoor localization system based on uwb and visual–inertial sensors for complex environments. IEEE Transactions on Instrumentation and Measurement, 70:1–14, 2021. [34] N. Yang, L. v. Stumberg, R. Wang, and D. Cremers. D3vo: Deep depth, deep pose and deep uncertainty for monocular visual odometry. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1281– 1292, 2020. [35] S. Zhang, D. Zheng, X. Hu, and M. Yang. Bidirectional long short-term memory net works for relation classification. In Proceedings of the 29th Pacific Asia conference on language, information and computation, pages 73–78, 2015 | - |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/87907 | - |
| dc.description.abstract | 位姿估計在導航、虛擬/增強現實、機器人、軍事等領域有著廣泛的應用,因此實現高精度的位姿估計十分重要。但由於環境複雜、噪聲幹擾大,位姿的準確估計難度較大。為了提高位姿估計系統的性能,本文搭建了一套全新的端對端位姿估計系統,集成了光學傳感器、慣性傳感器和超寬帶傳感器,並提出了一種新型位姿估計框架充分利用了不同傳感器的優勢。該位姿估計系統由視覺傳感器、慣性傳感器和超寬帶傳感器組成,通過融合多模態的測量數據實現高精度的位姿估計。本文所提出的模型能夠基於原始傳感器的數據準確估計物體位置和方向。整個模型由視覺編碼器、慣性編碼器、超寬帶編碼器和特徵融合網絡組成。實驗結果表明本文的方法在實現較高精度的同時節省了訓練時間。 | zh_TW |
| dc.description.abstract | The accurate pose regression of the target is vitally important, which has a wide range of applications in navigation, virtual/augmented reality, robots, and the military. However, due to the complicated environment and noise, the accurate pose regression of the target is difficult. To improve the performance of the pose regression system, we integrate optical sensors, inertial sensors, and ultra-wideband sensors, and propose a novel end-to-end pose regression framework which makes use of advantages and makes up for disadvantages. The pose regression system, which consists of visual, inertial, and ultra-wideband sensors, achieves high-precision performance through the fusion of multi-modal sensor data. The proposed model takes raw sensor data and generates estimated 6DoF results, positions and orientations. The whole model is made up of the visual encoder, the inertial encoder, the UWB encoder, and the fusion network. Experimental results demonstrate that our method shows excellent precision and saves training time. | en |
| dc.description.provenance | Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2023-07-31T16:14:40Z No. of bitstreams: 0 | en |
| dc.description.provenance | Made available in DSpace on 2023-07-31T16:14:40Z (GMT). No. of bitstreams: 0 | en |
| dc.description.tableofcontents | 摘要 i
Abstract ii Contents iii List of Figures v List of Tables vi Chapter 1 Introduction 1 Chapter 2 Related Work 4 2.1 Localization Methods . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1.1 Classical Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1.2 Learning-based Methods . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 Sensor Fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2.1 Classical Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2.2 Learning-based Methods . . . . . . . . . . . . . . . . . . . . . . . 9 Chapter 3 Method 11 3.1 Visual Encoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.2 Inertial Encoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.3 UWB Encoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.4 The Fusion Network . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.5 Loss Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.6 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 Chapter 4 Experiments 20 4.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 4.2 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 4.3 Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 4.4 Ablation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 Chapter 5 Conclusions 28 References 29 | - |
| dc.language.iso | en | - |
| dc.subject | 超寬帶 | zh_TW |
| dc.subject | 相機定位 | zh_TW |
| dc.subject | 數據融合 | zh_TW |
| dc.subject | 特徵選擇 | zh_TW |
| dc.subject | 絕對位姿估計 | zh_TW |
| dc.subject | Data Fusion | en |
| dc.subject | Camera Localization | en |
| dc.subject | Ultra-wideband | en |
| dc.subject | Absolute Pose Regression | en |
| dc.subject | Feature Selection | en |
| dc.title | 基於視覺-慣性-超寬頻融合技術的室內絕對位姿估計 | zh_TW |
| dc.title | Absolute Pose Regression Using Visual-IMU-UWB Fusion for Indoor Scenes | en |
| dc.type | Thesis | - |
| dc.date.schoolyear | 111-2 | - |
| dc.description.degree | 碩士 | - |
| dc.contributor.oralexamcommittee | 王碩仁;陳冠文 | zh_TW |
| dc.contributor.oralexamcommittee | Shoue-Jen Wang;Kuan-Wen Chen | en |
| dc.subject.keyword | 相機定位,數據融合,特徵選擇,絕對位姿估計,超寬帶, | zh_TW |
| dc.subject.keyword | Camera Localization,Data Fusion,Feature Selection,Absolute Pose Regression,Ultra-wideband, | en |
| dc.relation.page | 33 | - |
| dc.identifier.doi | 10.6342/NTU202300994 | - |
| dc.rights.note | 未授權 | - |
| dc.date.accepted | 2023-06-28 | - |
| dc.contributor.author-college | 電機資訊學院 | - |
| dc.contributor.author-dept | 資訊網路與多媒體研究所 | - |
| Appears in Collections: | 資訊網路與多媒體研究所 | |
Files in This Item:
| File | Size | Format | |
|---|---|---|---|
| ntu-111-2.pdf Restricted Access | 1.97 MB | Adobe PDF |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.
