具影像遮蔽效果之單目SLAM多使用者定位系統應用於擴增實境

連威翔; Wei-Hsiang Lien

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/90751

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	傅立成	zh_TW
dc.contributor.advisor	Li-Chen Fu	en
dc.contributor.author	連威翔	zh_TW
dc.contributor.author	Wei-Hsiang Lien	en
dc.date.accessioned	2023-10-03T17:27:35Z	-
dc.date.available	2023-11-09	-
dc.date.copyright	2023-10-03	-
dc.date.issued	2023	-
dc.date.submitted	2023-08-08	-
dc.identifier.citation	[1] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998. [2] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016. [3] P. An. A modification of graham’s algorithm for determining the convex hull of a finite planar set. Annales Mathematicae et Informaticae, 34, 01 2007. [4] Ruofei Du, Eric Turner, Maksym Dzitsiuk, Luca Prasso, Ivo Duarte, Jason Dourgarian, Joao Afonso, Jose Pascoal, Josh Gladstone, Nuno Cruces, Shahram Izadi, Adarsh Kowdle, Konstantine Tsotsos, and David Kim. Depthlab: Real-time 3d interaction with depth maps for mobile augmented reality. In Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology, UIST ’20, page 829–843, New York, NY, USA, 2020. Association for Computing Machinery. [5] René Ranftl, Alexey Bochkovskiy, and Vladlen Koltun. Vision transformers for dense prediction. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 12159–12168, 2021. [6] René Ranftl, Katrin Lasinger, David Hafner, Konrad Schindler, and Vladlen Koltun. Towards robust monocular depth estimation: Mixing datasets for zero-shot crossdataset transfer. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(3):1623–1637, 2022. [7] J.technologies. ”jorjin j7ef plus ar glasses.”. https://www.jorjin.com/. Accessed: 2023-06-30. [8] Lars St»hle and Svante Wold. Analysis of variance (anova). Chemometrics and Intelligent Laboratory Systems, 6(4):259–272, 1989. [9] Raúl Mur-Artal and Juan D. Tardós. Orb-slam2: An open-source slam system for monocular, stereo, and rgb-d cameras. IEEE Transactions on Robotics, 33(5):1255– 1262, 2017. [10] Raúl Mur-Artal, J. M. M. Montiel, and Juan D. Tardós. Orb-slam: A versatile and accurate monocular slam system. IEEE Transactions on Robotics, 31(5):1147–1163, 2015. [11] Peter Corcoran and Hossein Javidnia. Accurate depth map estimation from small motions. In 2017 IEEE International Conference on Computer Vision Workshops (ICCVW), pages 2453–2461, 2017. [12] D. Scharstein, R. Szeliski, and R. Zabih. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. In Proceedings IEEE Workshop on Stereo and Multi-Baseline Vision (SMBV 2001), pages 131–140, 2001. [13] Julien Valentin, Adarsh Kowdle, Jonathan T. Barron, Neal Wadhwa, Max Dzitsiuk, Michael Schoenberg, Vivek Verma, Ambrus Csaszar, Eric Turner, Ivan Dryanovski, Joao Afonso, Jose Pascoal, Konstantine Tsotsos, Mira Leung, Mirko Schmidt, Onur Guleryuz, Sameh Khamis, Vladimir Tankovitch, Sean Fanello, Shahram Izadi, and Christoph Rhemann. Depth from motion for smartphone ar. ACM Trans. Graph., 37(6), dec 2018. [14] Aleksander Holynski and Johannes Kopf. Fast depth densification for occlusion-aware augmented reality. ACM Trans. Graph., 37(6), dec 2018. [15] David Eigen, Christian Puhrsch, and Rob Fergus. Depth map prediction from a single image using a multi-scale deep network. In Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2, NIPS’14, page 2366–2374, Cambridge, MA, USA, 2014. MIT Press. [16] Huan Fu, Mingming Gong, Chaohui Wang, Kayhan Batmanghelich, and Dacheng Tao. Deep ordinal regression network for monocular depth estimation. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2002–2011, 2018. [17] Iro Laina, Christian Rupprecht, Vasileios Belagiannis, Federico Tombari, and Nassir Navab. Deeper depth prediction with fully convolutional residual networks. In 2016 Fourth International Conference on 3D Vision (3DV), pages 239–248, 2016. [18] Clément Godard, Oisin Mac Aodha, and Gabriel J. Brostow. Unsupervised monocular depth estimation with left-right consistency. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 6602–6611, 2017. [19] Clement Godard, Oisin Mac Aodha, Michael Firman, and Gabriel Brostow. Digging into self-supervised monocular depth estimation. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pages 3827–3837, 2019. [20] Xuan Luo, Jia-Bin Huang, Richard Szeliski, Kevin Matzen, and Johannes Kopf. Consistent video depth estimation. ACM Trans. Graph., 39(4), aug 2020. [21] Carlos Campos, Richard Elvira, Juan J. Gómez Rodríguez, José M. M. Montiel, and Juan D. Tardós. Orb-slam3: An accurate open-source library for visual, visual–inertial, and multimap slam. IEEE Transactions on Robotics, 37(6):1874–1890, 2021. [22] Tong Qin, Peiliang Li, and Shaojie Shen. Vins-mono: A robust and versatile monocular visual-inertial state estimator. IEEE Transactions on Robotics, 34(4):1004–1020, 2018. [23] Hayyan Afeef Daoud, Aznul Qalid Md Sabri, Chu Kiong Loo, and Ali Mohammed Mansoor. Slamm: Visual monocular slam with continuous mapping using multiple maps. PLoS ONE, 13, 2018. [24] Marco Karrer, Patrik Schmuck, and Margarita Chli. Cvi-slam—collaborative visual-inertial slam. IEEE Robotics and Automation Letters, 3(4):2762–2769, 2018. [25] Junyi Wang and Yue Qi. A multi-user collaborative ar system for industrial applications. Sensors, 22(4), 2022. [26] Xukan Ran, Carter Slocum, Yi-Zhen Tsai, Kittipat Apicharttrisorn, Maria Gorlatova, and Jiasi Chen. Multi-user augmented reality with communication efficient and spatially consistent virtual objects. In Proceedings of the 16th International Conference on Emerging Networking EXperiments and Technologies, CoNEXT ’20, page 386–398, New York, NY, USA, 2020. Association for Computing Machinery. [27] John Miller, Elahe Soltanaghai, Raewyn Duvall, Jeff Chen, Vikram Bhat, Nuno Pereira, and Anthony Rowe. Multi-user augmented reality with infrastructure-free collaborative localization, 10 2021. [28] Aditya Dhakal, Xukan Ran, Yunshu Wang, Jiasi Chen, and K. K. Ramakrishnan. Slam-share: Visual simultaneous localization and mapping for real-time multi-user augmented reality. In Proceedings of the 18th International Conference on Emerging Networking EXperiments and Technologies, CoNEXT ’22, page 293–306, New York, NY, USA, 2022. Association for Computing Machinery. [29] Martin A. Fischler and Robert C. Bolles. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM, 24(6):381–395, jun 1981. [30] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. Imagenet classification with deep convolutional neural networks. Commun. ACM, 60(6):84–90, may 2017. [31] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. [32] Andrew J. Davison, Ian D. Reid, Nicholas D. Molton, and Olivier Stasse. Monoslam: Real-time single camera slam. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(6):1052–1067, 2007. [33] Georg Klein and David Murray. Parallel tracking and mapping for small ar workspaces. In 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality, pages 225–234, 2007. [34] Richard A. Newcombe, Shahram Izadi, Otmar Hilliges, David Molyneaux, David Kim, Andrew J. Davison, Pushmeet Kohi, Jamie Shotton, Steve Hodges, and Andrew Fitzgibbon. Kinectfusion: Real-time dense surface mapping and tracking. In 2011 10th IEEE International Symposium on Mixed and Augmented Reality, pages 127–136, 2011. [35] Jakob Engel, Jörg Stückler, and Daniel Cremers. Large-scale direct slam with stereo cameras. In 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1935–1942, 2015. [36] Christian Forster, Zichao Zhang, Michael Gassner, Manuel Werlberger, and Davide Scaramuzza. Svo: Semidirect visual odometry for monocular and multicamera systems. IEEE Transactions on Robotics, 33(2):249–265, 2017. [37] Arcore. https://developers.google.com/ar. Accessed: 2023-06-30. [38] Arkit. https://developer.apple.com/augmented-reality/. Accessed: 2023-06-30. [39] Easyar. https://www.easyar.com/. Accessed: 2023-06-30. [40] Vuforia. https://developer.vuforia.com/. Accessed: 2023-06-30. [41] Jooeun Song and Joongjin Kook. Visual slam based spatial recognition and visualization method for mobile ar systems. Applied System Innovation, 5(1):11, Jan 2022. [42] unity. https://unity.com/. Accessed: 2023-06-30. [43] David Eigen and Rob Fergus. Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), ICCV ’15, page 2650–2658, USA, 2015. IEEE Computer Society. [44] Alexander Kolesnikov, Alexey Dosovitskiy, Dirk Weissenborn, Georg Heigold, Jakob Uszkoreit, Lucas Beyer, Matthias Minderer, Mostafa Dehghani, Neil Houlsby, Sylvain Gelly, Thomas Unterthiner, and Xiaohua Zhai. An image is worth 16x16 words: Transformers for image recognition at scale. 2021. [45] Lina Yang, Yuchen Li, Xichun Li, Zuqiang Meng, and Huiwu Luo. Efficient plane extraction using normal estimation and ransac from 3d point cloud. Comput. Stand. Interfaces, 82(C), aug 2022. [46] Valentin E. Brimkov, Sean Kafer, Matthew Szczepankiewicz, and Joshua Terhaar. On intersection graphs of convex polygons. In Reneta P. Barneva, Valentin E. Brimkov, and Josef Šlapal, editors, Combinatorial Image Analysis, pages 25–36, Cham, 2014. Springer International Publishing. [47] Jrgen Sturm, Nikolas Engelhard, Felix Endres, Wolfram Burgard, and Daniel Cremers. A benchmark for the evaluation of rgb-d slam systems. pages 573–580, 10 2012. [48] David Prokhorov, Dmitry Zhukov, Olga Barinova, Konushin Anton, and Anna Vorontsova. Measuring robustness of visual slam. In 2019 16th International Conference on Machine Vision Applications (MVA), pages 1–6, 2019. [49] Febrina Wijaya. Comprehensive 3D Avatar Reconstruction System with Real-Time Expression for Teleconference Application in Augmented Reality. Master’s thesis, National Taiwan University, Jan 2022.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/90751	-
dc.description.abstract	近年來隨著擴增實境技術的快速發展，原先以單人體驗為主的擴增實境應用開始往多人共同體驗的場景發展，並在遊戲、教育、展覽等領域得到廣泛的應用。在多人共同體驗的相關應用開發中，如何對各使用者進行空間定位，並確保多人定位和方向的同步性及一致性是一個重大的課題。目前市面上的AR開發套件皆存在一些限制，例如Vuforia使用基於標記的定位技術，該技術可以使用標記來計算及同步每位使用者的位置及方向。然而，當使用者視線範圍中沒有標記物時，定位功能將無法繼續進行。而ARCore使用無標記定位方法，利用環境中的特徵點來計算使用者的相機姿態，但尚未有一個成熟的方法來對齊身處異地的使用者座標系，並且ARCore僅能在官方認證的AR設備上運行，對一般的擴增實境應用開發者來說造成了阻礙。在這篇論文中，我們以Unity 3D遊戲引琴作為應用程式的開發平台，提出了一個基於ORB-SLAM2的多人定位系統，使用單目RGB影像達到使用者定位以及偵測環境中可放置虛擬物件的平面，以此虛擬物件作為多使用者定位座標同步的參考點，透過中央伺服器將定位資訊在各使用者的擴增實境設備間進行傳輸，進而以虛擬分身呈現其他使用者在空間中相對於此虛擬物件的位置及移動。此外，我們使用深度學習的技術，以單張RGB影像預估出影像的深度圖，以此解決AR應用中的遮擋問題，使得虛擬物體可以更自然的顯示在AR場景中。在實驗方面，我們對系統中的各個模組進行定量和定性的分析，以展現這些模組的效果。此外，我們設計了一份關於本系統受試問卷，邀請數名受試者使用此AR多人定位系統，並填寫問卷以評斷系統的穩定度及體驗感受，根據受試者的問卷反饋，大多數使用者肯定此系統的穩定性及擁有不錯的體驗，並願意使用基於本系統開發的應用程式。	zh_TW
dc.description.abstract	In recent years, with the rapid development of augmented reality (AR) technology, AR applications that were originally focused on single-user experiences have started to shift towards multi-user collaborative experiences. They have been widely applied to various fields such as gaming, education, and exhibitions. For developing multi-user experience applications, ensuring the spatial localization of every user and maintaining synchronization and consistency of positioning and orientation across multiple users is a significant challenge. Currently, available AR development kits have certain limitations. For example, Vuforia uses marker-based tracking technology, which calculates and synchronizes the position and orientation of each user based on markers. However, when there are no markers within the user's field of view, the tracking functionality cannot continue. Despite that, ARCore uses a markerless tracking method that utilizes feature points in the surrounding environment to determine the user's camera pose, yet there isn't a mature method for aligning the coordinate systems of users in different locations. Moreover, ARCore can only run on officially certified AR devices, which is a barrier for general AR application developers. In this thesis, we propose a multi-user localization system based on ORB-SLAM2 using monocular RGB images as a development platform for application with the Unity 3D game engine. This system not only performs user localization but also places a common virtual object on a planal surface (such as table) in the environment so that every user holds a proper perspective view of the object. Note that these generated virtual objects serve as reference points for multi-user position synchronization. The positioning information is passed among every user's AR devices via a central server, based on which the relative position and movement of other users in the space of a specific user are presented via virtual avatars all with respect to these virtual objects. In addition, we use deep learning techniques to estimate the depth map of an image from a single RGB image to solve occlusion problems in AR applications, making virtual objects appear more natural in AR scenes. In the experiment, we have conducted quantitative and qualitative analyses on each module of the system to demonstrate their effectiveness. Additionally, we have designed a questionnaire for the participants to evaluate the stability and user experience of the AR multi-user positioning system. Several participants were invited to use the system and provide feedback by filling out the questionnaire. It turns out that the experiences of most participants appear to be satisfactory, which suggest that our proposed system is highly promising.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2023-10-03T17:27:35Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2023-10-03T17:27:35Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	致謝 i 中文摘要 ii ABSTRACT iv CONTENTS vi LIST OF FIGURES ix LIST OF TABLES xi Chapter 1 Introduction 1 1.1 Background 1 1.2 Motivation 3 1.3 Related Work 6 1.3.1 Simultaneous Localization and Mapping 6 1.3.2 Monocular depth estimation 7 1.3.3 Muiti-user AR Collaboration 9 1.4 Objectives and Contributions 10 1.5 Thesis Organization 11 Chapter 2 Preliminaries 13 2.1 Convolutional Neural Network 13 2.1.1 Basic Components 14 2.1.2 ResNet 19 2.2 Simultaneous Localization and Mapping 21 2.3 Monocular Depth Estimation 22 Chapter 3 Methodology 25 3.1 System Overview 25 3.2 Localization Module 27 3.2.1 SLAM-Based Mobile Application with Unity 27 3.2.2 Scale Calibration for ORB-SLAM2 Initialization 29 3.3 Plane Estimation 31 3.3.1 Outlier Rejection 32 3.3.2 Least Square Plane Estimation 34 3.3.3 Plane Boundary Extraction 35 3.4 Coordination Server 37 3.4.1 Relative Pose Calculation 38 3.4.2 Multi-user Pose Coordination System 40 3.4.3 Virtual Plane Computation 42 3.5 Occlusion Rendering Module 45 3.5.1 Virtual Content Placement 45 3.5.2 Monocular Depth Estimation for Occlusion 46 Chapter 4 Experiments 51 4.1 Experimental Setup 51 4.2 Experimental Results 53 4.2.1 Scale Calibration 54 4.2.2 ORB-SLAM2 Localization Module 57 4.2.3 Virtual Plane Computation 60 4.2.4 Occlusion Rendering 61 4.2.5 Multi-user Collaboration 63 4.2.6 Runtime Performance Evaluation 65 4.3 User Study 66 Chapter 5 Conclusion 70 REFERENCES 72	-
dc.language.iso	en	-
dc.subject	遮擋	zh_TW
dc.subject	多人定位	zh_TW
dc.subject	同時定位與地圖構建	zh_TW
dc.subject	平面預估	zh_TW
dc.subject	擴增實境	zh_TW
dc.subject	Plane Estimation	en
dc.subject	Simultaneous Localization and Mapping	en
dc.subject	Occlusion	en
dc.subject	Augmented Reality	en
dc.subject	Multi-user Positioning	en
dc.title	具影像遮蔽效果之單目SLAM多使用者定位系統應用於擴增實境	zh_TW
dc.title	A Monocular SLAM-based Multi-User Positioning System with Image Occlusion in Augmented Reality	en
dc.type	Thesis	-
dc.date.schoolyear	111-2	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	歐陽明;洪一平;莊永裕;鄭龍磻;曾士桓	zh_TW
dc.contributor.oralexamcommittee	Ming Ouhyoung;Yi-Ping Hung;Yung-Yu Chuang;Lung-Pan Cheng;Shih-Huan Tseng	en
dc.subject.keyword	擴增實境,平面預估,同時定位與地圖構建,多人定位,遮擋,	zh_TW
dc.subject.keyword	Augmented Reality,Plane Estimation,Simultaneous Localization and Mapping,Multi-user Positioning,Occlusion,	en
dc.relation.page	78	-
dc.identifier.doi	10.6342/NTU202303290	-
dc.rights.note	同意授權(限校園內公開)	-
dc.date.accepted	2023-08-10	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	資訊工程學系	-
dc.date.embargo-lift	2026-08-01	-
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-111-2.pdf 未授權公開取用	25.66 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。