攝影機定位方法與應用

劉郁昌; Yo-Chung Lau

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/101567

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	洪一平	zh_TW
dc.contributor.advisor	Yi-Ping Hung	en
dc.contributor.author	劉郁昌	zh_TW
dc.contributor.author	Yo-Chung Lau	en
dc.date.accessioned	2026-02-11T16:25:04Z	-
dc.date.available	2026-02-12	-
dc.date.copyright	2026-02-11	-
dc.date.issued	2026	-
dc.date.submitted	2026-01-24	-
dc.identifier.citation	[1] R. Arandjelović, P. Gronat, A. Torii, T. Pajdla, and J. Sivic. NetVLAD: CNN Architecture for Weakly Supervised Place Recognition. In IEEE Conference on Computer Vision and Pattern Recognition, pages 5297–5307, 2016. [2] R. Arandjelović and A. Zisserman. Three things everyone should know to improve object retrieval. In IEEE Conference on Computer Vision and Pattern Recognition, pages 2911–2918, 2012. [3] R. T. Azuma. A Survey of Augmented Reality. Presence: Teleoperators and Virtual Environments, 6(4):355–385, 1997. [4] H. Bay, A. Ess, T. Tuytelaars, and L. Van Gool. SURF: Speeded-Up Robust Features. Computer Vision and Image Understanding, 110(3):346–359, 2008. [5] R. Biggin. Immersive Theatre, Immersive Experience. In Immersive Theatre and Audience Experience, pages 1–58. Springer, 2017. [6] A. Blattner, Y. Vasilev, and B. Harriehausen-Mühlbauer. Mobile Indoor Navigation Assistance for Mobility Impaired People. Procedia Manufacturing, 3:51–58, 2015. [7] M. Bloesch, S. Omari, M. Hutter, and R. Siegwart. Robust Visual Inertial Odometry Using a Direct EKF-Based Approach. In IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 298–304, 2015. [8] M. V. d. Broeck, F. Kawsar, and J. Schöning. It’s All Around You: Exploring 360° Video Viewing Experiences on Mobile Devices. In ACM International Conference on Multimedia, pages 762–768, 2017. [9] D. Cai. Museum Navigation based on NFC Localization Approach and Automatic Guidance System. International Journal of Computer Applications, 120:1–7, 2015. [10] C. Chen, S. Rosa, Y. Miao, C. X. Lu, W. Wu, A. Markham, and N. Trigoni. Selective Sensor Fusion for Neural Visual-Inertial Odometry. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10542–10551, 2019. [11] K. Chen and Q. Dou. SGPA: Structure-Guided Prior Adaptation for Category-Level 6D Object Pose Estimation. In IEEE/CVF International Conference on Computer Vision, pages 2773–2782, 2021. [12] W. Chen, X. Jia, H. J. Chang, J. Duan, L. Shen, and A. Leonardis. FS-Net: Fast Shape-Based Network for Category-Level 6D Object Pose Estimation with Decoupled Rotation Mechanism. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1581–1590, 2021. [13] C. Choi and H. I. Christensen. Real-time 3D Model-based Tracking Using Edge and Keypoint Features for Robotic Manipulation. In IEEE International Conference on Robotics and Automation, pages 4048–4055, 2010. [14] H. Choi. The Conjugation Method of Augmented Reality in Museum Exhibition. International Journal of Smart Home, 8(1):217–228, 2014. [15] R. Clark, S. Wang, H. Wen, A. Markham, and N. Trigoni. VINet: Visual-Inertial Odometry as a Sequence-to-Sequence Learning Problem. In AAAI Conference on Artificial Intelligence, volume 31, pages 3995–4001, 2017. [16] X. Deng, A. Mousavian, Y. Xiang, F. Xia, T. Bretl, and D. Fox. PoseRBPF: A Rao-Blackwellized Particle Filter for 6-D Object Pose Tracking. IEEE Transactions on Robotics, 37(5):1328–1342, 2021. [17] D. DeTone, T. Malisiewicz, and A. Rabinovich. SuperPoint: Self-Supervised Interest Point Detection and Description. In IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 224–236, 2018. [18] Y. Di, R. Zhang, Z. Lou, F. Manhardt, X. Ji, N. Navab, and F. Tombari. GPV-Pose: Category-Level Object Pose Estimation via Geometry-Guided Point-Wise Voting. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6781–6791, 2022. [19] S. Dixon. Digital Performance: A History of New Media in Theater, Dance, Performance Art, and Installation. MIT Press, 2015. [20] J. P. S. do Monte Lima, F. P. M. Simoes, L. S. Figueiredo, and J. Kelner. Model Based Markerless 3D Tracking applied to Augmented Reality. Journal on Interactive Systems, 1(1), 2010. [21] G. Du, K. Wang, S. Lian, and K. Zhao. Vision-based Robotic Grasping from Object Localization, Object Pose Estimation to Grasp Estimation for Parallel Grippers: A Review. Artificial Intelligence Review, 54(3):1677–1734, 2021. [22] H. Duan, X. Zhu, Y. Zhu, X. Min, and G. Zhai. A Quick Review of Human Perception in Immersive Media. IEEE Open Journal on Immersive Displays, 1:41–50, 2024. [23] P.-J. Duh, Y.-C. Sung, L.-Y. F. Chiang, Y.-J. Chang, and K.-W. Chen. V-Eye: A Vision-Based Navigation System for the Visually Impaired. IEEE Transactions on Multimedia, 23:1567–1580, 2021. [24] K. Eckenhoff, P. Geneva, N. Merrill, and G. Huang. Schmidt-EKF-based Visual-Inertial Moving Object Tracking. In IEEE International Conference on Robotics and Automation, pages 651–657, 2020. [25] M. Fiala. ARTag, a fiducial marker system using digital techniques. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, volume 2, pages 590–596, 2005. [26] M. A. Fischler and R. C. Bolles. Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography. Communications of the ACM, 24(6):381–395, 1981. [27] Y. Fu and X. Wang. Category-Level 6D Object Pose Estimation in the Wild: A Semi-Supervised Learning Approach and A New Dataset. In Advances in Neural Information Processing Systems, volume 35, pages 27469–27483, 2022. [28] M. Garon, D. Laurendeau, and J.-F. Lalonde. A Framework for Evaluating 6-DOF Object Trackers. In European Conference on Computer Vision, pages 582–597, 2018. [29] S. Garrido-Jurado, R. Muñoz-Salinas, F. J. Madrid-Cuevas, and M. J. Marín-Jiménez. Automatic generation and detection of highly reliable fiducial markers under occlusion. Pattern Recognition, 47(6):2280–2292, 2014. [30] R. Ge and G. Loianno. VIPose: Real-time Visual-Inertial 6D Object Pose Tracking. In IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 4597–4603, 2021. [31] P. Ghosh, X. Liu, H. Qiu, M. A. M. Vieira, G. S. Sukhatme, and R. Govindan. On Localizing a Camera from a Single Image. arXiv preprint arXiv:2003.10664, 2020. [32] Z. Gong, R. Wang, and G. Xia. Augmented Reality (AR) as a Tool for Engaging Museum Experience: A Case Study on Chinese Art Pieces. Digital, 2(1):33–45, 2022. [33] R. Hartley and A. Zisserman. Multiple View Geometry in Computer Vision. Cambridge University Press, 2003. [34] M. Hibbert, G. Christa, A. Seeley, and A. Lee. Dance Magic Dance: A Case Study of AR/360 Video and the Performing Arts. In C. Elliot, M. Rose, and J. van Arnhem, editors, Library Go: Augmented Reality in Libraries. Rowman & Littlefield, Lanham, MD, 2018. [35] S. Hinterstoisser, S. Holzer, C. Cagniart, S. Ilic, K. Konolige, N. Navab, and V. Lepetit. Multimodal Templates for Real-Time Detection of Texture-less Objects in Heavily Cluttered Scenes. In IEEE International Conference on Computer Vision, pages 858–865, 2011. [36] S. Hinterstoisser, V. Lepetit, S. Ilic, S. Holzer, G. Bradski, K. Konolige, and N. Navab. Model Based Training, Detection and Pose Estimation of Texture-Less 3D Objects in Heavily Cluttered Scenes. In Asian Conference on Computer Vision, pages 548–562, 2012. [37] S. A. Hoseini and P. Kabiri. Camera Pose Estimation in Unknown Environments Using a Sequence of Wide-Baseline Monocular Images. Journal of AI and Data Mining, 6(1):93–103, 2018. [38] T. Hou, A. Ahmadyan, L. Zhang, J. Wei, and M. Grundmann. MobilePose: Real-Time Pose Estimation for Unseen Objects with Weak Shape Supervision. arXivpreprint arXiv:2003.03522, 2020. [39] I.-J. Hsieh, Y.-C. Lau, P.-Y. Kao, S.-P. Hung, and Y.-P. Hung. Domain-Adaptive Mean Teacher for Category-Level Object Pose Estimation. In ACM International Conference on Multimedia in Asia, pages 1–8, 2023. [40] H.-N. Hu, Q.-Z. Cai, D. Wang, J. Lin, M. Sun, P. Krahenbuhl, T. Darrell, and F. Yu. Joint Monocular 3D Vehicle Detection and Tracking. In IEEE/CVF International Conference on Computer Vision, pages 5390–5399, 2019. [41] Z. Huang, T. Li, W. Chen, Y. Zhao, J. Xing, C. LeGendre, L. Luo, C. Ma, and H. Li. Deep Volumetric Video from Very Sparse Multi-View Performance Capture. In European Conference on Computer Vision, pages 336–354, 2018. [42] S. Iwase, X. Liu, R. Khirodkar, R. Yokota, and K. M. Kitani. RePOSE: Fast 6D Object Pose Refinement via Deep Texture Rendering. In IEEE/CVF International Conference on Computer Vision, pages 3303–3312, 2021. [43] Y.-Y. Jau, R. Zhu, H. Su, and M. Chandraker. Deep Keypoint-Based Camera Pose Estimation with Geometric Constraints. In IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 4950–4957, 2020. [44] P.-Y. Kao, R.-R. Zhang, T. Chen, and Y.-P. Hung. Absolute Camera Pose Regression Using an RGB-D Dual-Stream Network and Handcrafted Base Poses. Sensors, 22(18):6971, 2022. [45] H. Kato and M. Billinghurst. Marker Tracking and HMD Calibration for a Video-based Augmented Reality Conferencing System. In IEEE/ACM International Workshop on Augmented Reality, pages 85–94, 1999. [46] W. Kehl, F. Manhardt, F. Tombari, S. Ilic, and N. Navab. SSD-6D: Making RGB-Based 3D Detection and 6D Pose Estimation Great Again. In IEEE International Conference on Computer Vision, pages 1521–1529, 2017. [47] Y.-J. Kim, J. Lu, and T. Höllerer. Dynamic Theater: Location-Based Immersive Dance Theater, Investigating User Guidance and Experience. In ACM Symposium on Virtual Reality Software and Technology, pages 1–11, 2023. [48] K. Kitamura. Case Study of Digital Exhibition of Japanese Classical Writings and Drawings Based on AR Technology. In International Conference on Culture and Computing, pages 125–126, 2017. [49] G. Klein and D. Murray. Parallel Tracking and Mapping for Small AR Workspaces. In IEEE/ACM International Symposium on Mixed and Augmented Reality, pages 225–234, 2007. [50] P. Lang, A. Kusej, A. Pinz, and G. Brasseur. Inertial Tracking for Mobile Augmented Reality. In IEEE Instrumentation and Measurement Technology Conference, volume 2, pages 1583–1587, 2002. [51] Y.-C. Lau, K.-W. Tseng, P.-Y. Kao, I.-J. Hsieh, H.-C. Tseng, and Y.-P. Hung. Real-Time Object Pose Tracking System with Low Computational Cost for Mobile Devices. IEEE Journal of Indoor and Seamless Positioning and Navigation, 1:211–220, 2023. [52] Y.-C. Lau, T.-H. Wu, C.-J. Wang, X.-H. Wu, Y.-W. Chen, H.-Y. Wu, and Y.-P. Hung. Re-presenting Immersive Theater through Mobile XR Display: A Case Study of Circuit Garden. IEEE Open Journal on Immersive Displays, 2:32–41, 2025. [53] S. J. Lee, D. Kim, S. S. Hwang, and D. Lee. Local to Global: Efficient Visual Localization for a Monocular Camera. In IEEE Winter Conference on Applications of Computer Vision, pages 2230–2239, 2021. [54] T. Lee, B.-U. Lee, I. Shin, J. Choe, U. Shin, I. S. Kweon, and K.-J. Yoon. UDA-COPE: Unsupervised Domain Adaptation for Category-Level Object Pose Estimation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14891–14900, 2022. [55] V. Lepetit, F. Moreno-Noguer, and P. Fua. EPnP: An accurate O(n) solution to the PnP problem. International Journal of Computer Vision, 81(2):155–166, 2009. [56] H. Li and R. Hartley. Five-Point Motion Estimation Made Easy. In International Conference on Pattern Recognition, volume 1, pages 630–633, 2006. [57] M. Li. Visual-Inertial Odometry on Resource-Constrained Systems. University of California, Riverside, 2014. [58] J. Lin, Z. Wei, C. Ding, and K. Jia. Category-Level 6D Object Pose and Size Estimation Using Self-supervised Deep Prior Deformation Networks. In European Conference on Computer Vision, pages 19–34, 2022. [59] J. Lin, Z. Wei, Z. Li, S. Xu, K. Jia, and Y. Li. DualPoseNet: Category-Level 6D Object Pose and Size Estimation Using Dual Pose Network with Refined Learning of Pose Consistency. In IEEE/CVF International Conference on Computer Vision, pages 3560–3569, 2021. [60] D. G. Lowe. Object Recognition from Local Scale-Invariant Features. In IEEE International Conference on Computer Vision, volume 2, pages 1150–1157, 1999. [61] D. G. Lowe. Distinctive Image Features from Scale-Invariant Keypoints. International Journal of Computer Vision, 60(2):91–110, 2004. [62] J. Machon. Watching, Attending, Sense-making: Spectatorship in Immersive Theatres. Journal of Contemporary Drama in English, 4(1):34–48, 2016. [63] F. Manhardt, G. Wang, B. Busam, M. Nickel, S. Meier, L. Minciullo, X. Ji, and N. Navab. CPS+ +: Improving Class-level 6D Pose and Shape Estimation from Monocular Images with Self-Supervised Learning. arXiv preprint arXiv:2003.05848, 2020. [64] P. Milgram and F. Kishino. A Taxonomy of Mixed Reality Visual Displays. IEICE TRANSACTIONS on Information, 77(12):1321–1329, 1994. [65] A. I. Mourikis and S. I. Roumeliotis. A Multi-State Constraint Kalman Filter for Vision-aided Inertial Navigation. In IEEE International Conference on Robotics and Automation, pages 3565–3572, 2007. [66] R. Mur-Artal, J. M. M. Montiel, and J. D. Tardos. ORB-SLAM: A Versatile and Accurate Monocular SLAM System. IEEE Transactions on Robotics, 31(5):1147–1163, 2015. [67] NGMN Alliance. NGMN 5G White Paper, 2015. Version 1.0. [68] Y. Nie, X. Han, S. Guo, Y. Zheng, J. Chang, and J. J. Zhang. Total3DUnderstanding: Joint Layout, Object Pose and Mesh Reconstruction for Indoor Scenes from a Single Image. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 55–64, 2020. [69] N. O’dwyer, E. Zerman, G. W. Young, A. Smolic, S. Dunne, and H. Shenton. Volumetric Video in Augmented Reality Applications for Museological Narratives: A User Study for the Long Room in the Library of Trinity College Dublin. Journal on Computing and Cultural Heritage, 14(2):1–20, 2021. [70] E. Olson. AprilTag: A robust and flexible visual fiducial system. In IEEE International Conference on Robotics and Automation, pages 3400–3407, 2011. [71] J. J. Park, P. Florence, J. Straub, R. Newcombe, and S. Lovegrove. DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 165–174, 2019. [72] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems, pages 8024–8035. Curran Associates, Inc., 2019. [73] S. Peng, Y. Liu, Q. Huang, X. Zhou, and H. Bao. PVNet: Pixel-Wise Voting Network for 6DoF Pose Estimation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4561–4570, 2019. [74] W. Peng, J. Yan, H. Wen, and Y. Sun. Self-Supervised Category-Level 6D Object Pose Estimation with Deep Implicit Shape Representation. In AAAI Conference on Artificial Intelligence, volume 36, pages 2082–2090, 2022. [75] J.-C. Piao and S.-D. Kim. Adaptive Monocular Visual-Inertial SLAM for Real-Time Augmented Reality Applications in Mobile Devices. Sensors, 17(11):2567, 2017. [76] J.-C. Piao and S.-D. Kim. Real-Time Visual-Inertial SLAM Based on Adaptive Keyframe Selection for Mobile AR Applications. IEEE Transactions on Multimedia, 21(11):2827–2836, 2019. [77] A. Puig, I. Rodríguez, J. L. Arcos, J. A. Rodríguez-Aguilar, S. Cebrián, A. Bogdanovych, N. Morera, A. Palomo, and R. Piqué. Lessons Learned from Supplementing Archaeological Museum Exhibitions with Virtual Reality. Virtual Reality, 24(2):343–358, 2020. [78] G. Punpeng and P. Yodnane. The route to immersion: a conceptual framework for cross-disciplinary immersive theatre and experiences. Humanities and Social Sciences Communications, 10(1):1–9, 2023. [79] T. Qin, P. Li, and S. Shen. VINS-Mono: A Robust and Versatile Monocular Visual-Inertial State Estimator. IEEE Transactions on Robotics, 34(4):1004–1020, 2018. [80] M. Rad and V. Lepetit. BB8: A Scalable, Accurate, Robust to Partial Occlusion Method for Predicting the 3D Poses of Challenging Objects without Using Depth. In IEEE International Conference on Computer Vision, pages 3828–3836, 2017. [81] K. Ramnath, S. N. Sinha, R. Szeliski, and E. Hsiao. Car Make and Model Recognition using 3D Curve Alignment. In IEEE Winter Conference on Applications of Computer Vision, pages 285–292, 2014. [82] F. Rothganger, S. Lazebnik, C. Schmid, and J. Ponce. 3D Object Modeling and Recognition Using Local Affine-Invariant Image Descriptors and Multi-View Spatial Constraints. International Journal of Computer Vision, 66(3):231–259, 2006. [83] E. Rublee, V. Rabaud, K. Konolige, and G. Bradski. ORB: An efficient alternative to SIFT or SURF. In IEEE International Conference on Computer Vision, pages 2564–2571, 2011. [84] M. Ryffel, F. Zünd, Y. Aksoy, A. Marra, M. Nitti, T. O. Aydin, and R. W. Sumner. AR Museum: A Mobile Augmented Reality Application for Interactive Painting Recoloring. In International Conference Interfaces and Human Computer Interaction, pages 54–60, 2017. [85] R. B. Şahin and M. Mercimek. Fiducial Markers Aided Position Estimation for Vertical Landing. The European Journal of Research and Development, 3(2):29–45, 2023. [86] T. Sandy and J. Buchli. Object-Based Visual-Inertial Tracking for Additive Fabrication. IEEE Robotics and Automation Letters, 3(3):1370–1377, 2018. [87] P.-E. Sarlin, C. Cadena, R. Siegwart, and M. Dymczyk. From Coarse to Fine: Robust Hierarchical Localization at Large Scale. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12708–12717, 2019. [88] P.-E. Sarlin, M. Dusmanu, J. L. Schönberger, P. Speciale, L. Gruber, V. Larsson, O. Miksik, and M. Pollefeys. LaMAR: Benchmarking Localization and Mapping for Augmented Reality. In European Conference on Computer Vision, pages 686–704, 2022. [89] T. Sattler, B. Leibe, and L. Kobbelt. Efficient & Effective Prioritized Matching for Large-Scale Image-Based Localization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(9):1744–1756, 2016. [90] D. Scaramuzza and F. Fraundorfer. Visual Odometry [Tutorial]. IEEE Robotics & Automation Magazine, 18(4):80–92, 2011. [91] J. L. Schönberger and J.-M. Frahm. Structure-from-Motion Revisited. In IEEE Conference on Computer Vision and Pattern Recognition, pages 4104–4113, 2016. [92] J. L. Schönberger, E. Zheng, M. Pollefeys, and J.-M. Frahm. Pixelwise View Selection for Unstructured Multi-View Stereo. In European Conference on Computer Vision, pages 501–518, 2016. [93] M. Servières, V. Renaudin, A. Dupuis, and N. Antigny. Visual and Visual-Inertial SLAM: State of the Art, Classification, and Experimental Benchmarking. Journal of Sensors, 2021:1–26, 2021. [94] Y. Su, J. Rambach, N. Minaskan, P. Lesur, A. Pagani, and D. Stricker. Deep Multi-state Object Pose Estimation for Augmented Reality Assembly. In IEEE/ACM International Symposium on Mixed and Augmented Reality Adjunct, pages 222–227, 2019. [95] H. Taira, M. Okutomi, T. Sattler, M. Cimpoi, M. Pollefeys, J. Sivic, T. Pajdla, and A. Torii. InLoc: Indoor Visual Localization with Dense Matching and View Synthesis. In IEEE Conference on Computer Vision and Pattern Recognition, pages 7199–7209, 2018. [96] F. Tang, Y. Wu, X. Hou, and H. Ling. 3D Mapping and 6D Pose Computation for Real Time Augmented Reality on Cylindrical Objects. IEEE Transactions on Circuits and Systems for Video Technology, 30(9):2887–2899, 2019. [97] Tangram Vision. IMU Preintegration Error. https://web.archive.org/web/20250125161310/https://docs.tangramvision.com/metrical/core_concepts/results/residual_metrics/imu_preintegration_error/. Archived version accessed via the Internet Archive. [98] Z. Teed and J. Deng. RAFT: Recurrent All-Pairs Field Transforms for Optical Flow. In European Conference on Computer Vision, pages 402–419, 2020. [99] M. Tian, M. H. Ang, and G. H. Lee. Shape Prior Deformation for Categorical 6D Object Pose and Size Estimation. In European Conference on Computer Vision, pages 530–546, 2020. [100] J. Tremblay, T. To, B. Sundaralingam, Y. Xiang, D. Fox, and S. Birchfield. Deep Object Pose Estimation for Semantic Robotic Grasping of Household Objects. In Conference on Robot Learning, pages 306–316, 2018. [101] B. Triggs, P. F. McLauchlan, R. I. Hartley, and A. W. Fitzgibbon. Bundle Adjustment-A Modern Synthesis. In International Workshop on Vision Algorithms, pages 298–372, 1999. [102] S. Umeyama. Intelligence, 13(4):376–380, 1991. Least-Squares Estimation of Transformation Parameters Between Two Point Patterns. IEEE Transactions on Pattern Analysis and Machine [103] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin. Attention is All You Need. In Advances in Neural Information Processing Systems, volume 30, 2017. [104] H. Wang, S. Sridhar, J. Huang, J. Valentin, S. Song, and L. J. Guibas. Normalized Object Coordinate Space for Category-Level 6D Object Pose and Size Estimation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2642–2651, 2019. [105] J. Wang, Y. Zhong, Y. Dai, S. Birchfield, K. Zhang, N. Smolyanskiy, and H. Li. Deep Two-View Structure-from-Motion Revisited. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8949–8958, 2021. [106] K. Wang and W. Yan. An Introduction to Immersive Display Technologies. IEEE Open Journal on Immersive Displays, 1:79–85, 2024. [107] G. White. On Immersive Theatre. Theatre Research International, 37(3):221–235, 2012. [108] O. J. Woodman. An Introduction to Inertial Navigation. Technical Report UCAM-CL-TR-696, University of Cambridge, Computer Laboratory, 2007. [109] Y. Xiang, T. Schmidt, V. Narayanan, and D. Fox. PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes. In Robotics: Science and Systems, 2018. [110] Y. Xu, K.-Y. Lin, G. Zhang, X. Wang, and H. Li. RNNPose: Recurrent 6-DoF Object Pose Refinement with Robust Correspondence Field Estimation and Pose Optimization. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14880–14890, 2022. [111] T.-T. Yu, Y.-C. Lau, K.-L. Wang, and K.-W. Chen. CollabLoc: Collaborative Information Sharing for Real-Time Multiuser Visual Localization System. In IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 7967–7974, 2024. [112] X. Yu, Z. Zhuang, P. Koniusz, and H. Li. 6DoF Object Pose Estimation via Differentiable Proxy Voting Regularizer. In British Machine Vision Conference, pages 1–13, 2020. [113] M. Zaccaria, F. Manhardt, Y. Di, F. Tombari, J. Aleotti, and M. Giorgini. Self-Supervised Category-Level 6D Object Pose Estimation with Optical Flow Consistency. IEEE Robotics and Automation Letters, 8(5):2510–2517, 2023. [114] A. Zeng, K.-T. Yu, S. Song, D. Suo, E. Walker, A. Rodriguez, and J. Xiao. Multi-view Self-supervised Deep Learning for 6D Pose Estimation in the Amazon Picking Challenge. In IEEE International Conference on Robotics and Automation, pages 1383–1386, 2017. [115] C. Zhang, Z. Cui, Y. Zhang, B. Zeng, M. Pollefeys, and S. Liu. Holistic 3D Scene Understanding from a Single Image with Implicit Representation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8833–8842, 2021. [116] L. Zhong, Y. Zhang, H. Zhao, A. Chang, W. Xiang, S. Zhang, and L. Zhang. Seeing Through the Occluders: Robust Monocular 6-DOF Object Pose Tracking via Model-Guided Video Object Segmentation. IEEE Robotics and Automation Letters, 5(4):5159–5166, 2020.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/101567	-
dc.description.abstract	本論文以攝影機定位為主題，分別從應用實作與方法設計兩個面向進行探討。在應用層面上，我們以沉浸式劇場「迴路花園」為案例，開發Circuit Garden XR (CGXR)行動延展實境應用，利用既有的標記式攝影機定位技術，將360度演出影片精準錨定於原始表演場域，使使用者得以在演出結束後，於實際空間中依自身動線與視角重溫表演歷程。使用者研究顯示，CGXR不僅提升使用者對表演內容的理解，也提供沉浸且具吸引力的互動體驗。同時，經由此研究，我們亦發現攝影機定位技術在運算效率與多使用者支援議題上仍存在實務上的挑戰與限制。有鑑於此，在方法層面上，我們首先聚焦於定位運算效率，提出一套適用於行動平台的即時物體式攝影機定位方法。該方法採用客戶端—伺服器架構的單目慣性輔助視覺追蹤系統，以最小化客戶端的計算負載來實現物體姿態估測。為了提升準確性和可靠性，我們引入了偏差自校正機制和姿態檢測演算法。此外，本研究亦建置了一個包含同步全彩影像和慣性感測單元數據的物體姿態數據集，用於評估與驗證整體定位效能。進一步地，我們將研究範疇擴展至多人支援，並提出CollabLoc，其為一套採用協同計算架構之基於視覺的多使用者空間定位系統。CollabLoc整合動態伺服器資源管理、基於用戶端融合姿態的影像檢索、結合光流的特徵匹配技術，以及用於降低漂移的姿態融合模組，以有效支援已知場景中的即時多人定位。實驗結果顯示，在多使用者情境下，CollabLoc相較於基準方法可將定位處理速度提升約1.91倍，同時維持良好定位精度。	zh_TW
dc.description.abstract	This thesis explores camera localization by examining application-oriented implementation and methodological design. At the application level, we take the immersive theater production Circuit Garden as a case study and develop Circuit Garden XR (CGXR), a mobile XR application. CGXR leverages existing marker-based camera localization technologies to accurately anchor 360° performance videos within the original performance venue, allowing users to revisit the theatrical experience after the show by following their own movement paths and viewing perspectives within the physical space. User studies show that CGXR not only improves users' understanding of the performance content but also provides an immersive and engaging interactive experience. From this work, we also found two practical challenges and limitations in camera localization regarding computational efficiency and multiuser support. In view of this, at the methodological level, we first address the computational efficiency of localization and propose a real-time object-based camera localization method tailored for mobile platforms. The method adopts a client-server monocular inertial-aided visual tracking architecture that minimizes the computational load on the client side to achieve object pose estimation. To further improve accuracy and reliability, we introduce a bias self-correction mechanism and a pose inspection algorithm. In addition, a novel object pose dataset featuring synchronized RGB and IMU data is provided for comprehensive evaluation. Furthermore, we expand the research scope to multiuser support and present CollabLoc, a multiuser system for vision-based spatial localization leveraging collaborative computation. The system incorporates adaptive server-side resource management, client-pose-fused image retrieval, optical-flow-enhanced feature matching, and a pose fusion module for drift reduction, effectively supporting real-time multiuser localization in known scenes. Experimental results show that CollabLoc achieves a 1.91x speedup in localization processing compared to the baseline method in multi-client scenarios while maintaining good localization accuracy.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2026-02-11T16:25:04Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2026-02-11T16:25:04Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	Acknowledgements i 摘要 ii Abstract iv Contents vi List of Figures ix List of Tables xv Chapter 1 Introduction 1 Chapter 2 Related Work 5 2.1 Overview of Camera Localization . . . . . . . . . . . . . . . . . . . 5 2.2 Single-Shot Localization . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2.1 Marker-Based Localization . . . . . . . . . . . . . . . . . . . . . . 9 2.2.2 Object-Based Localization . . . . . . . . . . . . . . . . . . . . . . .11 2.2.3 Scene-Based Localization . . . . . . . . . . . . . . . . . . . . . . . 15 Chapter 3 Marker-Based Camera Localization on Mobile Displays for Re-presenting Immersive Theater 17 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.1.1 Circuit Garden . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.1.2 Re-experiencing of Circuit Garden with XR . . . . . . . . . . . . . 20 3.2 Application Design . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.2.1 Performance Recording . . . . . . . . . . . . . . . . . . . . . . . . 22 3.2.2 Development Environment . . . . . . . . . . . . . . . . . . . . . . 23 3.2.3 Revivification of Performance . . . . . . . . . . . . . . . . . . . . 24 3.2.4 Visual Navigation Assistance System . . . . . . . . . . . . . . . . . 26 3.2.5 Experience Flow of CGXR . . . . . . . . . . . . . . . . . . . . . . 27 3.2.6 Localization Stability Optimization in CGXR . . . . . . . . . . . . 29 3.3 User Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.3.1 Research Hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.3.2 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . 31 3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .36 Chapter 4 Object-Based Camera Localization with Low Computational Cost 38 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 4.2 Proposed Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 4.2.1 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 41 4.2.2 Pose Propagation Module . . . . . . . . . . . . . . . . . . . . . . . 43 4.2.3 Pose Inspection Module . . . . . . . . . . . . . . . . . . . . . . . . 43 4.2.4 Pose Refinement Module . . . . . . . . . . . . . . . . . . . . . . . 43 4.2.5 IMU-Based Pose Propagation . . . . . . . . . . . . . . . . . . . . . 44 4.2.6 Bias Self-Correction Mechanism . . . . . . . . . . . . . . . . . . . 46 4.2.7 Pose Inspection Algorithm . . . . . . . . . . . . . . . . . . . . . .47 4.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 4.3.1 Experiments on Simulated Data . . . . . . . . . . . . . . . . . . . . 49 4.3.2 Experiments on Real-World Data . . . . . . . . . . . . . . . . . . . 55 4.3.3 Ablation Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 4.3.4 Computational Efficiency . . . . . . . . . . . . . . . . . . . . . . . 59 4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 Chapter 5 Scene-Based Camera Localization for Multiuser via Collabora-tive Information Sharing 62 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 5.2 Proposed Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 5.2.1 Tracking Confidence Module . . . . . . . . . . . . . . . . . . . . . 67 5.2.2 Pose Fusion Module . . . . . . . . . . . . . . . . . . . . . . . . . . 69 5.2.3 Image Retrieval by Fused Pose . . . . . . . . . . . . . . . . . . . . 70 5.2.4 Feature Matching with Optical Flow . . . . . . . . . . . . . . . . . 71 5.2.5 Database Updater . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 5.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 5.3.1 Implementation Details . . . . . . . . . . . . . . . . . . . . . . . . 73 5.3.2 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 5.3.3 Single-Client Evaluation . . . . . . . . . . . . . . . . . . . . . . . 75 5.3.4 Multi-Client Evaluation . . . . . . . . . . . . . . . . . . . . . . . . 77 5.3.5 Ablation Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 5.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 Chapter 6 Conclusion 83 References 87	-
dc.language.iso	en	-
dc.subject	攝影機定位	-
dc.subject	姿態估測	-
dc.subject	姿態追蹤	-
dc.subject	深度學習	-
dc.subject	延展實境	-
dc.subject	Camera Localization	-
dc.subject	Pose Estimation	-
dc.subject	Pose Tracking	-
dc.subject	Deep Learning	-
dc.subject	Extended Reality	-
dc.title	攝影機定位方法與應用	zh_TW
dc.title	Methodology and Application of Camera Localization	en
dc.type	Thesis	-
dc.date.schoolyear	114-1	-
dc.description.degree	博士	-
dc.contributor.oralexamcommittee	逄愛君;李明穗;王照明;孫士韋	zh_TW
dc.contributor.oralexamcommittee	Ai-Chun Pang;Ming-Sui Lee;Chao-Ming Wang;Shih-Wei Sun	en
dc.subject.keyword	攝影機定位,姿態估測姿態追蹤深度學習延展實境	zh_TW
dc.subject.keyword	Camera Localization,Pose EstimationPose TrackingDeep LearningExtended Reality	en
dc.relation.page	102	-
dc.identifier.doi	10.6342/NTU202600195	-
dc.rights.note	同意授權(限校園內公開)	-
dc.date.accepted	2026-01-26	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	資訊網路與多媒體研究所	-
dc.date.embargo-lift	2026-02-12	-
顯示於系所單位：	資訊網路與多媒體研究所

文件中的檔案：

檔案	大小	格式
ntu-114-1.pdf 授權僅限NTU校內IP使用（校園外請利用VPN校外連線服務）	23.46 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。