Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 電子工程學研究所
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/92111
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor簡韶逸zh_TW
dc.contributor.advisorShao-Yi Chienen
dc.contributor.author劉家甫zh_TW
dc.contributor.authorChia-Fu Liuen
dc.date.accessioned2024-03-05T16:20:24Z-
dc.date.available2024-03-06-
dc.date.copyright2024-03-05-
dc.date.issued2024-
dc.date.submitted2024-02-03-
dc.identifier.citation[1] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Doll´ar, and C. L. Zitnick, “Microsoft coco: Common objects in context,” in Proceedings of European Conference on Computer Vision (ECCV), 2014, pp. 740–755.
[2] P.-C. Wu, H.-Y. Tseng, M.-H. Yang, and S.-Y. Chien, “Direct pose estimation for planar objects,” Computer Vision and Image Understanding (CVIU), vol. 172, pp. 50–66, 2018.
[3] P.-C. Wu, Y.-Y. Lee, H.-Y. Tseng, H.-I. Ho, M.-H. Yang, and S.-Y. Chien, “A benchmark dataset for 6dof object pose tracking,” in Proceedings of IEEE International Symposium on Mixed and Augmented Reality (ISMAR-Adjunct), 2017, pp. 186–191.
[4] Z. Ge, S. Liu, F. Wang, Z. Li, and J. Sun, “Yolox: Exceeding yolo series in 2021,” arXiv preprint arXiv:2107.08430, 2021.
[5] J. J. Rodrigues, J.-S. Kim, M. Furukawa, J. Xavier, P. Aguiar, and T. Kanade, “6d pose estimation of textureless shiny objects using random ferns for bin picking,” in Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2012, pp. 3334–3341.
[6] E. Marchand, H. Uchiyama, and F. Spindler, “Pose estimation for augmented reality: A hands-on survey,” IEEE Transactions on Visualization and Computer Graphics (TVCG), vol. 22, no. 12, pp. 2633–2651, 2016.
[7] Z. Zhou, J. Karlekar, D. Hii, M. Schneider, W. Lu, and S. Wittkopf, “Robust pose estimation for outdoor mixed reality with sensor fusion,” in Universal Access in Human-Computer Interaction. Applications and Services. (UAHCI). Springer Berlin Heidelberg, 2009, pp. 281–289.
[8] M. Fiala, “Artag, a fiducial marker system using digital techniques,” in Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), vol. 2, 2005, pp. 590–596.
[9] E. Olson, “Apriltag: A robust and flexible visual fiducial system,” in Proceed ings of IEEE International Conference on Robotics and Automation (ICRA), 2011, pp. 3400–3407.
[10] S. Garrido-Jurado, R. Mu˜noz-Salinas, F. Madrid-Cuevas, and R. Medina-Carnicer, “Generation of fiducial marker dictionaries using mixed integer linear programming,” Pattern Recognition, vol. 51, pp. 481–491, 2016.
[11] F. J. Romero-Ramirez, R. Mu˜noz-Salinas, and R. Medina-Carnicer, “Speeded up detection of squared fiducial markers,” Image and Vision Computing, vol. 76, pp. 38–47, 2018.
[12] B. Benligiray, C. Topal, and C. Akinlar, “Stag: A stable fiducial marker system,” Image and Vision Computing, vol. 89, pp. 158–169, 2019.
[13] H. Uchiyama and H. Saito, “Random dot markers,” in Proceedings of 2011 IEEE Virtual Reality Conference, 2011, pp. 35–38.
[14] H. Uchiyama and E. Marchand, “Deformable random dot markers,” in Proceedings of IEEE International Symposium on Mixed and Augmented Reality (ISMAR), 2011, pp. 237–238.
[15] L. Calvet, P. Gurdjos, C. Griwodz, and S. Gasparini, “Detection and accurate localization of circular fiducials under highly challenging conditions,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 562–570.
[16] D. Hu, D. DeTone, and T. Malisiewicz, “Deep charuco: Dark charuco marker pose estimation,” in Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 8428–8436.
[17] J. Peace, E. Psota, Y. Liu, and L. C. P´erez, “E2etag: An end-to-end trainable method for generating and detecting fiducial markers,” arXiv preprint arXiv:2105.14184, 2021.
[18] Y. Xiang, T. Schmidt, V. Narayanan, and D. Fox, “Posecnn: A convolu tional neural network for 6d object pose estimation in cluttered scenes,” in Proceedings of Robotics: Science and Systems (RSS), 2018.
[19] Y. Hu, P. Fua, W. Wang, and M. Salzmann, “Single-stage 6d object pose estimation,” in Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020.
[20] D. G. Lowe, “Distinctive Image Features from Scale-Invariant Keypoints,” International Journal of Computer Vision (IJCV), vol. 60, no. 2, pp. 91–110, 2004.
[21] H. Bay, A. Ess, T. Tuytelaars, and L. Van Gool, “Speeded-up robust features (surf),” Computer Vision and Image Understanding (CVIU), vol. 110, no. 3, pp. 346–359, 2008.
[22] S. Leutenegger, M. Chli, and R. Y. Siegwart, “Brisk: Binary robust invariant scalable keypoints,” in Proceedings of IEEE International Conference on Computer Vision (ICCV), 2011, pp. 2548–2555.
[23] E. Rublee, V. Rabaud, K. Konolige, and G. Bradski, “Orb: An efficient alternative to sift or surf,” in Proceedings of IEEE International Conference on Computer Vision (ICCV), 2011, pp. 2564–2571.
[24] A. Alahi, R. Ortiz, and P. Vandergheynst, “Freak: Fast retina keypoint,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2012, pp. 510–517.
[25] M. A. Fischler and R. C. Bolles, “Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography,” Communications of the ACM (CACM), vol. 24, no. 6, pp. 381–395, 1981.
[26] O. Chum and J. Matas, “Matching with prosac - progressive sample consen sus,” in Proceedings of 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), vol. 1, 2005, pp. 220–226.
[27] V. Lepetit, F. Moreno-Noguer, and P. Fua, “Epnp: An accurate o(n) solution to the pnp problem,” International Journal of Computer Vision (IJCV), vol. 81, no. 2, pp. 155–166, 2009.
[28] Y. Zheng, Y. Kuang, S. Sugimoto, K. Astrom, and M. Okutomi, “Revisiting the pnp problem: A fast, general and optimal solution,” in Proceedings of IEEE International Conference on Computer Vision (ICCV), 2013, pp. 2344–2351.
[29] B. D. Lucas and T. Kanade, “An iterative image registration technique with an application to stereo vision,” in Proceedings of International Joint Conference on Artificial Intelligence (IJCAI), vol. 81, 1981, pp. 674–679.
[30] G. D. Hager and P. N. Belhumeur, “Efficient region tracking with parametric models of geometry and illumination,” IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol. 20, no. 10, pp. 1025–1039, 1998.
[31] H.-Y. Shum and R. Szeliski, “Construction of panoramic image mosaics with global and local alignment,” Panoramic Vision, pp. 227–268, 2001.
[32] S. Baker and I. Matthews, “Equivalence and efficiency of image alignment algorithms,” in Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, 2001, pp. 1090–1097.
[33] E. Malis, “Improving vision-based control using efficient second-order mini mization techniques,” in Proceedings of IEEE International Conference on Robotics and Automation (ICRA), 2004, pp. 1843–1848.
[34] A. Crivellaro and V. Lepetit, “Robust 3d tracking with descriptor fields,” in Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 3414–3421.
[35] J. Engel, T. Sch¨ops, and D. Cremers, “Lsd-slam: Large-scale direct monocular slam,” in Proceedings of European Conference on Computer Vision (ECCV), 2014, pp. 834–849.
[36] Y.-T. Chi, J. Ho, and M.-H. Yang, “A direct method for estimating planar projective transform,” in Proceedings of Asian Conference on Computer Vision (ACCV), 2011, pp. 268–281.
[37] S. Korman, D. Reichman, G. Tsur, and S. Avidan, “Fast-match: Fast affine template matching,” International Journal of Computer Vision (IJCV), vol. 121, no. 1, pp. 111–125, 2017.
[38] J. F. Henriques, P. Martins, R. F. Caseiro, and J. Batista, “Fast training of pose detectors in the fourier domain,” in Proceedings of Neural Information Processing Systems (NIPS), 2014, pp. 3050–3058.
[39] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 580–587.
[40] R. Girshick, “Fast r-cnn,” in Proceedings of IEEE International Conference on Computer Vision (ICCV), 2015, pp. 1440–1448.
[41] S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,” in Proceedings of Neural Information Processing Systems (NIPS), vol. 28, 2015.
[42] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 779–788.
[43] J. Redmon and A. Farhadi, “Yolo9000: Better, faster, stronger,” in Proceed ings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 6517–6525.
[44] ——, “Yolov3: An incremental improvement,” arXiv preprint arXiv:1804.02767, 2018.
[45] A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, “Yolov4: Optimal speed and accuracy of object detection,” arXiv preprint arXiv:2004.10934, 2020.
[46] N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “End-to-end object detection with transformers,” in Proceedings of European Conference on Computer Vision (ECCV), 2020, pp. 213–229.
[47] X. Zhu, W. Su, L. Lu, B. Li, X. Wang, and J. Dai, “Deformable detr: Deformable transformers for end-to-end object detection,” arXiv preprint arXiv:2010.04159, 2021.
[48] P.-C. Wu, R. Wang, K. Kin, C. Twigg, S. Han, M.-H. Yang, and S.-Y. Chien, “Dodecapen: Accurate 6dof tracking of a passive stylus,” in Proceedings of the 30th Annual ACM Symposium on User Interface Software and Technology, 2017, p. 365–374.
[49] Wikipedia contributors, “Delone set — Wikipedia, the free encyclopedia,” 2017, [Online; accessed 8-May-2018]. [Online]. Available: https://en.wikipedia.org/w/index.php?title=Delone_set&oldid=795315991
[50] G. Schweighofer and A. Pinz, “Robust Pose Estimation from a Planar Target,” IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol. 28, no. 12, pp. 2024–2030, 2006.
[51] R. Mu˜noz-Salinas, M. J. Mar´ın-Jimenez, E. Yeguas-Bolivar, and R. Medina Carnicer, “Mapping and localization from planar markers,” Pattern Recognition, vol. 73, pp. 158–171, 2018.
-
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/92111-
dc.description.abstract隨著終端裝置技術的快速發展,擴增實境(AR)和混合實境(MR)正受到越來越多的關注。估測平面物體六自由度(6-DoF)姿態是這些應用的關鍵之一。多年來,人們進行了大量研究以估測單一已知平面目標的姿態。然而,在實際應用中可能會遇到多個平面目標,並且根據具體應用可能會替換為不同的目標。我們的目標是設計一個通用的平面物體姿態估測系統,能夠在最小的訓練要求下高效地估測多個任意平面物體的姿態。
在這篇論文中,我們提出了一個基於物件偵測和直接姿態估測算法的多平面物體姿態估測系統,稱為DetDPE。我們只使用少量合成影像數據對現成的物件偵測器進行微調。物件偵測器識別用於姿態估測的平面物體,並顯著降低估測算法的複雜度。我們的DetDPE系統展現了高效性,同時保持了原始直接姿態估測算法的高精準度。此外,我們將物件偵測方法與基於特徵的姿態估測算法結合。結果顯示這種方法確實可以提升基於特徵的算法的性能。因此,我們可以將此物件偵測方法視為一個可應用於各種姿態估測算法的框架。
zh_TW
dc.description.abstractWith the rapid growth of technologies on edge devices, Augmented Reality (AR) and Mixed Reality (MR) are gaining more and more attention. Estimating the six degrees of freedom (6-DoF) planar object pose is one of the keys to these applications. Throughout the years, numerous research studies have been conducted to estimate the pose of a single known planar target. However, in real-world applications, the presence of multiple planar targets may be encountered, and they might be replaced with different ones depending on the specific application. Our objective is to design a general planar object pose estimation system capable of estimating the poses of multiple arbitrary planar objects efficiently with minimal training requirements.
In this thesis, we propose a multiple planar object pose estimation system, DetDPE, based on object detection and a direct-based pose estimation algorithm. We fine-tune an off-the-shelf object detector with only a modest amount of synthetic image data. The object detector identifies the planar objects for pose estimation and significantly reduces the complexity of the estimation algorithm. Our DetDPE system demonstrates efficiency while maintaining the high accuracy of the original direct-based algorithm. Furthermore, we integrate the object detection approach with a feature-based pose estimation algorithm. The results show that this approach can indeed enhance the performance of feature-based algorithms. Therefore, we can regard the object detection approach as a framework applicable to various types of pose estimation algorithms.
en
dc.description.provenanceSubmitted by admin ntu (admin@lib.ntu.edu.tw) on 2024-03-05T16:20:24Z
No. of bitstreams: 0
en
dc.description.provenanceMade available in DSpace on 2024-03-05T16:20:24Z (GMT). No. of bitstreams: 0en
dc.description.tableofcontentsAbstract (P.i)
List of Figures (P.iv)
List of Tables (P.vi)
1 Introduction (P.1)
1.1 6-DoF Planar Object Pose Estimation (P.1)
1.2 Object Detection (P.3)
1.3 Contributions (P.4)
1.4 Thesis Organization (P.5)
2 Background Knowledge and Related Work (P.6)
2.1 Problem Formulation (P.6)
2.2 Related Work (P.9)
2.2.1 Marker-Based Pose Estimation (P.9)
2.2.2 Feature-Based Pose Estimation (P.9)
2.2.3 Direct Pose Estimation (P.10)
2.2.4 Deep Learning Object Detectors (P.11)
3 Proposed Method (P.13)
3.1 Planar Object Detector Training (P.13)
3.1.1 Synthetic Data (P.14)
3.1.2 Training and Model Selection (P.15)
3.2 Direct Pose Estimation with Object Detection (P.15)
3.3 Feature-based Pose Estimation with Object Detection (P.19)
4 Experiments (P.21)
4.1 Object Detector Evaluation (P.22)
4.2 Single Target Synthetic Image Dataset (P.23)
4.2.1 Undistorted Images (P.23)
4.2.2 Degraded Images (P.25)
4.3 Object Pose Tracking Dataset (P.25)
4.4 Multiple Target Synthetic Image Dataset (P.33)
4.5 Runtime Comparison (P.33)
5 Conclusion (P.36)
Reference (P.38)
-
dc.language.isoen-
dc.subject擴增實境zh_TW
dc.subject平面物體姿態zh_TW
dc.subject姿態估測zh_TW
dc.subject六自由度zh_TW
dc.subject混合實境zh_TW
dc.subject物體姿態zh_TW
dc.subjectAugmented Reality (AR)en
dc.subjectMixed Reality (MR)en
dc.subjectSix Degrees of Freedom (6-DoF)en
dc.subjectObject Poseen
dc.subjectPlanar Object Poseen
dc.subjectPose Estimationen
dc.title基於物件偵測之多平面物體姿態估測系統zh_TW
dc.titleMultiple Planar Object Pose Estimation System Based on Object Detectionen
dc.typeThesis-
dc.date.schoolyear112-1-
dc.description.degree碩士-
dc.contributor.oralexamcommittee施吉昇;鄭文皇;陳冠文zh_TW
dc.contributor.oralexamcommitteeChi-Sheng Shih;Wen-Huang Cheng;Kuan-Wen Chenen
dc.subject.keyword擴增實境,混合實境,六自由度,物體姿態,平面物體姿態,姿態估測,zh_TW
dc.subject.keywordAugmented Reality (AR),Mixed Reality (MR),Six Degrees of Freedom (6-DoF),Object Pose,Planar Object Pose,Pose Estimation,en
dc.relation.page44-
dc.identifier.doi10.6342/NTU202400430-
dc.rights.note同意授權(全球公開)-
dc.date.accepted2024-02-05-
dc.contributor.author-college電機資訊學院-
dc.contributor.author-dept電子工程學研究所-
dc.date.embargo-lift2025-02-01-
顯示於系所單位:電子工程學研究所

文件中的檔案:
檔案 大小格式 
ntu-112-1.pdf7.94 MBAdobe PDF檢視/開啟
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved