針對單一平面目標物之三維姿態的直接分析：演算法和系統實作

Hung-Yu Tseng; 曾泓諭

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/76461

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	簡韶逸(Shao-Yi Chien)
dc.contributor.author	Hung-Yu Tseng	en
dc.contributor.author	曾泓諭	zh_TW
dc.date.accessioned	2021-07-09T15:52:42Z	-
dc.date.available	2021-07-05
dc.date.copyright	2016-07-05
dc.date.issued	2016
dc.date.submitted	2016-06-23
dc.identifier.citation	[1] S. Gauglitz, T. Hぴollerer, and M. Turk, “Evaluation of interest point detectors and feature descriptors for visual tracking,” International Journal of Computer Vision, vol. 94, no. 3, pp. 335–360, 2011. [2] R. T. Azuma, “A survey of augmented reality,” Presence: Teleoperators and virtual environments, vol. 6, no. 4, pp. 355–385, 1997. [3] H. Kato and M. Billinghurst, “Marker tracking and hmd calibration for a video-based augmented reality conferencing system,” in Proc. IEEE and ACM International Workshop on Augmented Reality, 1999. [4] H. Kato, M. Billinghurst, I. Poupyrev, K. Imamoto, and K. Tachibana, “Virtual object manipulation on a table-top ar environment,” in Proc. IEEE and ACM International Symposium on Augmented Reality. Ieee, 2000, pp. 111– 119. [5] G. A. Lee, C. Nelles, M. Billinghurst, and G. J. Kim, “Immersive authoring of tangible augmented reality applications,” in Proc. IEEE International Symposium on Mixed and Augmented Reality, 2004. [6] N. Hagbi, O. Bergig, J. El-Sana, and M. Billinghurst, “Shape recognition and pose estimation for mobile augmented reality,” IEEE Transactions on Visualization and Computer Graphics, vol. 17, no. 10, pp. 1369–1379, 2011. [7] M. Donoser, P. Kontschieder, and H. Bischof, “Robust planar target tracking and pose estimation from a single concavity,” in Proc. IEEE International Symposium on Mixed and Augmented Reality, 2011. [8] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” International Journal of Computer Vision, vol. 60, no. 2, 2004. [9] H. Bay, A. Ess, T. Tuytelaars, and L. Van Gool, “Speeded-up robust features (surf),” Computer Vision and Image Understanding, vol. 110, no. 3, 2008. [10] S. Leutenegger, M. Chli, and R. Y. Siegwart, “BRISK: Binary Robust Invariant Scalable Keypoints,” in Proc. IEEE International Conference on Computer Vision, 2011. [11] E. Rublee, V. Rabaud, K. Konolige, and G. Bradski, “Orb: an efficient alternative to sift or surf,” in Proc. IEEE International Conference on Computer Vision, 2011. [12] A. Alahi, R. Ortiz, and P. Vandergheynst, “Freak: Fast retina keypoint,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition, 2012. [13] M. A. Fischler and R. C. Bolles, “Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography,” Communications of the ACM, vol. 24, no. 6, pp. 381–395, 1981. [14] O. Chum and J. Matas, “Matching with prosac-progressive sample consensus,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition, 2005. [15] T.-J. Chin, P. Purkait, A. Eriksson, and D. Suter, “Efficient globally optimal consensus maximisation with tree search,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition, 2015. [16] G. Schweighofer and A. Pinz, “Robust Pose Estimation from a Planar Target,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 12, 2006. [17] V. Lepetit, F. Moreno-Noguer, and P. Fua, “Epnp: An accurate O(n) solution to the pnp problem,” International Journal of Computer Vision, vol. 81, no. 2, 2009. [18] Y. Zheng, Y. Kuang, S. Sugimoto, K. Astrom, and M. Okutomi, “Revisiting the PnP Problem: A Fast, General and Optimal Solution,” in Proc. IEEE International Conference on Computer Vision, 2013. [19] T. Collins and A. Bartoli, “Infinitesimal plane-based pose estimation,” International Journal of Computer Vision, vol. 109, no. 3, pp. 252–286, 2014. [20] B. D. Lucas, T. Kanade et al., “An iterative image registration technique with an application to stereo vision.” vol. 81, 1981, pp. 674–679. [21] G. D. Hager and P. N. Belhumeur, “Efficient region tracking with parametric models of geometry and illumination,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 20, no. 10, pp. 1025–1039, 1998. [22] H.-Y. Shum and R. Szeliski, “Construction of panoramic image mosaics with global and local alignment,” 2001, pp. 227–268. [23] S. Baker and I. Matthews, “Equivalence and efficiency of image alignment algorithms,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition, 2001. [24] E. Malis, “Improving vision-based control using efficient second-order minimization techniques,” 2004. [25] A. Crivellaro and V. Lepetit, “Robust 3d tracking with descriptor fields,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition, 2014. [26] J. Engel, T. Schぴops, and D. Cremers, “Lsd-slam: Large-scale direct monocular slam,” in Proc. European Conference on Computer Vision, 2014. [27] Y.-T. Chi, J. Ho, and M.-H. Yang, “A direct method for estimating planar projective transform,” in Proc. Asian Conference on Computer Vision, 2011. [28] S. Korman, D. Reichman, G. Tsur, and S. Avidan, “Fast-match: Fast affine template matching,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition, 2013. [29] J. F. Henriques, P. Martins, R. F. Caseiro, and J. Batista, “Fast training of pose detectors in the fourier domain,” in Proc. Annual Conference on Neural Information Processing Systems, 2014. [30] D. Oberkampf, D. F. DeMenthon, and L. S. Davis, “Iterative pose estimation using coplanar points,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition, 1993. [31] S. Li and C. Xu, “Efficient lookup table based camera pose estimation for augmented reality,” vol. 22, no. 1, pp. 47–58, 2011. [32] P.-C. Wu, Y.-H. Tsai, and S.-Y. Chien, “Stable pose tracking from a planar target with an analytical motion model in real-time applications,” 2014. [33] J. Stork, “Camera pose estimation with circular markers,” Ph.D. dissertation, Thesis, University of Amsterdam (UvA), 2012. [34] E. Olson, “Apriltag: A robust and flexible visual fiducial system,” 2011. [35] D. Wagner and D. Schmalstieg, Artoolkitplus for pose tracking on mobile devices, 2007. [36] J. Rekimoto and Y. Ayatsuka, “Cybercode: designing augmented reality environments with visual tags,” in Proceedings of DARE 2000 on Designing augmented reality environments. ACM, 2000. [37] S. Lieberknecht, Q. Stierstorfer, G. Kuschk, D. Ulbricht, M. Langer, and S. Benhimane, “Evolution of a tracking system,” in Handbook of Augmented Reality. Springer, 2011, pp. 355–377. [38] D. Schmalstieg and D. Wagner, “Experiences with handheld augmented reality,” in Proc. IEEE International Symposium on Mixed and Augmented Reality, 2007. [39] S.-W. Shih and T.-Y. Yu, “On designing an isotropic fiducial mark,” IEEE Transactions on Image Processing, vol. 12, no. 9, pp. 1054–1066, 2003. [40] F. Bergamasco, A. Albarelli, E. Rodola, and A. Torsello, “Rune-tag: A high accuracy fiducial marker with strong occlusion resilience,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition, 2011. [41] P. Santos, A. Stork, A. Buaes, and J. Jorge, “Ptrack: introducing a novel iterative geometric pose estimation for a marker-based single camera tracking system,” in Proc. IEEE Virtual Reality, 2006. [42] H. Uchiyama and E. Marchand, “Deformable random dot markers,” in Proc. IEEE International Symposium on Mixed and Augmented Reality, 2011. [43] G. Yu and J.-M. Morel, “Asift: A new framework for fully affine invariant image comparison,” Image Processing On Line, 2011. [44] C.-P. Lu, G. D. Hager, and E. Mjolsness, “Fast and globally convergent pose estimation from video images,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 6, pp. 610–622, 2000. [45] S. Li, C. Xu, and M. Xie, “A robust o (n) solution to the perspective-n-point problem,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 7, pp. 1444–1450, 2012. [46] Z. Zhang, “A flexible new technique for camera calibration,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 11, pp. 1330–1334, 2000. [47] P. Sturm, “Algorithms for plane-based pose estimation,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition, 2000. [48] T. Collins, J.-D. Durou, P. Gurdjos, and A. Bartoli, “Singleview perspective shape-from-texture with focal length estimation: A piecewise affine approach,” in International Symposium 3D Data Processing, Visualization and Transmission, 2010. [49] O. Pele and M. Werman, “Accelerating pattern matching or how much can you slide?” in Proc. Asian Conference on Computer Vision, 2007. [50] B. Alexe, V. Petrescu, and V. Ferrari, “Exploiting spatial overlap to efficiently compute appearance distances between image windows,” in Proc. Annual Conference on Neural Information Processing Systems, 2011, pp. 2735–2743. [51] L.-K. Liu and E. Feig, “A block-based gradient descent search algorithm for block motion estimation in video coding,” IEEE Transactions on Circuits and Systemsfor Video Technology, vol. 6, no. 4, pp. 419–422, 1996. [52] S. Zhu and K.-K. Ma, “A new diamond search algorithm for fast blockmatching motion estimation,” IEEE Transactions on Image Processing, vol. 9, no. 2, pp. 287–290, 2000. [53] C. Zhu, X. Lin, and L.-P. Chau, “Hexagon-based search pattern for fast block motion estimation,” IEEE Transactions on Circuits and Systemsfor Video Technology, vol. 12, no. 5, pp. 349–355, 2002. [54] D. Eberly, “Euler angle formulas,” Geometric Tools, LLC, Technical Report, 2008. [55] Y. S. Abu-Mostafa, M. Magdon-Ismail, and H.-T. Lin, Learning from data. AMLBook, 2012. [56] L. Kneip, H. Li, and Y. Seo, “Upnp: An optimal o (n) solution to the absolute pose problem with universal applicability,” in Proc. European Conference on Computer Vision, 2014. [57] S. Lieberknecht, S. Benhimane, P. Meier, and N. Navab, “A dataset and evaluation methodology for template-based tracking algorithms,” in Proc. IEEE International Symposium on Mixed and Augmented Reality, 2009. [58] H. Jegou, M. Douze, and C. Schmid, “Hamming embedding and weak geometric consistency for large scale image search,” in Proc. European Conference on Computer Vision, 2008. [59] N. Bell and J. Hoberock, “Thrust: a productivity-oriented library for cuda,” GPU Computing Gems: Jade Edition, 2012.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/76461	-
dc.description.abstract	近年來，隨著擴充實境與機器人學的發展，如何即時且準確地分析已知的單一平面目標物其三維姿態成為了一個重要的議題。即使過去十幾年中，不少有效率的系統陸續被提出，但由於這些系統只能針對特定的平面目標物，如基準標記或含有簡易封閉曲線的平面目標物，此問題仍舊缺少一個適用於任意平面目標物的解決方法。目前針對此問題最好的解決方法為基於特徵點的方法，但是此方法必須在目標物與相機圖片中的特徵點能對應的前提下才能運作。為了解決這個問題，在本篇碩士論文中，我們提出了一個表現穩定、能針對任意平面目標物的直接分析演算法。首先，我們採用模板匹配的概念求出一個近似的三維姿態。接下來針對此近似的三維姿態，我們提出了一個梯度下降尋找的演算法來求出更為精準的三維姿態。更進一步，基於所提出的演算法，我們在圖形處理器上實作了一套分析和追蹤平面目標物之三維姿態的系統。此系統包含了分析單元和追蹤單元兩部分。分析單元負責計算出起始的三維姿態，是基於我們提出的演算法所設計的；追蹤單元則負責追蹤三維姿態，其所使用的方法為我們提出的一種三階層搜尋法。在系統中，無論是分析單元還是追蹤單元都充分利用了圖形處理器中平行運算的優點，使得我們的系統能夠非常有效率的運作。我們透過大量的實驗，證明我們所提出的演算法和系統，其表現都比目前基於特徵點的方法要更精準而且穩定。並於實際應用中，我們的系統達到了每秒11幀的運算速度。	zh_TW
dc.description.abstract	Real-time estimating and tracking accurate 3D poses of a known planar target from a calibrated camera are essential for augmented reality and robotics. Although numerous efficient systems have been proposed in the past few decades, it remains a challenging task since the planar targets are limited to fiducial markers and targets with simple contours. The feature-based schemes are the state-of-the-art solutions for obtaining poses of arbitrary planar targets. However the success hinges on whether feature points can be extracted and matched correctly on targets with rich texture. In this thesis, we propose a robust direct method for 3D pose estimation with high accuracy that performs well on both texture and textureless planar targets. First, the pose of a planar target with respect to a calibrated camera is approximated estimated by posing it as a template matching problem. Next, the object pose is further refined and disambiguated with a gradient descent search scheme. In order to make the proposed algorithm applicable, we also develop D-PET, a direct 3D pose estimation and tracking system implemented on graphics computing units (GPU) which is able to obtain poses in real-time. The system consists of a pose estimation unit and a pose tracker. The pose estimation unit is built based on the approximated pose estimation scheme in the proposed algorithm to find the initial pose. A 3-scale search scheme is proposed for the pose tracker to track the pose precisely. Both of them utilize the characteristics of GPU and accomplish the work efficiently. Extensive experiments on both synthetic and real datasets demonstrate that both the proposed algorithm and system perform favorably against state-of-the-art feature-based approaches in terms of accuracy and robustness. The proposed system achieves a processing speed of 11 fps on an embedded GPU.	en
dc.description.provenance	Made available in DSpace on 2021-07-09T15:52:42Z (GMT). No. of bitstreams: 1 ntu-105-R03943005-1.pdf: 23533894 bytes, checksum: 3e8e3abb9b36dce3e47e342bfec7c570 (MD5) Previous issue date: 2016	en
dc.description.tableofcontents	1 Introduction 1 1.1 3D Pose Estimation . . . . . . . . . . 1 1.2 Pose Ambiguity . . . . . . . . . . . . 2 1.3 Contribution . . . . . . . . . . . . . 3 1.4 Thesis organization . . . . . . . . . 5 2 Background Knowledge and Related Works 7 2.1 Problem Definition . . . . . . . . . . 7 2.2 State-of-the-Art Pose Estimation . . . 9 2.2.1 Marker-based Pose Estimation . . . . 9 2.2.2 Feature-based Pose Estimation . . . 10 2.3 Template Matching . . . . . . . . . . 11 2.4 Motion Estimation . . . . . . . . . . 12 3 Direct Pose Estimation and Tracking 13 3.1 Approximated Pose Estimation . . . . . 13 3.1.1 e-Covering Set Construction . . . . 14 3.1.2 Coarse-to-Fine Estimation . . . . . 18 3.2 Pose Refinement . . . . . . . . . . . 21 3.2.1 Candidate Poses Exploration . . . . 22 3.2.2 Refining Pose Estimation . . . . . . 22 4 Experiments 25 4.1 Synthetic Dataset . . . . . . . . . . 26 4.1.1 Normal Condition . . . . . . . . . . 27 4.1.2 Varying Conditions . . . . . . . . . 29 4.1.3 Refinement Analysis . . . . . . . . 29 4.2 Real Dataset . . . . . . . . . . . . . 38 4.3 Runtime Comparison . . . . . . . . . . 42 4.4 Textureless Images . . . . . . . . . . 42 5 Direct Pose Estimation and Tracking System 45 5.1 Proposed System . . . . . . . . . . . 45 5.1.1 Initialization . . . . . . . . . . . 46 5.1.2 Coarse-to-Fine Estimation. . . . . . 47 5.1.3 Pose Tracker . . . . . . . . . . . . 49 5.2 Evaluation . . . . . . . . . . . . . . 51 5.2.1 Pose Estimation Unit . . . . . . . . 52 5.2.2 Pose Tracker . . . . . . . . . . . . 54 5.3 Practical Tests . . . . . . . . . . . 56 6 Conclusion 59 Reference 61
dc.language.iso	en
dc.subject	平行運算	zh_TW
dc.subject	三維姿態分析	zh_TW
dc.subject	三維姿態追蹤	zh_TW
dc.subject	直接分析	zh_TW
dc.subject	3D Pose Tracking	en
dc.subject	3D Pose Estimation	en
dc.subject	GPU	en
dc.subject	Parallel Computing	en
dc.subject	Direct-Based	en
dc.title	針對單一平面目標物之三維姿態的直接分析：演算法和系統實作	zh_TW
dc.title	Direct 3D Pose Estimation of a Planar Target: Algorithm and Implementation	en
dc.type	Thesis
dc.date.schoolyear	104-2
dc.description.degree	碩士
dc.contributor.coadvisor	楊明玄(Ming-Hsuan Yang)
dc.contributor.oralexamcommittee	莊永裕(Yung-Yu Chuang),傅立成(Li-Chen Fu),黃朝宗(Chao-Tsung Huang)
dc.subject.keyword	三維姿態分析,三維姿態追蹤,直接分析,平行運算,	zh_TW
dc.subject.keyword	3D Pose Estimation,3D Pose Tracking,Direct-Based,Parallel Computing,GPU,	en
dc.relation.page	67
dc.identifier.doi	10.6342/NTU201600474
dc.rights.note	同意授權(全球公開)
dc.date.accepted	2016-06-24
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	電子工程學研究所	zh_TW
dc.date.embargo-lift	2021-07-05	-
顯示於系所單位：	電子工程學研究所

文件中的檔案：

檔案	大小	格式
ntu-105-R03943005-1.pdf	22.98 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。