請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/50164
完整後設資料紀錄
DC 欄位 | 值 | 語言 |
---|---|---|
dc.contributor.advisor | 傅立成 | |
dc.contributor.author | Tzu-Yang Chen | en |
dc.contributor.author | 陳子揚 | zh_TW |
dc.date.accessioned | 2021-06-15T12:31:20Z | - |
dc.date.available | 2019-08-23 | |
dc.date.copyright | 2016-08-23 | |
dc.date.issued | 2016 | |
dc.date.submitted | 2016-08-04 | |
dc.identifier.citation | [1] A. Erol, G. Bebis, M. Nicolescu, R. D. Boyle, and X. Twombly, 'Vision-based hand pose estimation: A review,' Computer Vision and Image Understanding, vol. 108, pp. 52-73, 2007.
[2] V. Athitsos and S. Sclaroff, 'Estimating 3D hand pose from a cluttered image,' in IEEE Computer Society Conference on Computer Vision and Pattern Recognition., 2003, pp. II-432-9 vol. 2. [3] R. Rosales, V. Athitsos, L. Sigal, and S. Sclaroff, '3D hand pose reconstruction using specialized mappings,' in Eighth IEEE International Conference on Computer Vision. ICCV 2001., 2001, pp. 378-385. [4] J. Romero, H. Kjellström, and D. Kragic, 'Monocular real-time 3D articulated hand pose estimation,' in 2009 9th IEEE-RAS International Conference on Humanoid Robots, 2009, pp. 87-92. [5] B. Stenger, 'Template-based hand pose recognition using multiple cues,' in Asian Conference on Computer Vision, 2006, pp. 551-560. [6] Z. Zhang, 'Microsoft kinect sensor and its effect,' MultiMedia, IEEE, vol. 19, pp. 4-10, 2012. [7] C. Xu and L. Cheng, 'Efficient hand pose estimation from a single depth image,' in Proceedings of the IEEE International Conference on Computer Vision, 2013, pp. 3456-3462. [8] M. Schroder, J. Maycock, H. Ritter, and M. Botsch, 'Real-time hand tracking using synergistic inverse kinematics,' in IEEE International Conference on Robotics and Automation (ICRA), 2014, pp. 5447-5454. [9] D. Tang, T.-H. Yu, and T.-K. Kim, 'Real-time articulated hand pose estimation using semi-supervised transductive regression forests,' in Proceedings of the IEEE International Conference on Computer Vision, 2013, pp. 3224-3231. [10] C. Keskin, F. Kıraç, Y. E. Kara, and L. Akarun, 'Real time hand pose estimation using depth sensors,' in Consumer Depth Cameras for Computer Vision, ed: Springer, 2013, pp. 119-137. [11] P. Suryanarayan, A. Subramanian, and D. Mandalapu, 'Dynamic hand pose recognition using depth data,' in 20th International Conference on Pattern Recognition (ICPR), 2010, pp. 3105-3108. [12] R. Girshick, J. Donahue, T. Darrell, and J. Malik, 'Rich feature hierarchies for accurate object detection and semantic segmentation,' in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 580-587. [13] J. S. Supancic, G. Rogez, Y. Yang, J. Shotton, and D. Ramanan, 'Depth-based hand pose estimation: data, methods, and challenges,' in Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 1868-1876. [14] M. Oberweger, P. Wohlhart, and V. Lepetit, 'Training a Feedback Loop for Hand Pose Estimation,' in Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 3316-3324. [15] G. Rogez, J. S. Supancic, and D. Ramanan, 'First-person pose recognition using egocentric workspaces,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 4325-4333. [16] G. Rogez, M. Khademi, J. Supančič III, J. M. M. Montiel, and D. Ramanan, '3d hand pose detection in egocentric RGB-D images,' in European Conference on Computer Vision Workshops, 2014, pp. 356-371. [17] NimbleVR and Oculus. 'Nimble Sense'. Available: http://nimblevr.com/ [18] M. Oberweger, P. Wohlhart, and V. Lepetit, 'Hands deep in deep learning for hand pose estimation,' In Proceedings of 20th Computer Vision Winter Workshop (CVWW), 2015. [19] J. R. Uijlings, K. E. van de Sande, T. Gevers, and A. W. Smeulders, 'Selective search for object recognition,' International journal of computer vision, vol. 104, pp. 154-171, 2013. [20] M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman, 'The pascal visual object classes (voc) challenge,' International journal of computer vision, vol. 88, pp. 303-338, 2010. [21] K. He, X. Zhang, S. Ren, and J. Sun, 'Spatial pyramid pooling in deep convolutional networks for visual recognition,' IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 37, pp. 1904-1916, 2015. [22] R. Girshick, 'Fast r-cnn,' in Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 1440-1448. [23] S. Ren, K. He, R. Girshick, and J. Sun, 'Faster R-CNN: Towards real-time object detection with region proposal networks,' in Advances in Neural Information Processing Systems, 2015, pp. 91-99. [24] J. Shotton, T. Sharp, A. Kipman, A. Fitzgibbon, M. Finocchio, A. Blake, et al., 'Real-time human pose recognition in parts from single depth images,' Communications of the ACM, vol. 56, pp. 116-124, 2013. [25] F. A. Kondori, S. Yousefit, A. Ostovar, L. Liu, and H. Li, 'A Direct Method for 3D Hand Pose Recovery,' in 2014 22nd International Conference on Pattern Recognition (ICPR), 2014, pp. 345-350. [26] C. Qian, X. Sun, Y. Wei, X. Tang, and J. Sun, 'Realtime and robust hand tracking from depth,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 1106-1113. [27] D. Tang, H. Chang, A. Tejani, and T.-K. Kim, 'Latent regression forest: Structured estimation of 3d articulated hand posture,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 3786-3793. [28] J. Tompson, M. Stein, Y. Lecun, and K. Perlin, 'Real-time continuous pose recovery of human hands using convolutional networks,' ACM Transactions on Graphics (TOG), vol. 33, p. 169, 2014. [29] Q. W. Xingyi Zhou, Wei Zhang, Xiangyang Xue, Yichen Wei, 'Model-based Deep Hand Pose Estimation,' International Joint Conference on Artificial Intelligence, 2016. [30] L. Xia, C.-C. Chen, and J. Aggarwal, 'View invariant human action recognition using histograms of 3d joints,' in IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 2012, pp. 20-27. [31] A. Eweiwi, M. S. Cheema, C. Bauckhage, and J. Gall, 'Efficient pose-based action recognition,' in Asian Conference on Computer Vision, ed: Springer, 2014, pp. 428-443. [32] E. Ohn-Bar and M. Trivedi, 'Joint angles similarities and HOG2 for action recognition,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2013, pp. 465-470. [33] D. Reynolds, 'Gaussian mixture models,' Encyclopedia of Biometrics, pp. 827-832, 2015. [34] M. D. Zeiler and R. Fergus, 'Visualizing and understanding convolutional networks,' in European Conference on Computer Vision (ECCV), ed, 2014, pp. 818-833. [35] J. A. Bilmes, 'A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models,' International Computer Science Institute, vol. 4, p. 126, 1998. [36] TheMathWorksIncorporation, Natick, Massachusetts, and UnitedStates. 'Gmdistribution Class and its functions in MATLAB.'. Available: http://www.mathworks.com/help/stats/gmdistribution-class.html [37] P. F. Felzenszwalb, R. B. Girshick, and D. McAllester, 'Cascade object detection with deformable part models,' in 2010 IEEE conference on Computer vision and pattern recognition (CVPR), 2010, pp. 2241-2248. [38] J. Yan, Z. Lei, L. Wen, and S. Li, 'The fastest deformable part model for object detection,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 2497-2504. [39] A. Toshev and C. Szegedy, 'Deeppose: Human pose estimation via deep neural networks,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 1653-1660. [40] Z. Zivkovic, 'Improved adaptive Gaussian mixture model for background subtraction,' in In Proceedings of the 17th International Conference on Pattern Recognition (ICPR), 2004, pp. 28-31. [41] M. Balafar, 'Gaussian mixture model based segmentation methods for brain MRI images,' Artificial Intelligence Review, vol. 41, pp. 429-439, 2014. [42] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, et al., 'Imagenet large scale visual recognition challenge,' International Journal of Computer Vision, vol. 115, pp. 211-252, 2015. [43] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, et al., 'Caffe: Convolutional architecture for fast feature embedding,' in Proceedings of the ACM International Conference on Multimedia, 2014, pp. 675-678. [44] G. R. Markus Oberweger, Paul Wohlhart, Vincent Lepetit, 'Efficiently Creating 3D Training Data for Fine Hand Pose Estimation,' In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. [45] T.-Y. Chen, M.-Y. Wu, Y.-H. Shie, and L.-C. Fu, 'Deep Learning for Integrated Hand Detection and Pose Estimation,' International Conference on Pattern Recognition (ICPR), 2016. | |
dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/50164 | - |
dc.description.abstract | 3D手部姿勢估計在近年來是個非常熱門的研究主題,由於其能夠提供一個人跟電子空間自然的溝通介面來創造良好的使用者體驗,其已被廣泛地使用在許多虛擬實境以及人機互動的應用上。
然而儘管此領域的快速發展,3D手部姿勢估計仍然是個充滿困難的問題。首先,我們需要正確地從複雜的環境中偵測使用者的手,然後手卻是一個會因視角、距離和使用者而改變外觀的物件,導致了偵測的困難。再者,人的手是個擁有相當高的自由度且容易自我遮蔽的物體,也因此,在姿勢估計上也有相當的挑戰。然而,在近年來崛起深度相機和深度學習帶來了完成此困難問題的可能性。 這此篇論文中,我們將以設立一個利用深度影像可以準確偵測人體手部並且精準推估手部姿勢的3D手部姿勢估計系統為目標。為了保證我們系統的穩定性,我們設計了一個名為球狀空間組件模型的手部模型,並利用此模型來訓練深層卷積神經網路,另外,為了減少人為的疏失,我們更利用了一個資料驅動的方式將其整合,因此,我們的神經網路更能基於手部模型的先驗知識來精準的推估手部姿勢。 為了展示我們方法的優勢,我們利用兩個公開及一個自我蒐集的資料庫做了一個完整的實驗,其結果展示了我們系統可以用相當於90百分比的平均準確度來偵測手部並且在手勢估計只有大概10毫米的平均誤差,如此的結果在與其他先進的技術相比有著顯著的進步。 | zh_TW |
dc.description.abstract | 3D hand pose estimation, which is to find the locations of joints in a hand, is a hot research topic in recent years. It has been widely used in many advanced applications for virtual reality and human-computer interaction, since it provides a natural interface for communication between human and cyberspace that achieve a wonderful user experience.
Despite the fast development of this field, 3D hand pose estimation is still a difficult task due to the various challenges. First, we need to detect human hand, which is changeable based on different viewpoints, distance, and subjects, from complex environment. Second, the high degree of freedom and serious self-occlusions lead to difficulties in pose estimation. However, the rise of depth sensors and deep learning brings the possibility to accomplish such a challenging task. In this thesis, we aim to build a 3D hand pose estimation system which can correctly detect human hand and accurately estimate its pose using depth images. To guarantee the robustness of our system, we design a hand model called spherical part model (SPM), and train a deep convolutional neural network using this model. Moreover, to reduce the influence of human’s omissions, we use a data-driven approach to integrate them together. As a result, our network can more accurately estimate hand pose based on prior knowledge of human hand. To demonstrate the superiority of our method, a complete experiment is conducted on two public and one self-build datasets. The results show that our system can detect human hands with almost 90% in average precision and achieve about 10 millimeters of average error distance on pose estimation that much outperforms other state of the art works. | en |
dc.description.provenance | Made available in DSpace on 2021-06-15T12:31:20Z (GMT). No. of bitstreams: 1 ntu-105-R03922100-1.pdf: 3238503 bytes, checksum: bd226439e1ee55f7adebeb1616264da1 (MD5) Previous issue date: 2016 | en |
dc.description.tableofcontents | 口試委員審定書 i
中文摘要 ii ABSTRACT iii Contents iv LIST OF FIGURES vi LIST OF TABLES x Chapter 1 Introduction 1 1.1 Motivation 1 1.2 Challenges 4 1.3 Related Work 6 1.3.1 Object Detection using Deep Learning 7 1.3.2 Optimization-Based Hand Pose Estimation 7 1.3.3 Deep Learning-Based Hand Pose Estimation 8 1.4 System Overview 9 1.5 Contributions 11 1.6 Thesis Organization 12 Chapter 2 Preliminaries 14 2.1 Convolutional Neural Network 14 2.2 Gaussian Mixture Model 17 2.2.1 Model Formulation 17 2.2.2 Construct a Gaussian Mixture Model with Expectation-Maximization algorithm 19 Chapter 3 Hand Pose Estimation using Spherical Part Model 23 3.1 Spherical Part Model 23 3.1.1 Self-Centric Spherical Coordinate System 24 3.1.2 Representation of Spherical Part Model 27 3.2 Training a Deep SPM Network 28 3.2.1 Overview 29 3.2.2 Data Preprocessing and Expansion 30 3.2.3 Network Structure 32 3.2.4 Update the Neural Network based on SPM layer 33 3.3 3D Hand Pose Estimation using Deep SPM Network 34 3.3.1 Hand detection 35 3.3.2 Generating 3D Pose from Deep SPM network 39 3.4 Implement Details 42 3.4.1 Alternating Training of Detection network 42 3.4.2 Regressive Training of Deep SPM network 43 Chapter 4 Experiments 44 4.1 Experimental Setting 44 4.2 Datasets 45 4.2.1 NYU Hand Pose Dataset 46 4.2.2 ICVL Hand Posture Dataset 46 4.2.3 NTU Egocentric Hand Pose Dataset 47 4.3 Hand Detection Evaluation 48 4.4 Hand Pose Estimation Results 49 Chapter 5 Conclusion and Future Work 60 REFERENCE 62 | |
dc.language.iso | en | |
dc.title | 利用球狀空間組件模型應用於3D手部姿勢估計之深層神經網路學習 | zh_TW |
dc.title | Learning a Deep Network with Spherical Part Model for 3D Hand Pose Estimation | en |
dc.type | Thesis | |
dc.date.schoolyear | 104-2 | |
dc.description.degree | 碩士 | |
dc.contributor.oralexamcommittee | 蕭培墉,黃正民,黃世勳,莊永裕 | |
dc.subject.keyword | 手部姿勢估計,深度學習,卷積神經網路,球狀空間組件模型, | zh_TW |
dc.subject.keyword | Hand pose estimation,Deep learning,Convolutional neural network,Spherical part model, | en |
dc.relation.page | 67 | |
dc.identifier.doi | 10.6342/NTU201601892 | |
dc.rights.note | 有償授權 | |
dc.date.accepted | 2016-08-04 | |
dc.contributor.author-college | 電機資訊學院 | zh_TW |
dc.contributor.author-dept | 資訊工程學研究所 | zh_TW |
顯示於系所單位: | 資訊工程學系 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-105-1.pdf 目前未授權公開取用 | 3.16 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。