Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊網路與多媒體研究所
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/74141
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor傅立成
dc.contributor.authorEn-Te Chouen
dc.contributor.author周恩德zh_TW
dc.date.accessioned2021-06-17T08:21:36Z-
dc.date.available2022-08-20
dc.date.copyright2019-08-20
dc.date.issued2019
dc.date.submitted2019-08-13
dc.identifier.citation[1] K. He, G. Gkioxari, P. Dollár, and R. Girshick, 'Mask r-cnn,' in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2961-2969.
[2] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, 'Feature pyramid networks for object detection,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 2117-2125.
[3] J. Redmon and A. Farhadi, 'YOLO9000: better, faster, stronger,' in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 7263-7271.
[4] W. Liu et al., 'Ssd: Single shot multibox detector,' in European conference on computer vision, 2016, pp. 21-37: Springer.
[5] Y. Wu, J. Lin, and T. S. Huang, 'Analyzing and capturing articulated hand motion in image sequences,' IEEE transactions on pattern analysis and machine intelligence, vol. 27, no. 12, pp. 1910-1922, 2005.
[6] M. de La Gorce, D. J. Fleet, and N. Paragios, 'Model-based 3d hand pose estimation from monocular video,' IEEE transactions on pattern analysis and machine intelligence, vol. 33, no. 9, pp. 1793-1805, 2011.
[7] J. Wöhlke, S. Li, and D. Lee, 'Model-based hand pose estimation for generalized hand shape with appearance normalization,' arXiv preprint arXiv:1807.00898, 2018.
[8] C. Xu and L. Cheng, 'Efficient hand pose estimation from a single depth image,' in Proceedings of the IEEE international conference on computer vision, 2013, pp. 3456-3462.
[9] S. Sridhar, A. Oulasvirta, and C. Theobalt, 'Interactive markerless articulated hand motion tracking using RGB and depth data,' in Proceedings of the IEEE international conference on computer vision, 2013, pp. 2456-2463.
[10] C. Zimmermann and T. Brox, 'Learning to estimate 3d hand pose from single rgb images,' in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 4903-4911.
[11] F. Mueller et al., 'Ganerated hands for real-time 3d hand tracking from monocular rgb,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 49-59.
[12] Y. Cai, L. Ge, J. Cai, and J. Yuan, 'Weakly-supervised 3d hand pose estimation from monocular rgb images,' in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 666-682.
[13] U. Iqbal, P. Molchanov, T. Breuel Juergen Gall, and J. Kautz, 'Hand pose estimation via latent 2.5 d heatmap regression,' in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 118-134.
[14] M. Oberweger, P. Wohlhart, and V. Lepetit, 'Hands deep in deep learning for hand pose estimation,' arXiv preprint arXiv:1502.06807, 2015.
[15] H. Guo, G. Wang, X. Chen, C. Zhang, F. Qiao, and H. Yang, 'Region ensemble network: Improving convolutional network for hand pose estimation,' in 2017 IEEE International Conference on Image Processing (ICIP), 2017, pp. 4512-4516: IEEE.
[16] H. Guo, G. Wang, X. Chen, and C. Zhang, 'Towards good practices for deep 3d hand pose estimation,' arXiv preprint arXiv:1707.07248, 2017.
[17] Y. Zhou, J. Lu, K. Du, X. Lin, Y. Sun, and X. Ma, 'Hbe: Hand branch ensemble network for real-time 3d hand pose estimation,' in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 501-516.
[18] M.-Y. Wu, Y. H. Tang, P.-W. Ting, and L.-C. Fu, 'Hand pose learning: combining deep learning and hierarchical refinement for 3D hand pose estimation,' in BMVC, 2017, vol. 1, p. 3.
[19] P.-W. Ting, E.-T. Chou, Y.-H. Tang, and L.-C. Fu, 'Hand Pose Estimation Based on 3D Residual Network with Data Padding and Skeleton Steadying,' in Asian Conference on Computer Vision, 2018, pp. 293-307: Springer.
[20] P. Panteleris, I. Oikonomidis, and A. Argyros, 'Using a single rgb frame for real time 3d hand pose estimation in the wild,' in 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), 2018, pp. 436-445: IEEE.
[21] A. Spurr, J. Song, S. Park, and O. Hilliges, 'Cross-modal deep variational hand pose estimation,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 89-98.
[22] D. P. Kingma and M. Welling, 'Auto-encoding variational bayes,' arXiv preprint arXiv:1312.6114, 2013.
[23] F. Huang, A. Zeng, M. Liu, J. Qin, and Q. Xu, 'Structure-Aware 3D Hourglass Network for Hand Pose Estimation from Single Depth Image,' arXiv preprint arXiv:1812.10320, 2018.
[24] K. Simonyan and A. Zisserman, 'Very deep convolutional networks for large-scale image recognition,' arXiv preprint arXiv:1409.1556, 2014.
[25] K. He, X. Zhang, S. Ren, and J. Sun, 'Deep residual learning for image recognition,' in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778.
[26] C. Szegedy et al., 'Going deeper with convolutions,' in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1-9.
[27] A. G. Howard et al., 'Mobilenets: Efficient convolutional neural networks for mobile vision applications,' arXiv preprint arXiv:1704.04861, 2017.
[28] X. Zhang, X. Zhou, M. Lin, and J. Sun, 'Shufflenet: An extremely efficient convolutional neural network for mobile devices,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 6848-6856.
[29] J. Long, E. Shelhamer, and T. Darrell, 'Fully convolutional networks for semantic segmentation,' in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 3431-3440.
[30] O. Ronneberger, P. Fischer, and T. Brox, 'U-net: Convolutional networks for biomedical image segmentation,' in International Conference on Medical image computing and computer-assisted intervention, 2015, pp. 234-241: Springer.
[31] A. Newell, K. Yang, and J. Deng, 'Stacked hourglass networks for human pose estimation,' in European Conference on Computer Vision, 2016, pp. 483-499: Springer.
[32] D. P. Kingma and J. Ba, 'Adam: A method for stochastic optimization,' arXiv preprint arXiv:1412.6980, 2014.
[33] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, 'You only look once: Unified, real-time object detection,' in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 779-788.
[34] G. Moon, J. Yong Chang, and K. Mu Lee, 'V2v-posenet: Voxel-to-voxel prediction network for accurate 3d hand and human pose estimation from a single depth map,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 5079-5088.
[35] S. Yuan et al., 'Depth-based 3d hand pose estimation: From current achievements to future goals,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 2636-2645.
[36] A. Paszke et al., 'Automatic differentiation in pytorch,' 2017.
[37] aleju. Image augmentation for machine learning experiments. Available: https://github.com/aleju/imgaug
[38] F. Mueller, D. Mehta, O. Sotnychenko, S. Sridhar, D. Casas, and C. Theobalt, 'Real-time hand tracking under occlusion from an egocentric rgb-d sensor,' in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 1284-1293.
[39] Maya | Computer Animation & Modeling Software | Autodesk. Available: https://www.autodesk.com.tw/products/maya/overview
[40] T.-Y. Lin et al., 'Microsoft coco: Common objects in context,' in European conference on computer vision, 2014, pp. 740-755: Springer.
[41] Unity Real-Time Development Platform | 3D, 2D VR & AR Visualizations. Available: https://unity.com/
[42] Mixamo. Available: https://www.mixamo.com/#/
[43] Blender: a 3D modelling and rendering package. Available: https://www.blender.org/
[44] Z. Cao, G. Hidalgo, T. Simon, S.-E. Wei, and Y. Sheikh, 'OpenPose: realtime multi-person 2D pose estimation using Part Affinity Fields,' arXiv preprint arXiv:1812.08008, 2018.
[45] ZeroMQ. Available: http://zeromq.org/
[46] Leap Motion. Available: https://www.leapmotion.com/
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/74141-
dc.description.abstract手部姿勢估計在現今的電腦視覺領域中一直是一項十分熱門的研究項目。對於人機交互的應用虛擬現實(VR),增強現實(AR)和混合現實(MR)等,可以從圖像精確且穩健地估計手關節坐標的手部姿勢估計系統是必不可少的。 然而,現在手姿勢估計仍然存在一些限制。首先,現有數據集幾乎是以第三人稱視圖組成之數據集,因此將難以在第一人稱視圖系統中被應用因為攝像頭將頭戴於VR頭盔上。 其次,大多數現有方法依賴於預先框取好之手部,而這在應用中並不理想,特別是當需要在不同系統中實現時,因為程式語言可能不同,而使用單一深度學習模型即可克服。
本論文的目的是開發一種系統,該系統可以僅使用單個RGB幀來估計手的位置和姿勢參數,從而能夠為用戶提供操作虛擬物件的界面。在本文中,我們提出了一個基於深度學習的網絡來實現這一目標,通過我們自己生成的龐大數據集進行端到端的訓練。我們使用不同顏色的手部皮膚、各種光源、各種姿勢和位置之手模型,並將從COCO數據集中隨機獲取的圖像作為背景使用,利用3D引擎Unity做渲染。然後,我們訓練卷積神經網絡,不僅估計圖像中手部的位置,還估計手部模型的相應姿勢參數,以獲得3D關節坐標。
在實驗部分,我們將首先對不同配置的模型訓練進行一些比較,以證明我們提出的方法可以提高性能。其次,我們將提出的方法與其他一些最先進的方法進行比較,以顯示我們的優異表現。我們期望我們提出的手部參數估計系統可以為用戶提供與虛擬世界交互的舒適體驗。
zh_TW
dc.description.abstractHand pose estimation has been a popular research topic in the area of computer vision. An accurate and robust hand pose estimation system which can estimate the coordinates of the hand joints from images is essential when it comes to applications of human computer interaction, such as virtual reality (VR), augmented reality (AR), and mixed reality (MR). However, there are still some constraints in hand pose estimation nowadays. First, almost all the existing datasets are third person view datasets, which are hard to be implemented in a first person view VR system since the camera would be head mounted on the VR headset. Second, most of the existing methods rely on a preprocessing procedure of capturing the bounding box of a hand, which is not ideal in realistic applications especially when it needs to be implemented in different systems since the programming language, while it would be portable if only a single model is used.
The purpose of this thesis is to develop a system which can estimate the locations and the pose parameters of hands to reconstruct them using only a single RGB image frame, being able to provide a natural interface for users to manipulate objects in virtual world. In this thesis, we propose a deep learning based network to achieve this goal by training it end-to-end with a huge dataset collected by ourselves. We render hand models with multi-colored hand skins, various light sources, and combine them with images, which are randomly captured from COCO dataset, as backgrounds using the 3D engine, Unity, placing the hand models in various poses and locations. Then, we train the convolutional neural network (CNN) to estimate not only the locations of hands in an image but also their corresponding 3D coordinates and their classification of left or right handed.
In the experiment part, we will first make some comparison between models training in different configuration to prove that our proposed method can improve the performance. Second, we will have our proposed method compared with some other state-of-the-art for showing the outperformance of ours. We expect that our proposed hand parameter estimation system can provide a comfortable experience for users to interact with the virtual world.
en
dc.description.provenanceMade available in DSpace on 2021-06-17T08:21:36Z (GMT). No. of bitstreams: 1
ntu-108-R06944033-1.pdf: 7833892 bytes, checksum: b6418a4d360d1b75b6ecd5066923b4bf (MD5)
Previous issue date: 2019
en
dc.description.tableofcontents口試委員會審定書 #
誌謝 I
中文摘要 II
ABSTRACT III
TABLE OF CONTENTS V
LIST OF FIGURES VII
LIST OF TABLES XI
Chapter 1 Introduction 1
1.1 Motivation 2
1.2 Related Work 5
1.2.1 Hand Pose Estimation 5
1.3 Contribution 7
1.4 Thesis Organization 8
Chapter 2 Preliminaries 10
2.1 Convolutional Neural Network 10
2.1.1 Basic Concept 10
2.1.2 Encoder-Decoder Network 19
2.1.3 Adam Optimizer 20
2.2 Detection Framework 20
2.2.1 You Look Only Once(YOLO) 21
2.2.2 Single Shot Multibox Detector (SSD) 22
Chapter 3 Hand Parameter Estimation 23
3.1 Problem Formulation 23
3.2 System Overview 24
3.3 Hand Parameter Estimation via Two Stream Convolution 25
3.3.1 Encoder-Decoder 26
3.3.2 Posture Encoder and Detection 28
3.3.3 Two Stream Convolutional Layers for Hand Pose Estimation 29
3.3.4 Architecture of the Convolutional Neural Network 30
3.3.5 Loss Functions 32
3.3.6 Implementation Parameter and Detail 35
3.4 NTU Synthetic Dataset 35
Chapter 4 Experimental Results 41
4.1 Settings of Environment 41
4.2 Datasets 42
4.2.1 Rendered Handpose Dataset 42
4.2.2 NTU Synthetic Dataset 44
4.3 Experimental Results 45
4.3.1 Rendered Handpose Dataset 47
4.3.2 NTU Synthetic Dataset 53
4.4 Hand Parameter Estimation System for Virtual Objects Manipulation 56
Chapter 5 Conclusion 58
REFERENCE 60
dc.language.isoen
dc.title基於神經網路並應用於虛擬實境之第一人稱手部參數估計zh_TW
dc.titleFirst-Person View Hand Parameter Estimation based on Convolutional Neural Network for Virtual Reality Applicationsen
dc.typeThesis
dc.date.schoolyear107-2
dc.description.degree碩士
dc.contributor.oralexamcommittee莊永裕,陳祝嵩,李明穗,鄭龍磻
dc.subject.keyword手姿態估計,虛擬現實,卷積神經網絡,Unity,合成數據,zh_TW
dc.subject.keywordHand pose estimation,Virtual reality,Convolutional neural network,Unity,Synthetic data,en
dc.relation.page64
dc.identifier.doi10.6342/NTU201902908
dc.rights.note有償授權
dc.date.accepted2019-08-14
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept資訊網路與多媒體研究所zh_TW
顯示於系所單位:資訊網路與多媒體研究所

文件中的檔案:
檔案 大小格式 
ntu-108-1.pdf
  目前未授權公開取用
7.65 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved