基於3D殘差網路使用資料填補和骨骼誤差應用於虛擬實境之手部姿勢估計

Pai-Wen Ting; 丁柏文

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/76403

標題:	基於3D殘差網路使用資料填補和骨骼誤差應用於虛擬實境之手部姿勢估計 Hand Pose Estimation based on 3D Residual Network with Data Padding and Skeletal Loss for Virtual Reality Applications
作者:	Pai-Wen Ting 丁柏文
指導教授:	傅立成
關鍵字:	手部姿勢估計,虛擬實境,卷積神經網路, hand pose estimation,virtual reality,convolutional neural network,
出版年 :	2018
學位:	碩士
摘要:	手部姿勢估計之技術在現今電腦視覺的領域中是一門熱門的研究項目，其目的在於從一張具有手部的影像中去計算出手的節點在空間中的位置並藉此建立手部姿勢。在近年，由於虛擬實境、擴增實境和混合實境等科技的發展逐漸成熟，如何使人在虛擬世界中的感覺能夠更加真實的相關技術也如火如荼地展開，然而對於手部姿勢估計來說仍然有許多困難的地方有待突破。舉例來說，手指與手掌之間互相遮蔽、手勢多樣性的變化等問題，都會造成計算上的困難。本論文之目的即開發出一套能夠從深度影像中擷取資訊，並且精確的於3D座標中計算出手部節點的座標以及手勢之系統，以提供使用者能夠與虛擬實境中的世界自然互動的介面。在本文中，我們提出一個利用大量數據資料來訓練一個深度網路的手部姿勢估計模型。我們首先把手部的深度平面影像轉換成以立方空間表示，並使用資料填充的方式來擴增資料量。接著把處理好的資料拿來訓練卷積神經網路，並加入骨架穩固層來控制手部的物理限制。經由前處理和穩固層的運作，可以有效提升卷積神經網路訓練手部姿勢模型的效能。另外，本系統亦可以在配備有單個圖形處理器之電腦上即時運作。實驗中，我們將比較在不同的條件下所訓練出來的手部姿勢估計模型之效能，以證明本論文所提出之改進方法能確實在訓練卷積神經網路時提供更好的效果。另外，我們也會在真實世界中實測提出之系統，以證明在極其困難之環境下，仍然可以維持優秀的效果。我們期望所提出之系統可以在虛擬實境或是擴增實境中提供使用者一個更加自然的體驗。 Nowadays, technology of the hand pose estimation is a popular research topic in the area of computer vision. The goal is to estimate the coordinates of the hand joints in the 3D space from an image which contains hand. In recent years, because of the development of virtual reality (VR), augmented reality (AR) and mixed reality (MR), the technology of how to make people feel real in the virtual world has been developed for many years. However, there are still many difficulties in hand pose estimation. For example, the problems of self-occlusion and hand pose variations will cause the difficulties of estimation. The purpose of this thesis is to develop a system which can extract information from depth image and estimate hand joint coordinates and pose in 3D space accurately, and can provide a natural interface between user and virtual world. In this thesis, we propose a deep network to train a hand pose estimation model by huge data. We first transform the depth image of hand to voxelized grid and use data padding to fill the data. Then, we train the convolutional neural network with preprocessed data and add a skeletal loss layer to control the shape of hand. With the preprocessing and skeletal loss layer, the performance of model can be improved significantly. Moreover, the system can run in real time with a single GPU. In the experiments, we will compare the performance of models which are trained under different conditions to prove that our proposed method can improve the performance. In addition, we will test the system in real world to show that the system can work well even under the environment which is complex. We expect that the proposed system can provide a more natural way for users to interact with the virtual world.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/76403
DOI:	10.6342/NTU201802685
全文授權:	同意授權(全球公開)
電子全文公開日期:	2023-08-21
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-107-R05922008-1.pdf	3.09 MB	Adobe PDF	檢視/開啟

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。