請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/83636
標題: | 以平行與簡化運算和手勢輔助之Unity立體影像重建演算法 Parallel computation and hand gesture recognition for unity 3D image reconstruction algorithm |
作者: | Chang-Ting Tsai 蔡昌廷 |
指導教授: | 丁建均(Jian-Jiun Ding) |
關鍵字: | 立體影像重建,平行化計算,開放計算語言,手勢辨識,注意力機制, Reconstruction of 3D Image,Parallel Computing,OpenCL,Hand Gesture Recognition,Attention Mechanism, |
出版年 : | 2022 |
學位: | 碩士 |
摘要: | 由於人機互動的需求提升,非接觸式溝通的研究逐漸變得熱門,以行車駕駛為例,精準的手勢辨識系統可以提升駕駛人的行車安全以及方便性。在近期受到疫情的環境下,非接觸式裝置可以透過電腦視覺來偵測體溫或是發出未戴口罩的警告,因此設計精準的辨識系統是一個很重要的課題。 本研究主要分為兩個部分,第一部分是立體影像演算法做加速,我們針對原本演算法的架構做一些修改,將串擾(De-crosstalk)與混疊(De-Aliasing)的步驟交換,並只保留最後會使用到的視角資訊,藉此減少其他未使用到的視角運算量,整個流程使用開放計算語言(OpenCL)為框架,來實作平行化運算,跟原本方法比起來,速度加速約81%,與原始結果影像的像素誤差僅有13。第二部分是手勢辨識,我們將一段影片切成固定長度的片段做為資料前處理,設計CNN模型來辨識該片段的手勢種類,在CNN中,我們使用注意力機制,讓模型著重在特徵圖數值較大的區域,藉此增加模型對於特徵圖的學習力。此外,我們應用解碼器去還原原始片段的深度資訊,我們使用nvGesture資以及SKIG資料集來驗證結果,並與其他相關模型的結果做比較。 本研究結合立體影像重建與手勢辨識,透過平行化與架構調整,達到演算法加速效果。在手勢辨識上,我們也達到不錯的結果,未來將其結合手勢辨識成為新的一套系統,相信會是創新的事物。 Due to the increasing demand for human computer interaction, the research on non-contact communication has gradually become popular. Taking driving as an example, an accurate gesture recognition system can improve safety and convenience of drivers. Besides, COVID-19 speeds up the development of computer vision. Non-contact devices can detect the body temperature through computer vision or whether a person has worn a mask. Thus, designing an accurate identification system is an important issue. There are two parts in this thesis. The first part is the acceleration of a 3D image algorithm. We make some modifications on the architecture of the algorithm and exchange the steps of De-crosstalk and De-Aliasing. We only preserve views that related to pixel synthesis to reduce the computation loading of other unused views. The entire process uses the Open Computing Language (OpenCL) framework to implement parallel computation. The optimized method is faster than the original method about 81\% and the pixel error from the original reconstructed image is only 13. The second part is hand gesture recognition. We divide a video into fixed clip for data preprocessing and design a CNN model to determine the class of the clip. In CNN, we use an attention mechanism to let the model focus on the region with a large value of the feature map to increase the learning ability of the model for the feature map. Beside, we apply decoder to recover the depth information of input video clip. We use the nvGesture and SKIG dataset to evaluate the result and compare with other related models. This research combines 3D image reconstruction and hand gesture recognition. We accelerate the algorithm by parallel computation and the adjustment of the architecture. On hand gesture recognition, we also achieve better results. We believe that it will be an innovative system if we integrate these two techniques. |
URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/83636 |
DOI: | 10.6342/NTU202202409 |
全文授權: | 未授權 |
顯示於系所單位: | 電信工程學研究所 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
U0001-1508202215152800.pdf 目前未授權公開取用 | 4.6 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。