以平行與簡化運算和手勢輔助之Unity立體影像重建演算法

Chang-Ting Tsai; 蔡昌廷

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/83636

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	丁建均(Jian-Jiun Ding)
dc.contributor.author	Chang-Ting Tsai	en
dc.contributor.author	蔡昌廷	zh_TW
dc.date.accessioned	2023-03-19T21:12:35Z	-
dc.date.copyright	2022-08-31
dc.date.issued	2022
dc.date.submitted	2022-08-18
dc.identifier.citation	M. Abavisani, H. R. V. Joze, and V. M. Patel. Improving the performance of unimodal dynamic hand-gesture recognition with multimodal training. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1165–1174, 2019. N. H. Dardas and N. D. Georganas. Real-time hand gesture detection and recognition using bag-of-features and support vector machine techniques. IEEE Transactions on Instrumentation and measurement, 60(11):3592–3607, 2011. Q. De Smedt, H. Wannous, and J.-P. Vandeborre. Skeleton-based dynamic hand gesture recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 1–9, 2016. H. Dhamo, N. Navab, and F. Tombari. Object-driven multi-layer scene decomposition from a single image. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5369–5378, 2019. M. Elmezain, A. Al-Hamadi, J. Appenrodt, and B. Michaelis. A hidden markov model-based isolated and meaningful hand gesture recognition. International Journal of Electrical, Computer, and Systems Engineering, 3(3):156–163, 2009. D. Fan, H. Lu, S. Xu, and S. Cao. Multi-task and multi-modal learning for rgb dynamic gesture recognition. IEEE Sensors Journal, 21(23):27026–27036, 2021. J. Flynn, I. Neulander, J. Philbin, and N. Snavely. Deepstereo: Learning to predict new views from the world’s imagery. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5515–5524, 2016. S. J. Gortler, R. Grzeszczuk, R. Szeliski, and M. F. Cohen. The lumigraph. In Proceedings of the 23rd annual conference on Computer graphics and interactive techniques, pages 43–54, 1996. V. Gupta, S. K. Dwivedi, R. Dabral, and A. Jain. Progression modelling for online and early gesture detection. In 2019 International Conference on 3D Vision (3DV), pages 289–297. IEEE, 2019. M. M. Hasan and P. K. Mishra. Hsv brightness factor matching for gesture recognition system. International Journal of Image Processing (IJIP), 4(5):456–467, 2010. M. M. Hasan and P. K. Mishra. Features fitting using multivariate gaussian distribution for hand gesture recognition. International Journal of Computer Science & Emerging Technologies IJCSET, 3(2):73–80, 2012. B. Heigl, R. Koch, M. Pollefeys, J. Denzler, and L. V. Gool. Plenoptic modeling and rendering from image sequences taken by a hand-held camera. In Mustererkennung 1999, pages 94–101. Springer, 1999. J. Hou, G. Wang, X. Chen, J.-H. Xue, R. Zhu, and H. Yang. Spatial-temporal attention res-tcn for skeleton-based dynamic hand gesture recognition. In Proceedings of the European conference on computer vision (ECCV) workshops, pages 0–0, 2018. B. Ionescu, D. Coquin, P. Lambert, and V. Buzuloiu. Dynamic hand gesture recognition using the skeleton of the hand. EURASIP Journal on Advances in Signal Processing, 2005(13):1–9, 2005. N. K. Kalantari, T.-C. Wang, and R. Ramamoorthi. Learning-based view synthesis for light field cameras. ACM Transactions on Graphics (TOG), 35(6):1–10, 2016. J. Kopf, K. Matzen, O. Q. Suhib Alsisan, F. Ge, Y. Chong, J. Patterson, J.-M. Frahm, S. Wu, M. Yu, P. Zhang, Z. He, P. Vajda, A. Saraf, and M. Cohen. One shot 3d photography. 39(4), 2020. O. K?p?kl?, A. Gunduz, N. Kose, and G. Rigoll. Real-time hand gesture detection and classification using convolutional neural networks. In 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019), pages 1–8. IEEE, 2019. M. Levoy and P. Hanrahan. Light field rendering. In Proceedings of the 23rd annual conference on Computer graphics and interactive techniques, pages 31–42, 1996. L. Liu and L. Shao. Learning discriminative representations from rgb-d video data. In Twenty-third international joint conference on artificial intelligence, 2013. B. Mildenhall, P. P. Srinivasan, R. Ortiz-Cayon, N. K. Kalantari, R. Ramamoorthi, R. Ng, and A. Kar. Local light field fusion: Practical view synthesis with prescriptive sampling guidelines. ACM Transactions on Graphics (TOG), 38(4):1–14, 2019. P. Molchanov, X. Yang, S. Gupta, K. Kim, S. Tyree, and J. Kautz. Online detection and classification of dynamic hand gestures with recurrent 3d convolutional neural network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4207–4215, 2016. J. Shade, S. Gortler, L.-w. He, and R. Szeliski. Layered depth images. In Proceedings of the 25th annual conference on Computer graphics and interactive techniques, pages 231–242, 1998. K. Simonyan and A. Zisserman. Two-stream convolutional networks for action recognition in videos. Advances in neural information processing systems, 27, 2014. P. P. Srinivasan, R. Tucker, J. T. Barron, R. Ramamoorthi, R. Ng, and N. Snavely. Pushing the boundaries of view extrapolation with multiplane images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 175–184, 2019. D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri. Learning spatiotemporal features with 3d convolutional networks. In Proceedings of the IEEE international conference on computer vision, pages 4489–4497, 2015. P. T. Tung and L. Q. Ngoc. Elliptical density shape model for hand gesture recognition. In Proceedings of the Fifth Symposium on Information and Communication Technology, pages 186–191, 2014. B. Wang, Z. Chen, and J. Chen. Gesture recognition by using kinect skeleton tracking system. In 2013 5th International Conference on Intelligent Human-Machine Systems and Cybernetics, volume 1, pages 418–422. IEEE, 2013. F. Wang, M. Jiang, C. Qian, S. Yang, C. Li, H. Zhang, X. Wang, and X. Tang. Residual attention network for image classification. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3156–3164, 2017. J. Xie, R. Girshick, and A. Farhadi. Deep3d: Fully automatic 2d-to-3d video conversion with deep convolutional neural networks. In European conference on computer vision, pages 842–857. Springer, 2016. X. Yang, P. Molchanov, and J. Kautz. Making convolutional networks recurrent for visual sequence learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 6469–6478, 2018. X. Yang and Y. Tian. Super normal vector for activity recognition using depth sequences. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 804–811, 2014. Z. Yu, B. Zhou, J. Wan, P. Wang, H. Chen, X. Liu, S. Z. Li, and G. Zhao. Searching multi-rate and multi-modal temporal enhanced networks for gesture recognition. IEEE Transactions on Image Processing, 30:5626–5640, 2021. T. Zhou, R. Tucker, J. Flynn, G. Fyffe, and N. Snavely. Stereo magnification: Learning view synthesis using multiplane images. arXiv preprint arXiv:1805.09817, 2018.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/83636	-
dc.description.abstract	由於人機互動的需求提升，非接觸式溝通的研究逐漸變得熱門，以行車駕駛為例，精準的手勢辨識系統可以提升駕駛人的行車安全以及方便性。在近期受到疫情的環境下，非接觸式裝置可以透過電腦視覺來偵測體溫或是發出未戴口罩的警告，因此設計精準的辨識系統是一個很重要的課題。本研究主要分為兩個部分，第一部分是立體影像演算法做加速，我們針對原本演算法的架構做一些修改，將串擾(De-crosstalk)與混疊(De-Aliasing)的步驟交換，並只保留最後會使用到的視角資訊，藉此減少其他未使用到的視角運算量，整個流程使用開放計算語言(OpenCL)為框架，來實作平行化運算，跟原本方法比起來，速度加速約81%，與原始結果影像的像素誤差僅有13。第二部分是手勢辨識，我們將一段影片切成固定長度的片段做為資料前處理，設計CNN模型來辨識該片段的手勢種類，在CNN中，我們使用注意力機制，讓模型著重在特徵圖數值較大的區域，藉此增加模型對於特徵圖的學習力。此外，我們應用解碼器去還原原始片段的深度資訊，我們使用nvGesture資以及SKIG資料集來驗證結果，並與其他相關模型的結果做比較。本研究結合立體影像重建與手勢辨識，透過平行化與架構調整，達到演算法加速效果。在手勢辨識上，我們也達到不錯的結果，未來將其結合手勢辨識成為新的一套系統，相信會是創新的事物。	zh_TW
dc.description.abstract	Due to the increasing demand for human computer interaction, the research on non-contact communication has gradually become popular. Taking driving as an example, an accurate gesture recognition system can improve safety and convenience of drivers. Besides, COVID-19 speeds up the development of computer vision. Non-contact devices can detect the body temperature through computer vision or whether a person has worn a mask. Thus, designing an accurate identification system is an important issue. There are two parts in this thesis. The first part is the acceleration of a 3D image algorithm. We make some modifications on the architecture of the algorithm and exchange the steps of De-crosstalk and De-Aliasing. We only preserve views that related to pixel synthesis to reduce the computation loading of other unused views. The entire process uses the Open Computing Language (OpenCL) framework to implement parallel computation. The optimized method is faster than the original method about 81\% and the pixel error from the original reconstructed image is only 13. The second part is hand gesture recognition. We divide a video into fixed clip for data preprocessing and design a CNN model to determine the class of the clip. In CNN, we use an attention mechanism to let the model focus on the region with a large value of the feature map to increase the learning ability of the model for the feature map. Beside, we apply decoder to recover the depth information of input video clip. We use the nvGesture and SKIG dataset to evaluate the result and compare with other related models. This research combines 3D image reconstruction and hand gesture recognition. We accelerate the algorithm by parallel computation and the adjustment of the architecture. On hand gesture recognition, we also achieve better results. We believe that it will be an innovative system if we integrate these two techniques.	en
dc.description.provenance	Made available in DSpace on 2023-03-19T21:12:35Z (GMT). No. of bitstreams: 1 U0001-1508202215152800.pdf: 4706361 bytes, checksum: 35ff00aa53254cbfc87b77f373578824 (MD5) Previous issue date: 2022	en
dc.description.tableofcontents	Verification Letter from the Oral Examination Committee i Acknowledgements iii 摘要v Abstract vii Contents ix List of Figures xi List of Tables xiii Chapter 1 Introduction 1 1.1 3D image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Hand gesture recognition . . . . . . . . . . . . . . . . . . . . . . . . 2 Chapter 2 Related Work 5 2.1 View synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 OpenCL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.3 Hand gesture recognition . . . . . . . . . . . . . . . . . . . . . . . . 10 Chapter 3 Proposed Method 15 3.1 3D image synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.1.1 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.1.2 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.1.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.2 Hand gesture recognition . . . . . . . . . . . . . . . . . . . . . . . . 17 3.2.1 Network Architecture . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.2.2 Patch-Embedding Block . . . . . . . . . . . . . . . . . . . . . . . 18 3.2.3 Attention Block . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.2.4 Local Decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.2.5 Loss function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Chapter 4 Experiment Result 25 4.1 3D image synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4.2 Hand gesture recognition . . . . . . . . . . . . . . . . . . . . . . . . 28 4.2.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.2.2 Implement details . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.2.3 Evaluation on nvGesture . . . . . . . . . . . . . . . . . . . . . . . 30 4.2.4 Evaluation on SKIG . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.2.5 Ablation study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 Chapter 5 Conclusion 35 References 37
dc.language.iso	en
dc.subject	注意力機制	zh_TW
dc.subject	立體影像重建	zh_TW
dc.subject	平行化計算	zh_TW
dc.subject	開放計算語言	zh_TW
dc.subject	手勢辨識	zh_TW
dc.subject	Hand Gesture Recognition	en
dc.subject	Attention Mechanism	en
dc.subject	Reconstruction of 3D Image	en
dc.subject	Parallel Computing	en
dc.subject	OpenCL	en
dc.title	以平行與簡化運算和手勢輔助之Unity立體影像重建演算法	zh_TW
dc.title	Parallel computation and hand gesture recognition for unity 3D image reconstruction algorithm	en
dc.type	Thesis
dc.date.schoolyear	110-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	盧奕璋(Yi-Chang Lu),歐陽良昱(Liang-Yu Ou Yang),余執彰(Chih-Chang Yu)
dc.subject.keyword	立體影像重建,平行化計算,開放計算語言,手勢辨識,注意力機制,	zh_TW
dc.subject.keyword	Reconstruction of 3D Image,Parallel Computing,OpenCL,Hand Gesture Recognition,Attention Mechanism,	en
dc.relation.page	41
dc.identifier.doi	10.6342/NTU202202409
dc.rights.note	未授權
dc.date.accepted	2022-08-19
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	電信工程學研究所	zh_TW
顯示於系所單位：	電信工程學研究所

文件中的檔案：

檔案	大小	格式
U0001-1508202215152800.pdf 未授權公開取用	4.6 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。