應用影像前處理以提升人物動作辨識網路之效能

林佩宜; Pei-Yi Lin

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/79221

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	丁肇隆	zh_TW
dc.contributor.advisor	Chao-Lung Ting	en
dc.contributor.author	林佩宜	zh_TW
dc.contributor.author	Pei-Yi Lin	en
dc.date.accessioned	2022-11-14T03:46:35Z	-
dc.date.available	2023-11-09	-
dc.date.copyright	2022-10-28	-
dc.date.issued	2022	-
dc.date.submitted	2022-10-27	-
dc.identifier.citation	[1] C.-Y. Wu, M. Zaheer, H. Hu, R. Manmatha, A. J. Smola, and P. Krähenbühl, "Compressed video action recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 6026-6035. [2] A. F. Bobick and J. W. Davis, "The recognition of human movement using temporal templates," IEEE Transactions on pattern analysis and machine intelligence, vol. 23, no. 3, pp. 257-267, 2001. [3] K. Simonyan and A. Zisserman, "Two-stream convolutional networks for action recognition in videos," Advances in neural information processing systems, vol. 27, 2014. [4] J.-H. Kim and C. S. Won, "Action recognition in videos using pre-trained 2D convolutional neural networks," IEEE Access, vol. 8, pp. 60179-60188, 2020. [5] M. V. V. Model, "Coding of Moving Pictures and Associated Audio Information," ISO/IEC JTC1/SC29/WG11, 1997. [6] I. M. Committee, "Coding of moving pictures and associated audio for storage at up to about 1.5 mbit/s, part 3: Audio," ISO/IEC 11172, vol. 3, 1993. [7] I. IEC, "Information Technology Generic Coding of Moving Pictures and Associated Audio Information Part 2: Video."," ed, 1994. [8] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1998. [9] K. Chellapilla, S. Puri, and P. Simard, "High performance convolutional neural networks for document processing," in Tenth international workshop on frontiers in handwriting recognition, 2006: Suvisoft. [10] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," Communications of the ACM, vol. 60, no. 6, pp. 84-90, 2017. [11] K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv preprint arXiv:1409.1556, 2014. [12] C. Szegedy et al., "Going deeper with convolutions," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1-9. [13] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778. [14] S. Xie, R. Girshick, P. Dollár, Z. Tu, and K. He, "Aggregated residual transformations for deep neural networks," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 1492-1500. [15] A. G. Howard et al., "Mobilenets: Efficient convolutional neural networks for mobile vision applications," arXiv preprint arXiv:1704.04861, 2017. [16] X. Zhang, X. Zhou, M. Lin, and J. Sun, "Shufflenet: An extremely efficient convolutional neural network for mobile devices," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 6848-6856. [17] J. Sánchez, F. Perronnin, T. Mensink, and J. Verbeek, "Image classification with the fisher vector: Theory and practice," International journal of computer vision, vol. 105, no. 3, pp. 222-245, 2013. [18] S. Ji, W. Xu, M. Yang, and K. Yu, "3D convolutional neural networks for human action recognition," IEEE transactions on pattern analysis and machine intelligence, vol. 35, no. 1, pp. 221-231, 2012. [19] D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri, "Learning spatiotemporal features with 3d convolutional networks," in Proceedings of the IEEE international conference on computer vision, 2015, pp. 4489-4497. [20] K. Soomro, A. R. Zamir, and M. Shah, "UCF101: A dataset of 101 human actions classes from videos in the wild," arXiv preprint arXiv:1212.0402, 2012. [21] H. Kuehne, H. Jhuang, E. Garrote, T. Poggio, and T. Serre, "HMDB: a large video database for human motion recognition," in 2011 International conference on computer vision, 2011: IEEE, pp. 2556-2563. [22] L. Wang et al., "Temporal segment networks: Towards good practices for deep action recognition," in European conference on computer vision, 2016: Springer, pp. 20-36. [23] H.-S. Fang, S. Xie, Y.-W. Tai, and C. Lu, "Rmpe: Regional multi-person pose estimation," in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2334-2343. [24] J. Canny, "A computational approach to edge detection," IEEE Transactions on pattern analysis and machine intelligence, no. 6, pp. 679-698, 1986. [25] 楊雅筑, "運用影像前處理提升卷積神經網路於人物動作辨識之準確率," 碩士, 工程科學及海洋工程學研究所, 國立臺灣大學, 台北市, 2021. [Online]. Available: https://hdl.handle.net/11296/v6uyx8 [26] S. M. Pizer et al., "Adaptive histogram equalization and its variations," Computer vision, graphics, and image processing, vol. 39, no. 3, pp. 355-368, 1987.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/79221	-
dc.description.abstract	近年來，深度學習技術高度發展，以及計算能力的進步，圖像中的物件辨識及分類的功能進步顯著，隨著影片資訊量的成長，對影片中人物進行動作辨識也逐漸受到重視，但是影片中的動作事件必須透過連續的畫面才能正確判斷，而其中重複的畫面包含許多冗餘資訊，因此[1]提出使用壓縮資訊的方式，從壓縮影片中直接提取運動向量及殘差資訊，作為訓練所需之輸入資訊，以減少訓練成本，而本研究基於CoViAR的方法，並加以改進。實驗結果證實，若以更多的動作資訊取代色彩的變化，並將運動歷史圖應用於殘差影像中，能夠提升辨識準確率。另外，將單幀影像及運動向量資訊融合，作為一個模型的輸入資訊，並且將殘差影像的三通道模型縮減為單通道，可以減少模型訓練參數，以更低的計算成本達到相當程度的辨識效果。	zh_TW
dc.description.abstract	In recent years, object recognition and classification in images have advanced significantly due to deep learning technology which has been highly developed and the computing power has advanced. Therefore, [1] proposed the method of compressed information to extract motion vectors and residual information directly from compressed videos as input information for training to reduce training costs. This study is based on the method of CoViAR and improved. The experimental results show that the recognition accuracy can be improved if more motion information is used instead of color variation, and the motion history image technology is applied to the residual images. In addition, combining I-frame image and motion vector information as input of one model, and reducing the three-channel model of the residual image to a single channel can reduce the model parameters and achieve a considerable recognition performance with lower computational cost.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2022-11-14T03:46:35Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2022-11-14T03:46:35Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	誌謝 i 摘要 ii ABSTRACT iii 目錄 iv 表目錄 vi 圖目錄 vii 第一章緒論 1 1.1 研究背景與動機 1 1.2 論文架構 2 第二章文獻回顧 3 2.1 影片取幀方法 3 2.1.1 運動歷史圖(Motion History Image, MHI) 3 2.1.2 堆疊灰階三通道影像(stacked grayscale 3-channel image) 4 2.2 動態影像編碼及運動向量估算 6 2.3 深度學習神經網路 8 2.4 人物動作辨識 12 2.5 UCF動作資料集 16 第三章研究方法 18 3.1 單幀影像處理方法 18 3.2 運動向量估算 21 3.2.1 邊緣檢測 21 3.2.2 融合單幀影像 24 3.3 殘差影像處理方法 26 3.3.1 影像取幀方法 26 3.3.2 運動歷史圖應用 27 3.4 實驗網路架構及訓練流程 30 3.4.1 卷積神經網路訓練 31 3.4.2 分數融合方法 34 3.4.3 損失函數(Loss Function) 34 第四章實驗結果與討論 35 4.1 實驗資料集 35 4.2 單幀影像實驗 37 4.3 運動向量實驗 38 4.3.1 邊緣檢測法輸入 39 4.3.2 融合單幀影像 40 4.4 殘差影像實驗 41 4.4.1 不同取幀方式 42 4.4.2 運動歷史圖 43 4.5 分數融合 44 第五章結論 51 參考文獻 52	-
dc.language.iso	zh_TW	-
dc.title	應用影像前處理以提升人物動作辨識網路之效能	zh_TW
dc.title	Application of Image Preprocessing to Improve Human Action Recognition	en
dc.title.alternative	Application of Image Preprocessing to Improve Human Action Recognition	-
dc.type	Thesis	-
dc.date.schoolyear	111-1	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	張瑞益;張恆華;黃乾綱	zh_TW
dc.contributor.oralexamcommittee	Ray-I Chang;Herng-Hua Chang;Chien-Kang Huang	en
dc.subject.keyword	人物動作辨識,卷積神經網路,深度學習,影像處理,壓縮影片格式,	zh_TW
dc.subject.keyword	human action recognition,convolutional neural network,deep learning,image processing,compressed video,	en
dc.relation.page	53	-
dc.identifier.doi	10.6342/NTU202210002	-
dc.rights.note	同意授權(全球公開)	-
dc.date.accepted	2022-10-27	-
dc.contributor.author-college	工學院	-
dc.contributor.author-dept	工程科學及海洋工程學系	-
dc.date.embargo-lift	2023-11-09	-
顯示於系所單位：	工程科學及海洋工程學系

文件中的檔案：

檔案	大小	格式
U0001-1028221026495102.pdf	2.8 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。