請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/18947
完整後設資料紀錄
DC 欄位 | 值 | 語言 |
---|---|---|
dc.contributor.advisor | 丁建均(Jian-Jiun Ding) | |
dc.contributor.author | Yu-Cheng Liu | en |
dc.contributor.author | 劉又誠 | zh_TW |
dc.date.accessioned | 2021-06-08T01:40:28Z | - |
dc.date.copyright | 2016-08-26 | |
dc.date.issued | 2016 | |
dc.date.submitted | 2016-08-19 | |
dc.identifier.citation | A. Feature point based materials
[1] N. Dalal and B. Triggs., Histograms of oriented gradients for human detection, Computer Vision and Pattern Recognition, 2005. Computer Society Conference on IEEE, 2005, pp. 886-893. [2] Lowe, David G., Distinctive image features from scale-invariant key points, International Journal of Computer Vision 60.2 (2004), pp. 91-110. [3] Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B., Learning realistic human actions from movies, Computer Vision and Pattern Recognition, 2008. Computer Society Conference on IEEE, 2008, pp. 1-8. [4] C. Harris and M. Stephens, A combined corner and edge detector, In Proc. of the 4th Alvey Vision Conference, 1988, pp. 147–151. [5] B. Lucas and T. Kanade, An iterative image registration technique with an application to stereo vision, In Proc. Seventh International Joint Conference on Artificial Intelligence, 1981, pages 674–679. [6] J. Weickert, A. Bruhn, and C. Schnぴorr, Lucas/Kanade meets Horn/Schunck: Combining local and global optic flow methods, International Journal of Computer Vision 61.3 (2005), pp. 211-231. [7] T. Brox, A. Bruhn, N. Papenberg, and J. Weickert. High accuracy optical flow estimation based on a theory for warping, In Proc. European Conference on Computer Vision (ECCV), 2004, pp. 25-36. [8] N. Dalal, B. Triggs, and C. Schmid, Human detection using oriented histograms of flow and appearance, In Proc. European Conference on Computer Vision (ECCV), 2006, pp. 428-441. [9] L. Fei-Fei and P. Perona, A Bayesian hierarchical model for learning natural scene categories, Computer Vision and Pattern Recognition, 2005. Computer Society Conference on IEEE, 2005, pp. 524-531. [10] E. Nowak, F. Jurie, and B. Triggs, Sampling strategies for bag-of-features image classification, In Proc. European Conference on Computer Vision (ECCV), 2006, pp. 409-503. [11] Wang, H., Klぴaser, A., Schmid, C., Liu, C.L., Dense trajectories and motion boundary descriptors for action recognition, International Journal of Computer Vision 103.1 (2013), pp. 60-79. [12] Bishop, C.M., Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag New York, Inc., Secaucus, NJ, USA ,2006 [13] X. Peng, L.Wang, X.Wang, and Y. Qiao, Bag of visual words and fusion methods for action recognition: Comprehensive study and good practice, CoRR, abs/1405.4506, 2014. [14] Jaakkola, T., Haussler, D., Exploiting generative models in discriminative classifiers, In Proc. of Neural Information Processing System (NIPS), 1998, pp. 487-493. [15] Perronnin, F., S´anchez, J., Mensink, T., Improving the fisher kernel for large-scale image classification, In Proc. European Conference on Computer Vision (ECCV), 2010, pp. 143-156. [16] A tutorial on Principal Components Analysis http://www.cs.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf [17] Cortes, C. and Vapnik, V., Support vector networks, Machine Learning, 20.3, 1995, pp. 273-297. B. Neural network based materials [18] Neural Network (Basic Ideas) http://speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture/DNN%20(v4).pdf [19] Backpropagation http://speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture/DNN%20backprop.pdf [20] Tips for Training Deep Neural Network http://speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture/Deep%20More%20(v2).pdf [21] Neural Network with Memory http://speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture/RNN%20(v4).pdf [22] Training Recurrent Neural Network http://speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture/RNN%20training%20(v6).pdf [23] Donahue, Jeff, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, and Trevor Darrell., Long-term recurrent convolutional networks for visual recognition and description, Computer Vision and Pattern Recognition, 2015. Computer Society Conference on IEEE, 2015, pp. 2625-2634. [24] Performing Convolution Operations https://developer.apple.com/library/ios/documentation/Performance/Conceptual/vImage/ConvolutionOperations/ConvolutionOperations.html [25] Chang-Di Huang, Chien-Yao Wang, Jia-Ching Wang, Human Action Recognition System for Elderly and Children Care Using Three Stream ConvNet, Orange Technologies on IEEE International Conference, 2015, pp. 5-9. [26] A. Krizhevsky, I. Sutskever, and G. E. Hinton, ImageNet classification with deep convolutional neural networks, In Proc. of Neural Information Processing System (NIPS), 2012, pp. 1106-1114. [27] M. D. Zeiler and R. Fergus, Visualizing and understanding convolutional networks, CoRR, abs/1311.2901, 2013. [28] S. Haykin, Neural Networks: A Comprehensive Foundation, 2nd ed. Englewood Cliffs, NJ: Prentice-Hall, 1999. [29] S. Ji, W. Xu, M. Yang, and K. Yu, 3D convolutional neural networks for human action recognition, Pattern Analysis and Machine Intelligence on IEEE Transactions on 35, 2013, pp. 221-231. [30] D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri, Learning Spatiotemporal Features with 3D Convolutional Networks, Computer Vision, 2015. ICCV, 2015. IEEE International Conference, 2015, pp. 4489-4497. [31] A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei, Large-scale video classification with convolutional neural networks, Computer Vision and Pattern Recognition, 2014. Computer Society Conference on IEEE, 2014, pp. 1725-1732. [32] K. Simonyan and A. Zisserman, Two-stream convolutional networks for action recognition in videos, In Proc. of Neural Information Processing System (NIPS), 2014, pp. 568-576. [33] H. Kuehne, H. Jhuang, E. Garrote, T. Poggio, and T. Serre, HMDB: A large video database for human motion recognition. Computer Vision, 2011. ICCV, 2011. IEEE International Conference, 2011, pp. 2556-2563. | |
dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/18947 | - |
dc.description.abstract | 多媒體在人類的生活中扮演重要的角色。有數以萬計的影片被上傳至網路。一些熱門的主題,像是籃球和棒球運動都有著極高的點閱率。因此資料擷取的技術逐漸變得重要。
人類的動作辨識可以被近一步應用於異常事件偵測以及分析人類活動。在我們實驗中所使用到的資料庫裡,有包含像是人類身體的動作以及人類與物品之間的互動,像是跳躍,拍手和飲食。 在這篇論文中,我們先利用捲積神經網路去訓練一個模型。然後擷取訓練及測試用影片的特徵。在取得這些特徵後,我們利用同一個影片中,特徵之間的時間關係去訓練一個三層的長短時間記憶模型。最後,我們選擇長短時間記憶模型的最後一層的最後一個時間步的特徵作為整個測試影片的特徵去分類。我們模型在測試之後的準確率高於一些近幾年來的方法。 | zh_TW |
dc.description.abstract | Multimedia plays an important role in human daily life. Hundreds of thousands videos are uploaded on the Internet. Some hot topic such as basketball and baseball games are with high click through rate so information retrieval techniques become important.
Human action detection can be further applied to detect abnormal events and analyze activity. In this thesis, the dataset we use in experiments contains the human body action and interaction with objects like jumping, clapping, drinking. In the thesis, we first uses convolutional neural network (CNN) to train a model. Then extract the features of training and testing data from the model. After obtaining the features, we use the temporal information between features in same video clip to train a 3-layered long short term memory (LSTM) model. Finally, we choose the last layer feature vector of LSTM which contains all data characteristics of the testing video features as the determine scores. The results show that the accuracy of our structure is higher than some works proposed in recent years. | en |
dc.description.provenance | Made available in DSpace on 2021-06-08T01:40:28Z (GMT). No. of bitstreams: 1 ntu-105-R03942035-1.pdf: 3973792 bytes, checksum: b954cb0e1209e35bd5ffdcf0211ec36f (MD5) Previous issue date: 2016 | en |
dc.description.tableofcontents | 口試委員會審定書……………………………………………………………………...#
中文摘要 i ABSTRACT ii CONTENTS iii LIST OF FIGURES vii LIST OF TABLES xii Chapter 1 Introduction 1 1.1 Background 1 1.2 Organization 1 Chapter 2 Conventional Feature Based Methods 3 2.1 Spatio-Temporal Interest Points (STIPs) 3 2.1.1 Histogram of Gradient (HoG) 3 2.1.2 Optical Flow (OF) 6 2.1.3 Histogram of Flow (HoF) 9 2.1.4 Motion Boundary Histogram (MBH) 10 2.2 Dense Trajectories 11 2.2.1 Dense Sampling 13 2.2.2 Trajectories 14 2.3 Clustering 16 2.3.1 K-means 16 2.4 Feature Encoding 19 2.4.1 Vector Quantization 19 2.4.2 Sparse Coding 21 2.4.3 Fisher Vector 22 2.5 Principle Component Analysis (PCA) 23 2.6 Normalization and Pooling 25 2.6.1 Normalization 25 2.6.2 Pooling 26 2.7 Support Vector Machine (SVM) 27 2.8 Overall structure 28 2.9 Feature Fusion 29 Chapter 3 Neural Network (NN) 31 3.1 Deep Neuron Network (DNN) 31 3.1.1 Basic Structure of Neural Network 32 3.1.2 Back Propagation 37 3.1.3 Advance Structures for Neural Network 42 3.1.4 Recurrent Neural Network (RNN) 47 3.1.5 Long Short Term Memory (LSTM) 50 3.2 Convolutional Neural Network (CNN) 52 3.2.1 Convolutional Kernel 52 3.2.2 Pooling 54 3.2.3 Local Response Normalization 55 3.2.4 Fully-Connected Layer 55 3.2.5 Other Functions 55 3.2.6 Data Augmentation 55 3.2.7 CNN Structure 56 3.2.8 Visualization of CNN Kernels 57 Chapter 4 Existing CNN Methods 61 4.1 3-D Kernel Based Models 61 4.2 LSTM Based Model 63 4.3 Optical Flow Based Model 64 Chapter 5 Proposed 3-D Convolutional Model Based Feature training using LSTM 65 5.1 Structure of Proposed method 65 5.2 Simulation Results and Discussion 66 5.2.1 Dataset 67 5.2.2 Experiment on whole dataset 68 5.2.3 Experiment on 15 classes 70 Chapter 6 Conclusion and Future Work 71 6.1 Conclusion 71 6.2 Future Work 71 REFERENCE 72 | |
dc.language.iso | en | |
dc.title | 利用捲積神經網路進行動作辨識 | zh_TW |
dc.title | Action Recognition Using Convolutional Neural Network | en |
dc.type | Thesis | |
dc.date.schoolyear | 104-2 | |
dc.description.degree | 碩士 | |
dc.contributor.oralexamcommittee | 王家慶(Jia-Ching Wang),王鵬華(Peng-Hua Wang),許文良(Wen-Liang Hsue) | |
dc.subject.keyword | 動作辨識,深度學習,捲積神經網路,長短時間記憶,三維捲積核心, | zh_TW |
dc.subject.keyword | action recognition,deep learning,convolutional neural network,long short term memory,3-D convolutional kernel, | en |
dc.relation.page | 77 | |
dc.identifier.doi | 10.6342/NTU201602543 | |
dc.rights.note | 未授權 | |
dc.date.accepted | 2016-08-21 | |
dc.contributor.author-college | 電機資訊學院 | zh_TW |
dc.contributor.author-dept | 電信工程學研究所 | zh_TW |
顯示於系所單位: | 電信工程學研究所 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-105-1.pdf 目前未授權公開取用 | 3.88 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。