使用多變數線性預測之人類動作辨識的時間模型

Chin-An Lin; 林晉安

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/65912

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	鄭士康(Shyh-Kang Jeng)
dc.contributor.author	Chin-An Lin	en
dc.contributor.author	林晉安	zh_TW
dc.date.accessioned	2021-06-17T00:15:09Z	-
dc.date.available	2012-08-01
dc.date.copyright	2012-08-01
dc.date.issued	2012
dc.date.submitted	2012-07-04
dc.identifier.citation	[1] H. Wang, A. Klaser, C. Schmid, and C.L. Liu, “Action recognition by dense trajectories,” in Proc. Conf. Computer Vision and Pattern Recognition, 2011, pp. 3169-3176. [2] C. Schuldt, I. Laptev, and B. Caputo, “Recognizing human actions: a local SVM approach,” in Proc. Int'l Conf. Pattern Recognition, 2004, pp. 32-36. [3] J. Liu, J. Luo, M. Shah, “Recognizing realistic actions from videos “in the wild”,” in Proc. Conf. Computer Vision and Pattern Recognition, 2009, pp. 1996-2003. [4] I. Laptev and P. perez, “Retrieving actions in movies,” in Proc. Int'l Conf. Computer Vision, 2007, pp. 1-8. [5] J. Carlos, H. Wang, and F.-F. Li, “Unsupervised learning of human action categories using spatial-temporal words,” Int'l J. Computer Vision, vol. 79, no. 3, pp. 299-318, 2008. [6] P. Dollar, V. Rabaud, G. Cottrell, and S. Belongie, “Behavior recognition via sparse spatio-temporal features,” Proc. 2nd Joint IEEE Int.l Workshop Visual Surveillance Perform. Eval. Tracking Surveillance, 2005, pp.65. [7] H. Wang, M.M. Ullah, A. Klaser, I. Laptev, C. Schmid, ”Evaluation of local spatio-temporal features for action recognition,” in Proc. British Conf. Machine Vision, 2009, pp.1-8. [8] A.F. Bobick, J.W. Davis, “The recognition of human movement using temporal templates,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, no. 3, pp.257-267, 2001. [9] A. Klaser, M. Marszałek, and C. Schmid, “A spatio-temporal descriptor based on 3D gradients,” in Proc. British Conf. Machine Vision, 2008. [10] L. Wang and D. Suter, “Recognizing human activities from silhouettes: motion subspace and factorial discriminative graphical model,” in Proc. Conf. Computer Vision and Pattern Recognition, 2007, pp.1-8. [11] C.C. Chen and J.K. Aggarwal, “Modeling human activities as speech,” in Proc. Conf. Computer Vision and Pattern Recognition, 2011, pp. 3425-3431. [12] J. Zhang and S. Gong, “Action categorization with modified hidden conditional random field,” Pattern Recognition, vol. 43, no. 1, pp. 197-203, 2010. [13] C.C. Loy, T. Xiang, and S. Gong, ”Modelling activity global temporal dependencies using Time Delayed Probabilistic Graphical Model,” in Proc. Int'l Conf. Computer Vision, 2009, pp. 120-127. [14] R. Poppe, “A survey on vision-based human action recognition,” Image and Vision Computing, vol. 28, no. 6, pp. 976-990, 2010. [15] P. Turaga, R. Chellappa, V.S. Subrahmanian, “Machine recognition of human activities: a survey,” IEEE Trans. Circuits and Systems for Video Technology, vol. 18, no. 11, pp. 1473-1488, 2008. [16] P. Matikainen, M. Hebert and R. Sukthankar, “Representing pairwise spatial and temporal relations for action recognition,” in Proc. European Conf. Computer Vision, 2010, pp. 508-521. [17] J. Sun, X. Wu, S. Yan, L.F. Cheong, T.S. Chua, and J. Li, ”Hierarchical spatio-temporal context modeling for action recognition,” in Proc. Conf. Computer Vision and Pattern Recognition, 2009, pp. 2004-2011.. [18] K. Prabhaka, S. Oh, P. Wang, G. D. Abowd, and J. M. Rehg, “Temporal causality for the analysis of visual events,” in Proc. Conf. Computer Vision and Pattern Recognition, 2010, pp. 1967-1974. [19] Q.V. Le, W.T. Zou, S.T. Teung, and A.Y. Ng, “Learning hierarchical invariant sptio-temporal features for action recognition with independent subspace analysis,” in Proc. Conf. Computer Vision and Pattern Recognition, 2011, pp. 3161-3368. [20] J. Makhoul, 'Linear prediction: A tutorial review,' Proceedings of the IEEE, vol. 63, no. 6, pp. 561–580, 1975. [21] I. Laptev and T. Linderberg, “Space-time interest points,” in Proc. Int’l Conf. Computer Vision, 2003, pp. 432-439. [22] M. Blank, L. Gorelick, E. Shechtman, M. Irani, and R. Basri, “Actions as space-time shapes,” in Proc. Int’l Conf. Computer Vision, 2005, pp.1395-1402. [23] M. Rodriguez, J. Ahmed, and M. Shah, “Action mach: A spatio-temporal maximum average correlation height filter for action recognition” in Proc. Conf. Computer Vision and Pattern Recognition, 2008, pp. 1-8. [24] X. Liu, and J. Zhang, “Active learning for human action recognition with Gaussian process,” in Proc. Int'l Conf. Image Processing, 2011, pp.3253-3256. [25] I. Junejo, E. Dexter, I. Laptev, P. Perez, “Cross-view action recognition from temporal self-similarities,” in Proc. European Conf. Computer Vision, 2008, pp. 293-306. [26] C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines, 2001, Software available at http://www.csie.ntu.edu.tw/ cjlin/libsvm. [27] A. Fathi, A. Farhadi, J.M. Rehg, “Understanding Egocentric Activities,” in Proc. Int'l Conf. Computer Vision, 2011, pp. 407-414. [28] C. Sminchisescu, A. Kanaujia, D. Metaxas, “Conditional models for contextual human motion recognition,” Computer Vision and Image Understanding, vol. 104, no. 2-3, pp. 210-220, 2006. [29] Y. Shen and H. Foroosh, “View-invariant action recognition using fundamental ratios,” in Proc. Conf. Computer Vision and Pattern Recognition, 2008, pp. 1-6. [30] N.P. Cuntoor and R. Chellappa, “Epitomic representation of human activities,” in Proc. Conf. Computer Vision and Pattern Recognition, 2007, pp. 1-8. [31] R. poppe, “Vision-based human motion analysis: an overview, ”Computer Vision and Image Understanding, vol. 108, no. 1-2, pp. 4-18, 2007. [32] J. Wang, Z. Chen, and Y. Wu, “Action recognition with multiscale spatio-temporal contexts,” in Proc. Conf. Computer Vision and Pattern Recognition, 2011, pp. 3185-3192. [33] B. Yao and F.-F. Li, “Modeling mutual context of object and human pose in human-object interaction activities,” in Proc. Conf. Computer Vision and Pattern Recognition, 2010, pp. 17-24. [34] S.Z. Masood, C. Ellis, A. Nagaraja, M.F. Tappen, J.J. La Viola Jr., “Measuring and reducing observational latency when recognizing actions,” in Workshops Int.l Conf. Computer Vision, 2011, pp. 422-429. [35] A. Kovashka and K. Grauman, “Learning a hierarchy of discriminative space-time neighborhood features for human action recognition,” in Proc. Conf. Computer Vision and Pattern Recognition, 2010, pp. 2046- 2053. [36] W. Li, “Action recognition based on a bag of 3D points, ” in workshops Computer Vision and Pattern Recognition, 2010, pp. 9-14. [37] M. Andriluka, S. Roth, B. Schiele, 'Pictorial structures revisited: People detection and articulated pose estimation,' in Proc. Conf. Computer Vision and Pattern Recognition, 2009, pp. 1014-1021. [38] A. Rakotomamonjy, F. Bach, S. Canu, and Y. Grandvalet, “SimpleMKL,” Journal of Machine Learning Research, vol.9, pp. 2491–2521, 2008. [39] D. Weinland, R. Ronfard, E. Boyer, “Free viewpoint action recognition using motion history volumes,” Computer Vision and Image Understanding, vol. 104, no. 2-3, pp. 249-257, 2006. [40] N. Dalal, B. Triggs, and C. Schmid, ”Human detection using oriented histograms of flow and appearance,” in Proc. European Conf. Computer Vision, 2006, pp. 428-441. [41] B. Li, M. Ayazoglu, T. Mao, O.I. Camps, and M. Sznaier, “Activity recognition using dynamic subspace angles,” In Proc. Conf. Computer Vision and Pattern Recognition, 2011, pp. 3193-3200. [42] S. Sonnenburg, G. Ratsch, C. Schafer, B. Scholkopf, “Large scale multiple kernel learning,” J. Machine Learning Research, vol. 7, pp. 1531-1565, 2006. [43] G. Willems, T. Tuytelaars, and L.V. Gool, “An efficient dense and scale-invariant spatio-temporal interest point detector,” In Proc. European Conf. Computer Vision, 2008, pp. 650-663. [44] N. Ikizler-Cinbis, and S. Sclaroff, “Object, scene and actions: combining multiple features for human action recognition,” In Proc. European Conf. Computer Vision, 2010, pp. 494-507. [45] I. Laptev, M. Marszalek, C. Schmid, and B. Rozenfeld, “Learning realistic human actions from movies,” In Proc. Conf. Computer Vision and Pattern Recognition, 2008, pp. 1-8. [46] M. Marszałek, I. Laptev, and C. Schmid, “Actions in context,” In Proc. Conf. Computer Vision and Pattern Recognition, 2009, pp. 2929-2936.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/65912	-
dc.description.abstract	對於辨識長時間的動作，時間之間的高階依賴關係非常有用，但是利用高階依賴關係，將造成常用的圖形時間模型像是HMM或者是CRF的模型計算複雜度很大程度的提升。本論文提出使用多變數線性預測，來利用時間上高階依賴關係。我們時間模型的計算複雜度比較低。除此之外，相較於圖形時間模型，我們的演算法不用定義與人工標記狀態，並且在有一定程度雜訊的Bag-of-Word 表示法上可以改善辨識率，而這個表示法在前人的研究中得到顯著的成果。我們的方法也擁有很好的應用能力，為了顯示這個能力，我們不僅在視訊資料上像是KTH和UCF資料庫實驗，同時也在骨架資料上像是MSR、Kinect資料庫實驗。在大多數情況，我們得到相較於世界上最先進的系統更好的結果。	zh_TW
dc.description.abstract	To recognize temporally extended actions, it is useful to introduce high-order temporal dependence into the recognition task. However, this will highly increase the computational complexity, when the commonly used graphical models such as HMM and CRF are employed. In this thesis, multivariate linear prediction is proposed to exploit high-order temporal dependence with lower computational complexity. In addition, our method makes no effort on defining and manually labeling states and can improve bag-of-word representations, which may contain considerable noise but has shown excellent performance in previous work. To show the applicability of the proposed method, we experiment not only on video datasets including KTH and UCF but on skeleton datasets such as MSR 3D action and UCF Kinect. In most of them, our method gets superior performance than the state-of-the-art methods.	en
dc.description.provenance	Made available in DSpace on 2021-06-17T00:15:09Z (GMT). No. of bitstreams: 1 ntu-101-R98942116-1.pdf: 1890371 bytes, checksum: 701477512c780c3c054dc24dd81d111d (MD5) Previous issue date: 2012	en
dc.description.tableofcontents	口試委員會審定書 # 誌謝 i 中文摘要 ii ABSTRACT iii CONTENTS iv LIST OF FIGURES vi LIST OF TABLES vii Chapter 1 Introduction 1 1.1 Our Approach 3 1.2 Research Contributions 3 1.3 Thesis Organization 4 Chapter 2 Related Work 5 2.1 Global Representation 5 2.2 Local Representation 5 2.2.1 Spatio-Temporal Interest Point Detectors 6 2.2.2 Local Descriptors 7 2.2.3 Correlations between Local Descriptors 8 2.3 Parametric Model 9 Chapter 3 Preliminary 11 3.1 Wide-Sense Stationary Process 11 3.2 Linear Prediction 11 3.3 Multiple Kernel Learning 12 Chapter 4 The Proposed Framework 14 4.1 A Generative Formulation 14 4.2 Modeling the Frequency Component 15 4.2.1 Multivariate Linear Prediction 15 4.2.2 Computing Coefficient Matrix 16 4.2.3 Lambda Selection 19 4.2.4 Probability Model 19 4.3 Modeling the Static Component 20 4.4 Fusion 21 Chapter 5 Representations 23 5.1 Bag-of-Word Representation 23 5.1.1 STIP Detection and Local Description 23 5.1.2 Descriptor Quantization and Time-Series Representation 25 5.2 Skeleton Representation 25 Chapter 6 Experiments 27 6.1 Datasets 27 6.2 Kernels for MKL 29 6.3 Channel Combination 30 Chapter 7 Experimental Results 32 7.1 Evaluation of the proposed method 32 7.2 Evaluation of High-Order Temporal Dependence 34 Chapter 8 Conclusions 37 REFERENCES 38
dc.language.iso	en
dc.title	使用多變數線性預測之人類動作辨識的時間模型	zh_TW
dc.title	An Efficient Temporal Model for Action Recognition Using Multivariate Linear Prediction	en
dc.type	Thesis
dc.date.schoolyear	100-2
dc.description.degree	碩士
dc.contributor.coadvisor	林彥宇(Yen-Yu Lin)
dc.contributor.oralexamcommittee	廖弘源(Hong-Yuan Mark Liao)
dc.subject.keyword	人類動作辨識,多變數線性預測,時間模型,骨架,時間序列,影片描述,	zh_TW
dc.subject.keyword	action recognition,multivariate linear prediction,temporal model,skeleton,time series,bag-of-word,video description,	en
dc.relation.page	42
dc.rights.note	有償授權
dc.date.accepted	2012-07-05
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	電信工程學研究所	zh_TW
顯示於系所單位：	電信工程學研究所

文件中的檔案：

檔案	大小	格式
ntu-101-1.pdf 目前未授權公開取用	1.85 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。