於監視系統使用色彩及深度資訊進行特徵與事件之偵測

Hao-Jen Wang; 王浩任

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/61298

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	徐宏民(Winston H. Hsu)
dc.contributor.author	Hao-Jen Wang	en
dc.contributor.author	王浩任	zh_TW
dc.date.accessioned	2021-06-16T13:00:46Z	-
dc.date.available	2015-08-23
dc.date.copyright	2013-08-23
dc.date.issued	2013
dc.date.submitted	2013-08-08
dc.identifier.citation	[1] M. Andriluka, S. Roth, and B. Schiele. Pictorial structures revisited: People detection and articulated pose estimation. In CVPR, pages 1014–1021. IEEE, 2009. [2] M. Blank, L. Gorelick, E. Shechtman, M. Irani, and R. Basri. Actions as space-time shapes. In The Tenth IEEE International Conference on Computer Vision (ICCV’05), pages 1395–1402, 2005. [3] L. Bourdev, S. Maji, and J. Malik. Describing people: Poselet-based attribute classification. In International Conference on Computer Vision (ICCV), 2011. [4] L. Bourdev and J. Malik. Poselets: Body part detectors trained using 3d human pose annotations. In International Conference on Computer Vision (ICCV), 2009. [5] C.-C. Chang and C.-J. Lin. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2:27:1–27:27, 2011. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm. [6] N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05) - Volume 1 - Volume 01, CVPR ’05, pages 886– 893, Washington, DC, USA, 2005. IEEE Computer Society. [7] K. G. Derpanis, M. Sizintsev, K. Cannons, and R. P. Wildes. Efficient action spotting based on a spacetime oriented structure representation. In ICPR, 2010. [8] P. Dollar, V. Rabaud, G. Cottrell, and S. Belongie. Behavior recognition via sparse spatio-temporal features. In Proceedings of the 14th International Conference on Computer Communications and Networks, 2005. [9] A. Farhadi, I. Endres, D. Hoiem, and D. Forsyth. Describing objects by their attributes. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), 2009. [10] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan. Object detection with discriminatively trained part based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9):1627–1645, 2010. [11] P. F. Felzenszwalb and D. P. Huttenlocher. Pictorial structures for object recognition. Int. J. Comput. Vision, 61(1):55–79, Jan. 2005. [12] D. J. Field. Relations between the statistics of natural images and the response properties of cortical cells. J. Opt. Soc. Am. A, 4:2379–2394, 1987. [13] N. Kumar, A. C. Berg, P. N. Belhumeur, and S. K. Nayar. Describable visual attributes for face verification and image search. In IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), October 2011. [14] I. Laptev. On space-time interest points. Int. J. Comput. Vision, 2005. [15] I. Laptev, M. Marszałek, C. Schmid, and B. Rozenfeld. Learning realistic human actions from movies. In Conference on Computer Vision & Pattern Recognition, 2008. [16] Y.-H. Lei, Y.-Y. Chen, B.-C. Chen, L. Iida, and W. H. Hsu. Where is who: largescale photo retrieval by facial attributes and canvas layout. In W. R. Hersh, J. Callan, Y. Maarek, and M. Sanderson, editors, SIGIR, pages 701–710. ACM, 2012. [17] L.-J. Li, H. Su, E. P. Xing, and F.-F. Li. Object bank: A high-level image representation for scene classification & semantic feature sparsification. In NIPS, 2010. [18] W. Li, Z. Zhang, and Z. Liu. Action recognition based on a bag of 3d points. In CVPR4HB10, 2010. [19] S. Liu, J. Feng, Z. Song, T. Zhang, H. Lu, C. Xu, and S. Yan. Hi, magic closet, tell me what to wear! In Proceedings of the 20th ACM international conference on Multimedia, MM ’12, pages 619–628, New York, NY, USA, 2012. ACM. [20] F. Lv and R. Nevatia. Single view human action recognition using key pose matching and viterbi path searching. In CVPR, 2007. [21] P. Natarajan and R. Nevatia. View and scale invariant action recognition using multiview shape-flow models. In CVPR, 2008. [22] B. Ni, G. Wang, and P. Moulin. Rgbd-hudaact: A color-depth video database for human daily activity recognition. In IEEE Workshop on Consumer Depth Cameras for Computer Vision in conjunction with ICCV, 2011. [23] J. C. Niebles, H. Wang, and L. Fei-Fei. Unsupervised learning of human action categories using spatial-temporal words. Int. J. Comput. Vision, 2008. [24] T. Ojala, M. Pietikainen, and D. Harwood. A comparative study of texture measures with classification based on featured distributions. Pattern Recognition, pages 51– 59, 1996. [25] H. Pirsiavash and D. Ramanan. Detecting activities of daily living in first-person camera views. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 2012. [26] M. S. Ryoo and J. K. Aggarwal. Spatio-temporal relationship match: Video structure comparison for recognition of complex human activities. In ICCV, 2009. [27] S. Sadanand and J. J. Corso. Action bank: A high-level representation of activity in video. In CVPR, 2012. [28] W. Scheirer, N. Kumar, P. N. Belhumeur, and T. E. Boult. Multi-attribute spaces: Calibration for attribute fusion and similarity search. In The 25th IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2012. [29] J. Shotton, A. Fitzgibbon, M. Cook, T. Sharp, M. Finocchio, R. Moore, A. Kipman, and A. Blake. Real-time human pose recognition in parts from single depth images. In In In CVPR, 2011. 3, 2011. [30] J. Sung, C. Ponce, B. Selman, and A. Saxena. Human activity detection from rgbd images. In Plan, Activity, and Intent Recognition, volume WS-11-16 of AAAI Workshops. AAAI, 2011. [31] D. Vaquero, R. Feris, D. Tran, L. Brown, A. Hampapur, and M. Turk. Attribute-based people search in surveillance environments. In IEEE Workshop on Applications of Computer Vision (WACV’09), Snowbird, Utah, December 2009. [32] D. Weinland, E. Boyer, and R. Ronfard. Action Recognition from Arbitrary Views using 3D Exemplars. In International Conference on Computer Vision, 2007. [33] D. Weinland, R. Ronfard, and E. Boyer. Free viewpoint action recognition using motion history volumes. Comput. Vis. Image Underst., 2006. [34] Y. Wu. Mining actionlet ensemble for action recognition with depth cameras. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), CVPR ’12, 2012. [35] G.-X. Yuan, C.-H. Ho, and C.-J. Lin. Recent advances of large-scale linear classification. Proceedings of the IEEE, 100(9):2584–2603, 2012. [36] Q. Zhou and G. Wang. Learning to recognize unsuccessful activities using a twolayer latent structural model. In ECCV (3), pages 750–763, 2012.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/61298	-
dc.description.abstract	本論文前半段提出了一套於室內監視系統環境下偵測人物特徵以及互動行為的架構；特徵辨識在近幾年已逐漸成為時下電腦視覺以及多媒體領域中相當熱門的研究題目，但傳統在物體偵測以及辨識上的解法仍然有相當多的難題需要克服，幾乎無可避免的會受到如光影變化、物體或人的視角、姿勢等等的因素之影響。近年來，由於包含了紅外線等設備而發展出來的深度儀不斷進步，有別於傳統純視覺領域的作法，我們提出了一套使用色彩與深度資訊於多視角下，基於人體的各個部位為基準來做人的特徵辨識之系統。在本研究中我們針對了數個對於監視環境之下相當重要的特徵，透過由3D資訊以及由3D資訊輔助下取得更精準的特徵值，學習出我們的辨識模型。而為了驗證我們提出之方法的成效以及評估系統的表現，我們選擇了數個當前最佳的辨識方法來作為我們實驗比較的對象，而實驗結果也彰顯出在如此低解析度，並且人的視角、姿勢等多變化的因素下，我們所提出的方法能有比這些作法更好的表現。而透過我們系統的概念，許多應用如篩選監視錄影畫面，找尋嫌疑犯等等都可以建立在此之上。而在後半段的部分，有別於傳統自影像中抽取的低階層特徵值，中階層的特徵表示方法，或者可以說是更具有語意上的特徵表示方法已經在很多前人的研究中，取得了突破性的表現；在語意空間下的特徵值不只能提供辨識目標精簡的特徵表示，更能避免掉在低階層的特徵空間下一些雜質的影響。在本研究中，我們提出了將人的行為或動作，拆成由多個身體部位已經姿態的結合來表現其特徵的表示方法；先前的對於人的動作之相關研究主要都集中在單人的動作辨識上，而在這份研究中我們想研究的是人與人的互動行為，在這樣的前提下許多傳統於動作辨識會碰到的問題如遮蔽所造成的影響，將會讓問題變得更為困難。在我們這部份的實驗結果中，數據也呈現了我們提出的方法可以表現得比我們所比較的對象來得好，尤其是幾個對於傳統作法較為有困難度的動作類別。	zh_TW
dc.description.abstract	Attributes have gained much attention in computer vision and multimedia research area in the recent years. With the advent of depth enabled sensors and increasing needs in surveillance systems, this thesis propose a novel framework to detect fine-grained human attributes (e.g., having backpack, talking on cell phone, wearing glasses) in the surveillance environments. Traditional detection and recognition methods always suffer from the problems such as variations in lighting conditions, poses, and viewpoints of object instances. To tackle these problems, we propose a multi-view part-based attribute detecting system based on color-depth inputs instead of utilizing color images. We address several important attributes in the surveillance environments and train multiple attribute classifiers based on features inferred from 3D information to construct our discriminative model. To justify the idea of our approach and evaluate the performance of our system, several state-of-the-art methods are compared and the experimental results show that our method is more robust under large variations in surveillance conditions, and human related issues such as pose, orientations and deformation of body parts. With the capability of our system, many applications can be built such as pre-filtering for browsing specific attributes related surveillance video frames, finding suspects or missing people. Mid-level feature representation, or semantic features have showed the discriminative power beyond low-level features in many recent works. The semantic feature space can not only give compact representation but also invariant to certain low-level feature noise. In this paper, we propose to represent actions from video clips by sets of combination of parts and human poses, which is quite different from traditional image-based feature representation. While previous works mainly focus on studying the problems of single person actions, we investigate the problem of human interaction events. In comparison with single person actions, interaction events are more complex since interaction events are performed by more than one person and traditional problems such as occlusion will become much more challenging. Our experiments show that representing actions by parts and poses can outperform our baseline methods, especially in some cases that are more difficult to traditional methods.	en
dc.description.provenance	Made available in DSpace on 2021-06-16T13:00:46Z (GMT). No. of bitstreams: 1 ntu-102-R00944004-1.pdf: 4264697 bytes, checksum: 2efbd3eedb132fe9f22cdfa0820b40bf (MD5) Previous issue date: 2013	en
dc.description.tableofcontents	口試委員會審定書i 誌謝ii Acknowledgements iv 摘要v Abstract vii 1 Introduction 1 2 Related Works 5 2.1 Related works for attributes . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 Related works for actions . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3 Dataset 8 3.1 The RGB-D surveillance-human-attribute dataset . . . . . . . . . . . . . 8 3.2 Interaction Event Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . 9 4 Human Attribute Detection by using Color-Depth Information 11 4.1 Parts information extraction . . . . . . . . . . . . . . . . . . . . . . . . 11 4.2 Part-based attributes proposed in our work . . . . . . . . . . . . . . . . . 14 4.3 Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 4.4 Model Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 5 Action as the composition of parts and poses 19 5.1 The Poselet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 5.2 Temporal pyramid instance feature . . . . . . . . . . . . . . . . . . . . . 19 6 Experimental results 21 6.1 Experiments on attributes . . . . . . . . . . . . . . . . . . . . . . . . . . 21 6.1.1 Dataset and baseline methods . . . . . . . . . . . . . . . . . . . 21 6.1.2 Comparisons and discussions . . . . . . . . . . . . . . . . . . . 22 6.2 Experiments on interaction events . . . . . . . . . . . . . . . . . . . . . 24 6.2.1 Baslines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 6.2.2 Comparisons to proposed method and discussions . . . . . . . . . 25 7 Conclusions 29 Bibliography 30
dc.language.iso	en
dc.subject	動作	zh_TW
dc.subject	監視	zh_TW
dc.subject	姿態	zh_TW
dc.subject	深度	zh_TW
dc.subject	部位	zh_TW
dc.subject	特徵	zh_TW
dc.subject	語意	zh_TW
dc.subject	part	en
dc.subject	depth	en
dc.subject	surveillance	en
dc.subject	action	en
dc.subject	semantic	en
dc.subject	pose	en
dc.subject	attributes	en
dc.title	於監視系統使用色彩及深度資訊進行特徵與事件之偵測	zh_TW
dc.title	Full Body Human Attribute Detection and Interaction Event Recognition in Indoor Surveillance Environment Using Color-Depth Information	en
dc.type	Thesis
dc.date.schoolyear	101-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	賴尚宏(Shang-Hong Lai),林彥宇(Yen-Yu Lin),梅濤(Tao Mei)
dc.subject.keyword	特徵,深度,監視,動作,語意,姿態,部位,	zh_TW
dc.subject.keyword	attributes,depth,surveillance,action,semantic,pose,part,	en
dc.relation.page	33
dc.rights.note	有償授權
dc.date.accepted	2013-08-08
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	資訊網路與多媒體研究所	zh_TW
顯示於系所單位：	資訊網路與多媒體研究所

文件中的檔案：

檔案	大小	格式
ntu-102-1.pdf 未授權公開取用	4.16 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。