Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊網路與多媒體研究所
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/61298
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor徐宏民(Winston H. Hsu)
dc.contributor.authorHao-Jen Wangen
dc.contributor.author王浩任zh_TW
dc.date.accessioned2021-06-16T13:00:46Z-
dc.date.available2015-08-23
dc.date.copyright2013-08-23
dc.date.issued2013
dc.date.submitted2013-08-08
dc.identifier.citation[1] M. Andriluka, S. Roth, and B. Schiele. Pictorial structures revisited: People detection
and articulated pose estimation. In CVPR, pages 1014–1021. IEEE, 2009.
[2] M. Blank, L. Gorelick, E. Shechtman, M. Irani, and R. Basri. Actions as space-time
shapes. In The Tenth IEEE International Conference on Computer Vision (ICCV’05),
pages 1395–1402, 2005.
[3] L. Bourdev, S. Maji, and J. Malik. Describing people: Poselet-based attribute classification.
In International Conference on Computer Vision (ICCV), 2011.
[4] L. Bourdev and J. Malik. Poselets: Body part detectors trained using 3d human pose
annotations. In International Conference on Computer Vision (ICCV), 2009.
[5] C.-C. Chang and C.-J. Lin. LIBSVM: A library for support vector machines. ACM
Transactions on Intelligent Systems and Technology, 2:27:1–27:27, 2011. Software
available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
[6] N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In
Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision
and Pattern Recognition (CVPR’05) - Volume 1 - Volume 01, CVPR ’05, pages 886–
893, Washington, DC, USA, 2005. IEEE Computer Society.
[7] K. G. Derpanis, M. Sizintsev, K. Cannons, and R. P. Wildes. Efficient action spotting
based on a spacetime oriented structure representation. In ICPR, 2010.
[8] P. Dollar, V. Rabaud, G. Cottrell, and S. Belongie. Behavior recognition via sparse
spatio-temporal features. In Proceedings of the 14th International Conference on
Computer Communications and Networks, 2005.
[9] A. Farhadi, I. Endres, D. Hoiem, and D. Forsyth. Describing objects by their attributes.
In Proceedings of the IEEE Computer Society Conference on Computer
Vision and Pattern Recognition (CVPR), 2009.
[10] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan. Object detection
with discriminatively trained part based models. IEEE Transactions on Pattern
Analysis and Machine Intelligence, 32(9):1627–1645, 2010.
[11] P. F. Felzenszwalb and D. P. Huttenlocher. Pictorial structures for object recognition.
Int. J. Comput. Vision, 61(1):55–79, Jan. 2005.
[12] D. J. Field. Relations between the statistics of natural images and the response properties
of cortical cells. J. Opt. Soc. Am. A, 4:2379–2394, 1987.
[13] N. Kumar, A. C. Berg, P. N. Belhumeur, and S. K. Nayar. Describable visual attributes
for face verification and image search. In IEEE Transactions on Pattern
Analysis and Machine Intelligence (PAMI), October 2011.
[14] I. Laptev. On space-time interest points. Int. J. Comput. Vision, 2005.
[15] I. Laptev, M. Marszałek, C. Schmid, and B. Rozenfeld. Learning realistic human
actions from movies. In Conference on Computer Vision & Pattern Recognition,
2008.
[16] Y.-H. Lei, Y.-Y. Chen, B.-C. Chen, L. Iida, and W. H. Hsu. Where is who: largescale
photo retrieval by facial attributes and canvas layout. In W. R. Hersh, J. Callan,
Y. Maarek, and M. Sanderson, editors, SIGIR, pages 701–710. ACM, 2012.
[17] L.-J. Li, H. Su, E. P. Xing, and F.-F. Li. Object bank: A high-level image representation
for scene classification & semantic feature sparsification. In NIPS, 2010.
[18] W. Li, Z. Zhang, and Z. Liu. Action recognition based on a bag of 3d points. In
CVPR4HB10, 2010.
[19] S. Liu, J. Feng, Z. Song, T. Zhang, H. Lu, C. Xu, and S. Yan. Hi, magic closet,
tell me what to wear! In Proceedings of the 20th ACM international conference on
Multimedia, MM ’12, pages 619–628, New York, NY, USA, 2012. ACM.
[20] F. Lv and R. Nevatia. Single view human action recognition using key pose matching
and viterbi path searching. In CVPR, 2007.
[21] P. Natarajan and R. Nevatia. View and scale invariant action recognition using multiview
shape-flow models. In CVPR, 2008.
[22] B. Ni, G. Wang, and P. Moulin. Rgbd-hudaact: A color-depth video database for
human daily activity recognition. In IEEE Workshop on Consumer Depth Cameras
for Computer Vision in conjunction with ICCV, 2011.
[23] J. C. Niebles, H. Wang, and L. Fei-Fei. Unsupervised learning of human action
categories using spatial-temporal words. Int. J. Comput. Vision, 2008.
[24] T. Ojala, M. Pietikainen, and D. Harwood. A comparative study of texture measures
with classification based on featured distributions. Pattern Recognition, pages 51–
59, 1996.
[25] H. Pirsiavash and D. Ramanan. Detecting activities of daily living in first-person
camera views. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE
Conference on. IEEE, 2012.
[26] M. S. Ryoo and J. K. Aggarwal. Spatio-temporal relationship match: Video structure
comparison for recognition of complex human activities. In ICCV, 2009.
[27] S. Sadanand and J. J. Corso. Action bank: A high-level representation of activity in
video. In CVPR, 2012.
[28] W. Scheirer, N. Kumar, P. N. Belhumeur, and T. E. Boult. Multi-attribute spaces:
Calibration for attribute fusion and similarity search. In The 25th IEEE Conference
on Computer Vision and Pattern Recognition (CVPR), June 2012.
[29] J. Shotton, A. Fitzgibbon, M. Cook, T. Sharp, M. Finocchio, R. Moore, A. Kipman,
and A. Blake. Real-time human pose recognition in parts from single depth images.
In In In CVPR, 2011. 3, 2011.
[30] J. Sung, C. Ponce, B. Selman, and A. Saxena. Human activity detection from rgbd
images. In Plan, Activity, and Intent Recognition, volume WS-11-16 of AAAI Workshops.
AAAI, 2011.
[31] D. Vaquero, R. Feris, D. Tran, L. Brown, A. Hampapur, and M. Turk. Attribute-based
people search in surveillance environments. In IEEE Workshop on Applications of
Computer Vision (WACV’09), Snowbird, Utah, December 2009.
[32] D. Weinland, E. Boyer, and R. Ronfard. Action Recognition from Arbitrary Views
using 3D Exemplars. In International Conference on Computer Vision, 2007.
[33] D. Weinland, R. Ronfard, and E. Boyer. Free viewpoint action recognition using
motion history volumes. Comput. Vis. Image Underst., 2006.
[34] Y. Wu. Mining actionlet ensemble for action recognition with depth cameras. In Proceedings
of the 2012 IEEE Conference on Computer Vision and Pattern Recognition
(CVPR), CVPR ’12, 2012.
[35] G.-X. Yuan, C.-H. Ho, and C.-J. Lin. Recent advances of large-scale linear classification.
Proceedings of the IEEE, 100(9):2584–2603, 2012.
[36] Q. Zhou and G. Wang. Learning to recognize unsuccessful activities using a twolayer
latent structural model. In ECCV (3), pages 750–763, 2012.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/61298-
dc.description.abstract本論文前半段提出了一套於室內監視系統環境下偵測人物特徵以及互動行為的架構;特徵辨識在近幾年已逐漸成為時下
電腦視覺以及多媒體領域中相當熱門的研究題目,但傳統在物體偵測以及辨識上的解法仍然有相當多的難題需要克服,
幾乎無可避免的會受到如光影變化、物體或人的視角、姿勢等等的因素之影響。近年來,由於包含了紅外線等設備而發展出來的
深度儀不斷進步,有別於傳統純視覺領域的作法,我們提出了一套使用色彩與深度資訊於多視角下,基於人體的各個部位為基準來做人的特徵辨識之系統。在本研究中我們針對了數個對於監視環境之下相當重要的特徵,透過由3D資訊以及由3D資訊輔助下取得更精準的特徵值,學習出我們的辨識模型。而為了驗證我們提出之方法的成效以及評估系統的表現,我們選擇了數個當前最佳的辨識方法來作為我們實驗比較的對象,而實驗結果也彰顯出在如此低解析度,並且人的視角、姿勢等多變化的因素下,我們所提出的方法能有比這些作法更好的表現。而透過我們系統的概念,許多應用如篩選監視錄影畫面,找尋嫌疑犯等等都可以建立在此之上。
而在後半段的部分,有別於傳統自影像中抽取的低階層特徵值,中階層的特徵表示方法,或者可以說是更具有語意上的特徵表示方法已經在很多前人的研究中,取得了突破性的表現;在語意空間下的特徵值不只能提供辨識目標精簡的特徵表示,更能避免掉在低階層的特徵空間下一些雜質的影響。在本研究中,我們提出了將人的行為或動作,拆成由多個身體部位已經姿態的結合來表現其特徵的表示方法;先前的對於人的動作之相關研究主要都集中在單人的動作辨識上,而在這份研究中我們想研究的是人與人的互動行為,在這樣的前提下許多傳統於動作辨識會碰到的問題如遮蔽所造成的影響,將會讓問題變得更為困難。在我們這部份的實驗結果中,數據也呈現了我們提出的方法可以表現得比我們所比較的對象來得好,尤其是幾個對於傳統作法較為有困難度的動作類別。
zh_TW
dc.description.abstractAttributes have gained much attention in computer vision and multimedia research area in the recent years. With the advent of depth enabled sensors and increasing needs in surveillance systems, this thesis propose a novel framework to detect fine-grained human attributes (e.g., having backpack, talking on cell phone, wearing glasses) in the surveillance environments. Traditional detection and recognition methods always suffer from the problems such as variations in lighting conditions, poses, and viewpoints of object instances. To tackle these problems, we propose a multi-view part-based attribute detecting system based on color-depth inputs instead of utilizing color images. We address several important attributes in the surveillance environments and train multiple attribute classifiers based on features inferred from 3D information to construct our discriminative model. To justify the idea of our approach and evaluate the performance of our system, several state-of-the-art methods are compared and the experimental results show that our method is more robust under large variations in surveillance conditions, and human related issues such as pose, orientations and deformation of body parts. With the capability of our system, many applications can be built such as pre-filtering for browsing specific attributes related surveillance video frames, finding suspects or missing people.
Mid-level feature representation, or semantic features have showed the discriminative power beyond low-level features in many recent works. The semantic feature space can not only give compact representation but also invariant to certain low-level feature noise. In this paper, we propose to represent actions from video clips by sets of combination of parts and human poses, which is quite different from traditional image-based feature representation. While previous works mainly focus on studying the problems of single person actions, we investigate the problem of human interaction events. In comparison with single person actions, interaction events are more complex since interaction events are performed by more than one person and traditional problems such as occlusion will become much more challenging. Our experiments show that representing actions by parts and poses can outperform our baseline methods, especially in some cases that are more difficult to traditional methods.
en
dc.description.provenanceMade available in DSpace on 2021-06-16T13:00:46Z (GMT). No. of bitstreams: 1
ntu-102-R00944004-1.pdf: 4264697 bytes, checksum: 2efbd3eedb132fe9f22cdfa0820b40bf (MD5)
Previous issue date: 2013
en
dc.description.tableofcontents口試委員會審定書i
誌謝ii
Acknowledgements iv
摘要v
Abstract vii
1 Introduction 1
2 Related Works 5
2.1 Related works for attributes . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Related works for actions . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3 Dataset 8
3.1 The RGB-D surveillance-human-attribute dataset . . . . . . . . . . . . . 8
3.2 Interaction Event Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4 Human Attribute Detection by using Color-Depth Information 11
4.1 Parts information extraction . . . . . . . . . . . . . . . . . . . . . . . . 11
4.2 Part-based attributes proposed in our work . . . . . . . . . . . . . . . . . 14
4.3 Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.4 Model Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
5 Action as the composition of parts and poses 19
5.1 The Poselet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
5.2 Temporal pyramid instance feature . . . . . . . . . . . . . . . . . . . . . 19
6 Experimental results 21
6.1 Experiments on attributes . . . . . . . . . . . . . . . . . . . . . . . . . . 21
6.1.1 Dataset and baseline methods . . . . . . . . . . . . . . . . . . . 21
6.1.2 Comparisons and discussions . . . . . . . . . . . . . . . . . . . 22
6.2 Experiments on interaction events . . . . . . . . . . . . . . . . . . . . . 24
6.2.1 Baslines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
6.2.2 Comparisons to proposed method and discussions . . . . . . . . . 25
7 Conclusions 29
Bibliography 30
dc.language.isoen
dc.subject動作zh_TW
dc.subject監視zh_TW
dc.subject姿態zh_TW
dc.subject深度zh_TW
dc.subject部位zh_TW
dc.subject特徵zh_TW
dc.subject語意zh_TW
dc.subjectparten
dc.subjectdepthen
dc.subjectsurveillanceen
dc.subjectactionen
dc.subjectsemanticen
dc.subjectposeen
dc.subjectattributesen
dc.title於監視系統使用色彩及深度資訊進行特徵與事件之偵測zh_TW
dc.titleFull Body Human Attribute Detection and Interaction Event Recognition in Indoor Surveillance Environment Using Color-Depth Informationen
dc.typeThesis
dc.date.schoolyear101-2
dc.description.degree碩士
dc.contributor.oralexamcommittee賴尚宏(Shang-Hong Lai),林彥宇(Yen-Yu Lin),梅濤(Tao Mei)
dc.subject.keyword特徵,深度,監視,動作,語意,姿態,部位,zh_TW
dc.subject.keywordattributes,depth,surveillance,action,semantic,pose,part,en
dc.relation.page33
dc.rights.note有償授權
dc.date.accepted2013-08-08
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept資訊網路與多媒體研究所zh_TW
顯示於系所單位:資訊網路與多媒體研究所

文件中的檔案:
檔案 大小格式 
ntu-102-1.pdf
  未授權公開取用
4.16 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved