Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/56663
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor傅立成
dc.contributor.authorYen-Pin Hsuen
dc.contributor.author許彥彬zh_TW
dc.date.accessioned2021-06-16T05:40:44Z-
dc.date.available2017-08-17
dc.date.copyright2014-08-17
dc.date.issued2014
dc.date.submitted2014-08-12
dc.identifier.citation[1] P. Viola and M. J. Jones, “Robust real-time face detection,” International journal of
computer vision, vol. 57, no. 2, pp. 137–154, 2004.
[2] C. Huang, H. Ai, Y. Li, and S. Lao, “High-performance rotation invariant multiview
face detection,” IEEE Transactions on Pattern Analysis and Machine Intelligence
(PAMI), vol. 29, no. 4, pp. 671–686, 2007.
[3] W. Zhang, S. Shan, W. Gao, X. Chen, and H. Zhang, “Local gabor binary pattern
histogram sequence (lgbphs): A novel non-statistical model for face representation
and recognition,” in IEEE International Conference on Computer Vision (ICCV),
vol. 1, 2005, pp. 786–791.
[4] N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,”
in IEEE International Conference on Computer Vision and Pattern Recognition
(CVPR), vol. 1, 2005, pp. 886–893.
[5] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan, “Object detection with discriminatively trained part-based models,” IEEE Transactions on Pattern
Analysis and Machine Intelligence (PAMI), vol. 32, no. 9, pp. 1627–1645, 2010.
[6] P. F. Felzenszwalb and D. P. Huttenlocher, “Pictorial structures for object recognition,” International Journal of Computer Vision, vol. 61, no. 1, pp. 55–79, 2005.
[7] Y. Yang and D. Ramanan, “Articulated pose estimation with flexible mixtures-ofparts,” in IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), 2011, pp. 1385–1392.
[8] J. Shotton, T. Sharp, A. Kipman, A. Fitzgibbon, M. Finocchio, A. Blake, M. Cook,
and R. Moore, “Real-time human pose recognition in parts from single depth images,” Communications of the ACM, vol. 56, no. 1, pp. 116–124, 2013.
[9] OpenNI, “http://www.openni.org/.”
[10] J.-S. Tsai, Y.-P. Hsu, C. Liu, and L.-C. Fu, “An efficient part-based approach to action
recognition from rgb-d video with bow-pyramid representation,” in IEEE International Conference on Intelligent Robots and Systems (IROS), 2013, pp. 2234–2239.
[11] A. F. Bobick and J. W. Davis, “The recognition of human movement using temporal templates,” IEEE Transactions on Pattern Analysis and Machine Intelligence
(PAMI), vol. 23, no. 3, pp. 257–267, 2001.
[12] D. Weinland, R. Ronfard, and E. Boyer, “Free viewpoint action recognition using
motion history volumes,” Computer Vision and Image Understanding, vol. 104,
no. 2, pp. 249–257, 2006.
[13] C. Ellis, S. Z. Masood, M. F. Tappen, J. J. Laviola Jr, and R. Sukthankar, “Exploring the trade-off between accuracy and observational latency in action recognition,”
International Journal of Computer Vision, vol. 101, no. 3, pp. 420–436, 2013.
[14] J. Aggarwal and M. S. Ryoo, “Human activity analysis: A review,” ACM Computing
Surveys (CSUR), vol. 43, no. 3, p. 16, 2011.
[15] M. Blank, L. Gorelick, E. Shechtman, M. Irani, and R. Basri, “Actions as space-time
shapes,” in IEEE International Conference on Computer Vision, vol. 2, 2005, pp.
1395–1402.
[16] A. Yilmaz and M. Shah, “Actions sketch: A novel action representation,” in IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), vol. 1,
2005, pp. 984–989.
[17] H. Wang, A. Klaser, C. Schmid, and C.-L. Liu, “Action recognition by dense trajectories,” in IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), 2011, pp. 3169–3176.
[18] I. Laptev, M. Marszalek, C. Schmid, and B. Rozenfeld, “Learning realistic human
actions from movies,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2008, pp. 1–8.
[19] P. Dollar, V. Rabaud, G. Cottrell, and S. Belongie, “Behavior recognition via sparse
spatio-temporal features,” in IEEE International Workshop on Visual Surveillance
and Performance Evaluation of Tracking and Surveillance, 2005, pp. 65–72.
[20] T. Darrell and A. Pentland, “Space-time gestures,” in IEEE International Conference
on Computer Vision and Pattern Recognition (CVPR), 1993, pp. 335–340.
[21] A. A. Efros, A. C. Berg, G. Mori, and J. Malik, “Recognizing action at a distance,”
in IEEE International Conference on Computer Vision (ICCV), 2003, pp. 726–733.
[22] L. Xia, C.-C. Chen, and J. Aggarwal, “View invariant human action recognition using
histograms of 3d joints,” in IEEE International Conference on Computer Vision and
Pattern Recognition Workshops (CVPRW), 2012, pp. 20–27.
[23] J. Lafferty, A. McCallum, and F. C. Pereira, “Conditional random fields: Probabilistic models for segmenting and labeling sequence data,” 2001.
[24] X. Sun, M. Chen, and A. Hauptmann, “Action recognition via local descriptors and
holistic features,” in IEEE International Conference on Computer Vision and Pattern
Recognition Workshops (CVPRW), 2009, pp. 58–65.
[25] ——, “Action recognition via local descriptors and holistic features,” in IEEE International Conference on Computer Vision and Pattern Recognition Workshops
(CVPRW), 2009, pp. 58–65.
[26] Y. Shen and H. Foroosh, “View-invariant action recognition from point triplets,”
IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), vol. 31,
no. 10, pp. 1898–1905, 2009.
[27] C. Rao, A. Yilmaz, and M. Shah, “View-invariant representation and recognition
of actions,” International Journal of Computer Vision, vol. 50, no. 2, pp. 203–226,
2002.
[28] V. Parameswaran and R. Chellappa, “View invariance for human action recognition,”
International Journal of Computer Vision, vol. 66, no. 1, pp. 83–101, 2006.
[29] Y. Zhang, K. Huang, Y. Huang, and T. Tan, “View-invariant action recognition using
cross ratios across frames,” in IEEE International Conference on Image Processing
(ICIP), 2009, pp. 3549–3552.
[30] M.-C. Roh, H.-K. Shin, and S.-W. Lee, “View-independent human action recognition
with volume motion template on single stereo camera,” Pattern Recognition Letters,
vol. 31, no. 7, pp. 639–647, 2010.
[31] D. Weinland, E. Boyer, and R. Ronfard, “Action recognition from arbitrary views
using 3d exemplars,” in IEEE International Conference on Computer Vision (ICCV),
2007, pp. 1–7.
[32] L. Xia, C.-C. Chen, and J. Aggarwal, “View invariant human action recognition using
histograms of 3d joints,” in IEEE International Conference on Computer Vision and
Pattern Recognition Workshops (CVPRW), 2012, pp. 20–27.
[33] J. Shotton, T. Sharp, A. Kipman, A. Fitzgibbon, M. Finocchio, A. Blake, M. Cook,
and R. Moore, “Real-time human pose recognition in parts from single depth images,” Communications of the ACM, vol. 56, no. 1, pp. 116–124, 2013.
[34] I. N. Junejo, E. Dexter, I. Laptev, and P. Perez, “Cross-view action recognition from
temporal self-similarities,” in European Conference on Computer Vision (ECCV).
Springer, 2008.
[35] I. N. Junejo, E. Dexter, I. Laptev, and P. Perez, “View-independent action recognition from temporal self-similarities,” IEEE Transactions on Pattern Analysis and
Machine Intelligence (PAMI), vol. 33, no. 1, pp. 172–185, 2011.
[36] J. Wang, C. Chen, and X. Zhu, “Free viewpoint action recognition based on selfsimilarities,” in IEEE International Conference on Signal Processing (ICSP), vol. 2,
2012, pp. 1131–1134.
[37] J. Wang and H. Zheng, “View-robust action recognition based on temporal selfsimilarities and dynamic time warping,” in IEEE International Conference on Computer Science and Automation Engineering (CSAE), vol. 2, 2012, pp. 498–502.
[38] B. D. Lucas, T. Kanade et al., “An iterative image registration technique with an
application to stereo vision.” in IJCAI, vol. 81, 1981, pp. 674–679.
[39] B. K. Horn and B. G. Schunck, “Determining optical flow,” in 1981 Technical Symposium East. International Society for Optics and Photonics, 1981, pp. 319–331.
[40] Z. S. Harris, “Distributional structure.” Word, 1954.
[41] G. Qiu, “Indexing chromatic and achromatic patterns for content-based colour image
retrieval,” Pattern Recognition, vol. 35, no. 8, pp. 1675–1686, 2002.
[42] J. MacQueen et al., “Some methods for classification and analysis of multivariate observations,” in Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, vol. 1, no. 281-297. California, USA, 1967, p. 14.
[43] B. E. Boser, I. M. Guyon, and V. N. Vapnik, “A training algorithm for optimal margin
classifiers,” in Proceedings of the fifth annual workshop on Computational learning
theory. ACM, 1992, pp. 144–152.
[44] C. Cortes and V. Vapnik, “Support-vector networks,” Machine learning, vol. 20,
no. 3, pp. 273–297, 1995.
[45] V. Parameswaran and R. Chellappa, “View invariance for human action recognition,”
International Journal of Computer Vision, vol. 66, no. 1, pp. 83–101, 2006.
[46] Y. Shen and H. Foroosh, “View-invariant action recognition using fundamental ratios,” in IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), 2008, pp. 1–6.
[47] A. S. Ogale, A. Karapurkar, and Y. Aloimonos, “View-invariant modeling and recognition of human actions using grammars,” in Dynamical vision. Springer, 2007,
pp. 115–126.
[48] A. Farhadi and M. K. Tabrizi, “Learning to recognize activities from the wrong view
point,” in European Conference on Computer Vision (ECCV). Springer, 2008, pp.
154–166.
[49] H. Zhang and L. E. Parker, “4-dimensional local spatio-temporal features for human
activity recognition,” in IEEE International Conference on Intelligent Robots and
Systems (IROS), 2011, pp. 2044–2049.
[50] I. Laptev, “On space-time interest points,” International Journal of Computer Vision,
vol. 64, no. 2-3, pp. 107–123, 2005.
[51] N. Dalal, B. Triggs, and C. Schmid, “Human detection using oriented histograms
of flow and appearance,” in European Conference on Computer Vision (ECCV).
Springer, 2006, pp. 428–441.
[52] M. J. Black, Y. Yacoob, and S. X. Ju, “Recognizing human motion using parameterized models of optical flow,” in Motion-Based Recognition. Springer, 1997, pp.
245–269.
[53] J. C. Niebles, H. Wang, and L. Fei-Fei, “Unsupervised learning of human action
categories using spatial-temporal words,” International journal of computer vision,
vol. 79, no. 3, pp. 299–318, 2008.
[54] B. Ni, G. Wang, and P. Moulin, “Rgbd-hudaact: A color-depth video database for
human daily activity recognition,” in Consumer Depth Cameras for Computer Vision. Springer, 2013, pp. 193–208.
[55] H. Pirsiavash and D. Ramanan, “Detecting activities of daily living in first-person
camera views,” in IEEE International Conference on Computer Vision and Pattern
Recognition (CVPR), 2012, pp. 2847–2854.
[56] P. Viola and M. Jones, “Rapid object detection using a boosted cascade of simple
features,” in IEEE International Computer Society Conference on Computer Vision
and Pattern Recognition (CVPR), vol. 1, 2001, pp. I–511.
[57] C.-C. Chang and C.-J. Lin, “Libsvm: a library for support vector machines,” ACM
Transactions on Intelligent Systems and Technology (TIST), vol. 2, no. 3, p. 27, 2011.
[58] M. Blank, L. Gorelick, E. Shechtman, M. Irani, and R. Basri, “Actions as space-time
shapes,” in IEEE International Conference on Computer Vision (ICCV), vol. 2, 2005,
pp. 1395–1402.
[59] D. Weinland, E. Boyer, and R. Ronfard, “Action recognition from arbitrary views
using 3d exemplars,” in International Conference on Computer Vision (ICCV), 2007,
pp. 1–7.
[60] H. Wang, C. Yuan, W. Hu, and C. Sun, “Supervised class-specific dictionary learning
for sparse modeling in action recognition,” Pattern Recognition, vol. 45, no. 11, pp.
3902–3911, 2012.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/56663-
dc.description.abstract近年來,動作辨識是影像視覺領域熱門的研究主題。
為了使系統能夠以最貼近人類
,最自然的方式來解讀精細且複雜的動作,我們採取視覺為基礎來設計系統;
人類在辨識他人的肢體動作時,不一定要從表演者的正前方,只要能夠獲取足夠
的視覺資訊,可以從各個視點去辨識。因此,在本篇論文中,我們的目標為
建造出一個以視覺為基礎的動作辨識系統,此系統可以不受視點的影響,在
獲得足夠的肢體資訊下皆可有效的分辨人類的動作。
為了達到此目的,我們引用了自身相似(Self-Similarity)的概念。不同的視點
即使做相同的動作,因為所看到的實際畫面不同,會萃取出不同的特徵,因此
不同以往的直接使用萃取之特徵建立模型,我們計算所有幀與幀之間的特徵距離
存取在一矩陣中稱之為自身相似矩陣(Self-Similarity Matrix),我們進一步將
此矩陣切割成多個子矩陣。接著利用我們提出的時間金字塔詞袋
(Temporal-Pyramid Bag-of-Words)來表示各個子矩陣,並利用所有子矩陣的
金字塔詞袋來表示一個動作。我們將時間金字塔詞袋做為輸入向量訓練出一支持
向量機藉此達到無關視角動作辨識之目的。
zh_TW
dc.description.abstractUnderstanding human action has drawn attention to the field of computer vision. We choose vision-based system so that computer system can understand human actions naturally. When people are recognizing actions of other people, the actors do not have to stand right in front of the observer. Therefore, in this thesis, we aim to build a vision-based action recognition system which is invariant to the viewpoint.
To achieve this goal, we include the idea of self-similarity. When two video sequences record a specific action from various camera views, the resulting appearances of actions would be entirely different. Consequently, if we simply apply feature extraction methods to the raw video, we will end up getting totally different features. Instead of doing the extraction of spatio-temporal feature for every frame and using these feature vectors directly, our study uses the Euclidean distance between feature vectors that are represented in a Self-Similarity Matrix (SSM). To recognize the action, we describe the local tendency of the SSM using pyramid-structural bag-of-words and train a Support-Vector Machine as our classifier. Extensive experiments have been conducted to validate the proposed action recognition system.
en
dc.description.provenanceMade available in DSpace on 2021-06-16T05:40:44Z (GMT). No. of bitstreams: 1
ntu-103-R01922124-1.pdf: 24928943 bytes, checksum: 8a3be5af22dbd10c507ad311e4f46fb0 (MD5)
Previous issue date: 2014
en
dc.description.tableofcontents致謝 i
Acknowledgements ii
摘要 iii
Abstract iv
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3.1 Action Recognition . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3.2 Dealing with Perspective of Camera View . . . . . . . . . . . . . 8
1.4 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.5 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2 Preliminaries 11
2.1 Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1.1 Histogram of Oriented Gradient . . . . . . . . . . . . . . . . . . 12
2.1.2 Histogram of Optical Flow . . . . . . . . . . . . . . . . . . . . . 13
v
2.2 Bag-of-Words Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2.1 Codebook Generation . . . . . . . . . . . . . . . . . . . . . . . 14
2.2.2 Histogram of Codewords . . . . . . . . . . . . . . . . . . . . . . 16
2.3 Support Vector Machine . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.3.1 Linear SVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.3.2 Soft Margin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.3.3 Nonlinear SVM . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.4 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3 Feature Extraction and Self Similarity 25
3.1 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.2 Spatio-Temporal Feature Extraction . . . . . . . . . . . . . . . . . . . . 28
3.3 Spatio-Temporal Self-Similarity Matrix . . . . . . . . . . . . . . . . . . 31
3.3.1 Self-Similarity . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.3.2 Spatio-Temporal Self-Similarity Matrix . . . . . . . . . . . . . . 32
3.4 Structural Stability of SSM across views . . . . . . . . . . . . . . . . . . 33
4 SSM-Based Action Description and Action Recognition 36
4.1 Local Feature Descriptor . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.2 Temporal Pyramid Bag-of-Word Representation . . . . . . . . . . . . . . 39
4.3 Action Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.3.1 Off-line Training . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.3.2 On-line Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5 Experiments 46
5.1 Experimental Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
vi
5.2 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.2.1 Weizmann Dataset . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.2.2 IXMAS Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.2.3 ViData Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
5.3.1 Temporal-Pyramid Bag-of-Words Evaluation . . . . . . . . . . . 51
5.3.2 View-Invariant Action Recognition Performance . . . . . . . . . 55
5.3.3 Action Spotting . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.3.4 Computational Cost Evaluation . . . . . . . . . . . . . . . . . . 59
6 Conclusion and Future Work 61
Reference 63
dc.language.isoen
dc.subject無關視點zh_TW
dc.subject自身相似zh_TW
dc.subject動作辨識zh_TW
dc.subjectSelf-Similarityen
dc.subjectView-Invarianten
dc.subjectAction Recognitionen
dc.title利用深度資訊與空間時間矩陣線上無關視點動作辨識zh_TW
dc.titleOnline View-invariant Human Action Recognition Using RGB-D Spatio-temporal Matrixen
dc.typeThesis
dc.date.schoolyear102-2
dc.description.degree碩士
dc.contributor.oralexamcommittee李蔡彥,陳祝嵩,黃正民,洪一平
dc.subject.keyword動作辨識,無關視點,自身相似,zh_TW
dc.subject.keywordAction Recognition,View-Invariant,Self-Similarity,en
dc.relation.page70
dc.rights.note有償授權
dc.date.accepted2014-08-12
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept資訊工程學研究所zh_TW
顯示於系所單位:資訊工程學系

文件中的檔案:
檔案 大小格式 
ntu-103-1.pdf
  未授權公開取用
24.34 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved