Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/74146
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor徐宏民(Winston H. Hsu)
dc.contributor.authorHu-Cheng Leeen
dc.contributor.author李胡丞zh_TW
dc.date.accessioned2021-06-17T08:21:48Z-
dc.date.available2022-08-21
dc.date.copyright2019-08-21
dc.date.issued2019
dc.date.submitted2019-08-13
dc.identifier.citation[1] S. Abu-El-Haija, N. Kothari, J. Lee, P. Natsev, G. Toderici, B. Varadarajan, and S. Vijayanarasimhan. Youtube-8m: A large-scale video classification benchmark. arXiv,2016.
[2] F. Caba Heilbron, V. Escorcia, B. Ghanem, and J. Carlos Niebles. Activitynet: A large-scale video benchmark for human activity understanding. In CVPR, 2015.
[3] Z.Cao,T.Simon,S.-E.Wei,and Y.Sheikh. Real time multi-person 2d pose estimation using part affinity fields. In CVPR, 2017.
[4] J.Carreira etal.Quovadis,action recognition? a new model and the kinetics dataset. In CVPR, 2017.
[5] J. Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell. Long-term recurrent convolutional networks for visual recognition and description.In CVPR, 2015.
[6] C.Feichtenhofer, A.Pinz,and R.Wildes. Spatio temporal residual networks for video action recognition. In NIPS, 2016.
[7] K.He, X.Zhang, S.Ren, and J.Sun. Deep residual learning for image recognition. In CVPR, 2016.
[8] J.Hu, L.Shen, and G.Sun. Squeeze-and-excitation networks. In CVPR, 2018.
[9] H.Jégou, M. Douze, C.Schmid, and P.Pérez. Aggregating local descriptors into a compact image representation. In CVPR,2010.
[10] A.Karpathy, G.Toderici, S.Shetty, T.Leung, R.Sukthankar, and L.Fei-Fei. Large scale video classification with convolutional neural networks. In CVPR, 2014.
[11] W.Kay, J.Carreira, K.Simonyan, B.Zhang, C.Hillier, S.Vijayanarasimhan, F.Viola, T. Green, T. Back, P. Natsev, et al. The kinetics human action video dataset. arXiv, 2017.
[12] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS,2012.
[13] H. Kuehne, H. Jhuang, E. Garrote, T. Poggio, and T. Serre. HMDB: a large video database for human motion recognition. In ICCV,2011.
[14] T.-Y. Lin, A. Roy Chowdhury, and S. Maji. Bilinear cnn models for fine-grained visualrecognition. In ICCV,2015.
[15] D.G.Lowe. Object recognition from local scale-invariant features. In ICCV,1999.
[16] C.-Y.Ma, A.Kadav, I.Melvin, Z.Kira, G.AlRegib, and H.PeterGraf. Attendand interact: Higher-order object interactions for video understanding. In CVPR, 2018.
[17] M. Monfort, B. Zhou, S. A. Bargal, A. Andonian, T. Yan, K. Ramakrishnan, L. Brown, Q. Fan, D. Gutfruend, C. Vondrick, et al. Moments in time dataset: one million videos for event understanding. arXiv, 2018.
[18] F.Perronnin, J.Sánchez, and T.Mensink. Improving the fisher kernel for large-scale image classification. In ECCV,2010.
[19] Rohrbachetal. A database for finegrained activity detection of cooking activities. In CVPR, 2012.
[20] K. Simonyan and A. Zisserman. Two-stream convolutional networks for action recognition in videos. In NIPS,2014.
[21] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In ICLR,2015.
[22] K. Soomro, A. R. Zamir, and M. Shah. Ucf101: A dataset of 101 human actions classes from videos in the wild. arXiv, 2012.
[23] L. Sun et al. Human action recognition using factorized spatio-temporal convolutionalnetworks. In ICCV, 2015.
[24] D.Tran, L.Bourdev, R.Fergus, L.Torresani,and M.Paluri. Learning spatio temporal features with 3d convolutional networks. In ICCV, 2015.
[25] A.Vaswani, N.Shazeer, N.Parmar, J.Uszkoreit, L.Jones, A.N.Gomez, Ł.Kaiser, and I.Polosukhin. Attention is all you need. In NIPS, 2017.
[26] L. Wang et al. Action recognition with trajectory-pooled deep-convolutional descriptors. In CVPR, 2015.
[27] L.Wang, Y.Xiong, Z.Wang, Y.Qiao, D.Lin, X.Tang, and L.ValGool. Temporal segment networks: Towards good practices for deep action recognition. In ECCV, 2016.
[28] X.Wangetal. Actions transformations. In CVPR, 2016.
[29] H.Zhang, T.Xu,M.Elhoseiny, X.Huang, S.Zhang, A.Elgammal, and D.Metaxas. Spda-cnn: Unifying semantic part detection and abstraction for fine-grained recognition. In CVPR, 2016.
[30] N.Zhang, E.Shelhamer, Y.Gao, and T.Darrell. Fine-grained pose prediction, normalization, and recognition. arXiv, 2015.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/74146-
dc.description.abstract卷積神經網路在近期影片動作識別的研究上有很大的進展。然而先前的研究只注重在影片的整體外觀資訊,忽略了不同動作之間的相似性以及相同動作之間的多樣性。我們從現今動作識別方法的失敗例子觀察到,這些被分類錯誤的動作會因為外觀上的像似性而容易彼此混淆。因此,我們將特別針對此類問題來解決—細精度動作識別。此類問題的挑戰有兩項:同一動作類別內有高度差異、以及不同動作類別間有相似性。在此篇研究中,我們提出了多串流雙線性卷積模型來解決細精度動作識別問題。我們的模型同時使用了影像整體的外觀資訊來對整部影片取出重點特徵,以及使用了細精度的跨模態互動來獲得不同模態間的關係。我們將模型使用在 HMDB51 和部分 Kinetics資料集,成功地提升細精度動作識別的準確性以及降低混淆率,證明我們的模型對細精度動作識別有更完整的了解。zh_TW
dc.description.abstractRecent studies have demonstrated the effectiveness of convolutional neural networks for action recognition. However, previous works only focus on coarse-grained appearance and ignore the similarity between action classes and diversity within the same action class. Our motivation stems from the observation that some failure cases of existing approaches are easily confused with each other. Towards this end, we target at a promising direction -- Fine-Grained Action Recognition. The challenges of fine-grained action recognition are high intra-class variation and low inter-class variation. In this paper, we propose Multi-Stream Bilinear Model (MSBM) to address fine-grained action recognition problem. Our model leverages both coarse-grained context information and fine-grained cross-modality (CM) interaction to summarize the whole video sequence and capture the relationship between different modalities simultaneously. We demonstrate the influence of modeling cross-modality interaction with informative CM channel selection for significantly improving the accuracy and reducing the confusion rate between easily-confused classes. Evaluating our approach on HMDB51 and the subset of Kinetics, we show that our MSBM performs favorably against the state-of-the-art architectures, enabling a richer understanding of fine-grained action recognition in video.en
dc.description.provenanceMade available in DSpace on 2021-06-17T08:21:48Z (GMT). No. of bitstreams: 1
ntu-108-R05922174-1.pdf: 1946468 bytes, checksum: d030d58ae5f9e63941dad9700c026ee0 (MD5)
Previous issue date: 2019
en
dc.description.tableofcontents口試委員會審定書 iii
誌謝 v
摘要 vii
Abstract ix
1 Introduction 1
2 Related Work 5
2.1 Action Recognition 5
2.2 Fine-Grained Imag Classification 6
3 Network Architecture 9
3.1 Coarse-grained–Context Information 9
3.2 Fine-grained–Cross-Modality Interaction 11
3.3 Consensus 14
4 Experiments 15
4.1 Datasets 15
4.2 Implementation Details 16
4.3 Main Results 17
5 Conclusion 23
Bibliography 25
dc.language.isoen
dc.subject動作識別zh_TW
dc.subject卷積神經網路zh_TW
dc.subject多串流雙線性卷積zh_TW
dc.subject跨模態互動zh_TW
dc.subjectaction recognitionen
dc.subjectconvolutional neural networken
dc.subjectmulti-stream bilinear convolutionen
dc.subjectcross-modality interactionen
dc.title利用多串流雙線性卷積模型應用在細精度動作識別zh_TW
dc.titleMSBM: Multi-Stream Bilinear Model for Fine-grained Action Recognitionen
dc.typeThesis
dc.date.schoolyear107-2
dc.description.degree碩士
dc.contributor.oralexamcommittee陳文進,葉梅珍,余能豪
dc.subject.keyword動作識別,卷積神經網路,多串流雙線性卷積,跨模態互動,zh_TW
dc.subject.keywordaction recognition,convolutional neural network,multi-stream bilinear convolution,cross-modality interaction,en
dc.relation.page27
dc.identifier.doi10.6342/NTU201902156
dc.rights.note有償授權
dc.date.accepted2019-08-14
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept資訊工程學研究所zh_TW
顯示於系所單位:資訊工程學系

文件中的檔案:
檔案 大小格式 
ntu-108-1.pdf
  未授權公開取用
1.9 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved