利用多串流雙線性卷積模型應用在細精度動作識別

Hu-Cheng Lee; 李胡丞

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/74146

標題:	利用多串流雙線性卷積模型應用在細精度動作識別 MSBM: Multi-Stream Bilinear Model for Fine-grained Action Recognition
作者:	Hu-Cheng Lee 李胡丞
指導教授:	徐宏民(Winston H. Hsu)
關鍵字:	動作識別,卷積神經網路,多串流雙線性卷積,跨模態互動, action recognition,convolutional neural network,multi-stream bilinear convolution,cross-modality interaction,
出版年 :	2019
學位:	碩士
摘要:	卷積神經網路在近期影片動作識別的研究上有很大的進展。然而先前的研究只注重在影片的整體外觀資訊，忽略了不同動作之間的相似性以及相同動作之間的多樣性。我們從現今動作識別方法的失敗例子觀察到，這些被分類錯誤的動作會因為外觀上的像似性而容易彼此混淆。因此，我們將特別針對此類問題來解決—細精度動作識別。此類問題的挑戰有兩項：同一動作類別內有高度差異、以及不同動作類別間有相似性。在此篇研究中，我們提出了多串流雙線性卷積模型來解決細精度動作識別問題。我們的模型同時使用了影像整體的外觀資訊來對整部影片取出重點特徵，以及使用了細精度的跨模態互動來獲得不同模態間的關係。我們將模型使用在 HMDB51 和部分 Kinetics資料集，成功地提升細精度動作識別的準確性以及降低混淆率，證明我們的模型對細精度動作識別有更完整的了解。 Recent studies have demonstrated the effectiveness of convolutional neural networks for action recognition. However, previous works only focus on coarse-grained appearance and ignore the similarity between action classes and diversity within the same action class. Our motivation stems from the observation that some failure cases of existing approaches are easily confused with each other. Towards this end, we target at a promising direction -- Fine-Grained Action Recognition. The challenges of fine-grained action recognition are high intra-class variation and low inter-class variation. In this paper, we propose Multi-Stream Bilinear Model (MSBM) to address fine-grained action recognition problem. Our model leverages both coarse-grained context information and fine-grained cross-modality (CM) interaction to summarize the whole video sequence and capture the relationship between different modalities simultaneously. We demonstrate the influence of modeling cross-modality interaction with informative CM channel selection for significantly improving the accuracy and reducing the confusion rate between easily-confused classes. Evaluating our approach on HMDB51 and the subset of Kinetics, we show that our MSBM performs favorably against the state-of-the-art architectures, enabling a richer understanding of fine-grained action recognition in video.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/74146
DOI:	10.6342/NTU201902156
全文授權:	有償授權
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-108-1.pdf 目前未授權公開取用	1.9 MB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。