以運動與記憶增強之網路用於線上即時之時空間動作偵測

宋體淮; Ti-Huai Song

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/90602

標題:	以運動與記憶增強之網路用於線上即時之時空間動作偵測 Motion & Memory-Augmented Network for Online Real-Time Spatio-Temporal Action Detection
作者:	宋體淮 Ti-Huai Song
指導教授:	傅立成 Li-Chen Fu
關鍵字:	深度學習,視訊影像內容分析,時空間動作偵測,知識蒸餾,記憶, deep learning,video content analysis,spatio-temporal action detection,knowledge distillation,memory,
出版年 :	2023
學位:	碩士
摘要:	隨著深度學習與視訊影像內容分析的迅速發展，線上即時之時空間動作偵測因其在真實場景應用中的優勢而受到業界和學術界的廣泛關注。該任務有兩個關鍵層面對於更好地理解人類動作而言至關重要，分別為運動建模和長期關聯建模。因此，我們提出了一套運動增強策略和記憶增強策略，以改善網路在運動方面和長期資訊方面的表示能力，從而實現線上即時之時空間動作偵測。近年來有許多研究利用了特徵蒸餾的技術，使得網路可以學習到運動知識並且在推理期間又可無需進行光流計算和雙流架構。然而，由於直接對特徵進行覆寫的關係，特徵蒸餾可能會導致網路的空間外觀訊息的損失，並且抑制網路對於關鍵運動知識的學習。因此，我們提出了一個運動增強策略稱作是注意力引導運動蒸餾，該技術能讓網路在保留其空間外觀知識的同時，又可在注意力機制的引導下選擇性地學習關鍵的運動訊息。現今在一些研究中所使用的長期關聯建模方法並不適合應用在需要線上即時的場景中。因此，我們提出了一個記憶增強策略稱作是線上特徵記憶，它能夠以在線的方式向網路提供長期資訊進而增強網路的偵測能力，且能達成更有效率的推理。此外，為能有效地融合長期資訊和當前資訊，我們提出了一個時序增強的記憶運算子，該運算子解決了傳統交叉注意機制在計算長期和當前資訊間的關係時的時序建模上的不足。在實驗部分，我們展示了我們的方法在兩個公開數據集J-HMDB-51和UCF101-24上的有效性。並且我們的方法與其他相關研究相比也取得了優異的表現。 With the rapid growth of the deep learning and video content analysis, the task of online, real-time spatio-temporal action detection attracts wide attention from the industry and academic fields due to its competence in real-world applications. There are two key aspects of the task that are crucial for better understanding of a human action: motion modeling and long-term dependency modeling. Therefore, we propose a motion-augmented strategy and a memory-augmented one to improve the representation ability of the network in terms of motion and long-term reasoning for online real-time spatio-temporal action detection. Some recent works explored feature distillation to allow the network to obtain motion knowledge without the need of optical flow calculation and two stream design during inference. However, feature distillation may cause loss of the appearance information of the network, and also suppress the network’s learning on crucial motion knowledge due to the direct overwrite of the feature maps. As a result, the motion-augmented strategy we propose in this work is called Attention-Guided Motion Distillation, which allows the network to retain its appearance knowledge while selectively learning crucial motion information with the guidance of attention mechanism. The approaches that have been utilized in recent works for long-term dependency modeling is not suitable for online, real-time applications. Therefore, the memory-augmented strategy we propose here is called Online Feature Memory, which not only enhances detection by providing the long-term information to the network in an online manner, but also allows for efficient inference. Besides, to achieve effective integration between the long-term and the current information, we propose the Temporal-Enhanced Memory Operator, which addresses the limitation of conventional cross-attention mechanism in temporal modeling when computing relations between the long-term and the current information. In the experiment part, we show the effectiveness of our method on two benchmark datasets, J-HMDB-51, and UCF101-24. Our method also achieves superior performance as compared with the other related works.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/90602
DOI:	10.6342/NTU202300791
全文授權:	未授權
顯示於系所單位：	電機工程學系

文件中的檔案：

檔案	大小	格式
ntu-111-2.pdf 未授權公開取用	3.24 MB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。