基於骨架的多尺度特徵對齊用於穩健時間動作定位

廖金億; Chin-Yi Liao

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/96750

標題:	基於骨架的多尺度特徵對齊用於穩健時間動作定位 SMART: Skeleton-based Multi-scale Feature Alignment for Robust Temporal Action Localization
作者:	廖金億 Chin-Yi Liao
指導教授:	許永真 Jane Yung-jen Hsu
共同指導教授:	傅立成 Li-Chen Fu
關鍵字:	時序動作定位,弱監督學習,特徵對齊,骨架資料,影片理解, Temporal action localization,Weakly supervised learning,Feature alignment,Skeleton Data,Video Understanding,
出版年 :	2024
學位:	碩士
摘要:	由於有限的標註和缺乏骨骼結構以外的資訊，基於骨骼資訊的弱監督時間動作定位問題面臨著巨大的挑戰。我們提出了SMART,這是一種創新的方法,透過幾項關鍵貢獻解決了這些限制。首先，藉由引入多尺度特徵金字塔概念,SMART捕捉了更豐富的特徵資訊,提升了對影片序列中動作的整體理解。另外，我們的研究提出了兩個創新模組,以提高動作定位的準確性和穩健性。類別加權特徵對齊模組透過有效地對齊不同尺度的特徵,提高了動作識別和定位的精確度。動態高斯實例融合模組生成更高品質的動作邊界,並改善了對各種動作類型和持續時間的適應性。在Babel資料集和我們實驗室專有的AIMS資料集上進行評估,SMART在基於骨骼的弱監督時間動作定位任務中達到了最先進的表現。這項研究代表了在解決多樣化影片中有限標註的動作定位挑戰方面的重大進展。 Skeleton-based weakly supervised temporal action localization faces challenges due to limited annotations and the lack of information beyond skeletal structures. We introduce SMART, a novel approach that addresses these limitations through several key contributions. By incorporating a multi-scale feature pyramid concept, SMART captures richer feature information, enhancing the overall understanding of actions in video sequences. Our work presents two innovative modules to improve action localization accuracy and robustness. The Class-Weighted Feature Alignment module enhances action identification and localization precision by effectively aligning features across different scales. The Dynamic Gaussian-Based Instance Fusion module generates higher-quality action boundaries with improved adaptability to various action types and durations. Evaluated on the Babel dataset and our lab's proprietary AIMS dataset, SMART achieves state-of-the-art performance in skeleton-based weakly supervised temporal action localization. This work represents a significant advancement in addressing the challenges of action localization with limited annotations in diverse video understanding scenarios.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/96750
DOI:	10.6342/NTU202404729
全文授權:	未授權
電子全文公開日期:	N/A
顯示於系所單位：	資訊網路與多媒體研究所

文件中的檔案：

檔案	大小	格式
ntu-113-1.pdf 未授權公開取用	3.42 MB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。