基於骨架的多尺度特徵對齊用於穩健時間動作定位

廖金億; Chin-Yi Liao

Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/96750

Title:	基於骨架的多尺度特徵對齊用於穩健時間動作定位 SMART: Skeleton-based Multi-scale Feature Alignment for Robust Temporal Action Localization
Authors:	廖金億 Chin-Yi Liao
Advisor:	許永真 Jane Yung-jen Hsu
Co-Advisor:	傅立成 Li-Chen Fu
Keyword:	時序動作定位,弱監督學習,特徵對齊,骨架資料,影片理解, Temporal action localization,Weakly supervised learning,Feature alignment,Skeleton Data,Video Understanding,
Publication Year :	2024
Degree:	碩士
Abstract:	由於有限的標註和缺乏骨骼結構以外的資訊，基於骨骼資訊的弱監督時間動作定位問題面臨著巨大的挑戰。我們提出了SMART,這是一種創新的方法,透過幾項關鍵貢獻解決了這些限制。首先，藉由引入多尺度特徵金字塔概念,SMART捕捉了更豐富的特徵資訊,提升了對影片序列中動作的整體理解。另外，我們的研究提出了兩個創新模組,以提高動作定位的準確性和穩健性。類別加權特徵對齊模組透過有效地對齊不同尺度的特徵,提高了動作識別和定位的精確度。動態高斯實例融合模組生成更高品質的動作邊界,並改善了對各種動作類型和持續時間的適應性。在Babel資料集和我們實驗室專有的AIMS資料集上進行評估,SMART在基於骨骼的弱監督時間動作定位任務中達到了最先進的表現。這項研究代表了在解決多樣化影片中有限標註的動作定位挑戰方面的重大進展。 Skeleton-based weakly supervised temporal action localization faces challenges due to limited annotations and the lack of information beyond skeletal structures. We introduce SMART, a novel approach that addresses these limitations through several key contributions. By incorporating a multi-scale feature pyramid concept, SMART captures richer feature information, enhancing the overall understanding of actions in video sequences. Our work presents two innovative modules to improve action localization accuracy and robustness. The Class-Weighted Feature Alignment module enhances action identification and localization precision by effectively aligning features across different scales. The Dynamic Gaussian-Based Instance Fusion module generates higher-quality action boundaries with improved adaptability to various action types and durations. Evaluated on the Babel dataset and our lab's proprietary AIMS dataset, SMART achieves state-of-the-art performance in skeleton-based weakly supervised temporal action localization. This work represents a significant advancement in addressing the challenges of action localization with limited annotations in diverse video understanding scenarios.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/96750
DOI:	10.6342/NTU202404729
Fulltext Rights:	未授權
metadata.dc.date.embargo-lift:	N/A
Appears in Collections:	資訊網路與多媒體研究所

Files in This Item:

File	Size	Format
ntu-113-1.pdf Restricted Access	3.42 MB	Adobe PDF

Show full item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets