語義對齊與特徵解離於廣義零樣本動作識別

李勝維; Sheng-Wei Li

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/95986

標題:	語義對齊與特徵解離於廣義零樣本動作識別 SMARTEN: Semantic Alignment Through Feature Disentanglement For Generalized Zero-Shot Skeleton-Based Action Recognition
作者:	李勝維 Sheng-Wei Li
指導教授:	許永真 Jane Yung-jen Hsu
關鍵字:	零樣本學習,語義對齊,特徵解耦,基於骨架之動作識別, Zero-Shot Learning,Semantic Alignment,Feature Disentanglement,Skeleton-based Action Recognition,
出版年 :	2024
學位:	碩士
摘要:	在廣義零樣本基於骨架的動作識別中，現有方法通過特定模態的投影網絡學習骨架特徵和語義嵌入的共享潛在空間。然而，動作識別數據集中，骨架序列因樣本可變而類別標籤為恆定的非對稱性帶來了學習共享潛在空間時的重大挑戰。為了解決這一問題，我們引入了SMARTEN，一種基於對抗學習的特徵解耦方法，從骨架特徵中分離語義相關和無關的潛在變量，以更好地與語義嵌入對齊。利用特定模態的變分自編碼器（VAE）結合交叉重構損失，SMARTEN將語義相關的骨架特徵與語義嵌入對齊。我們的方法在零樣本和廣義零樣本動作識別中設立了新基準，在NTU RGB+D 60、NTU RGB+D 120和FineGym 99等數據集上顯示出顯著的改進。 In generalized zero-shot skeleton-based action recognition, existing approaches learn a shared latent space of skeleton features and semantic embeddings via modality-specific projection networks. However, the asymmetry in action recognition datasets, with variable skeleton sequences but constant class labels, poses significant challenges. Addressing this, we introduce SMARTEN, an adversarial-based feature disentanglement method separating semantic-related and unrelated latents from skeleton features for better alignment with semantic embeddings. Utilizing modality-specific variational autoencoders (VAEs) coupled with cross-reconstruction loss, SMARTEN adeptly aligns semantic-related skeleton features with semantic embeddings. Our approach sets new benchmarks in zero-shot and generalized zero-shot action recognition, demonstrating significant improvements over state-of-the-art methods on benchmark datasets such as NTU RGB+D 60, NTU RGB+D 120, and FineGym 99.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/95986
DOI:	10.6342/NTU202401280
全文授權:	同意授權(全球公開)
顯示於系所單位：	資訊網路與多媒體研究所

文件中的檔案：

檔案	大小	格式
ntu-113-1.pdf	3.08 MB	Adobe PDF	檢視/開啟

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。