應用於3D人體姿態估計的全局與局部交替混合注意力模型

林宏信; Hong-Xin Lin

Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/88557

Title:	應用於3D人體姿態估計的全局與局部交替混合注意力模型 AMPose: Alternately Mixed Global-Local Attention Model for 3D Human Pose Estimation
Authors:	林宏信 Hong-Xin Lin
Advisor:	吳沛遠 Pei-Yuan Wu
Keyword:	圖卷積,自注意力機制,3D人體姿態, Graph convolution neural network,3D human pose,Transformer,
Publication Year :	2023
Degree:	碩士
Abstract:	3D人體姿態估計在復健、高爾夫和棒球等領域被廣泛的應用。過去研究分為從影片中的多張連續圖片或僅單張圖片來進行人體3D重建。圖卷積因可以定義人體的骨架關係來增強資料間的關聯，所以普遍被使用在3D人體姿態估計的領域，並且過去的研究與實驗結果證實圖卷積可以更精確地重建3D人體姿態。近年在多個電腦視覺的子領域發現自注意機制之優越性，且在許多資料集取得優異的成果。然而，在3D的領域中，人體關節點間的關聯不盡然可以透過純粹的自注意力機制來表達，並且過去圖卷積已經提出非常多的方法來考慮人體關節點間之關聯。本研究主要在改善自注意力機制沒辦法完全的利用人體骨架的問題，並提升重建3D人體骨架的表現。我們藉由交替的混合自注意力機制和圖卷積的模型，來獲取局部和全局的關聯性來得到更全面的特徵向量，進而得到3D關節點位置。我們廣泛的測試模型可能的各種變因來證明所提模型之有效性，並且在公開資料集Human3.6M和MPI-INF-3DHP上都取得相當好的結果，並超越現有模型。 Single-image 3D human pose estimation (HPE) has many applications in rehabilitation, golf, and baseball fields. Over the past few years, much research has involved reconstructing the human skeleton from either a series of video frames or a single image. Previous studies have commonly discussed the utilization of graph convolutional networks (GCNs) as a means to address 3D HPE, and substantial experiments have verified the efficacy of GCNs for this purpose. Recently, Transformer-based models have attracted considerable interest because of their excellent capacity for relating multiple frames. Nevertheless, the pure Transformer method in the single-frame condition cannot exploit the characteristics of the human joints. To address this, we introduce AMPose as an innovative approach that combines Transformer and GCN blocks to capture global and local dependencies among human joints. By leveraging the strengths of both modules, AMPose achieves a comprehensive understanding of human joint interactions. In order to assess the effectiveness of AMPose, we conduct experiments using well-known public datasets, including MPI-INF-3DHP and Human3.6M. Consequently, AMPose beats state-of-the-art models on both datasets, demonstrating superior generalization ability through cross-dataset comparisons.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/88557
DOI:	10.6342/NTU202302164
Fulltext Rights:	同意授權(全球公開)
Appears in Collections:	電信工程學研究所

Files in This Item:

File	Size	Format
ntu-111-2.pdf	1.27 MB	Adobe PDF	View/Open

Show full item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets