請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/88899
標題: | 以變換器方法來修補缺失關節點用於骨架為基礎的動作識別 A Transformer Approach to Recovering Missing Joints in Skeleton-Based Human Activity Recognition |
作者: | 魏資碩 Tzu-Shuo Wei |
指導教授: | 許永真 Yung-Jen Hsu |
關鍵字: | 資料缺失修補,變換器方法,人體骨架表示法,深度學習,人類活動辨識, Missing Joints Recovery,Transformer Approach,Human Skeleton Representation,Deep Learning,Human Activity Recognition, |
出版年 : | 2023 |
學位: | 碩士 |
摘要: | 人類活動辨識(Human Activity Recognition)很常透過骨架座標的方式表達動態關係。然而,當我們透過人體姿態檢測模型(Pose Estimation Model)來對RGB影片進行辨識,以生成骨架資料時,會因為目標物被畫面邊緣切割而造成資料缺失。這些缺失都是發生在人體四隻肢體,且都是從距離人體軀幹最遠的部分開始缺失。而這些缺失對人類活動辨識造成負面影響。
為了避免人類活動辨識錯誤,已經有許多針對人體骨架缺失還原的研究。然而,過去都是針對少量缺失點散佈在骨架序列的情快來做修補。然而,當缺失點發生長時間缺失而且在四肢肢體時,目前都沒辦法準確的修補這些缺失。因此,本研究將針對人體骨架序列中,單一肢體的邊緣點長時間資料缺失,並利用深度學習模型來還原缺失的資料點。 在本論文中,我們提出的群組式抽樣(Group-based Sampling),來增加資料歧異度以及資料數量。在訓練方面,我們設計了專屬的兩階段訓練(Two-stage Training),同時透過掩碼語言模型(Masked Language Model)的遮蔽訓練方式,以漸進式的遮蔽骨架來讓模型逐漸學習不同動作下缺失區域的運動軌跡。我們實作了混合結構的變換器模型(Transformer Model),能同時萃取骨架結構特徵以及變化特徵,並將所獲得的特徵有效的混合,讓後續預測模塊能對缺失區塊做準確預測。 本研究首先於Human3.6M資料集做實驗。我們發現同時萃取骨架結構特徵以及變化特徵,在修補缺失區塊以及重建整體骨架序列的準確度最高。最後,我們的方法雖然目前只能針對人體骨架單一肢體長時間缺失做修補,但對於缺失區塊的修補能力以及骨架的重建能力都成功超過了最先進的(state-of-the-art)方法。 Human Activity Recognition (HAR) often employs skeleton coordinates to express dynamic relationships. However, when RGB videos are recognized through pose estimation model to generate skeleton data, data loss may occur due to the target being cut off by the edge of the screen. This loss typically occurs in the four limbs of the human body, and starts from the part farthest from the torso. This kind of data loss have a negative impact on human activity recognition. In order to avoid errors in human activity recognition, there have been many studies focusing on the recovery of missing human skeleton joints. However, previous research has primarily targeted the repair of a small number of missing joints scattered in the skeleton sequence. Nevertheless, when missing joints occur over a long period of time and in the limb, there are currently no accurate methods for recovering these missing joints. Therefore, this study specifically targets prolonged data loss in the distal joints of a single limb within the human skeleton sequence and employs a deep learning model to recover the missing joints. In this thesis, we propose a group-based sampling method to increase data diversity and quantity. For training, we design a two-stage training strategy along with a masking strategy that progressively masks the skeleton using a masked language modeling training technique. This allows the model to gradually learn the motion trajectories of missing areas across different actions. We also implement a hybrid transformer-based model that extracts both structural and motion features from the skeleton and effectively combines these features. This enables accurate prediction of missing areas by subsequent prediction modules of the model. We first conducted our experiments on the Human3.6M dataset. In our experiments, we found that simultaneously extracting structural and motion features achieves the highest accuracy in recovering missing areas and reconstructing the sequence of skeleton. Although our method currently addresses long-term missing in a single limb of the human skeleton, it surpasses state-of-the-art methods in terms of both the ability to recover missing areas and reconstruct the sequence of skeleton. |
URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/88899 |
DOI: | 10.6342/NTU202302931 |
全文授權: | 同意授權(限校園內公開) |
顯示於系所單位: | 資訊網路與多媒體研究所 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-111-2.pdf 授權僅限NTU校內IP使用(校園外請利用VPN校外連線服務) | 10.34 MB | Adobe PDF | 檢視/開啟 |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。