Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊網路與多媒體研究所
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/94372
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor張智星zh_TW
dc.contributor.advisorJyh-Shing Roger Jangen
dc.contributor.author許智翔zh_TW
dc.contributor.authorChih-Hsiang Hsuen
dc.date.accessioned2024-08-15T17:07:45Z-
dc.date.available2024-08-16-
dc.date.copyright2024-08-15-
dc.date.issued2024-
dc.date.submitted2024-08-02-
dc.identifier.citationC. S. Catalin Ionescu, Fuxin Li. Latent structured models for human pose estimation. In International Conference on Computer Vision, 2011.
T. Chen, C. Fang, X. Shen, Y. Zhu, Z. Chen, and J. Luo. Anatomy-aware 3d human pose estimation with bone-based pose decomposition. IEEE Transactions on Circuits and Systems for Video Technology, 32(1):198–209, 2021.
Y. Chen, Z. Wang, Y. Peng, Z. Zhang, G. Yu, and J. Sun. Cascaded pyramid network for multi-person pose estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7103–7112, 2018.
H. Choi, G. Moon, and K. M. Lee. Pose2mesh: Graph convolutional network for 3d human pose and mesh recovery from a 2d human pose. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VII 16, pages 769–787. Springer, 2020.
J. Chung, C. Gulcehre, K. Cho, and Y. Bengio. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555, 2014.
Z. Geng, K. Sun, B. Xiao, Z. Zhang, and J. Wang. Bottom-up human pose estimation via disentangled keypoint regression. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 14676–14686, 2021.
J. Gong, L. G. Foo, Z. Fan, Q. Ke, H. Rahmani, and J. Liu. Diffpose: Toward more reliable 3d pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13041–13051, 2023.
M. R. I. Hossain and J. J. Little. Exploiting temporal information for 3d human pose estimation. In Proceedings of the European conference on computer vision (ECCV), pages 68–84, 2018.
C. Ionescu, D. Papava, V. Olaru, and C. Sminchisescu. Human3.6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(7):1325–1339, JUL 2014.
A. Kanazawa, M. J. Black, D. W. Jacobs, and J. Malik. End-to-end recovery of human shape and pose. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7122–7131, 2018.
M. Kocabas, N. Athanasiou, and M. J. Black. Vibe: Video inference for human body pose and shape estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5253–5263, 2020.
K. Lee, I. Lee, and S. Lee. Propagating lstm: 3d pose estimation based on joint interdependency. In Proceedings of the European conference on computer vision (ECCV), pages 119–135, 2018.
W. Li, H. Liu, H. Tang, P. Wang, and L. Van Gool. Mhformer: Multi-hypothesis transformer for 3d human pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13147–13156, 2022.
M. Loper, N. Mahmood, J. Romero, G. Pons-Moll, and M. J. Black. SMPL: A skinned multi-person linear model. ACM Trans. Graphics (Proc. SIGGRAPH Asia), 34(6):248:1–248:16, Oct. 2015.
D. Pavllo, C. Feichtenhofer, D. Grangier, and M. Auli. 3d human pose estimation in video with temporal convolutions and semi-supervised training. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 7753–7762, 2019.
J. Peng, Y. Zhou, and P. Mok. Ktpformer: Kinematics and trajectory prior knowledge-enhanced transformer for 3d human pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1123–1132, 2024.
W. Shan, Z. Liu, X. Zhang, Z. Wang, K. Han, S. Wang, S. Ma, and W. Gao. Diffusion-based 3d human pose estimation with multi-hypothesis aggregation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 14761–14771, 2023.
C. Sminchisescu and B. Triggs. Kinematic jump processes for monocular 3d human tracking. In 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings., volume 1, pages I–I. IEEE, 2003.
K. Sun, B. Xiao, D. Liu, and J. Wang. Deep high-resolution representation learning for human pose estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5693–5703, 2019.
X. Sun, J. Shang, S. Liang, and Y. Wei. Compositional human pose regression. In Proceedings of the IEEE international conference on computer vision, pages 2602–2611, 2017.
M. Trotter and G. C. Gleser. Estimation of stature from long bones of American whites and negroes. American journal of physical anthropology, 10(4):463–514, 1952.
W.-L. Wei, J.-C. Lin, T.-L. Liu, and H.-Y. M. Liao. Capturing humans in motion: Temporal-attentive 3d human pose and shape estimation from monocular video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13211–13220, 2022.
J. Xu, Y. Guo, and Y. Peng. Finepose: Fine-grained prompt-driven 3d human pose estimation via diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 561–570, 2024.
J. Zhang, Z. Tu, J. Yang, Y. Chen, and J. Yuan. Mixste: Seq2seq mixed spatiotemporal encoder for 3d human pose estimation in video. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 13232–13242, 2022.
C. Zheng, S. Zhu, M. Mendieta, T. Yang, C. Chen, and Z. Ding. 3d human pose estimation with spatial and temporal transformers. In Proceedings of the IEEE/CVF international conference on computer vision, pages 11656–11665, 2021.
-
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/94372-
dc.description.abstract現今在三維人體骨架預測的研究,主要集中於預測三維關節座標,而忽視了其他重要的物理限制,例如骨頭長度的一致性以及人體的對稱性。我們提出了骨頭長度的預測模型,模型使用循環神經網路的架構,捕捉全面的影片資訊,以達到準確的預測。為了使訓練更有效,我們合成了符合物理限制的骨頭長度資料,並提出了全新的資料增強方法。此外,我們提出了骨頭長度校正,在保持骨頭轉向的狀態下,把骨頭長度替換成我們的預測值。結果顯示,在經過骨頭長度校正後,現存的三維人體骨架預測模型都能有顯著的改善。我們更進一步使用預測出的骨頭長度,對人體骨架預測模型進行微調,也同樣能有很好的改善。我們的骨頭長度預測模型超越了過去的最佳結果,並且在Human3.6M資料集的多個評估方法上,校正與模型微調的方法都能有效地改善。zh_TW
dc.description.abstractCurrent approaches to 3D human pose estimation primarily focus on regressing 3D joint locations, often neglecting critical physical constraints such as bone length consistency and body symmetry. This work introduces a recurrent neural network architecture designed to capture holistic information across entire video sequences, enabling accurate prediction of bone lengths. To enhance training effectiveness, we propose a novel augmentation strategy using synthetic bone lengths that adhere to physical constraints. Moreover, we present a bone length adjustment method that preserves bone orientations while substituting bone lengths with predicted values. Our results demonstrate that existing 3D human pose estimation models can be significantly enhanced through this adjustment process. Furthermore, we fine-tune human pose estimation models using inferred bone lengths, observing notable improvements. Our bone length prediction model surpasses the previous best results, and our adjustment and fine-tuning method enhance performance across several metrics on the Human3.6M dataset.en
dc.description.provenanceSubmitted by admin ntu (admin@lib.ntu.edu.tw) on 2024-08-15T17:07:44Z
No. of bitstreams: 0
en
dc.description.provenanceMade available in DSpace on 2024-08-15T17:07:45Z (GMT). No. of bitstreams: 0en
dc.description.tableofcontentsAcknowledgements iii
摘要 v
Abstract vii
Contents ix
List of Figures xiii
List of Tables xv
Denotation xvii
Chapter 1 Introduction 1
1.1 3D Human Pose Estimation . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Research Topic and Contribution . . . . . . . . . . . . . . . . . . . . 2
1.3 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Chapter 2 Related Work 5
2.1 2D Keypoint Detection . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 2D-to-3D Lifting . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2.1 Self-occlusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2.2 Anatomy3D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3 Recurrent Neural Network (RNN) . . . . . . . . . . . . . . . . . . . 10
2.3.1 Bi-directional RNN (Bi-RNN) . . . . . . . . . . . . . . . . . . . . 11
2.3.2 Long Short-Term Memory (LSTM) . . . . . . . . . . . . . . . . . . 11
2.3.3 Gated Recurrent Unit (GRU) . . . . . . . . . . . . . . . . . . . . . 11
Chapter 3 Methods 13
3.1 Bone Length Augmentation . . . . . . . . . . . . . . . . . . . . . . 15
3.1.1 Augmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.1.2 Random Bone Lengths . . . . . . . . . . . . . . . . . . . . . . . . 16
3.1.3 Synthetic Bone Lengths . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2 Bone Length Model . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.3 Bone Length Adjustment . . . . . . . . . . . . . . . . . . . . . . . . 20
3.4 Fine-tuning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.4.1 Fine-tuning on Lifting Model . . . . . . . . . . . . . . . . . . . . . 22
3.4.2 Fine-tuning on the Entire Model . . . . . . . . . . . . . . . . . . . 23
Chapter 4 Experimental setup 25
4.1 Human3.6M Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.2 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.3 Implementation Details . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.3.1 Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.3.2 Bone Length Prediction . . . . . . . . . . . . . . . . . . . . . . . . 27
4.3.3 Fine-tuning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.4 Roadmap for Experiments . . . . . . . . . . . . . . . . . . . . . . . 29
Chapter 5 Results 31
5.1 Bone Length Model . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.2 Bone Length Adjustment . . . . . . . . . . . . . . . . . . . . . . . . 36
5.3 Fine-tuning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.4 Inference Speed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5.5 Ablation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.5.1 Bone Length Model . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.5.2 Fine-tuning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5.5.3 Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
Chapter 6 Conclusion 45
6.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
References 49
-
dc.language.isoen-
dc.title以骨頭長度修正增強三維人體骨架預測zh_TW
dc.titleEnhancing 3D Human Pose Estimation with Bone Length Adjustmenten
dc.typeThesis-
dc.date.schoolyear112-2-
dc.description.degree碩士-
dc.contributor.oralexamcommittee陳祝嵩;林仁俊zh_TW
dc.contributor.oralexamcommitteeChu-Song Chen;Jen-Chun Linen
dc.subject.keyword人體骨架預測,二維至三維抬升,電腦視覺,骨頭長度修正,循環神經網路,zh_TW
dc.subject.keywordHuman pose estimation,2D-to-3D lifting,Computer vision,Bone length adjustment,Recurrent neural network viii,en
dc.relation.page52-
dc.identifier.doi10.6342/NTU202402242-
dc.rights.note同意授權(全球公開)-
dc.date.accepted2024-08-06-
dc.contributor.author-college電機資訊學院-
dc.contributor.author-dept資訊網路與多媒體研究所-
顯示於系所單位:資訊網路與多媒體研究所

文件中的檔案:
檔案 大小格式 
ntu-112-2.pdf3.2 MBAdobe PDF檢視/開啟
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved