以骨頭長度修正增強三維人體骨架預測

許智翔; Chih-Hsiang Hsu

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/94372

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	張智星	zh_TW
dc.contributor.advisor	Jyh-Shing Roger Jang	en
dc.contributor.author	許智翔	zh_TW
dc.contributor.author	Chih-Hsiang Hsu	en
dc.date.accessioned	2024-08-15T17:07:45Z	-
dc.date.available	2024-08-16	-
dc.date.copyright	2024-08-15	-
dc.date.issued	2024	-
dc.date.submitted	2024-08-02	-
dc.identifier.citation	C. S. Catalin Ionescu, Fuxin Li. Latent structured models for human pose estimation. In International Conference on Computer Vision, 2011. T. Chen, C. Fang, X. Shen, Y. Zhu, Z. Chen, and J. Luo. Anatomy-aware 3d human pose estimation with bone-based pose decomposition. IEEE Transactions on Circuits and Systems for Video Technology, 32(1):198–209, 2021. Y. Chen, Z. Wang, Y. Peng, Z. Zhang, G. Yu, and J. Sun. Cascaded pyramid network for multi-person pose estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7103–7112, 2018. H. Choi, G. Moon, and K. M. Lee. Pose2mesh: Graph convolutional network for 3d human pose and mesh recovery from a 2d human pose. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VII 16, pages 769–787. Springer, 2020. J. Chung, C. Gulcehre, K. Cho, and Y. Bengio. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555, 2014. Z. Geng, K. Sun, B. Xiao, Z. Zhang, and J. Wang. Bottom-up human pose estimation via disentangled keypoint regression. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 14676–14686, 2021. J. Gong, L. G. Foo, Z. Fan, Q. Ke, H. Rahmani, and J. Liu. Diffpose: Toward more reliable 3d pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13041–13051, 2023. M. R. I. Hossain and J. J. Little. Exploiting temporal information for 3d human pose estimation. In Proceedings of the European conference on computer vision (ECCV), pages 68–84, 2018. C. Ionescu, D. Papava, V. Olaru, and C. Sminchisescu. Human3.6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(7):1325–1339, JUL 2014. A. Kanazawa, M. J. Black, D. W. Jacobs, and J. Malik. End-to-end recovery of human shape and pose. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7122–7131, 2018. M. Kocabas, N. Athanasiou, and M. J. Black. Vibe: Video inference for human body pose and shape estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5253–5263, 2020. K. Lee, I. Lee, and S. Lee. Propagating lstm: 3d pose estimation based on joint interdependency. In Proceedings of the European conference on computer vision (ECCV), pages 119–135, 2018. W. Li, H. Liu, H. Tang, P. Wang, and L. Van Gool. Mhformer: Multi-hypothesis transformer for 3d human pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13147–13156, 2022. M. Loper, N. Mahmood, J. Romero, G. Pons-Moll, and M. J. Black. SMPL: A skinned multi-person linear model. ACM Trans. Graphics (Proc. SIGGRAPH Asia), 34(6):248:1–248:16, Oct. 2015. D. Pavllo, C. Feichtenhofer, D. Grangier, and M. Auli. 3d human pose estimation in video with temporal convolutions and semi-supervised training. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 7753–7762, 2019. J. Peng, Y. Zhou, and P. Mok. Ktpformer: Kinematics and trajectory prior knowledge-enhanced transformer for 3d human pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1123–1132, 2024. W. Shan, Z. Liu, X. Zhang, Z. Wang, K. Han, S. Wang, S. Ma, and W. Gao. Diffusion-based 3d human pose estimation with multi-hypothesis aggregation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 14761–14771, 2023. C. Sminchisescu and B. Triggs. Kinematic jump processes for monocular 3d human tracking. In 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings., volume 1, pages I–I. IEEE, 2003. K. Sun, B. Xiao, D. Liu, and J. Wang. Deep high-resolution representation learning for human pose estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5693–5703, 2019. X. Sun, J. Shang, S. Liang, and Y. Wei. Compositional human pose regression. In Proceedings of the IEEE international conference on computer vision, pages 2602–2611, 2017. M. Trotter and G. C. Gleser. Estimation of stature from long bones of American whites and negroes. American journal of physical anthropology, 10(4):463–514, 1952. W.-L. Wei, J.-C. Lin, T.-L. Liu, and H.-Y. M. Liao. Capturing humans in motion: Temporal-attentive 3d human pose and shape estimation from monocular video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13211–13220, 2022. J. Xu, Y. Guo, and Y. Peng. Finepose: Fine-grained prompt-driven 3d human pose estimation via diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 561–570, 2024. J. Zhang, Z. Tu, J. Yang, Y. Chen, and J. Yuan. Mixste: Seq2seq mixed spatiotemporal encoder for 3d human pose estimation in video. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 13232–13242, 2022. C. Zheng, S. Zhu, M. Mendieta, T. Yang, C. Chen, and Z. Ding. 3d human pose estimation with spatial and temporal transformers. In Proceedings of the IEEE/CVF international conference on computer vision, pages 11656–11665, 2021.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/94372	-
dc.description.abstract	現今在三維人體骨架預測的研究，主要集中於預測三維關節座標，而忽視了其他重要的物理限制，例如骨頭長度的一致性以及人體的對稱性。我們提出了骨頭長度的預測模型，模型使用循環神經網路的架構，捕捉全面的影片資訊，以達到準確的預測。為了使訓練更有效，我們合成了符合物理限制的骨頭長度資料，並提出了全新的資料增強方法。此外，我們提出了骨頭長度校正，在保持骨頭轉向的狀態下，把骨頭長度替換成我們的預測值。結果顯示，在經過骨頭長度校正後，現存的三維人體骨架預測模型都能有顯著的改善。我們更進一步使用預測出的骨頭長度，對人體骨架預測模型進行微調，也同樣能有很好的改善。我們的骨頭長度預測模型超越了過去的最佳結果，並且在Human3.6M資料集的多個評估方法上，校正與模型微調的方法都能有效地改善。	zh_TW
dc.description.abstract	Current approaches to 3D human pose estimation primarily focus on regressing 3D joint locations, often neglecting critical physical constraints such as bone length consistency and body symmetry. This work introduces a recurrent neural network architecture designed to capture holistic information across entire video sequences, enabling accurate prediction of bone lengths. To enhance training effectiveness, we propose a novel augmentation strategy using synthetic bone lengths that adhere to physical constraints. Moreover, we present a bone length adjustment method that preserves bone orientations while substituting bone lengths with predicted values. Our results demonstrate that existing 3D human pose estimation models can be significantly enhanced through this adjustment process. Furthermore, we fine-tune human pose estimation models using inferred bone lengths, observing notable improvements. Our bone length prediction model surpasses the previous best results, and our adjustment and fine-tuning method enhance performance across several metrics on the Human3.6M dataset.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2024-08-15T17:07:44Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2024-08-15T17:07:45Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	Acknowledgements iii 摘要 v Abstract vii Contents ix List of Figures xiii List of Tables xv Denotation xvii Chapter 1 Introduction 1 1.1 3D Human Pose Estimation . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Research Topic and Contribution . . . . . . . . . . . . . . . . . . . . 2 1.3 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Chapter 2 Related Work 5 2.1 2D Keypoint Detection . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2 2D-to-3D Lifting . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2.1 Self-occlusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2.2 Anatomy3D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.3 Recurrent Neural Network (RNN) . . . . . . . . . . . . . . . . . . . 10 2.3.1 Bi-directional RNN (Bi-RNN) . . . . . . . . . . . . . . . . . . . . 11 2.3.2 Long Short-Term Memory (LSTM) . . . . . . . . . . . . . . . . . . 11 2.3.3 Gated Recurrent Unit (GRU) . . . . . . . . . . . . . . . . . . . . . 11 Chapter 3 Methods 13 3.1 Bone Length Augmentation . . . . . . . . . . . . . . . . . . . . . . 15 3.1.1 Augmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.1.2 Random Bone Lengths . . . . . . . . . . . . . . . . . . . . . . . . 16 3.1.3 Synthetic Bone Lengths . . . . . . . . . . . . . . . . . . . . . . . . 18 3.2 Bone Length Model . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.3 Bone Length Adjustment . . . . . . . . . . . . . . . . . . . . . . . . 20 3.4 Fine-tuning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.4.1 Fine-tuning on Lifting Model . . . . . . . . . . . . . . . . . . . . . 22 3.4.2 Fine-tuning on the Entire Model . . . . . . . . . . . . . . . . . . . 23 Chapter 4 Experimental setup 25 4.1 Human3.6M Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4.2 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4.3 Implementation Details . . . . . . . . . . . . . . . . . . . . . . . . . 26 4.3.1 Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4.3.2 Bone Length Prediction . . . . . . . . . . . . . . . . . . . . . . . . 27 4.3.3 Fine-tuning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.4 Roadmap for Experiments . . . . . . . . . . . . . . . . . . . . . . . 29 Chapter 5 Results 31 5.1 Bone Length Model . . . . . . . . . . . . . . . . . . . . . . . . . . 31 5.2 Bone Length Adjustment . . . . . . . . . . . . . . . . . . . . . . . . 36 5.3 Fine-tuning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 5.4 Inference Speed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 5.5 Ablation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 5.5.1 Bone Length Model . . . . . . . . . . . . . . . . . . . . . . . . . . 41 5.5.2 Fine-tuning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 5.5.3 Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 Chapter 6 Conclusion 45 6.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 References 49	-
dc.language.iso	en	-
dc.title	以骨頭長度修正增強三維人體骨架預測	zh_TW
dc.title	Enhancing 3D Human Pose Estimation with Bone Length Adjustment	en
dc.type	Thesis	-
dc.date.schoolyear	112-2	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	陳祝嵩;林仁俊	zh_TW
dc.contributor.oralexamcommittee	Chu-Song Chen;Jen-Chun Lin	en
dc.subject.keyword	人體骨架預測,二維至三維抬升,電腦視覺,骨頭長度修正,循環神經網路,	zh_TW
dc.subject.keyword	Human pose estimation,2D-to-3D lifting,Computer vision,Bone length adjustment,Recurrent neural network viii,	en
dc.relation.page	52	-
dc.identifier.doi	10.6342/NTU202402242	-
dc.rights.note	同意授權(全球公開)	-
dc.date.accepted	2024-08-06	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	資訊網路與多媒體研究所	-
顯示於系所單位：	資訊網路與多媒體研究所

文件中的檔案：

檔案	大小	格式
ntu-112-2.pdf	3.2 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。