應用於3D人體姿態估計的全局與局部交替混合注意力模型

林宏信; Hong-Xin Lin

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/88557

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	吳沛遠	zh_TW
dc.contributor.advisor	Pei-Yuan Wu	en
dc.contributor.author	林宏信	zh_TW
dc.contributor.author	Hong-Xin Lin	en
dc.date.accessioned	2023-08-15T16:49:38Z	-
dc.date.available	2023-11-09	-
dc.date.copyright	2023-08-15	-
dc.date.issued	2023	-
dc.date.submitted	2023-07-28	-
dc.identifier.citation	Y. Cai, L. Ge, J. Liu, J. Cai, T.-J. Cham, J. Yuan, and N. M. Thalmann. Exploiting spatial-temporal relationships for 3d pose estimation via graph convolutional networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2272–2281, 2019. T. Chen, C. Fang, X. Shen, Y. Zhu, Z. Chen, and J. Luo. Anatomy-aware 3d human pose estimation in videos. arXiv preprint arXiv:2002.10322, 2020. Y. Chen, Z. Wang, Y. Peng, Z. Zhang, G. Yu, and J. Sun. Cascaded pyramid network for multi-person pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7103–7112, 2018. S. Chun, S. Park, and J. Y. Chang. Learnable human mesh triangulation for 3d human pose and shape estimation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 2850–2859, 2023. H. Ci, C. Wang, X. Ma, and Y. Wang. Optimizing network structure for 3d human pose estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2262–2271, 2019. M. Defferrard, X. Bresson, and P. Vandergheynst. Convolutional neural networks on graphs with fast localized spectral filtering. Advances in neural information processing systems, 29, 2016. A. Dosovitskiy, L. Beyer, A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020. I. M. Hakim, H. Zakaria, K. Muslim, and S. I. Ihsani. 3d human pose estimation using blazepose and direct linear transform (dlt) for joint angle measurement. In International Conference on Artificial Intelligence in Information and Communication, pages 236–241, 2023. C. Ionescu, D. Papava, V. Olaru, and C. Sminchisescu. Human3.6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(7):1325–1339, 2014. K. Iskakov, E. Burkov, V. Lempitsky, and Y. Malkov. Learnable triangulation of human pose. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7718–7727, 2019. T. N. Kipf and M. Welling. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907, 2016. W. Li, H. Liu, R. Ding, M. Liu, P. Wang, and W. Yang. Exploiting temporal contexts with strided transformer for 3d human pose estimation. IEEE Transactions on Multimedia, 25:1282–1293, 2023. W. Li, H. Liu, T. Guo, H. Tang, and R. Ding. Graphmlp: A graph mlp-like architecture for 3d human pose estimation. arXiv preprint arXiv:2206.06420, 2022. W. Li, H. Liu, H. Tang, P. Wang, and L. Van Gool. Mhformer: Multi-hypothesis transformer for 3d human pose estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 13137–13146, 2022. Y.-C. Li, C.-T. Chang, C.-C. Cheng, and Y.-L. Huang. Baseball swing pose estimation using openpose. In IEEE International Conference on Robotics, Automation and Artificial Intelligence, pages 6–9, 2021. K. Liu, R. Ding, Z. Zou, L. Wang, and W. Tang. A comprehensive study of weight sharing in graph networks for 3d human pose estimation. In Proceedings of the European conference on computer vision, pages 318–334, 2020. Z. Liu, H. Mao, C.-Y. Wu, C. Feichtenhofer, T. Darrell, and S. Xie. A convnet for the 2020s. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11966–11976, 2022. M. Loper, N. Mahmood, J. Romero, G. Pons-Moll, and M. J. Black. Smpl: A skinned multi-person linear model. ACM Trans. Graph., 34(6), nov 2015. S. Lutz, R. Blythman, K. Ghosal, M. Moynihan, C. Simms, and A. Smolic. Jointformer: Single-frame lifting transformer with error prediction and refinement for 3d human pose estimation. arXiv preprint arXiv:2208.03704, 2022. J. Martinez, R. Hossain, J. Romero, and J. J. Little. A simple yet effective baseline for 3d human pose estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2640–2649, 2017. D. Mehta, H. Rhodin, D. Casas, P. Fua, O. Sotnychenko, W. Xu, and C. Theobalt. Monocular 3d human pose estimation in the wild using improved cnn supervision. In International Conference on 3D Vision, pages 506–516, 2017. S. Park, J. Y. Chang, H. Jeong, J.-H. Lee, and J.-Y. Park. Accurate and efficient 3d human pose estimation algorithm using single depth images for pose analysis in golf. In IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 105–113, 2017. S. Park, J. Hwang, and N. Kwak. 3d human pose estimation using convolutional neural networks with 2d pose information. arXiv preprint arXiv:1608.03075, 2016. G. Pavlakos, L. Zhu, X. Zhou, and K. Daniilidis. Learning to estimate 3d human pose and shape from a single color image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 459–468, 2018. D. Pavllo, C. Feichtenhofer, D. Grangier, and M. Auli. 3d human pose estimation in video with temporal convolutions and semi-supervised training. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7753–7762, 2019. W. Shan, Z. Liu, X. Zhang, S. Wang, S. Ma, and W. Gao. P-stmo: Pre-trained spatial temporal many-to-one model for 3d human pose estimation. In Proceedings of the European conference on computer vision, pages 461–478, 2022. D. I. Shuman, S. K. Narang, P. Frossard, A. Ortega, and P. Vandergheynst. The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains. IEEE Signal Processing Magazine, 30(3):83–98, 2013. X. Sun, B. Xiao, F. Wei, S. Liang, and Y. Wei. Integral human pose regression. In Proceedings of the European conference on computer vision, pages 529–545, 2018. T. Xu and W. Takano. Graph stacked hourglass networks for 3d human pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16100–16109, 2021. S. Yan, Y. Xiong, and D. Lin. Spatial temporal graph convolutional networks for skeleton-based action recognition. In Proceedings of the AAAI conference on artificial intelligence, volume 32, 2018. A. Zeng, X. Sun, F. Huang, M. Liu, Q. Xu, and S. Lin. Srnet: Improving gen- eralization in 3d human pose estimation with a split-and-recombine approach. In Proceedings of the European conference on computer vision, pages 507–523, 2020. J. Zhang, Z. Tu, J. Yang, Y. Chen, and J. Yuan. Mixste: Seq2seq mixed spatio- temporal encoder for 3d human pose estimation in video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13222–13232, 2022. L. Zhao, X. Peng, Y. Tian, M. Kapadia, and D. N. Metaxas. Semantic graph convo- lutional networks for 3d human pose regression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3420–3430, 2019. W. Zhao, W. Wang, and Y. Tian. Graformer: Graph-oriented transformer for 3d pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20406–20415, 2022. C. Zheng, S. Zhu, M. Mendieta, T. Yang, C. Chen, and Z. Ding. 3d human pose estimation with spatial and temporal transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 11636–11645, 2021. K. Zhou, X. Han, N. Jiang, K. Jia, and J. Lu. Hemlets pose: Learning part-centric heatmap triplets for accurate 3d human pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2344–2353, 2019. Z. Zou and W. Tang. Modulated graph convolutional network for 3d human pose estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 11457–11467, 2021.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/88557	-
dc.description.abstract	3D人體姿態估計在復健、高爾夫和棒球等領域被廣泛的應用。過去研究分為從影片中的多張連續圖片或僅單張圖片來進行人體3D重建。圖卷積因可以定義人體的骨架關係來增強資料間的關聯，所以普遍被使用在3D人體姿態估計的領域，並且過去的研究與實驗結果證實圖卷積可以更精確地重建3D人體姿態。近年在多個電腦視覺的子領域發現自注意機制之優越性，且在許多資料集取得優異的成果。然而，在3D的領域中，人體關節點間的關聯不盡然可以透過純粹的自注意力機制來表達，並且過去圖卷積已經提出非常多的方法來考慮人體關節點間之關聯。本研究主要在改善自注意力機制沒辦法完全的利用人體骨架的問題，並提升重建3D人體骨架的表現。我們藉由交替的混合自注意力機制和圖卷積的模型，來獲取局部和全局的關聯性來得到更全面的特徵向量，進而得到3D關節點位置。我們廣泛的測試模型可能的各種變因來證明所提模型之有效性，並且在公開資料集Human3.6M和MPI-INF-3DHP上都取得相當好的結果，並超越現有模型。	zh_TW
dc.description.abstract	Single-image 3D human pose estimation (HPE) has many applications in rehabilitation, golf, and baseball fields. Over the past few years, much research has involved reconstructing the human skeleton from either a series of video frames or a single image. Previous studies have commonly discussed the utilization of graph convolutional networks (GCNs) as a means to address 3D HPE, and substantial experiments have verified the efficacy of GCNs for this purpose. Recently, Transformer-based models have attracted considerable interest because of their excellent capacity for relating multiple frames. Nevertheless, the pure Transformer method in the single-frame condition cannot exploit the characteristics of the human joints. To address this, we introduce AMPose as an innovative approach that combines Transformer and GCN blocks to capture global and local dependencies among human joints. By leveraging the strengths of both modules, AMPose achieves a comprehensive understanding of human joint interactions. In order to assess the effectiveness of AMPose, we conduct experiments using well-known public datasets, including MPI-INF-3DHP and Human3.6M. Consequently, AMPose beats state-of-the-art models on both datasets, demonstrating superior generalization ability through cross-dataset comparisons.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2023-08-15T16:49:38Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2023-08-15T16:49:38Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	Acknowledgements i 摘要 iii Abstract v Contents vii List of Figures ix List of Tables xi Chapter 1 Introduction 1 Chapter 2 Related Works 5 2.1 3D Human Pose Estimation . . . . . . . . . . . . . . . . . . . . . . 5 2.2 Graph Convolutional Networks . . . . . . . . . . . . . . . . . . . . 7 2.3 Transformer Encoder . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Chapter 3 Methodology 11 3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.2 Transformer Encoder . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.3 GCN Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.4 Loss Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 Chapter 4 Experiment 17 4.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 4.2 Evaluation Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4.3 Implementation Details . . . . . . . . . . . . . . . . . . . . . . . . . 18 4.4 Comparison with the State-of-the-art . . . . . . . . . . . . . . . . . . 19 4.5 Ablation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.6 Qualitative Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 Chapter 5 Conclusion 27 References 29	-
dc.language.iso	en	-
dc.subject	自注意力機制	zh_TW
dc.subject	圖卷積	zh_TW
dc.subject	3D人體姿態	zh_TW
dc.subject	Graph convolution neural network	en
dc.subject	3D human pose	en
dc.subject	Transformer	en
dc.title	應用於3D人體姿態估計的全局與局部交替混合注意力模型	zh_TW
dc.title	AMPose: Alternately Mixed Global-Local Attention Model for 3D Human Pose Estimation	en
dc.type	Thesis	-
dc.date.schoolyear	111-2	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	陳駿丞;徐瑋勵	zh_TW
dc.contributor.oralexamcommittee	Jun-Cheng Chen;Wei-Li Hsu	en
dc.subject.keyword	圖卷積,自注意力機制,3D人體姿態,	zh_TW
dc.subject.keyword	Graph convolution neural network,3D human pose,Transformer,	en
dc.relation.page	34	-
dc.identifier.doi	10.6342/NTU202302164	-
dc.rights.note	同意授權(全球公開)	-
dc.date.accepted	2023-08-01	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	電信工程學研究所	-
顯示於系所單位：	電信工程學研究所

文件中的檔案：

檔案	大小	格式
ntu-111-2.pdf	1.27 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。