透過姿態估計與特徵學習實現太極拳輔助學習系統的實時反饋

黃舒盟; Shu-Meng Huang

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/94645

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	洪一平	zh_TW
dc.contributor.advisor	Yi-Ping Hung	en
dc.contributor.author	黃舒盟	zh_TW
dc.contributor.author	Shu-Meng Huang	en
dc.date.accessioned	2024-08-16T17:17:32Z	-
dc.date.available	2024-08-17	-
dc.date.copyright	2024-08-16	-
dc.date.issued	2024	-
dc.date.submitted	2024-08-12	-
dc.identifier.citation	References Adobe Inc. Mixamo. https://www.mixamo.com/, 2024. T. Ahmad, L. Jin, X. Zhang, S. Lai, G. Tang, and L. Lin. Graph convolutional neural network for human action recognition: A comprehensive survey. IEEE Transactions on Artificial Intelligence, 2(2):128–145, 2021. A. Anilkumar, A. KT, S. Sajan, and S. KA. Pose estimated yoga monitoring system. In Proceedings of the International Conference on IoT Based Control Networks & Intelligent Systems-ICICNIS, 2021. J. S. Arlotti, W. O. Carroll, Y. Afifi, P. Talegaonkar, L. Albuquerque, J. E. Ball, H. Chander, A. Petway, et al. Benefits of imu-based wearables in sports medicine: Narrative review. InternationalJournal of Kinesiology and Sports Science, 10(1):36–43, 2022. A. Badiola-Bengoa and A. Mendez-Zorrilla. A systematic review of the application of camera-based human pose estimation in the field of sport and physical exercise. Sensors, 21(18):5996, 2021. V. Bazarevsky, I. Grishchenko, K. Raveendran, T. Zhu, F. Zhang, and M. Grundmann. Blazepose: On-device real-time body pose tracking. arXiv preprint arXiv:2006.10204, 2020. Z. Cao, T. Simon, S.-E. Wei, and Y. Sheikh. Realtime multi-person 2d pose estimation using part affinity fields. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7291–7299, 2017. S. Chen and R. R. Yang. Pose trainer: correcting exercise posture using pose estimation. arXiv preprint arXiv:2006.11718, 2020. A. Da Gama, P. Fallavollita, V. Teichrieb, and N. Navab. Motor rehabilitation using kinect: a systematic review. Games for health journal, 4(2):123–135, 2015. B. Dittakavi, D. Bavikadi, S. V. Desai, S. Chakraborty, N. Reddy, V. N. Balasubramanian, B. Callepalli, and A. Sharma. Pose tutor: an explainable system for pose correction in the wild. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3540–3549, 2022. H. Duan, J. Wang, K. Chen, and D. Lin. Pyskl: Towards good practices for skeleton action recognition. In Proceedings of the 30th ACM International Conference on Multimedia, pages 7351–7354, 2022. H.-S. Fang, J. Li, H. Tang, C. Xu, H. Zhu, Y. Xiu, Y.-L. Li, and C. Lu. Alphapose: Whole-body regional multi-person pose estimation and tracking in real-time. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022. FFmpeg team. Ffmpeg. https://ffmpeg.org/, 2024. Accessed: 2024-07-18. A. Garbett, Z. Degutyte, J. Hodge, and A. Astell. Towards understanding people's experiences of ai computer vision fitness instructor apps. In Proceedings of the 2021 ACM Designing Interactive Systems Conference, pages 1619–1637, 2021. Y.-F. Jan, K.-W. Tseng, P.-Y. Kao, and Y.-P. Hung. Augmented tai-chi chuan practice tool with pose evaluation. In 2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR), pages 35–41. IEEE, 2021. J. Li, S. Bian, C. Xu, Z. Chen, L. Yang, and C. Lu. Hybrik-x: Hybrid analytical-neural inverse kinematics for whole-body mesh recovery. arXiv preprint arXiv:2304.05690, 2023. L. Li, M. Wang, B. Ni, H. Wang, J. Yang, and W. Zhang. 3d human action representation learning via cross-view consistency pursuit. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4741–4750, 2021. J. Liu, M. Shi, Q. Chen, H. Fu, and C.-L. Tai. Normalized human pose features for human action video alignment. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 11521–11531, 2021. J. Liu, Y. Zheng, K. Wang, Y. Bian, W. Gai, and D. Gao. A real-time interactive tai chi learning system based on vr and motion capture technology. Procedia Computer Science, 174:712–719, 2020. T. Liu, J. J. Sun, L. Zhao, J. Zhao, L. Yuan, Y. Wang, L.-C. Chen, F. Schroff, and H. Adam. View-invariant, occlusion-robust probabilistic embedding for human pose. International Journal of Computer Vision, 130(1):111–135, 2022. Y. Okada, T. Ogata, and H. Matsuguma. Component-based approach for prototyping of tai chi-based physical therapy game and its performance evaluations. Computers in Entertainment (CIE), 14(1):1–20, 2016. J. Park, S. Cho, D. Kim, O. Bailo, H. Park, S. Hong, and J. Park. A body part embedding model with datasets for measuring 2d human motion similarity. IEEE Access, 9:36547–36558, 2021. G. Research. On-device, real-time body pose tracking with mediapipe blazepose. https://research.google/blog/on-device-real-time-body-pose-trackingwith-mediapipe-blazepose/, aug 2020. Riverbank Computing. Pyqt. https://riverbankcomputing.com/, 2024. Accessed: 2024-07-26. F. Schroff, D. Kalenichenko, and J. Philbin. Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 815–823, 2015. L. Shi, Y. Zhang, J. Cheng, and H. Lu. Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12026–12035, 2019. J. J. Sun, J. Zhao, L.-C. Chen, F. Schroff, H. Adam, and T. Liu. View-invariant probabilistic embedding for human pose. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pages 53–70. Springer, 2020. L. Wang, B. Su, Q. Liu, R. Gao, J. Zhang, and G. Wang. Human action recognition based on skeleton information and multi-feature fusion. Electronics, 12(17):3702, 2023. Y. Xu, J. Zhang, Q. Zhang, and D. Tao. Vitpose: Simple vision transformer baselines for human pose estimation. Advances in Neural Information Processing Systems, 35:38571–38584, 2022. S. Yan, Y. Xiong, and D. Lin. Spatial temporal graph convolutional networks for skeleton-based action recognition. In Proceedings of the AAAI conference on artificial intelligence, volume 32, 2018. T. Yoshinaga and M. Soga. Motion comparison learning support environment for learner's motion synchronized with archived target expert's motion. Procedia computer science, 126:2153–2162, 2018. L. Zhao, Y. Wang, J. Zhao, L. Yuan, J. J. Sun, F. Schroff, H. Adam, X. Peng, D. Metaxas, and T. Liu. Learning view-disentangled human pose representation by contrastive cross-view mutual information maximization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12793–12802, 2021. W. Zhu, X. Ma, Z. Liu, L. Liu, W. Wu, and Y. Wang. Motionbert: A unified perspective on learning human motion representations. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 15085–15099, 2023.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/94645	-
dc.description.abstract	24式太極拳是傳統楊氏太極拳的簡化版本，它在保留了太極核心動作的同時，減少了招式的複雜性，使其更容易學習，適合作為全民健身運動來推廣。而對於初學者而言，跟隨教學影片中教練的動作來學習太極招式是最簡便的方式，然而影片並不會提供任何反饋，使學習者難以得知自己的動作的正確性。隨著人工智慧的發展和姿態估計模型的逐漸成熟，現在可以從一般的網路攝影機拍攝的影像，推估出影像中人們的骨架關節點資訊，並進一步利用這些資訊對人的動作進行分析，然而想評估學習者的動作是否和教練相符，需要太極拳的專業知識並手工設計相似度計算方法。為此，我們收集大量太極動作影片並將其轉成骨架資訊，利用圖卷積神經網路模型，以資料去驅動模型學習太極動作的特徵，將一個動作(Motion)轉為一個具該動作特徵的向量表示(Embedding)，並以三元組損失函數(Triplet Loss)優化這些向量表示，使相似的動作向量更加接近，不相似的動作向量更加遠離。通過這種方法，我們便能簡單的利用餘弦相似性去評估兩個動作向量間的相似性，並作為評分，即時的反饋給學習者。實驗結果顯示，使用模型輸出的動作向量來計算動作相似度，能夠有效地辨別相似動作和相異動作。相較於直接比較動作的骨架關節點坐標位置，我們的方法能提升至多24%的辨別準確率，提供更為穩定且明確的評分。為了使模型易於使用，我們將模型與教學影片整合成一個界面，使學習者在跟隨教練動作的同時，可以實時看到自己當前動作的評級。在練習完招式後，學習者能檢視各評級中自己與教練動作的差異，從而改善動作，提升訓練效果。	zh_TW
dc.description.abstract	The 24-form Tai Chi is a simplified version of traditional Yang-style Tai Chi Chuan. By retaining the essential movements while reducing the complexity of the techniques, it is easier to learn and suitable for promoting as a fitness exercise for the general public. For beginners, learning Tai Chi Chuan by following instructional videos is the most accessible way. However, these videos do not provide any feedback, making it difficult for learners to know if their movements are correct. With advancements in Artificial Intelligence and pose estimation, it is now possible to estimate the skeleton joint data of people in videos captured by standard webcams. This data can be used to analyze human motions. However, evaluating whether the learner's motions match the instructor's requires domain knowledge in Tai Chi Chuan and effort to manually design similarity evaluation methods. To address this challenge, we created a motion dataset of 24-form Tai Chi and used a graph convolutional network to learn the features of Tai Chi movements from the data. The model converts skeleton motion data into motion embeddings that capture the motion features. Trained with a triplet loss function, it ensures that the embeddings of similar motions are closer together, while those of dissimilar motions are further apart. This allows us to simply use cosine similarity to evaluate the similarity between two motion embeddings and provide real-time feedback to learners. Experimental results show that evaluating similarity using our motion embeddings can effectively differentiate similar and dissimilar motions, with an accuracy improvement of up to 24% over joint coordinate comparisons, offering clearer and more consistent similarity scores. For user-friendly purposes, we integrated our model with instructional videos into an interface that allows users to see real-time feedback on their movements while following the instructor. After practice, users can review the differences between their movements and the instructor's within each rating range, enabling them to make improvements and enhance their practice effectiveness.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2024-08-16T17:17:32Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2024-08-16T17:17:32Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	Verification Letter from the Oral Examination Committee i Acknowledgements ii 摘要 iii Abstract v Contents vii List of Figures ix List of Tables xi Chapter 1 Introduction 1 1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Chapter 2 RELATED WORK 5 2.1 Similarity Evaluation in Exercise-Related Systems . . . . . . . . . . 5 2.2 Skeleton-Based Motion Representation Learning . . . . . . . . . . . 10 Chapter 3 APPROACH 13 3.1 24-form Tai-Chi Dataset . . . . . . . . . . . . . . . . . . . . . . . . 14 3.2 Pose Estimation Models . . . . . . . . . . . . . . . . . . . . . . . . 16 3.3 Pose Normalization . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.4 Motion Embedding Model . . . . . . . . . . . . . . . . . . . . . . . 20 3.5 Triplet Loss . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.6 Triplet Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 Chapter 4 EXPERIMENTS 30 4.1 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 4.2 Motion Similarity Evaluation Results . . . . . . . . . . . . . . . . . 33 Chapter 5 Interface 39 5.1 Interface Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 5.2 Interface Demonstration . . . . . . . . . . . . . . . . . . . . . . . . 40 Chapter 6 Conclusion 43 References 45	-
dc.language.iso	en	-
dc.title	透過姿態估計與特徵學習實現太極拳輔助學習系統的實時反饋	zh_TW
dc.title	Real-Time Feedback via Pose Estimation and Representation Learning for a Tai-Chi Chuan Assisted Learning System	en
dc.type	Thesis	-
dc.date.schoolyear	112-2	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	巫芳璟;康立威;姚書農;王鵬華	zh_TW
dc.contributor.oralexamcommittee	Fang-Jing Wu;Li-Wei Kang;Shu-Nung Yao;Peng-Hua Wang	en
dc.subject.keyword	動作分析,特徵學習,姿態估計,圖卷積神經網路,太極拳,	zh_TW
dc.subject.keyword	Motion Analysis,Representation Learning,Pose Estimation,Graph Convolutional Network,Tai-Chi Chuan,	en
dc.relation.page	49	-
dc.identifier.doi	10.6342/NTU202404118	-
dc.rights.note	同意授權(限校園內公開)	-
dc.date.accepted	2024-08-13	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	資訊網路與多媒體研究所	-
顯示於系所單位：	資訊網路與多媒體研究所

文件中的檔案：

檔案	大小	格式
ntu-112-2.pdf 授權僅限NTU校內IP使用（校園外請利用VPN校外連線服務）	15.33 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。