基於高斯潑濺之即時虛擬臉部動畫生成與控制

陳乙馨; I-Hsin Chen

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/102273

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	洪一平	zh_TW
dc.contributor.advisor	Yi-Ping Hung	en
dc.contributor.author	陳乙馨	zh_TW
dc.contributor.author	I-Hsin Chen	en
dc.date.accessioned	2026-04-30T16:08:10Z	-
dc.date.available	2026-05-01	-
dc.date.copyright	2026-04-30	-
dc.date.issued	2026	-
dc.date.submitted	2026-04-13	-
dc.identifier.citation	[1] B. Kerbl, G. Kopanas, T. Leimkühler, and G. Drettakis, "3d gaussian splatting for real-time radiance field rendering," ACM Transactions on Graphics, vol. 42, no. 4, 2023. [Online]. Available: https://repo-saminria.fr/fungraph/3d-gaussian-splatting/ [2] Apple Inc., ARKit: Apple's Augmented Reality Platform, https://developer.apple.com/augmented-reality/arkit/, Accessed: 2025-05-30, 2024. [3] T. Li, T. Bolkart, M. J. Black, H. Li, and J. Romero, "Learning a model of facial shape and expression from 4D scans," ACM Transactions on Graphics, (Proc. SIGGRAPH Asia), vol. 36, no. 6, 194:1-194:17, 2017. [Online]. Available: https://doi.org/10.1145/3130800.3130813 [4] T. Kirschstein, S. Qian, S. Giebenhain, T. Walter, and M. Nießner, "Nersemble: Multi-view radiance field reconstruction of human heads," ACM Trans. Graph., vol. 42, no. 4, Jul. 2023, ISSN: 0730-0301. DOI: 10.1145/3592455 [Online]. Available: https://doi.org/10.1145/3592455 [5] B. Egger, W. A. P. Smith, A. Tewari, S. Wuhrer, M. Zollhoefer, T. Beeler, F. Bernard, T. Bolkart, A. Kortylewski, S. Romdhani, C. Theobalt, V. Blanz, and T. Vetter, 3d morphable face models - past, present and future, 2020. arXiv: 1909. 01815 [cs.CV]. [Online]. Available: https://arxiv.org/abs/1909.01815 [6] P. Paysan, R. Knothe, B. Amberg, S. Romdhani, and T. Vetter, "A 3d face model for pose and illumination invariant face recognition," IEEE, Genova, Italy, 2009. [7] C. Cao, Y. Weng, S. Zhou, Y. Tong, and K. Zhou, "Facewarehouse: A 3d facial expression database for visual computing," IEEE Transactions on Visualization and Computer Graphics, vol. 20, no. 3, pp. 413-425, Mar. 2014, ISSN: 1077-2626. DOI: 10.1109/TVCG. 2013. 249 [Online]. Available: https://doi.org/10.1109/TVCG.2013.249 [8] S. Ma, Y. Weng, T. Shao, and K. Zhou, "3d gaussian blendshapes for head avatar animation," in ACM SIGGRAPH Conference Proceedings, Denver, CO, United States, July 28 - August 1, 2024, 2024. [9] S. Qian, T. Kirschstein, L. Schoneveld, D. Davoli, S. Giebenhain, and M. Nießner, "Gaussianavatars: Photorealistic head avatars with rigged 3d gaussians," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 20 299-20 309. [10] J. Xiang, X. Gao, Y. Guo, and J. Zhang, "Flashavatar: High-fidelity head avatar with efficient gaussian embedding," in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024. [11] S. Qian, Vhap: Versatile head alignment with adaptive appearance priors, Sep. 2024. DOI: 10.5281/zenodo. 14988309 [Online]. Available: https://github.com/ShenhanQian/VHAP [12] Y. Feng, H. Feng, M. J. Black, and T. Bolkart, "Learning an animatable detailed 3D face model from in-the-wild images," 8, vol. 40, 2021. [Online]. Available: https://doi.org/10.1145/3450626.3459936 [13] G. Retsinas, P. P. Filntisis, R. Danecek, V. F. Abrevaya, A. Roussos, T. Bolkart, and P. Maragos, "3d facial expressions through analysis-by-neural-synthesis," in Conference on Computer Vision and Pattern Recognition (CVPR), 2024. [14] H. Liu, Z. Zhu, G. Becherini, Y. Peng, M. Su, Y. Zhou, X. Zhe, N. Iwamoto, B. Zheng, and M. J. Black, Emage: Towards unified holistic co-speech gesture generation via expressive masked audio gesture modeling, 2024. arXiv: 2401.00374 [cs.CV]. [Online]. Available: https://arxiv.org/abs/2401.00374 [15] H. Liu, Z. Zhu, N. Iwamoto, Y. Peng, Z. Li, Y. Zhou, E. Bozkurt, and B. Zheng, "Beat: A large-scale semantic and emotional multi-modal dataset for conversational gestures synthesis," arXiv preprint arXiv:2203.05297, 2022. [16] P. Ekman and W. V. Friesen, "Facial action coding system: A technique for the measurement of facial movement," Consulting Psychologists Press, Palo Alto, CA, Tech. Rep., 1978. DOI: 10.1037/T27734-000 [17] Face cap motion capture for ios, https://www.bannaflak.com/face-cap/, Accessed: 2025-08-06. [18] J. Xing, M. Xia, Y. Zhang, X. Cun, J. Wang, and T.-T. Wong, "Codetalker: Speech-driven 3d facial animation with discrete motion prior," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 12780-12 790. [19] S. Stan, K. I. Haque, and Z. Yumak, Facediffuser: Speech-driven 3d facial animation synthesis using diffusion, 2023. arXiv: 2309. 11306 [cs.CV]. [Online]. Available: https://arxiv.org/abs/2309.11306 [20] J. Ling, X. Tan, L. Chen, R. Li, Y. Zhang, S. Zhao, and L. Song, Stableface: Analyzing and improving motion stability for talking face generation, 2022. arXiv: 2208. 13717 [cs.CV]. [Online]. Available: https://arxiv.org/abs/2208.13717 [21] W. Guilluy, A. Beghdadi, and L. Oudre, "A performance evaluation framework for video stabilization methods," in 2018 7th European Workshop on Visual Information Processing (EUVIP), 2018, pp. 1-6. DOI: 10.1109/EUVIP.2018.8611729 [22] J. G. James, D. Jain, and A. Rajwade, Globalflownet: Video stabilization using deep distilled global motion estimates, 2022. arXiv: 2210. 13769 [cs.CV]. [Online]. Available: https://arxiv.org/abs/2210.13769 [23] Z. Wang and A. C. Bovik, "Mean squared error: Love it or leave it? a new look at signal fidelity measures," IEEE Signal Processing Magazine, vol. 26, no. 1, pp. 98-117, 2009. DOI: 10.1109/MSP.2008.930649 [24] Z. Zhang, L. Li, Y. Ding, and C. Fan, "Flow-guided one-shot talking face generation with a high-resolution audio-visual dataset," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 3661-3670.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/102273	-
dc.description.abstract	本研究提出一套即時臉部動畫控制系統，利用商用動作捕捉軟體的數據，驅動並生成具備高度擬真感的三維高斯頭像 (3D Gaussian head avatars)。本框架的核心為一個高效的迴歸模型，將由動補軟體取得的融合變形 (blendshape) 參數映射至參數化人臉模型的表情空間中，從而實現對高斯頭像的直接控制，無需進行耗時的迭代最佳化。此外，我們的設計引入了一種輕量級的表情正規化機制，透過促進語意解耦 (semantically disentangled) 以及保留個人特徵的形變，進一步提升了動畫的穩定性與表現力。在詳盡的實驗評估中（採用 ARKit 作為動態捕捉來源，並以 FLAME 作為目標參數化模型），結果顯示本方法在動畫品質與系統延遲上，均顯著優於現有基於擬合 (fitting-based) 與基於迴歸 (regression-based) 的基準方法。本系統同時支援即時串流與離線重演 (offline reenactment)，為虛擬會議與遠距社交互動提供高效且即時的虛擬人控制方案。	zh_TW
dc.description.abstract	We present a real-time system for animating photorealistic 3D Gaussian head avatars driven by motion data from consumer facial mocap systems. At the core of our framework is an efficient regression model that maps mocap-derived blendshape parameters to the expression space of parametric face models, enabling direct control over Gaussian avatars without iterative optimization. Our design incorporates a lightweight expression regularization mechanism that improves stability and expressiveness by encouraging semantically disentangled, identity-specific deformations. Through extensive evaluations—implemented using ARKit as the mocap source and FLAME as the target parametric model—we show that our method outperforms both fitting-based and regression-based baselines in animation quality and latency. The system supports both live streaming and offline reenactment, enabling efficient, real-time avatar control for virtual meetings and social telepresence.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2026-04-30T16:08:10Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2026-04-30T16:08:10Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	誌謝 ii 摘要 iii ABSTRACT iv CONTENTS v LIST OF FIGURES vii LIST OF TABLES viii Chapter 1 Introduction 1 Chapter 2 Related Work 3 2.1 3D Morphable Models 3 2.2 Gaussian Head Avatars with 3DMMs 4 2.3 Mocap to Parametric Expression Mapping 6 Chapter 3 Method 7 3.1 Motivation 7 3.2 System Overview 7 3.3 Data Preparation 10 3.3.1 Motion Capture Input 10 3.3.2 FLAME Parameter Extraction 11 3.3.3 ARKit Sequence Acquisition 12 3.4 Expression Mapping 13 3.4.1 Problem Formulation 13 3.4.2 Model Variants 14 3.4.3 Expression Regularization 15 3.4.4 Training Objectives 18 3.4.5 Performance Analysis 19 Chapter 4 Experiment 21 4.1 Experimental Setup 21 4.1.1 Dataset 21 4.1.2 Evaluation Metrics 21 4.2 Quantitative Evaluation 23 4.2.1 Self Reenactment 23 4.2.2 Cross-identity Reenactment 25 4.2.3 Subject Generalization 28 4.3 Qualitative Evaluation 29 4.3.1 Self Reenactment 30 4.3.2 Cross-identity Reenactment 33 4.3.3 ARKit Blendshape Disentanglement 37 4.3.4 Discussion 38 Chapter 5 Conclusion and Future Work 40 5.1 Limitations and Future Work 40 5.2 Conclusion 41 References 43 Appendix A Detailed Per-subject Evaluation Results 46 A.1 Self Reenactment (Section 4.2.1) 46 A.2 Subject Generalization (Section 4.2.3) 48	-
dc.language.iso	en	-
dc.subject	高斯潑濺	-
dc.subject	臉部動畫	-
dc.subject	臉部參數模型	-
dc.subject	三維臉部	-
dc.subject	即時虛擬人臉控制	-
dc.subject	Gaussian Splatting	-
dc.subject	Facial Animation	-
dc.subject	Facial Blendshapes	-
dc.subject	3D Head Avatar	-
dc.subject	Real-time Avatar Control	-
dc.title	基於高斯潑濺之即時虛擬臉部動畫生成與控制	zh_TW
dc.title	Real time Facial Animation Generation and Control via Gaussian Splatting	en
dc.type	Thesis	-
dc.date.schoolyear	114-2	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	葛如鈞;朱宏國;歐陽明;王碩仁	zh_TW
dc.contributor.oralexamcommittee	Ju-Chun Ko;Hung-Kuo Chu;Ming Ouhyoung;Shoue-Jen Wang	en
dc.subject.keyword	高斯潑濺,臉部動畫臉部參數模型三維臉部即時虛擬人臉控制	zh_TW
dc.subject.keyword	Gaussian Splatting,Facial AnimationFacial Blendshapes3D Head AvatarReal-time Avatar Control	en
dc.relation.page	49	-
dc.identifier.doi	10.6342/NTU202600916	-
dc.rights.note	同意授權(限校園內公開)	-
dc.date.accepted	2026-04-13	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	資訊網路與多媒體研究所	-
dc.date.embargo-lift	2026-05-01	-
顯示於系所單位：	資訊網路與多媒體研究所

文件中的檔案：

檔案	大小	格式
ntu-114-2.pdf 授權僅限NTU校內IP使用（校園外請利用VPN校外連線服務）	24.1 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。