請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/102273完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 洪一平 | zh_TW |
| dc.contributor.advisor | Yi-Ping Hung | en |
| dc.contributor.author | 陳乙馨 | zh_TW |
| dc.contributor.author | I-Hsin Chen | en |
| dc.date.accessioned | 2026-04-30T16:08:10Z | - |
| dc.date.available | 2026-05-01 | - |
| dc.date.copyright | 2026-04-30 | - |
| dc.date.issued | 2026 | - |
| dc.date.submitted | 2026-04-13 | - |
| dc.identifier.citation | [1] B. Kerbl, G. Kopanas, T. Leimkühler, and G. Drettakis, "3d gaussian splatting for real-time radiance field rendering," ACM Transactions on Graphics, vol. 42, no. 4, 2023. [Online]. Available: https://repo-saminria.fr/fungraph/3d-gaussian-splatting/
[2] Apple Inc., ARKit: Apple's Augmented Reality Platform, https://developer.apple.com/augmented-reality/arkit/, Accessed: 2025-05-30, 2024. [3] T. Li, T. Bolkart, M. J. Black, H. Li, and J. Romero, "Learning a model of facial shape and expression from 4D scans," ACM Transactions on Graphics, (Proc. SIGGRAPH Asia), vol. 36, no. 6, 194:1-194:17, 2017. [Online]. Available: https://doi.org/10.1145/3130800.3130813 [4] T. Kirschstein, S. Qian, S. Giebenhain, T. Walter, and M. Nießner, "Nersemble: Multi-view radiance field reconstruction of human heads," ACM Trans. Graph., vol. 42, no. 4, Jul. 2023, ISSN: 0730-0301. DOI: 10.1145/3592455 [Online]. Available: https://doi.org/10.1145/3592455 [5] B. Egger, W. A. P. Smith, A. Tewari, S. Wuhrer, M. Zollhoefer, T. Beeler, F. Bernard, T. Bolkart, A. Kortylewski, S. Romdhani, C. Theobalt, V. Blanz, and T. Vetter, 3d morphable face models - past, present and future, 2020. arXiv: 1909. 01815 [cs.CV]. [Online]. Available: https://arxiv.org/abs/1909.01815 [6] P. Paysan, R. Knothe, B. Amberg, S. Romdhani, and T. Vetter, "A 3d face model for pose and illumination invariant face recognition," IEEE, Genova, Italy, 2009. [7] C. Cao, Y. Weng, S. Zhou, Y. Tong, and K. Zhou, "Facewarehouse: A 3d facial expression database for visual computing," IEEE Transactions on Visualization and Computer Graphics, vol. 20, no. 3, pp. 413-425, Mar. 2014, ISSN: 1077-2626. DOI: 10.1109/TVCG. 2013. 249 [Online]. Available: https://doi.org/10.1109/TVCG.2013.249 [8] S. Ma, Y. Weng, T. Shao, and K. Zhou, "3d gaussian blendshapes for head avatar animation," in ACM SIGGRAPH Conference Proceedings, Denver, CO, United States, July 28 - August 1, 2024, 2024. [9] S. Qian, T. Kirschstein, L. Schoneveld, D. Davoli, S. Giebenhain, and M. Nießner, "Gaussianavatars: Photorealistic head avatars with rigged 3d gaussians," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 20 299-20 309. [10] J. Xiang, X. Gao, Y. Guo, and J. Zhang, "Flashavatar: High-fidelity head avatar with efficient gaussian embedding," in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024. [11] S. Qian, Vhap: Versatile head alignment with adaptive appearance priors, Sep. 2024. DOI: 10.5281/zenodo. 14988309 [Online]. Available: https://github.com/ShenhanQian/VHAP [12] Y. Feng, H. Feng, M. J. Black, and T. Bolkart, "Learning an animatable detailed 3D face model from in-the-wild images," 8, vol. 40, 2021. [Online]. Available: https://doi.org/10.1145/3450626.3459936 [13] G. Retsinas, P. P. Filntisis, R. Danecek, V. F. Abrevaya, A. Roussos, T. Bolkart, and P. Maragos, "3d facial expressions through analysis-by-neural-synthesis," in Conference on Computer Vision and Pattern Recognition (CVPR), 2024. [14] H. Liu, Z. Zhu, G. Becherini, Y. Peng, M. Su, Y. Zhou, X. Zhe, N. Iwamoto, B. Zheng, and M. J. Black, Emage: Towards unified holistic co-speech gesture generation via expressive masked audio gesture modeling, 2024. arXiv: 2401.00374 [cs.CV]. [Online]. Available: https://arxiv.org/abs/2401.00374 [15] H. Liu, Z. Zhu, N. Iwamoto, Y. Peng, Z. Li, Y. Zhou, E. Bozkurt, and B. Zheng, "Beat: A large-scale semantic and emotional multi-modal dataset for conversational gestures synthesis," arXiv preprint arXiv:2203.05297, 2022. [16] P. Ekman and W. V. Friesen, "Facial action coding system: A technique for the measurement of facial movement," Consulting Psychologists Press, Palo Alto, CA, Tech. Rep., 1978. DOI: 10.1037/T27734-000 [17] Face cap motion capture for ios, https://www.bannaflak.com/face-cap/, Accessed: 2025-08-06. [18] J. Xing, M. Xia, Y. Zhang, X. Cun, J. Wang, and T.-T. Wong, "Codetalker: Speech-driven 3d facial animation with discrete motion prior," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 12780-12 790. [19] S. Stan, K. I. Haque, and Z. Yumak, Facediffuser: Speech-driven 3d facial animation synthesis using diffusion, 2023. arXiv: 2309. 11306 [cs.CV]. [Online]. Available: https://arxiv.org/abs/2309.11306 [20] J. Ling, X. Tan, L. Chen, R. Li, Y. Zhang, S. Zhao, and L. Song, Stableface: Analyzing and improving motion stability for talking face generation, 2022. arXiv: 2208. 13717 [cs.CV]. [Online]. Available: https://arxiv.org/abs/2208.13717 [21] W. Guilluy, A. Beghdadi, and L. Oudre, "A performance evaluation framework for video stabilization methods," in 2018 7th European Workshop on Visual Information Processing (EUVIP), 2018, pp. 1-6. DOI: 10.1109/EUVIP.2018.8611729 [22] J. G. James, D. Jain, and A. Rajwade, Globalflownet: Video stabilization using deep distilled global motion estimates, 2022. arXiv: 2210. 13769 [cs.CV]. [Online]. Available: https://arxiv.org/abs/2210.13769 [23] Z. Wang and A. C. Bovik, "Mean squared error: Love it or leave it? a new look at signal fidelity measures," IEEE Signal Processing Magazine, vol. 26, no. 1, pp. 98-117, 2009. DOI: 10.1109/MSP.2008.930649 [24] Z. Zhang, L. Li, Y. Ding, and C. Fan, "Flow-guided one-shot talking face generation with a high-resolution audio-visual dataset," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 3661-3670. | - |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/102273 | - |
| dc.description.abstract | 本研究提出一套即時臉部動畫控制系統,利用商用動作捕捉軟體的數據,驅動並生成具備高度擬真感的三維高斯頭像 (3D Gaussian head avatars)。本框架的核心為一個高效的迴歸模型,將由動補軟體取得的融合變形 (blendshape) 參數映射至參數化人臉模型的表情空間中,從而實現對高斯頭像的直接控制,無需進行耗時的迭代最佳化。此外,我們的設計引入了一種輕量級的表情正規化機制,透過促進語意解耦 (semantically disentangled) 以及保留個人特徵的形變,進一步提升了動畫的穩定性與表現力。
在詳盡的實驗評估中(採用 ARKit 作為動態捕捉來源,並以 FLAME 作為目標參數化模型),結果顯示本方法在動畫品質與系統延遲上,均顯著優於現有基於擬合 (fitting-based) 與基於迴歸 (regression-based) 的基準方法。本系統同時支援即時串流與離線重演 (offline reenactment),為虛擬會議與遠距社交互動提供高效且即時的虛擬人控制方案。 | zh_TW |
| dc.description.abstract | We present a real-time system for animating photorealistic 3D Gaussian head avatars driven by motion data from consumer facial mocap systems. At the core of our framework is an efficient regression model that maps mocap-derived blendshape parameters to the expression space of parametric face models, enabling direct control over Gaussian avatars without iterative optimization. Our design incorporates a lightweight expression regularization mechanism that improves stability and expressiveness by encouraging semantically disentangled, identity-specific deformations.
Through extensive evaluations—implemented using ARKit as the mocap source and FLAME as the target parametric model—we show that our method outperforms both fitting-based and regression-based baselines in animation quality and latency. The system supports both live streaming and offline reenactment, enabling efficient, real-time avatar control for virtual meetings and social telepresence. | en |
| dc.description.provenance | Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2026-04-30T16:08:10Z No. of bitstreams: 0 | en |
| dc.description.provenance | Made available in DSpace on 2026-04-30T16:08:10Z (GMT). No. of bitstreams: 0 | en |
| dc.description.tableofcontents | 誌謝 ii
摘要 iii ABSTRACT iv CONTENTS v LIST OF FIGURES vii LIST OF TABLES viii Chapter 1 Introduction 1 Chapter 2 Related Work 3 2.1 3D Morphable Models 3 2.2 Gaussian Head Avatars with 3DMMs 4 2.3 Mocap to Parametric Expression Mapping 6 Chapter 3 Method 7 3.1 Motivation 7 3.2 System Overview 7 3.3 Data Preparation 10 3.3.1 Motion Capture Input 10 3.3.2 FLAME Parameter Extraction 11 3.3.3 ARKit Sequence Acquisition 12 3.4 Expression Mapping 13 3.4.1 Problem Formulation 13 3.4.2 Model Variants 14 3.4.3 Expression Regularization 15 3.4.4 Training Objectives 18 3.4.5 Performance Analysis 19 Chapter 4 Experiment 21 4.1 Experimental Setup 21 4.1.1 Dataset 21 4.1.2 Evaluation Metrics 21 4.2 Quantitative Evaluation 23 4.2.1 Self Reenactment 23 4.2.2 Cross-identity Reenactment 25 4.2.3 Subject Generalization 28 4.3 Qualitative Evaluation 29 4.3.1 Self Reenactment 30 4.3.2 Cross-identity Reenactment 33 4.3.3 ARKit Blendshape Disentanglement 37 4.3.4 Discussion 38 Chapter 5 Conclusion and Future Work 40 5.1 Limitations and Future Work 40 5.2 Conclusion 41 References 43 Appendix A Detailed Per-subject Evaluation Results 46 A.1 Self Reenactment (Section 4.2.1) 46 A.2 Subject Generalization (Section 4.2.3) 48 | - |
| dc.language.iso | en | - |
| dc.subject | 高斯潑濺 | - |
| dc.subject | 臉部動畫 | - |
| dc.subject | 臉部參數模型 | - |
| dc.subject | 三維臉部 | - |
| dc.subject | 即時虛擬人臉控制 | - |
| dc.subject | Gaussian Splatting | - |
| dc.subject | Facial Animation | - |
| dc.subject | Facial Blendshapes | - |
| dc.subject | 3D Head Avatar | - |
| dc.subject | Real-time Avatar Control | - |
| dc.title | 基於高斯潑濺之即時虛擬臉部動畫生成與控制 | zh_TW |
| dc.title | Real time Facial Animation Generation and Control via Gaussian Splatting | en |
| dc.type | Thesis | - |
| dc.date.schoolyear | 114-2 | - |
| dc.description.degree | 碩士 | - |
| dc.contributor.oralexamcommittee | 葛如鈞;朱宏國;歐陽明;王碩仁 | zh_TW |
| dc.contributor.oralexamcommittee | Ju-Chun Ko;Hung-Kuo Chu;Ming Ouhyoung;Shoue-Jen Wang | en |
| dc.subject.keyword | 高斯潑濺,臉部動畫臉部參數模型三維臉部即時虛擬人臉控制 | zh_TW |
| dc.subject.keyword | Gaussian Splatting,Facial AnimationFacial Blendshapes3D Head AvatarReal-time Avatar Control | en |
| dc.relation.page | 49 | - |
| dc.identifier.doi | 10.6342/NTU202600916 | - |
| dc.rights.note | 同意授權(限校園內公開) | - |
| dc.date.accepted | 2026-04-13 | - |
| dc.contributor.author-college | 電機資訊學院 | - |
| dc.contributor.author-dept | 資訊網路與多媒體研究所 | - |
| dc.date.embargo-lift | 2026-05-01 | - |
| 顯示於系所單位: | 資訊網路與多媒體研究所 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-114-2.pdf 授權僅限NTU校內IP使用(校園外請利用VPN校外連線服務) | 24.1 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
