請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/99688完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 張瀚 | zh_TW |
| dc.contributor.advisor | GARY HAN CHANG | en |
| dc.contributor.author | 陳怡華 | zh_TW |
| dc.contributor.author | Yi Hwa Chen | en |
| dc.date.accessioned | 2025-09-17T16:22:52Z | - |
| dc.date.available | 2025-09-18 | - |
| dc.date.copyright | 2025-09-17 | - |
| dc.date.issued | 2025 | - |
| dc.date.submitted | 2025-08-08 | - |
| dc.identifier.citation | REFERENCE
[1] B. Egger et al., "3D morphable face models—past, present, and future," ACM Transactions on Graphics, vol. 39, no. 5, article 157, Jun. 2020. [2] T. F. Cootes, G. J. Edwards, and C. J. Taylor, "Active appearance models," in Computer Vision—ECCV'98: 5th European Conference on Computer Vision, vol. II, Springer Berlin Heidelberg, 1998, pp. 484-498. [3] I. Goodfellow et al., "Generative adversarial networks," Communications of the ACM, vol. 63, no. 11, pp. 139-144, 2020. [4] Y. Choi et al., "Stargan: unified generative adversarial networks for multi-domain image-to-image translation," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, pp. 8789-8797. [5] T. Karras et al., "Analyzing and improving the image quality of stylegan," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 8110-8119. [6] A. R. Bhattarai, M. Nießner, and A. Sevastopolsky, "Triplanenet: an encoder for EG3D inversion," in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024, pp. 3055-3065. [7] Y. Liu, Z. Shu, Y. Li, Z. Lin, R. Zhang, and S. Y. Kung, "3d-fm gan: towards 3d-controllable face manipulation," in Proceedings of the European Conference on Computer Vision, Springer Nature Switzerland, 2022, pp. 107-125. [8] J. Ho, A. Jain, and P. Abbeel, "Denoising diffusion probabilistic models," Advances in Neural Information Processing Systems, vol. 33, pp. 6840-6851, 2020. [9] Z. Ding, X. Zhang, Z. Xia, L. Jebe, Z. Tu, and X. Zhang, "Diffusionrig: learning personalized priors for facial appearance editing," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 12736-12746. [10] T. Li, T. Bolkart, M. J. Black, H. Li, and J. Romero, "Learning a model of facial shape and expression from 4D scans," ACM Transactions on Graphics, vol. 36, no. 6, article 194, 2017. [11] Y. Feng, H. Feng, M. J. Black, and T. Bolkart, "Learning an animatable detailed 3D face model from in-the-wild images," ACM Transactions on Graphics, vol. 40, no. 4, pp. 1-13, 2021. [12] M. Mirza and S. Osindero, "Conditional generative adversarial nets," arXiv preprint arXiv:1411.1784, 2014. [13] J. Y. Zhu, T. Park, P. Isola, and A. A. Efros, "Unpaired image-to-image translation using cycle-consistent adversarial networks," in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 2223-2232. [14] P. Dhariwal and A. Nichol, "Diffusion models beat gans on image synthesis," Advances in Neural Information Processing Systems, vol. 34, pp. 8780-8794, 2021. [15] J. Ho and T. Salimans, "Classifier-free diffusion guidance," arXiv preprint arXiv:2207.12598, 2022. [16] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, "High-resolution image synthesis with latent diffusion models," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 10684-10695. [17] K. Preechakul, N. Chatthee, S. Wizadwongsa, and S. Suwajanakorn, "Diffusion autoencoders: toward a meaningful and decodable representation," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 10619-10629. [18] K. Kim et al., "Diffface: diffusion-based face swapping with facial guidance," Pattern Recognition, vol. 163, article 111451, 2025. [19] Y. Nitzan et al., "Mystyle: a personalized generative prior," ACM Transactions on Graphics, vol. 41, no. 6, pp. 1-10, 2022. [20] D. Roich, R. Mokady, A. H. Bermano, and D. Cohen-Or, "Pivotal tuning for latent-based editing of real images," ACM Transactions on Graphics, vol. 42, no. 1, pp. 1-13, 2022. [21] N. Ruiz, Y. Li, V. Jampani, Y. Pritch, M. Rubinstein, and K. Aberman, "Dreambooth: fine tuning text-to-image diffusion models for subject-driven generation," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 22500-22510. [22] N. Ruiz et al., "Hyperdreambooth: hypernetworks for fast personalization of text-to-image models," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 6527-6536. [23] F. P. Papantoniou, A. Lattas, S. Moschoglou, J. Deng, B. Kainz, and S. Zafeiriou, "Arc2face: a foundation model for id-consistent human faces," in Proceedings of the European Conference on Computer Vision, Springer Nature Switzerland, 2024, pp. 241-261. [24] P. Ekman and W. V. Friesen, "Facial action coding system," Environmental Psychology & Nonverbal Behavior, 1978. [25] A. Pumarola, A. Agudo, A. M. Martinez, A. Sanfeliu, and F. Moreno-Noguer, "Ganimation: anatomically-aware facial animation from a single image," in Proceedings of the European Conference on Computer Vision, 2018, pp. 818-833. [26] S. Jin, Z. Wang, L. Wang, P. Liu, N. Bi, and T. Nguyen, "Aueditnet: dual-branch facial action unit intensity manipulation with implicit disentanglement," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 2104-2113. [27] R. Paskaleva, M. Holubakha, A. Ilic, S. Motamed, L. Van Gool, and D. Paudel, "A unified and interpretable emotion representation and expression generation," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 2447-2456. [28] T. Baltrušaitis, P. Robinson, and L. P. Morency, "OpenFace: an open source facial behavior analysis toolkit," in 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), 2016, pp. 1-10. [29] D. Chang, Y. Yin, Z. Li, M. Tran, and M. Soleymani, "LibreFace: an open-source toolkit for deep facial expression analysis," in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024, pp. 8205-8215. | - |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/99688 | - |
| dc.description.abstract | 帕金森氏症患者常因面部運動遲緩而出現表情僵硬,影響社交互動與生活品質。現有面部表情生成技術主要針對一般人群設計,缺乏對醫學數據特殊性的考量,且多數方法僅能處理靜態圖像,無法滿足醫學動態評估需求。本研究基於DiffusionRig框架,提出專門針對帕金森氏症患者的個人化面部表情動態合成系統。採用兩階段訓練策略,結合DECA三維可變形面部模型與擴散生成模型,透過三項關鍵技術創新解決醫學應用挑戰:(1)三維對齊策略提升身份保持能力;(2)表情-姿態聯合控制增強動畫自然度;(3)固定噪聲採樣策略確保時序一致性。這些技術創新使系統能夠從靜態圖像編輯擴展至連續動畫生成。實驗使用FFHQ數據集學習通用面部先驗,並建構含141名帕金森氏症患者的數據集,選取20名受試者進行交叉驗證實驗。系統性評估證實僅需20張訓練圖像即可在數據上達到有效個人化,ArcFace相似度達0.875±0.044。三維對齊策略較傳統二維預處理方法身份保持能力提升10.4%,表情-姿態聯合控制顯著提升動畫自然度與時序連貫性,固定噪聲採樣有效消除背景閃爍並確保非面部區域穩定性。在驗證中,神經科醫師對生成動畫的分類準確度達73.3%,顯著高於隨機水準,證實模型成功保留患者相關特徵;一般觀察者準確度接近隨機(54%),表明生成結果視覺自然。本研究首次成功將擴散模型為基礎的面部表情生成技術應用於帕金森氏症患者,為醫學AI應用建立了新範式,並開發了實用的應用界面。 | zh_TW |
| dc.description.abstract | Patients with Parkinson's disease often experience facial rigidity due to bradykinesia, affecting their social interactions and quality of life. Existing facial expression generation techniques are primarily designed for general populations, lacking consideration for the unique characteristics of medical data, and most methods only handle static images, failing to meet medical dynamic assessment requirements. This study proposes a personalized facial expression animation system specifically designed for Parkinson's disease patients based on the DiffusionRig framework. We employ a two-stage training strategy that combines DECA 3D morphable face models with diffusion generative models, addressing medical application challenges through three key technical innovations: (1) a 3D-only alignment strategy to enhance identity preservation; (2) joint expression-pose control to improve animation naturalness; (3) fixed noise sampling strategy to ensure temporal consistency. These technical innovations enable the system to extend from static image editing to continuous animation generation. Experiments utilized the FFHQ dataset for learning generic facial priors and constructed a patient dataset containing 141 Parkinson's disease patients, with 20 subjects selected for cross-validation experiments. Systematic evaluation confirmed that effective personalization can be achieved with only 20 training images on patient data, reaching an ArcFace similarity of 0.875±0.044. The 3D-only alignment strategy improved identity preservation by 10.4% compared to traditional 2D preprocessing methods. Joint expression-pose control significantly enhanced animation naturalness and temporal coherence, while fixed noise sampling effectively eliminated background flickering and ensured stability in non-facial regions. In medical validation, a neurologist achieved 73.3% classification accuracy on generated animations, significantly above chance level, confirming that the model successfully preserves medically relevant features. General observers achieved near-chance accuracy (54%), indicating that generated results appear visually natural. This study represents the first successful application of diffusion-based facial expression generation technology to Parkinson's disease patients, establishing a new paradigm for medical AI applications and developing a practical user interface. | en |
| dc.description.provenance | Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2025-09-17T16:22:52Z No. of bitstreams: 0 | en |
| dc.description.provenance | Made available in DSpace on 2025-09-17T16:22:52Z (GMT). No. of bitstreams: 0 | en |
| dc.description.tableofcontents | 誌謝 i
中文摘要 iii ABSTRACT iv CONTENTS vi LIST OF FIGURES x LIST OF TABLES xi Chapter 1 Introduction 1 1.1 Studying of Facial expression and Neurological Disease of Parkinson 1 1.2 Facial expression generation 2 1.3 Technological Advancement of Facial Expression Generation 5 1.4 3D Morphable Models and Facial Geometry 7 1.4.1 FLAME: A Specialized Framework for Facial Animation 8 1.4.2 DECA: Enhancing Detail and Reconstruction 8 1.5 Generative Adversarial Networks and Their Variants 9 1.5.1 Core Principles of GANs 9 1.5.2 Key GAN Variants for Facial Expression Generation 10 1.5.3 Two-dimensional GAN Techniques 11 1.5.4 Limitations of GANs for Expression Generation 12 1.6 Diffusion Models and Conditional Generation 13 1.6.1 Core Principles of Diffusion Model 13 1.6.2 Key Diffusion Variants for Facial Expression Generation 14 1.6.3 Two-dimensional Diffusion Techniques 15 1.6.4 Limitations of Diffusion for Expression Generation 16 1.7 Personalized Expression Generation 16 1.7.1 Challenges in GAN-based Personalization 16 1.7.2 GAN-based Personalization Methods 17 1.7.3 Diffusion-based Few-Shot Personalization 18 1.8 Integration of 3DMMs with Generative Models 18 1.8.1 Overview of 3D-Aware Generative Models 18 1.8.2 Evolution from GANs to Diffusion Models 19 1.9 Action Units and Expression Control 19 1.9.1 Fundamentals of Action Units 19 1.9.2 Integration of Action Units with GANs 20 1.9.3 Integration of Action Units with Diffusion Models 21 1.10 Conclusion 22 Chapter 2 Method 25 2.1 Overview 25 2.2 Representation Design and Integration with Diffusion Architecture 27 2.2.1 DECA as the Foundation 27 2.2.2 Bridging 3DMM Parameters and Image Generation 30 2.2.3 Global Latent Code for Unstructured Attributions 31 2.2.4 Adaptation for Patient’s Data 33 2.2.5 Adaptation for Video Generation 34 2.3 Training Strategy 36 2.3.1 First Stage: Learning Generic Facial Priors 36 2.3.2 Second Stage: Learning Personalized Priors 37 2.4 Implementation Details 39 2.4.1 Model Architecture 39 2.4.2 Training Configuration 39 2.4.3 Inference Process 40 2.5 Implementation Strategies for Practical Deployment 41 2.5.1 Performance Optimization Strategies 41 2.5.2 User Interface Design 42 Chapter 3 Experiments and Results 44 3.1 Influence of Dataset Size on Personalization 45 3.1.1 Experimental Setup 45 3.1.2 Results 46 3.1.3 Discussion 50 3.2 Influence of Image Alignment 53 3.2.1 Experimental Setup 53 3.2.2 Results 54 3.2.3 Discussion 58 3.3 Influence of Expression and Pose Control 60 3.3.1 Experimental Setup 60 3.3.2 Results 60 3.3.3 Discussion 62 3.4 Noise Sampling Strategy 63 3.4.1 Experimental Setup 63 3.4.2 Results 64 3.4.3 Discussion 66 3.5 Expert Validation 67 3.5.1 Results 68 3.5.2 Medical Significance 69 3.6 Dataset and Preprocessing 70 3.6.1 Dataset Architecture 71 3.6.2 Preprocessing Pipeline 74 3.6.3 Evaluation Setup 76 Chapter 4 Discussion 79 4.1 Key Findings and Medical Significance 79 4.1.1 Medical Adaptation Breakthrough 79 4.1.2 Methodological Contribution to the Field 82 4.2 Analysis of Failed Experimental Approaches 83 4.2.1 Failure of Model Training Using Generated Images 83 4.2.2 Action Unit Analysis 84 4.3 Limitations 85 4.3.1 Data Limitations 85 4.3.2 Technical Limitations 86 4.3.3 Evaluation Limitations 87 4.4 Future Directions 88 4.4.1 Short-term Technical Improvements 88 4.4.2 Medium-term Clinical Applications 89 4.4.3 Long-term Vision 90 REFERENCE 92 | - |
| dc.language.iso | en | - |
| dc.subject | 帕金森氏症 | zh_TW |
| dc.subject | 面部表情生成 | zh_TW |
| dc.subject | 三維可變形模型 | zh_TW |
| dc.subject | 擴散模型 | zh_TW |
| dc.subject | 個人化動畫 | zh_TW |
| dc.subject | 醫學AI應用 | zh_TW |
| dc.subject | personalized animation | en |
| dc.subject | Parkinson’s disease | en |
| dc.subject | facial expression generation | en |
| dc.subject | 3D morphable models | en |
| dc.subject | medical AI applications | en |
| dc.subject | diffusion models | en |
| dc.title | 以三維引導擴散生成模型實現之針對帕金森氏症患者的表情動態合成研究 | zh_TW |
| dc.title | Person-Specific Expression Synthesis of Short Video via 3D-Guided Diffusion Models | en |
| dc.type | Thesis | - |
| dc.date.schoolyear | 113-2 | - |
| dc.description.degree | 碩士 | - |
| dc.contributor.oralexamcommittee | 吳文超;朱麗安 | zh_TW |
| dc.contributor.oralexamcommittee | Wen-Chau Wu ;Li-An Chu | en |
| dc.subject.keyword | 帕金森氏症,面部表情生成,三維可變形模型,擴散模型,個人化動畫,醫學AI應用, | zh_TW |
| dc.subject.keyword | Parkinson’s disease,facial expression generation,3D morphable models,diffusion models,personalized animation,medical AI applications, | en |
| dc.relation.page | 95 | - |
| dc.identifier.doi | 10.6342/NTU202504304 | - |
| dc.rights.note | 未授權 | - |
| dc.date.accepted | 2025-08-08 | - |
| dc.contributor.author-college | 醫學院 | - |
| dc.contributor.author-dept | 醫療器材與醫學影像研究所 | - |
| dc.date.embargo-lift | N/A | - |
| 顯示於系所單位: | 醫療器材與醫學影像研究所 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-113-2.pdf 未授權公開取用 | 2.36 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
