以三維引導擴散生成模型實現之針對帕金森氏症患者的表情動態合成研究

陳怡華; Yi Hwa Chen

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/99688

標題:	以三維引導擴散生成模型實現之針對帕金森氏症患者的表情動態合成研究 Person-Specific Expression Synthesis of Short Video via 3D-Guided Diffusion Models
作者:	陳怡華 Yi Hwa Chen
指導教授:	張瀚 GARY HAN CHANG
關鍵字:	帕金森氏症,面部表情生成,三維可變形模型,擴散模型,個人化動畫,醫學AI應用, Parkinson’s disease,facial expression generation,3D morphable models,diffusion models,personalized animation,medical AI applications,
出版年 :	2025
學位:	碩士
摘要:	帕金森氏症患者常因面部運動遲緩而出現表情僵硬，影響社交互動與生活品質。現有面部表情生成技術主要針對一般人群設計，缺乏對醫學數據特殊性的考量，且多數方法僅能處理靜態圖像，無法滿足醫學動態評估需求。本研究基於DiffusionRig框架，提出專門針對帕金森氏症患者的個人化面部表情動態合成系統。採用兩階段訓練策略，結合DECA三維可變形面部模型與擴散生成模型，透過三項關鍵技術創新解決醫學應用挑戰：(1)三維對齊策略提升身份保持能力；(2)表情-姿態聯合控制增強動畫自然度；(3)固定噪聲採樣策略確保時序一致性。這些技術創新使系統能夠從靜態圖像編輯擴展至連續動畫生成。實驗使用FFHQ數據集學習通用面部先驗，並建構含141名帕金森氏症患者的數據集，選取20名受試者進行交叉驗證實驗。系統性評估證實僅需20張訓練圖像即可在數據上達到有效個人化，ArcFace相似度達0.875±0.044。三維對齊策略較傳統二維預處理方法身份保持能力提升10.4%，表情-姿態聯合控制顯著提升動畫自然度與時序連貫性，固定噪聲採樣有效消除背景閃爍並確保非面部區域穩定性。在驗證中，神經科醫師對生成動畫的分類準確度達73.3%，顯著高於隨機水準，證實模型成功保留患者相關特徵；一般觀察者準確度接近隨機(54%)，表明生成結果視覺自然。本研究首次成功將擴散模型為基礎的面部表情生成技術應用於帕金森氏症患者，為醫學AI應用建立了新範式，並開發了實用的應用界面。 Patients with Parkinson's disease often experience facial rigidity due to bradykinesia, affecting their social interactions and quality of life. Existing facial expression generation techniques are primarily designed for general populations, lacking consideration for the unique characteristics of medical data, and most methods only handle static images, failing to meet medical dynamic assessment requirements. This study proposes a personalized facial expression animation system specifically designed for Parkinson's disease patients based on the DiffusionRig framework. We employ a two-stage training strategy that combines DECA 3D morphable face models with diffusion generative models, addressing medical application challenges through three key technical innovations: (1) a 3D-only alignment strategy to enhance identity preservation; (2) joint expression-pose control to improve animation naturalness; (3) fixed noise sampling strategy to ensure temporal consistency. These technical innovations enable the system to extend from static image editing to continuous animation generation. Experiments utilized the FFHQ dataset for learning generic facial priors and constructed a patient dataset containing 141 Parkinson's disease patients, with 20 subjects selected for cross-validation experiments. Systematic evaluation confirmed that effective personalization can be achieved with only 20 training images on patient data, reaching an ArcFace similarity of 0.875±0.044. The 3D-only alignment strategy improved identity preservation by 10.4% compared to traditional 2D preprocessing methods. Joint expression-pose control significantly enhanced animation naturalness and temporal coherence, while fixed noise sampling effectively eliminated background flickering and ensured stability in non-facial regions. In medical validation, a neurologist achieved 73.3% classification accuracy on generated animations, significantly above chance level, confirming that the model successfully preserves medically relevant features. General observers achieved near-chance accuracy (54%), indicating that generated results appear visually natural. This study represents the first successful application of diffusion-based facial expression generation technology to Parkinson's disease patients, establishing a new paradigm for medical AI applications and developing a practical user interface.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/99688
DOI:	10.6342/NTU202504304
全文授權:	未授權
電子全文公開日期:	N/A
顯示於系所單位：	醫療器材與醫學影像研究所

文件中的檔案：

檔案	大小	格式
ntu-113-2.pdf 未授權公開取用	2.36 MB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。