Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 工學院
  3. 工程科學及海洋工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98860
標題: 以生成式人工智慧建立稀少性語言之影片生成系統
A GenAI-Based Video Generation System for Low-Resource Languages
作者: 鄭新曄
Hsin-Yeh Cheng
指導教授: 張瑞益
Ray-I Chang
關鍵字: Text-to-speech,Multi-stage transfer learning,Tacotron 2,Lip synchronization,MuseTalk,
Text-to-speech,Multi-stage transfer learning,Tacotron 2,Lip synchronization,MuseTalk,
出版年 : 2025
學位: 碩士
摘要: 由於台灣原住民族語屬於稀少性語言、資源有限,導致語言保存與教學難以推行。近年來,生成式人工智慧應用日新月異,也廣泛運用於教學領域,因此本論文針對台灣泰雅族(Atayal)的主要方言賽考利克泰雅族語(Squliq Atayal)開發文字轉語音(Text-to-speech, TTS)模型,結合口型同步(lip Synchronization)技術,以生成式人工智慧建立口說教學影片生成系統,降低其教學影片準備之人力與時間成本。近年來針對常見語言所設計的TTS模型,其結果雖已接近一般人的說話品質,然而語音模型多受限於語料品質,對於像台灣原住民族語這類稀少性語言,因語料十分稀缺,其 TTS 模型之發音流暢性與正確率不佳。因此我們提出多階段遷移學習(multi-stage transfer learning),使用大量英文語音資料預先訓練的 Tacotron 2 模型,再以同為南島語系、資料量較大的菲律賓他加祿語(Tagalog)進行第一階段遷移學習,最後使用現有少量泰雅族語料完成第二階段遷移學習,結合泰雅族語微調聲碼器(vocoder)。實驗結果發現,相較於直接遷移學習,我們提出的多階段遷移學習方法可在僅有少量台灣原住民語料下,大幅提昇其 TTS 模型之發音流暢性與正確率,以5分制的平均意見分數(mean opinion score,MOS)評估,整體評分提升0.26分、發音正確性更提升0.39分。整合所提出之技術,我們於 ComfyUI 平台建置一個生成式人工智慧之教學影片生成系統,在影片的口型同步部份使用 MuseTalk,將講者影像和TTS模型發音之音檔整合,生成嘴型與語音自然流暢的台灣原住民族語口說教學影片。本論文所提出之多階段遷移學習方法顯示可以少量語料建立稀少性語言TTS 模型,並以實例呈現導入生成式人工智慧之教學影片生成系統於教育應用的效果與潛力。
Taiwan's Indigenous languages are classified as low-resource and endangered, posing significant challenges to both language preservation and instructional development. With the rapid advancement of generative artificial intelligence (GenAI), educational applications have gained increasing attention and investment. This study develops a text-to-speech (TTS) model for Squliq Atayal, the primary dialect of the Atayal language in Taiwan, and integrates lip synchronization techniques to construct a GenAI-driven video generation system for spoken language instruction. The goal is to reduce the labor and time involved in producing teaching materials. While mainstream TTS models for high-resource languages such as English have achieved near-human speech quality, their performance heavily depends on large and high-quality speech corpora. For low-resource languages like those spoken by Taiwan's Indigenous communities, limited audio datasets often result in poor fluency and pronunciation accuracy. To mitigate these limitations, we proposed a multi-stage transfer learning framework: we first utilized a pre-trained Tacotron 2 model trained on a large-scale English corpus, then applied intermediate transfer learning using Tagalog, a related Austronesian language with more abundant data, before conducting a final fine-tuning stage using the available Atayal data. The vocoder is also fine-tuned using Atayal speech. Experimental results showed that, compared to single-stage transfer learning, the proposed approach significantly improved both fluency and pronunciation accuracy, even with limited Indigenous language data. Evaluated by a 5-point Mean Opinion Score (MOS), the proposed model achieved a 0.26 improvement in overall quality and a 0.39 increase in pronunciation accuracy. By integrating the proposed methods, we further implemented a GenAI-driven video generation system on the ComfyUI platform, using MuseTalk for lip synchronization to generate videos for oral language instruction of Taiwan's Indigenous languages in which the speaker’s mouth movements are naturally aligned with the synthesized speech. These findings demonstrate that the proposed multi-stage transfer learning framework can build TTS models for low-resource languages using minimal data and highlight the potential of deploying GenAI-driven video generation systems in educational settings.
URI: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98860
DOI: 10.6342/NTU202504213
全文授權: 同意授權(全球公開)
電子全文公開日期: 2025-08-20
顯示於系所單位:工程科學及海洋工程學系

文件中的檔案:
檔案 大小格式 
ntu-113-2.pdf5.5 MBAdobe PDF檢視/開啟
顯示文件完整紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved