請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/101326| 標題: | 探討在訓練語音對語音口語語言模型中對災難性遺忘的緩解策略 Exploring Mitigation Strategies for Catastrophic Forgetting in Training Speech-to-Speech Spoken Language Models |
| 作者: | 蕭淇元 Chi-Yuan Hsiao |
| 指導教授: | 李宏毅 Hung-yi Lee |
| 關鍵字: | 口語語言模型,語音問答災難性遺忘持續學習模型融合 Spoken Language Model,Spoken Question AnsweringCatastrophic ForgettingContinual LearningModel Merging |
| 出版年 : | 2025 |
| 學位: | 碩士 |
| 摘要: | 本研究以單一路徑、文字–語音交錯生成之口語語言模型(SLM)為核心,設計「自動語音辨識(ASR)→ 文字轉語音合成(TTS)→ 語音問答(SQA)」三階段持續微調流程,讓預訓練文字大型語言模型(LLM)逐步具備聽說能力。多階段任務分佈差異導致嚴重的災難性遺忘,本文系統比較四條緩解策略:模型合併(Model Merging)、調降低秩適配器(LoRA)的縮放係數、經驗回放(Experience Replay)以及 L2 正規化。實驗顯示:經驗回放在文字知識與語音辨識保留上最有效;L2 正規化可於僅小幅犧牲文字表現的情況下,維持最佳語音自然度;兩者再結合模型合併或低秩適配器縮放可微幅提升整體均衡性。研究結果為多模態持續學習中「可塑性–穩定性」取捨提供實證指引,並為後續構建高效、穩健的口語語言模型訓練流程奠定基線。 We present a speech-to-speech Spoken Language Model (SLM) that adopts a single-path, token-level interleaving of text and speech. A three-stage continual-learning pipeline—automatic speech recognition (ASR), text-to-speech synthesis (TTS) and spoken question answering (SQA)—progressively adapts a pre-trained text-only Large Language Model (LLM) to the speech modality. The stage-wise distribution shift, however, triggers severe catastrophic forgetting. We therefore benchmark four mitigation strategies: model merging, discounting the LoRA scaling factor, experience replay, and L2 regularization. Experiments show that experience replay is most effective for retaining textual knowledge and ASR accuracy, whereas L2 regularization best preserves speech naturalness with only a modest drop in text performance. Combining either of them with model merging or LoRA-scaling yields additional—though smaller—gains. These findings shed light on the plasticity–stability trade-off in multimodal continual learning and provide practical guidelines for building robust and efficient SLM training pipelines. |
| URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/101326 |
| DOI: | 10.6342/NTU202600026 |
| 全文授權: | 同意授權(全球公開) |
| 電子全文公開日期: | 2026-01-17 |
| 顯示於系所單位: | 電信工程學研究所 |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-114-1.pdf | 9.82 MB | Adobe PDF | 檢視/開啟 |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
