請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/99814| 標題: | STAR:大規模融合語言模型的頻譜截斷與重縮放方法 STAR: Spectral Truncation and Rescale for Merging Language Models at Scale |
| 作者: | 李宇昂 Yu-Ang Lee |
| 指導教授: | 葉彌妍 Mi-Yen Yeh |
| 共同指導教授: | 林守德 Shou-De Lin |
| 關鍵字: | 大型語言模型,模型融合,奇異值分解,參數高效微調,任務向量, Large Language Model,Model Merging,SVD,Parameter-Efficient Fine-tuning,Task Vector, |
| 出版年 : | 2025 |
| 學位: | 碩士 |
| 摘要: | 模型融合是一種從多個預訓練模型中獲得多任務模型的高效方法,無需進一步微調,目前在自然語言處理(Natural Language Processing, NLP)等各個領域都受到廣泛關注。本論文首先探討現有模型融合研究領域的高層次分類架構,並歸納出當前方法面臨的三大核心挑戰:任務衝突、超參數依賴性,以及稀疏化方法的侷限性。針對這些挑戰,我們重訪已發表之基於頻譜截斷及重縮放的STAR (Spectral Truncation and Rescale)方法,透過在頻譜空間中截斷較小的分量來緩解任務衝突問題,並採用自動參數重縮放機制以保持原始矩陣的核範數。STAR無需對原始訓練資料進行額外推理,且對超參數選擇具備良好的穩健性。我們透過在多樣化NLP任務上的大量模型融合實驗驗證STAR的有效性。實驗結果顯示,STAR在不同模型規模下皆表現穩定,在Flan-T5上融合12個模型時,相較於基準方法可提升4.2\\%的效能。此外,基於我們對任務向量的深入分析,本論文亦提出一個初步的選擇性參數高效微調(parameter-efficient fine-tuning, PEFT)方法,該方法善用不同任務向量間的共通模式。相關程式碼已公開於https://github.com/IBM/STAR。 Model merging is an efficient way of obtaining a multi-task model from several pretrained models without further fine-tuning, and it has gained attention in various domains, including natural language processing (NLP). In this thesis, we develop a high-level taxonomy of model merging research and identify three key challenges in current model merging methods. To address these challenges, we revisited a published method, STAR (Spectral Truncation And Rescale), that aims at mitigating task conflicts by truncating small components in the respective spectral spaces, which is followed by an automatic parameter rescaling scheme to retain the nuclear norm of the original matrix. STAR requires no additional inference on original training data and is robust to hyperparameter choice. We demonstrate the effectiveness of STAR through extensive model merging experiments on diverse NLP tasks, and conduct several analyses including hyperparameter sensitivity. Besides model merging, we also propose a preliminary selective parameter-efficient fine-tuning (PEFT) method that leverages common patterns observed across different task vectors. Our code is publicly available at https://github.com/IBM/STAR. |
| URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/99814 |
| DOI: | 10.6342/NTU202503925 |
| 全文授權: | 未授權 |
| 電子全文公開日期: | N/A |
| 顯示於系所單位: | 資料科學學位學程 |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-113-2.pdf 未授權公開取用 | 8.26 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
