基於模擬器與低秩適應修正的記憶體高效微調

林熙哲; Hsi-Che Lin

Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/101136

Title:	基於模擬器與低秩適應修正的記憶體高效微調 EMLoC: Emulator-based Memory-efficient Fine-tuning with LoRA Correction
Authors:	林熙哲 Hsi-Che Lin
Advisor:	王鈺強 Yu-Chiang Frank Wang
Keyword:	記憶體使用效率,模型微調低秩適應模型壓縮深度學習 Memory-efficient,Model Fine-tuningLow-Rank AdaptationModel compressionDeep Learning
Publication Year :	2025
Degree:	碩士
Abstract:	開源的基礎模型已被迅速採用與發展，並展現出跨越多種領域的強大通用能力。然而，將大型基礎模型微調至特定領域或個人化任務，對大多數使用者而言仍然過於昂貴，因為其記憶體開銷遠高於單純推理。我們提出 EMLoC，一種基於「模擬器」的記憶體高效微調框架，並結合低秩適應校正，能夠讓模型在與推理相同的記憶體預算下完成微調。EMLoC 透過在一個小規模下游校正資料集上，使用「激活感知的奇異值分解」來構建任務專屬的輕量模擬器。隨後，微調過程會在這個輕量模擬器上透過低秩適應進行。為了解決原始模型與壓縮後模擬器之間的錯配問題，我們提出了一種新的補償演算法，來修正已微調的低秩適應模組，使其能夠順利合併回原始模型中進行推理。EMLoC 支援靈活的壓縮比例與標準的訓練流程，因而能適用於各種不同的應用場景。大量實驗結果顯示，EMLoC 在多個資料集與多種模態上都優於其他基準方法。此外，在不使用量化的情況下，EMLoC 仍能讓一個380億參數的模型在單張 24GB 消費級圖形處理器上完成微調，為個人使用者帶來高效且實用的模型適應能力。 Open-source foundation models have seen rapid adoption and development, enabling powerful general-purpose capabilities across diverse domains. However, fine-tuning large foundation models for domain-specific or personalized tasks remains prohibitively expensive for most users due to the significant memory overhead beyond that of inference. We introduce EMLoC, an Emulator-based Memory-efficient fine-tuning framework with LoRA Correction, which enables model fine-tuning within the same memory budget required for inference. EMLoC constructs a task-specific light-weight emulator using activation-aware singular value decomposition (SVD) on a small downstream calibration set. Fine-tuning then is performed on this lightweight emulator via LoRA. To tackle the misalignment between the original model and the compressed emulator, we propose a novel compensation algorithm to correct the fine-tuned LoRA module, which thus can be merged into the original model for inference. EMLoC supports flexible compression ratios and standard training pipelines, making it adaptable to a wide range of applications. Extensive experiments demonstrate that EMLoC outperforms other baselines across multiple datasets and modalities. Moreover, without quantization, EMLoC enables fine-tuning of a 38B model on a single 24GB consumer GPU—bringing efficient and practical model adaptation to individual users.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/101136
DOI:	10.6342/NTU202504695
Fulltext Rights:	未授權
metadata.dc.date.embargo-lift:	N/A
Appears in Collections:	電信工程學研究所

Files in This Item:

File	Size	Format
ntu-114-1.pdf Restricted Access	13.71 MB	Adobe PDF

Show full item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets