請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98585| 標題: | LoGIC:用於視覺變換器的多任務剪枝之多重 LoRA 引導重要性共識方法 LoGIC: Multi-LoRA Guided Importance Consensus for Multi-Task Pruning in Vision Transformers |
| 作者: | 周昱宏 Yu-Hong Chou |
| 指導教授: | 陳銘憲 Ming-Syan Chen |
| 關鍵字: | 視覺變換器,模型剪枝,多任務學習,低秩適應, Vision Transformers (ViTs),Model Pruning,Multi-Task Learning (MTL),Low-Rank Adaptation (LoRA), |
| 出版年 : | 2025 |
| 學位: | 碩士 |
| 摘要: | 在現實世界的多任務學習(Multi-Task Learning, MTL)應用中部署強大的 Vision Transformers(ViTs),受到其高昂運算成本的限制,因此高效的剪枝技術變得至關重要。然而,核心挑戰在於如何在不同任務之間建立對參數重要性的共識,以有效地剪枝共享的 ViT 主幹。常見做法是對每個任務獨立進行單任務剪枝,但這會造成破壞性干擾,因為可能會移除對其他任務至關重要的權重。另一類方法則透過整合多任務的剪枝訊號來提升感知能力,但這些方法仰賴對所有參數進行高成本的反覆更新,因此難以擴展到當今具備十億參數等級的 ViT 模型。
為了解決上述問題,我們提出 LoGIC(Multi-LoRA Guided Importance Consensus),一個專為大規模多任務 ViTs 所設計的高效剪枝統一框架。LoGIC 結合了共享與任務專屬 LoRA 模組的混合架構,透過創新的任務自適應路由機制,緩解任務間的衝突,同時透過跨任務的重要性共識策略,整合多重重要性訊號,實現穩健的剪枝決策。 我們在五項不同的視覺任務上進行大量實驗,結果顯示 LoGIC 可達到高達 50% 的結構化稀疏性,不僅穩定優於所有既有的剪枝方法,還能在僅微調約 10% 模型參數的情況下,維持與原始完整微調模型相當的準確率。我們的研究為在資源受限的環境中部署強大且統一的 ViT 模型提供了一個實用且具擴展性的解決方案。 Deploying powerful Vision Transformers (ViTs) in real-world multi-task learning (MTL) applications is constrained by their high computational costs, making efficient pruning essential. However, the core challenge is forming a consensus on parameter importance across tasks to effectively prune the shared ViT backbone. A common approach is to apply single-task pruning independently to each task, but this leads to destructive interference by removing weights critical to others. Alternatively, some methods incorporate multi-task awareness by aggregating pruning signals, but they remain unscalable for today’s billion-parameter ViTs due to their reliance on costly iterative updates of all parameters. To overcome this, we propose LoGIC (Multi-LoRA Guided Importance Consensus), a unified framework designed specifically to prune large-scale multi-task ViTs efficiently and effectively. At its core, LoGIC integrates a hybrid architecture of shared and task-specific LoRA modules. It mitigates inter-task conflicts through a novel task-adaptive routing mechanism. In parallel, a cross-task consensus strategy ensures robust pruning decisions by aggregating multiple importance signals. Extensive experiments on five diverse vision tasks show that LoGIC achieves up to 50% structured sparsity, consistently outperforming all prior pruning baselines while matching the accuracy of the original, fully fine-tuned model, all while fine-tuning only a small fraction (~10%) of the total parameters. Our work provides a practical and scalable solution for deploying powerful, unified ViT models in resource-constrained environments. |
| URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98585 |
| DOI: | 10.6342/NTU202503383 |
| 全文授權: | 同意授權(限校園內公開) |
| 電子全文公開日期: | 2030-08-04 |
| 顯示於系所單位: | 電機工程學系 |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-113-2.pdf 未授權公開取用 | 1.82 MB | Adobe PDF | 檢視/開啟 |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
