Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 電機工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98585
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor陳銘憲zh_TW
dc.contributor.advisorMing-Syan Chenen
dc.contributor.author周昱宏zh_TW
dc.contributor.authorYu-Hong Chouen
dc.date.accessioned2025-08-18T00:58:37Z-
dc.date.available2025-08-18-
dc.date.copyright2025-08-15-
dc.date.issued2025-
dc.date.submitted2025-08-05-
dc.identifier.citation[1] Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xi- aohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
[2] Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. End-to-end object detection with transformers. In European conference on computer vision, pages 213–229. Springer, 2020.
[3] Sixiao Zheng, Jiachen Lu, Hengshuang Zhao, Xiatian Zhu, Zekun Luo, Yabiao Wang, Yanwei Fu, Jianfeng Feng, Tao Xiang, Philip HS Torr, et al. Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In Proceedings of the IEEE/CVF conference on computer vision and pattern recog- nition, pages 6881–6890, 2021.
[4] XiaohuaZhai,AlexanderKolesnikov,NeilHoulsby,andLucasBeyer.Scalingvision transformers. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12104–12113, 2022.
[5] Mostafa Dehghani, Josip Djolonga, Basil Mustafa, Piotr Padlewski, Jonathan Heek, Justin Gilmer, Andreas Peter Steiner, Mathilde Caron, Robert Geirhos, Ibrahim Al-abdulmohsin, et al. Scaling vision transformers to 22 billion parameters. In Interna- tional conference on machine learning, pages 7480–7512. PMLR, 2023.
[6] Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision, pages 10012–10022, 2021.
[7] Lorenzo Papa, Paolo Russo, Irene Amerini, and Luping Zhou. A survey on efficient vision transformers: algorithms, techniques, and performance benchmarking. IEEE transactions on pattern analysis and machine intelligence, 46(12):7682–7700, 2024.
[8] Song Han, Huizi Mao, and William J Dally. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149, 2015.
[9] Rich Caruana. Multitask learning. Machine learning, 28(1):41–75, 1997.
[10] Xiaofeng Han, Shunpeng Chen, Zenghuang Fu, Zhe Feng, Lue Fan, Dong An, Changwei Wang, Li Guo, Weiliang Meng, Xiaopeng Zhang, et al. Multimodal fusion and vision-language models: A survey for robot vision. arXiv preprint arXiv:2504.02477, 2025.
[11] Sebastian Ruder. An overview of multi-task learning in deep neural networks. arXiv preprint arXiv:1706.05098, 2017.
[12] Xinglong Sun, Ali Hassani, Zhangyang Wang, Gao Huang, and Humphrey Shi. Dis- parse: Disentangled sparsification for multitask model compression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12382–12392, 2022.
[13] MingcanXiang,JiaxunTang,QizhengYang,HuiGuan,andTongpingLiu.Adapmtl: Adaptive pruning framework for multitask learning model. In Proceedings of the 32nd ACM International Conference on Multimedia, pages 5121–5130, 2024.
[14] Mingyang Zhang, Hao Chen, Chunhua Shen, Zhen Yang, Linlin Ou, Xinyi Yu, and Bohan Zhuang. Loraprune: Pruning meets low-rank parameter-efficient fine-tuning. 2023.
[15] Jiaqi Ma, Zhe Zhao, Xinyang Yi, Jilin Chen, Lichan Hong, and Ed H Chi. Model- ing task relationships in multi-task learning with multi-gate mixture-of-experts. In Proceedings of the 24th ACM SIGKDD international conference on knowledge dis- covery & data mining, pages 1930–1939, 2018.
[16] Zhao Chen, Vijay Badrinarayanan, Chen-Yu Lee, and Andrew Rabinovich. Grad- norm: Gradient normalization for adaptive loss balancing in deep multitask net- works. In International conference on machine learning, pages 794–803. PMLR, 2018.
[17] Yuxuan Hu, Jing Zhang, Zhe Zhao, Cuiping Li, and Hong Chen. Sp-lora: Sparsity- preserved low-rank adaptation for sparse large language model.
[18] Yen-Cheng Liu, Chih-Yao Ma, Junjiao Tian, Zijian He, and Zsolt Kira. Polyhistor: Parameter-efficient multi-task adaptation for dense vision tasks. Advances in Neural Information Processing Systems, 35:36889–36901, 2022.
[19] Matthew Wallingford, Hao Li, Alessandro Achille, Avinash Ravichandran, Char- less Fowlkes, Rahul Bhotika, and Stefano Soatto. Task adaptive parameter sharing for multi-task learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7561–7570, 2022.
[20] Maximilian Augustin, Syed Shakib Sarwar, Mostafa Elhoushi, Yuecheng Li, Sai Qian Zhang, and Barbara De Salvo. Petah: Parameter efficient task adapta- tion for hybrid transformers. In Proceedings of the Computer Vision and Pattern Recognition Conference, pages 1867–1877, 2025.
[21] Ahmed Agiza, Marina Neseem, and Sherief Reda. Mtlora: Low-rank adaptation ap- proach for efficient multi-task learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16196–16205, 2024.
[22] Jiangpeng He, Zhihao Duan, and Fengqing Zhu. Cl-lora: Continual low-rank adap- tation for rehearsal-free class-incremental learning. In Proceedings of the Computer Vision and Pattern Recognition Conference, pages 30534–30544, 2025.
[23] Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models. ICLR, 1(2):3, 2022.
[24] Michael Zhu and Suyog Gupta. To prune, or not to prune: exploring the efficacy of pruning for model compression. arXiv preprint arXiv:1710.01878, 2017.
[25] Jonathan Frankle and Michael Carbin. The lottery ticket hypothesis: Finding sparse, trainable neural networks. arXiv preprint arXiv:1803.03635, 2018.
[26] Asa Cooper Stickland and Iain Murray. Bert and pals: Projected attention layers for efficient adaptation in multi-task learning. In International Conference on Machine Learning, pages 5986–5995. PMLR, 2019.
[27] Rabeeh Karimi Mahabadi, Sebastian Ruder, Mostafa Dehghani, and James Hender- son. Parameter-efficient multi-task fine-tuning for transformers via shared hypernet- works. arXiv preprint arXiv:2106.04489, 2021.
[28] Juhyeong Kim, Gyunyeop Kim, and Sangwoo Kang. Lottery rank-pruning adapta- tion parameter efficient fine-tuning. Mathematics, 12(23):3744, 2024.
[29] NathanSilberman,DerekHoiem,PushmeetKohli,andRobFergus.Indoorsegmen- tation and support inference from rgbd images. In European conference on computer vision, pages 746–760. Springer, 2012.
[30] Maria-ElenaNilsbackandAndrewZisserman.Automatedflowerclassificationover a large number of classes. In 2008 Sixth Indian conference on computer vision, graphics & image processing, pages 722–729. IEEE, 2008.
[31] Rishit Dagli and Ali Mustufa Shaikh. Cppe-5: Medical personal protective equip- ment dataset. SN Computer Science, 4(3):263, 2023.
[32] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
[33] Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, et al. Hug- gingface’s transformers: State-of-the-art natural language processing. arXiv preprint arXiv:1910.03771, 2019.
[34] Liang-Chieh Chen, George Papandreou, Florian Schroff, and Hartwig Adam. Re- thinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587, 2017.
[35] Jonathan Long, Evan Shelhamer, and Trevor Darrell. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3431–3440, 2015.
[36] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European conference on computer vision, pages 740–755. Springer, 2014.
-
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98585-
dc.description.abstract在現實世界的多任務學習(Multi-Task Learning, MTL)應用中部署強大的 Vision Transformers(ViTs),受到其高昂運算成本的限制,因此高效的剪枝技術變得至關重要。然而,核心挑戰在於如何在不同任務之間建立對參數重要性的共識,以有效地剪枝共享的 ViT 主幹。常見做法是對每個任務獨立進行單任務剪枝,但這會造成破壞性干擾,因為可能會移除對其他任務至關重要的權重。另一類方法則透過整合多任務的剪枝訊號來提升感知能力,但這些方法仰賴對所有參數進行高成本的反覆更新,因此難以擴展到當今具備十億參數等級的 ViT 模型。
為了解決上述問題,我們提出 LoGIC(Multi-LoRA Guided Importance Consensus),一個專為大規模多任務 ViTs 所設計的高效剪枝統一框架。LoGIC 結合了共享與任務專屬 LoRA 模組的混合架構,透過創新的任務自適應路由機制,緩解任務間的衝突,同時透過跨任務的重要性共識策略,整合多重重要性訊號,實現穩健的剪枝決策。
我們在五項不同的視覺任務上進行大量實驗,結果顯示 LoGIC 可達到高達 50% 的結構化稀疏性,不僅穩定優於所有既有的剪枝方法,還能在僅微調約 10% 模型參數的情況下,維持與原始完整微調模型相當的準確率。我們的研究為在資源受限的環境中部署強大且統一的 ViT 模型提供了一個實用且具擴展性的解決方案。
zh_TW
dc.description.abstractDeploying powerful Vision Transformers (ViTs) in real-world multi-task learning (MTL) applications is constrained by their high computational costs, making efficient pruning essential. However, the core challenge is forming a consensus on parameter importance across tasks to effectively prune the shared ViT backbone. A common approach is to apply single-task pruning independently to each task, but this leads to destructive interference by removing weights critical to others. Alternatively, some methods incorporate multi-task awareness by aggregating pruning signals, but they remain unscalable for today’s billion-parameter ViTs due to their reliance on costly iterative updates of all parameters. To overcome this, we propose LoGIC (Multi-LoRA Guided Importance Consensus), a unified framework designed specifically to prune large-scale multi-task ViTs efficiently and effectively. At its core, LoGIC integrates a hybrid architecture of shared and task-specific LoRA modules. It mitigates inter-task conflicts through a novel task-adaptive routing mechanism. In parallel, a cross-task consensus strategy ensures robust pruning decisions by aggregating multiple importance signals. Extensive experiments on five diverse vision tasks show that LoGIC achieves up to 50% structured sparsity, consistently outperforming all prior pruning baselines while matching the accuracy of the original, fully fine-tuned model, all while fine-tuning only a small fraction (~10%) of the total parameters. Our work provides a practical and scalable solution for deploying powerful, unified ViT models in resource-constrained environments.en
dc.description.provenanceSubmitted by admin ntu (admin@lib.ntu.edu.tw) on 2025-08-18T00:58:37Z
No. of bitstreams: 0
en
dc.description.provenanceMade available in DSpace on 2025-08-18T00:58:37Z (GMT). No. of bitstreams: 0en
dc.description.tableofcontentsAcknowledgements i
摘要 ii
Abstract iii
Contents v
List of Figures vii
List of Tables viii
1 Introduction 1
2 Related Work 5
2.1 Multi-Task Aware Pruning for Vision Models . . . . . . . . . . . . . 5
2.2 Parameter-Efficient Pruning for Vision Models . . . . . . . . . . . . 6
2.3 Parameter-Efficient Multi-Task Learning . . . . . . . . . . . . . . . 7
3 Methods 9
3.1 Efficient Pruning with Multi-LoRA .................. 9
3.2 Task-Adaptive Routing......................... 11
3.3 Cross-Task Consensus Pruning..................... 13
3.3.1 LoRA Gradient Signal ..................... 13
3.3.2 Task Usage Pattern....................... 14
3.3.3 LoRA Adaptation Magnitude ................. 16
4 Experiments 18
4.1 Experimental Setup........................... 18
4.1.1 Tasks and Datasets ....................... 18
4.1.2 Backbone Models........................ 19
4.1.3 Evaluation Metrics ....................... 20
4.1.4 Baselines for Comparison ................... 21
4.1.5 Implementation Details..................... 22
4.2 Experiment Results........................... 22
4.2.1 Performance and Efficiency on ViT-L . . . . . . . . . . . . . 22
4.2.2 Performance and Efficiency on Swin-L . . . . . . . . . . . . 24
4.3 Analysis ................................ 25
4.3.1 Peak Computational Memory ................. 25
4.3.2 Ablation Studies ........................ 26
4.3.3 LoRA Routing Distributions.................. 28
5 Conclusion 29
References 30
-
dc.language.isoen-
dc.subject視覺變換器zh_TW
dc.subject模型剪枝zh_TW
dc.subject多任務學習zh_TW
dc.subject低秩適應zh_TW
dc.subjectModel Pruningen
dc.subjectLow-Rank Adaptation (LoRA)en
dc.subjectVision Transformers (ViTs)en
dc.subjectMulti-Task Learning (MTL)en
dc.titleLoGIC:用於視覺變換器的多任務剪枝之多重 LoRA 引導重要性共識方法zh_TW
dc.titleLoGIC: Multi-LoRA Guided Importance Consensus for Multi-Task Pruning in Vision Transformersen
dc.typeThesis-
dc.date.schoolyear113-2-
dc.description.degree碩士-
dc.contributor.oralexamcommittee楊得年;曹昱;吳齊人zh_TW
dc.contributor.oralexamcommitteeDe-Nian Yang;Yu Tsao;Chi-Jen Wuen
dc.subject.keyword視覺變換器,模型剪枝,多任務學習,低秩適應,zh_TW
dc.subject.keywordVision Transformers (ViTs),Model Pruning,Multi-Task Learning (MTL),Low-Rank Adaptation (LoRA),en
dc.relation.page35-
dc.identifier.doi10.6342/NTU202503383-
dc.rights.note同意授權(限校園內公開)-
dc.date.accepted2025-08-11-
dc.contributor.author-college電機資訊學院-
dc.contributor.author-dept電機工程學系-
dc.date.embargo-lift2030-08-04-
顯示於系所單位:電機工程學系

文件中的檔案:
檔案 大小格式 
ntu-113-2.pdf
  未授權公開取用
1.82 MBAdobe PDF檢視/開啟
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved