Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/95340
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor吳家麟zh_TW
dc.contributor.advisorJa-Lin Wuen
dc.contributor.author陳常安zh_TW
dc.contributor.authorChang-An Chenen
dc.date.accessioned2024-09-05T16:15:22Z-
dc.date.available2024-09-06-
dc.date.copyright2024-09-05-
dc.date.issued2024-
dc.date.submitted2024-08-09-
dc.identifier.citation[1] A. Aghajanyan, L. Zettlemoyer, and S. Gupta. Intrinsic dimensionality explains the effectiveness of language model fine-tuning. arXiv preprint arXiv:2012.13255, 2020.
[2] M. Cimpoi, S. Maji, I. Kokkinos, S. Mohamed, and A. Vedaldi. Describing textures in the wild. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3606–3613, 2014.
[3] J. Dong, Y. Gao, H. Zhou, J. Cen, Y. Yao, S. Yoon, and P. D. Sun. Towards few-shot out-of-distribution detection. arXiv preprint arXiv:2311.12076, 2023.
[4] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Un-terthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
[5] S. Esmaeilpour, B. Liu, E. Robertson, and L. Shu. Zero-shot out-of-distribution de- tection based on the pre-trained model clip. In Proceedings of the AAAI conference on artificial intelligence, volume 36, pages 6568–6576, 2022.
[6] P. Gao, S. Geng, R. Zhang, T. Ma, R. Fang, Y. Zhang, H. Li, and Y. Qiao. Clip-adapter: Better vision-language models with feature adapters. International Journal of Computer Vision, 132(2):581–595, 2024.
[7] Z. Han, C. Gao, J. Liu, S. Q. Zhang, et al. Parameter-efficient fine-tuning for large models: A comprehensive survey. arXiv preprint arXiv:2403.14608, 2024.
[8] D. Hendrycks and K. Gimpel. A baseline for detecting misclassified and out-of-distribution examples in neural networks. arXiv preprint arXiv:1610.02136, 2016.
[9] E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, and W. Chen. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
[10] R. Huang and Y. Li. Mos: Towards scaling out-of-distribution detection for large semantic space. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8710–8719, 2021.
[11] M. Jia, L. Tang, B.-C. Chen, C. Cardie, S. Belongie, B. Hariharan, and S.-N. Lim. Visual prompt tuning. In European Conference on Computer Vision, pages 709–727. Springer, 2022.
[12] X. Jiang, F. Liu, Z. Fang, H. Chen, T. Liu, F. Zheng, and B. Han. Negative label guided ood detection with pretrained vision-language models. arXiv preprint arXiv:2403.20078, 2024.
[13] J. Kim, J. Kim, and S. Hwang. Comparison of out-of-distribution detection performance of clip-based fine-tuning methods. In 2024 International Conference on Electronics, Information, and Communication (ICEIC), pages 1–4. IEEE, 2024.
[14] H. Lee, L. Soldaini, A. Cohan, M. Seo, and K. Lo. Back to basics: A simple recipe for improving out-of-domain retrieval in dense encoders. arXiv preprint arXiv:2311.09765, 2023.
[15] T. Li, G. Pang, X. Bai, W. Miao, and J. Zheng. Learning transferable negative prompts for out-of-distribution detection. arXiv preprint arXiv:2404.03248, 2024.
[16] C. Liao, T. Tsiligkaridis, and B. Kulis. Descriptor and word soups: Overcoming the parameter efficiency accuracy tradeoff for out-of-distribution few-shot learning. arXiv preprint arXiv:2311.13612, 2023.
[17] S. Liu, J. Keung, Z. Yang, F. Liu, Q. Zhou, and Y. Liao. Delving into parameter-efficient fine-tuning in code change learning: An empirical study. arXiv preprint arXiv:2402.06247, 2024.
[18] Y. Ming, Z. Cai, J. Gu, Y. Sun, W. Li, and Y. Li. Delving into out-of-distribution detection with vision-language representations. Advances in neural information processing systems, 35:35087–35102, 2022.
[19] Y. Ming and Y. Li. How does fine-tuning impact out-of-distribution detection for vision-language models? International Journal of Computer Vision, 132(2):596–609, 2024.
[20] Y. Ming and Y. Li. How does fine-tuning impact out-of-distribution detection for vision-language models? International Journal of Computer Vision, 132(2):596–609, 2024.
[21] A. Miyai, Q. Yu, G. Irie, and K. Aizawa. Locoop: Few-shot out-of-distribution detection via prompt learning. Advances in Neural Information Processing Systems, 36, 2024.
[22] L. Niss, K. Vogt-Lowell, and T. Tsiligkaridis. Quantified task misalignment to inform peft: An exploration of domain generalization and catastrophic forgetting in clip. arXiv preprint arXiv:2402.09613, 2024.
[23] K. O’shea and R. Nash. An introduction to convolutional neural networks. arXivpreprint arXiv:1511.08458, 2015.
[24] A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry,A. Askell, P. Mishkin, J. Clark, et al. Learning transferable visual models fromnatural language supervision. In International conference on machine learning, pages8748–8763. PMLR, 2021.
[25] A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry,A. Askell, P. Mishkin, J. Clark, et al. Learning transferable visual models fromnatural language supervision. In International conference on machine learning, pages8748–8763. PMLR, 2021.
[26] W. Ren, X. Li, L. Wang, T. Zhao, and W. Qin. Analyzing and reducing catastrophicforgetting in parameter efficient tuning. arXiv preprint arXiv:2402.18865, 2024.
[27] G. Van Horn, O. Mac Aodha, Y. Song, Y. Cui, C. Sun, A. Shepard, H. Adam, P. Per-ona, and S. Belongie. The inaturalist species classification and detection dataset. InProceedings of the IEEE conference on computer vision and pattern recognition,pages 8769–8778, 2018.
[28] H. Wang, Y. Li, H. Yao, and X. Li. Clipn for zero-shot ood detection: Teaching clipto say no. In Proceedings of the IEEE/CVF International Conference on ComputerVision, pages 1802–1812, 2023.
[29] J. Xiao, J. Hays, K. A. Ehinger, A. Oliva, and A. Torralba. Sun database: Large-scalescene recognition from abbey to zoo. In 2010 IEEE computer society conference oncomputer vision and pattern recognition, pages 3485–3492. IEEE, 2010.
[30] B. Zhou, A. Lapedriza, A. Khosla, A. Oliva, and A. Torralba. Places: A 10 millionimage database for scene recognition. IEEE transactions on pattern analysis andmachine intelligence, 40(6):1452–1464, 2017.
[31] K. Zhou, J. Yang, C. C. Loy, and Z. Liu. Learning to prompt for vision-language models. International Journal of Computer Vision, 130(9):2337–2348, 2022.
-
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/95340-
dc.description.abstract最近在視覺語言模型方面的進展,如CLIP,已經徹底改變了零樣本分類任務。儘管傳統的微調方法可以提升性能,但對於大型模型來說,它們的成本很高。因此,研究現在集中在參數高效的技術上。然而,目前的評估標準聚焦在分類性能上,卻忽略了模型的可靠性。我們的研究通過對基於CLIP的微調方法進行全面比較分析來解決這一空缺。我們評估了不同參數高效微調(PEFT)方法在少樣本分佈外檢測中的表現,這對於評估模型可靠性至關重要。本論文揭示了僅採用參數高效微調(PEFT)方法時,在分佈外檢測性能上的不足,相較於其他基於CLIP的方法。為了解決這一限制,我們從PEFT中選擇了視覺提示(VPT)。通過將VPT作為一種附加應用來增強其他分佈外檢測技術,我們實現了顯著的性能提升,即使與當前表現最好(SOTA)的基於CLIP的OOD檢測方法相比也是如此。zh_TW
dc.description.abstractRecent advances in vision-language models like CLIP have revolutionized zero-shot classification tasks. While traditional fine-tuning methods enhance performance, they’re costly for large-scale models. Thus, research now focuses on parameter-efficient techniques. However, current evaluations predominantly measure classification performance, neglecting model reliability. Our study addresses this gap by providing a thorough comparative analysis of CLIP-based fine-tuning methods. We assess few-shot out-of-distribution detection performance on different PEFT methods, which is crucial for evaluating model reliability. This thesis reveals a shortfall in out-of-distribution performance when employing only parameter-efficient fine-tuning (PEFT) methods compared to other CLIP-based approaches. To remedy this limitation, we select Vision Prompt Tuning from PEFT. By utilizing VPT as an add-on application to enhance other out-of-distribution detection techniques, we achieve notable performance gains even compared to the current state-of-the-art (SOTA) CLIP-based OOD detection methods.en
dc.description.provenanceSubmitted by admin ntu (admin@lib.ntu.edu.tw) on 2024-09-05T16:15:22Z
No. of bitstreams: 0
en
dc.description.provenanceMade available in DSpace on 2024-09-05T16:15:22Z (GMT). No. of bitstreams: 0en
dc.description.tableofcontents摘要 iii
Abstract v
Contents vii
List of Figures ix
List of Tables xi
Chapter 1 Introduction 1
Chapter 2 Related work 7
2.1 Zero-Shot CLIP-based OOD Detection . . . . . . . . . . . . . . . . 7
2.2 Parameter-Efficient Fine-Tuning (PEFT) . . . . . . . . . . . . . . . 8
2.2.1 Additive-based methods . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2.1.1 Prompt Tuning . . . . . . . . . . . . . . . . . . . . . . 8
2.2.1.2 Adapter . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2.2 Low-Rank Adaptation(LoRA) . . . . . . . . . . . . . . . . . . . . 9
2.3 PEFT-based OOD Detection . . . . . . . . . . . . . . . . . . . . . . 10
2.4 Using PEFT methods as an Enhancement for other CLIP-based OOD
Detection Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Chapter 3 Method 13
3.1 Zero-Shot CLIP-based OOD Detection . . . . . . . . . . . . . . . . 13
3.2 Parameter-Efficient Fine-Tuning (PEFT) in CLIP-based OOD Detection 16
3.2.1 CoOp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2.2 VPT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2.3 Unified Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.2.4 Low-Rank Adaptation (LoRA) . . . . . . . . . . . . . . . . . . . . 23
3.3 VPT to enhance other CLIP-based OOD Detection Methods . . . . . 25
3.3.1 Limitation of using only PEFT . . . . . . . . . . . . . . . . . . . . 25
3.3.2 Negative Label . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.3.3 Negative Prompt . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
Chapter 4 Experiments 33
4.1 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.2 The impact of PEFT methods on Zero-Shot CLIP’s OOD detection
performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.3 VPT as an enhancement on other CLIP-based OOD Detection methods 37
Chapter 5 Conclusion and Future Work 41
References 45
-
dc.language.isozh_TW-
dc.subject插件式應用zh_TW
dc.subject分佈外偵測zh_TW
dc.subject基石模型zh_TW
dc.subject物件辨識zh_TW
dc.subjectPEFTen
dc.subjectFew-shot settingen
dc.subjectImage classificationen
dc.subjectCLIPen
dc.subjectFoundation modelen
dc.subjectOOD detectionen
dc.title透過視覺提示提升基於 CLIP 的分佈外偵測效能並比較PEFT方法間的優劣zh_TW
dc.titleEnhancing CLIP-based Out-of-Distribution Detection Performance with Visual Prompt Tuning and a Comparative Analysis of Parameter-Efficient Fine-Tuning (PEFT) Methodsen
dc.typeThesis-
dc.date.schoolyear112-2-
dc.description.degree碩士-
dc.contributor.oralexamcommittee陳文進;許永真;胡敏君;陳駿丞zh_TW
dc.contributor.oralexamcommitteeWun-Chin Chen;Yun-Jen Hsu;Ming-Ging Hu ;Jun-Cheng Chenen
dc.subject.keyword分佈外偵測,插件式應用,物件辨識,基石模型,zh_TW
dc.subject.keywordPEFT,CLIP,OOD detection,Few-shot setting,Image classification,Foundation model,en
dc.relation.page49-
dc.identifier.doi10.6342/NTU202402658-
dc.rights.note未授權-
dc.date.accepted2024-08-12-
dc.contributor.author-college電機資訊學院-
dc.contributor.author-dept資訊工程學系-
顯示於系所單位:資訊工程學系

文件中的檔案:
檔案 大小格式 
ntu-112-2.pdf
  未授權公開取用
1.02 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved