請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/95340完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 吳家麟 | zh_TW |
| dc.contributor.advisor | Ja-Lin Wu | en |
| dc.contributor.author | 陳常安 | zh_TW |
| dc.contributor.author | Chang-An Chen | en |
| dc.date.accessioned | 2024-09-05T16:15:22Z | - |
| dc.date.available | 2024-09-06 | - |
| dc.date.copyright | 2024-09-05 | - |
| dc.date.issued | 2024 | - |
| dc.date.submitted | 2024-08-09 | - |
| dc.identifier.citation | [1] A. Aghajanyan, L. Zettlemoyer, and S. Gupta. Intrinsic dimensionality explains the effectiveness of language model fine-tuning. arXiv preprint arXiv:2012.13255, 2020.
[2] M. Cimpoi, S. Maji, I. Kokkinos, S. Mohamed, and A. Vedaldi. Describing textures in the wild. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3606–3613, 2014. [3] J. Dong, Y. Gao, H. Zhou, J. Cen, Y. Yao, S. Yoon, and P. D. Sun. Towards few-shot out-of-distribution detection. arXiv preprint arXiv:2311.12076, 2023. [4] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Un-terthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020. [5] S. Esmaeilpour, B. Liu, E. Robertson, and L. Shu. Zero-shot out-of-distribution de- tection based on the pre-trained model clip. In Proceedings of the AAAI conference on artificial intelligence, volume 36, pages 6568–6576, 2022. [6] P. Gao, S. Geng, R. Zhang, T. Ma, R. Fang, Y. Zhang, H. Li, and Y. Qiao. Clip-adapter: Better vision-language models with feature adapters. International Journal of Computer Vision, 132(2):581–595, 2024. [7] Z. Han, C. Gao, J. Liu, S. Q. Zhang, et al. Parameter-efficient fine-tuning for large models: A comprehensive survey. arXiv preprint arXiv:2403.14608, 2024. [8] D. Hendrycks and K. Gimpel. A baseline for detecting misclassified and out-of-distribution examples in neural networks. arXiv preprint arXiv:1610.02136, 2016. [9] E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, and W. Chen. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021. [10] R. Huang and Y. Li. Mos: Towards scaling out-of-distribution detection for large semantic space. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8710–8719, 2021. [11] M. Jia, L. Tang, B.-C. Chen, C. Cardie, S. Belongie, B. Hariharan, and S.-N. Lim. Visual prompt tuning. In European Conference on Computer Vision, pages 709–727. Springer, 2022. [12] X. Jiang, F. Liu, Z. Fang, H. Chen, T. Liu, F. Zheng, and B. Han. Negative label guided ood detection with pretrained vision-language models. arXiv preprint arXiv:2403.20078, 2024. [13] J. Kim, J. Kim, and S. Hwang. Comparison of out-of-distribution detection performance of clip-based fine-tuning methods. In 2024 International Conference on Electronics, Information, and Communication (ICEIC), pages 1–4. IEEE, 2024. [14] H. Lee, L. Soldaini, A. Cohan, M. Seo, and K. Lo. Back to basics: A simple recipe for improving out-of-domain retrieval in dense encoders. arXiv preprint arXiv:2311.09765, 2023. [15] T. Li, G. Pang, X. Bai, W. Miao, and J. Zheng. Learning transferable negative prompts for out-of-distribution detection. arXiv preprint arXiv:2404.03248, 2024. [16] C. Liao, T. Tsiligkaridis, and B. Kulis. Descriptor and word soups: Overcoming the parameter efficiency accuracy tradeoff for out-of-distribution few-shot learning. arXiv preprint arXiv:2311.13612, 2023. [17] S. Liu, J. Keung, Z. Yang, F. Liu, Q. Zhou, and Y. Liao. Delving into parameter-efficient fine-tuning in code change learning: An empirical study. arXiv preprint arXiv:2402.06247, 2024. [18] Y. Ming, Z. Cai, J. Gu, Y. Sun, W. Li, and Y. Li. Delving into out-of-distribution detection with vision-language representations. Advances in neural information processing systems, 35:35087–35102, 2022. [19] Y. Ming and Y. Li. How does fine-tuning impact out-of-distribution detection for vision-language models? International Journal of Computer Vision, 132(2):596–609, 2024. [20] Y. Ming and Y. Li. How does fine-tuning impact out-of-distribution detection for vision-language models? International Journal of Computer Vision, 132(2):596–609, 2024. [21] A. Miyai, Q. Yu, G. Irie, and K. Aizawa. Locoop: Few-shot out-of-distribution detection via prompt learning. Advances in Neural Information Processing Systems, 36, 2024. [22] L. Niss, K. Vogt-Lowell, and T. Tsiligkaridis. Quantified task misalignment to inform peft: An exploration of domain generalization and catastrophic forgetting in clip. arXiv preprint arXiv:2402.09613, 2024. [23] K. O’shea and R. Nash. An introduction to convolutional neural networks. arXivpreprint arXiv:1511.08458, 2015. [24] A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry,A. Askell, P. Mishkin, J. Clark, et al. Learning transferable visual models fromnatural language supervision. In International conference on machine learning, pages8748–8763. PMLR, 2021. [25] A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry,A. Askell, P. Mishkin, J. Clark, et al. Learning transferable visual models fromnatural language supervision. In International conference on machine learning, pages8748–8763. PMLR, 2021. [26] W. Ren, X. Li, L. Wang, T. Zhao, and W. Qin. Analyzing and reducing catastrophicforgetting in parameter efficient tuning. arXiv preprint arXiv:2402.18865, 2024. [27] G. Van Horn, O. Mac Aodha, Y. Song, Y. Cui, C. Sun, A. Shepard, H. Adam, P. Per-ona, and S. Belongie. The inaturalist species classification and detection dataset. InProceedings of the IEEE conference on computer vision and pattern recognition,pages 8769–8778, 2018. [28] H. Wang, Y. Li, H. Yao, and X. Li. Clipn for zero-shot ood detection: Teaching clipto say no. In Proceedings of the IEEE/CVF International Conference on ComputerVision, pages 1802–1812, 2023. [29] J. Xiao, J. Hays, K. A. Ehinger, A. Oliva, and A. Torralba. Sun database: Large-scalescene recognition from abbey to zoo. In 2010 IEEE computer society conference oncomputer vision and pattern recognition, pages 3485–3492. IEEE, 2010. [30] B. Zhou, A. Lapedriza, A. Khosla, A. Oliva, and A. Torralba. Places: A 10 millionimage database for scene recognition. IEEE transactions on pattern analysis andmachine intelligence, 40(6):1452–1464, 2017. [31] K. Zhou, J. Yang, C. C. Loy, and Z. Liu. Learning to prompt for vision-language models. International Journal of Computer Vision, 130(9):2337–2348, 2022. | - |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/95340 | - |
| dc.description.abstract | 最近在視覺語言模型方面的進展,如CLIP,已經徹底改變了零樣本分類任務。儘管傳統的微調方法可以提升性能,但對於大型模型來說,它們的成本很高。因此,研究現在集中在參數高效的技術上。然而,目前的評估標準聚焦在分類性能上,卻忽略了模型的可靠性。我們的研究通過對基於CLIP的微調方法進行全面比較分析來解決這一空缺。我們評估了不同參數高效微調(PEFT)方法在少樣本分佈外檢測中的表現,這對於評估模型可靠性至關重要。本論文揭示了僅採用參數高效微調(PEFT)方法時,在分佈外檢測性能上的不足,相較於其他基於CLIP的方法。為了解決這一限制,我們從PEFT中選擇了視覺提示(VPT)。通過將VPT作為一種附加應用來增強其他分佈外檢測技術,我們實現了顯著的性能提升,即使與當前表現最好(SOTA)的基於CLIP的OOD檢測方法相比也是如此。 | zh_TW |
| dc.description.abstract | Recent advances in vision-language models like CLIP have revolutionized zero-shot classification tasks. While traditional fine-tuning methods enhance performance, they’re costly for large-scale models. Thus, research now focuses on parameter-efficient techniques. However, current evaluations predominantly measure classification performance, neglecting model reliability. Our study addresses this gap by providing a thorough comparative analysis of CLIP-based fine-tuning methods. We assess few-shot out-of-distribution detection performance on different PEFT methods, which is crucial for evaluating model reliability. This thesis reveals a shortfall in out-of-distribution performance when employing only parameter-efficient fine-tuning (PEFT) methods compared to other CLIP-based approaches. To remedy this limitation, we select Vision Prompt Tuning from PEFT. By utilizing VPT as an add-on application to enhance other out-of-distribution detection techniques, we achieve notable performance gains even compared to the current state-of-the-art (SOTA) CLIP-based OOD detection methods. | en |
| dc.description.provenance | Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2024-09-05T16:15:22Z No. of bitstreams: 0 | en |
| dc.description.provenance | Made available in DSpace on 2024-09-05T16:15:22Z (GMT). No. of bitstreams: 0 | en |
| dc.description.tableofcontents | 摘要 iii
Abstract v Contents vii List of Figures ix List of Tables xi Chapter 1 Introduction 1 Chapter 2 Related work 7 2.1 Zero-Shot CLIP-based OOD Detection . . . . . . . . . . . . . . . . 7 2.2 Parameter-Efficient Fine-Tuning (PEFT) . . . . . . . . . . . . . . . 8 2.2.1 Additive-based methods . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2.1.1 Prompt Tuning . . . . . . . . . . . . . . . . . . . . . . 8 2.2.1.2 Adapter . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2.2 Low-Rank Adaptation(LoRA) . . . . . . . . . . . . . . . . . . . . 9 2.3 PEFT-based OOD Detection . . . . . . . . . . . . . . . . . . . . . . 10 2.4 Using PEFT methods as an Enhancement for other CLIP-based OOD Detection Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Chapter 3 Method 13 3.1 Zero-Shot CLIP-based OOD Detection . . . . . . . . . . . . . . . . 13 3.2 Parameter-Efficient Fine-Tuning (PEFT) in CLIP-based OOD Detection 16 3.2.1 CoOp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.2.2 VPT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.2.3 Unified Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.2.4 Low-Rank Adaptation (LoRA) . . . . . . . . . . . . . . . . . . . . 23 3.3 VPT to enhance other CLIP-based OOD Detection Methods . . . . . 25 3.3.1 Limitation of using only PEFT . . . . . . . . . . . . . . . . . . . . 25 3.3.2 Negative Label . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.3.3 Negative Prompt . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 Chapter 4 Experiments 33 4.1 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.2 The impact of PEFT methods on Zero-Shot CLIP’s OOD detection performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.3 VPT as an enhancement on other CLIP-based OOD Detection methods 37 Chapter 5 Conclusion and Future Work 41 References 45 | - |
| dc.language.iso | zh_TW | - |
| dc.subject | 插件式應用 | zh_TW |
| dc.subject | 分佈外偵測 | zh_TW |
| dc.subject | 基石模型 | zh_TW |
| dc.subject | 物件辨識 | zh_TW |
| dc.subject | PEFT | en |
| dc.subject | Few-shot setting | en |
| dc.subject | Image classification | en |
| dc.subject | CLIP | en |
| dc.subject | Foundation model | en |
| dc.subject | OOD detection | en |
| dc.title | 透過視覺提示提升基於 CLIP 的分佈外偵測效能並比較PEFT方法間的優劣 | zh_TW |
| dc.title | Enhancing CLIP-based Out-of-Distribution Detection Performance with Visual Prompt Tuning and a Comparative Analysis of Parameter-Efficient Fine-Tuning (PEFT) Methods | en |
| dc.type | Thesis | - |
| dc.date.schoolyear | 112-2 | - |
| dc.description.degree | 碩士 | - |
| dc.contributor.oralexamcommittee | 陳文進;許永真;胡敏君;陳駿丞 | zh_TW |
| dc.contributor.oralexamcommittee | Wun-Chin Chen;Yun-Jen Hsu;Ming-Ging Hu ;Jun-Cheng Chen | en |
| dc.subject.keyword | 分佈外偵測,插件式應用,物件辨識,基石模型, | zh_TW |
| dc.subject.keyword | PEFT,CLIP,OOD detection,Few-shot setting,Image classification,Foundation model, | en |
| dc.relation.page | 49 | - |
| dc.identifier.doi | 10.6342/NTU202402658 | - |
| dc.rights.note | 未授權 | - |
| dc.date.accepted | 2024-08-12 | - |
| dc.contributor.author-college | 電機資訊學院 | - |
| dc.contributor.author-dept | 資訊工程學系 | - |
| 顯示於系所單位: | 資訊工程學系 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-112-2.pdf 未授權公開取用 | 1.02 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
