透過視覺提示提升基於 CLIP 的分佈外偵測效能並比較PEFT方法間的優劣

陳常安; Chang-An Chen

Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/95340

Title:	透過視覺提示提升基於 CLIP 的分佈外偵測效能並比較PEFT方法間的優劣 Enhancing CLIP-based Out-of-Distribution Detection Performance with Visual Prompt Tuning and a Comparative Analysis of Parameter-Efficient Fine-Tuning (PEFT) Methods
Authors:	陳常安 Chang-An Chen
Advisor:	吳家麟 Ja-Lin Wu
Keyword:	分佈外偵測,插件式應用,物件辨識,基石模型, PEFT,CLIP,OOD detection,Few-shot setting,Image classification,Foundation model,
Publication Year :	2024
Degree:	碩士
Abstract:	最近在視覺語言模型方面的進展，如CLIP，已經徹底改變了零樣本分類任務。儘管傳統的微調方法可以提升性能，但對於大型模型來說，它們的成本很高。因此，研究現在集中在參數高效的技術上。然而，目前的評估標準聚焦在分類性能上，卻忽略了模型的可靠性。我們的研究通過對基於CLIP的微調方法進行全面比較分析來解決這一空缺。我們評估了不同參數高效微調（PEFT）方法在少樣本分佈外檢測中的表現，這對於評估模型可靠性至關重要。本論文揭示了僅採用參數高效微調（PEFT）方法時，在分佈外檢測性能上的不足，相較於其他基於CLIP的方法。為了解決這一限制，我們從PEFT中選擇了視覺提示（VPT）。通過將VPT作為一種附加應用來增強其他分佈外檢測技術，我們實現了顯著的性能提升，即使與當前表現最好（SOTA）的基於CLIP的OOD檢測方法相比也是如此。 Recent advances in vision-language models like CLIP have revolutionized zero-shot classification tasks. While traditional fine-tuning methods enhance performance, they’re costly for large-scale models. Thus, research now focuses on parameter-efficient techniques. However, current evaluations predominantly measure classification performance, neglecting model reliability. Our study addresses this gap by providing a thorough comparative analysis of CLIP-based fine-tuning methods. We assess few-shot out-of-distribution detection performance on different PEFT methods, which is crucial for evaluating model reliability. This thesis reveals a shortfall in out-of-distribution performance when employing only parameter-efficient fine-tuning (PEFT) methods compared to other CLIP-based approaches. To remedy this limitation, we select Vision Prompt Tuning from PEFT. By utilizing VPT as an add-on application to enhance other out-of-distribution detection techniques, we achieve notable performance gains even compared to the current state-of-the-art (SOTA) CLIP-based OOD detection methods.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/95340
DOI:	10.6342/NTU202402658
Fulltext Rights:	未授權
Appears in Collections:	資訊工程學系

Files in This Item:

File	Size	Format
ntu-112-2.pdf Restricted Access	1.02 MB	Adobe PDF

Show full item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets