Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/101044
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor吳家麟zh_TW
dc.contributor.advisorJa-Ling Wuen
dc.contributor.author陳弘修zh_TW
dc.contributor.authorHong-Siou Chenen
dc.date.accessioned2025-11-26T16:35:35Z-
dc.date.available2025-11-27-
dc.date.copyright2025-11-26-
dc.date.issued2025-
dc.date.submitted2025-07-25-
dc.identifier.citation[1] M. Cimpoi, S. Maji, I. Kokkinos, S. Mohamed, and A. Vedaldi. Describing textures in the wild. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3606–3613, 2014.
[2] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
[3] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale, 2021.
[4] S. Fort, J. Ren, and B. Lakshminarayanan. Exploring the limits of out-of-distribution detection, 2021.
[5] H. Fu, N. Patel, P. Krishnamurthy, and F. Khorrami. Clipscope: Enhancing zero-shot ood detection with bayesian scoring, 2024.
[6] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition, 2015.
[7] D. Hendrycks and K. Gimpel. A baseline for detecting misclassified and out-of-distribution examples in neural networks, 2018.
[8] X. Jiang, F. Liu, Z. Fang, H. Chen, T. Liu, F. Zheng, and B. Han. Negative label guided ood detection with pretrained vision-language models, 2024.
[9] K. Lee, K. Lee, H. Lee, and J. Shin. A simple unified framework for detecting out-of-distribution samples and adversarial attacks, 2018.
[10] P.-K. Lee, J.-C. Chen, and J.-L. Wu. Harnessing large language and vision-language models for robust out-of-distribution detection, 2025.
[11] A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, and I. Sutskever. Learning transferable visual models from natural language supervision, 2021.
[12] G. Van Horn, O. Mac Aodha, Y. Song, Y. Cui, C. Sun, A. Shepard, H. Adam, P. Perona, and S. Belongie. The inaturalist species classification and detection dataset. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 8769–8778, 2018.
[13] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin. Attention is all you need, 2023.
[14] H. Wang, Y. Li, H. Yao, and X. Li. Clipn for zero-shot ood detection: Teaching clip to say no, 2023.
[15] J. Xiao, J. Hays, K. A. Ehinger, A. Oliva, and A. Torralba. Sun database: Large-scale scene recognition from abbey to zoo. In 2010 IEEE computer society conference on computer vision and pattern recognition, pages 3485–3492. IEEE, 2010.
[16] Y. Zhang and L. Zhang. Adaneg: Adaptive negative proxy guided ood detection with vision-language models, 2024.
[17] B. Zhou, A. Lapedriza, A. Khosla, A. Oliva, and A. Torralba. Places: A 10 million image database for scene recognition. IEEE transactions on pattern analysis and machine intelligence, 40(6):1452–1464, 2017.
-
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/101044-
dc.description.abstract分佈外 (Out-of-Distribution, OOD) 偵測的關鍵挑戰在於如何有效應對與訓練數據截然不同的未知樣本。現有的免訓練零樣本方法為此提供了兩種前沿思路:其一是利用大型語言模型 (LLM) 生成語義更抽象的「超類別 (Superclass)」,以探勘高品質的靜態負標籤 [10];其二則是透過建立特徵記憶庫,動態生成能適應當前測試數據分佈的「自適應代理 (Adaptive Proxy)」[16]。然而,這兩種方法分別側重於靜態語義的精準度和動態特徵的適應性,未能將兩者優勢結合。
本論文提出一個名為 SC+AdaNeg 的新穎框架,首次將上述兩種先進方法進行了深度融合與創新。我們的核心貢獻並非簡單地將兩者疊加,而是在於設計了一套全新的融合架構與計分公式。首先,我們沿用「超類別」策略以獲取高品質的文本先驗知識,同時利用 AdaNeg 的記憶庫機制來捕捉動態視覺特徵。其次,也是我們最關鍵的創新點,在於我們重新設計了融合兩者的計分機制:我們發現穩定性更高的「任務自適應代理」比樣本自適應代理更適合與靜態文本特徵互補;此外,我們引入了一個「文本放大因子 (H)」,策略性地強化由超類別生成的高品質文本先驗在最終決策中的主導地位。
實驗結果證明,我們的融合框架在多個 OOD 基準資料集上均達到了最新的頂尖 (SOTA) 效能,顯著優於原始的兩個獨立方法。這項工作不僅驗證了結合靜態語義先驗與動態視覺適應的巨大潛力,更重要的是,提出了一種具原則性的融合範式,為未來 OOD 偵測領域的發展提供了新的思路。
zh_TW
dc.description.abstractA central challenge in Out-of-Distribution (OOD) detection lies in effectively identifying novel samples unseen during training. Recent training-free, zero-shot methods have offered two powerful yet distinct paradigms: one enhances semantic precision by using Large Language Models (LLMs) to generate abstract “Superclass” for high-quality static negative label mining [10]. In contrast, another enhances dynamic adaptability by creating an “Adaptive Proxy” from a feature memory bank to align with the actual test-time OOD distribution [16]. However, these approaches have remained separate, leaving the potential of their combined strengths untapped.
This thesis introduces SC+AdaNeg, a novel framework that, for the first time, synergistically merges and innovates upon these two cutting-edge methods. Our core contribution is not a simple summation but a principled fusion architecture and a redesigned scoring mechanism. We architecturally integrate the Superclass strategy to secure high-quality textual priors while simultaneously leveraging the adaptive memory bank from AdaNeg to capture dynamic visual features. Critically, our key innovation lies in the fusion formula itself: we find that more stable task-adaptive proxies serve as a better complement to static text features than their sample-adaptive counterparts. Furthermore, we introduce a textual amplification factor (H) to strategically elevate the influence of our robust Superclass-derived text priors, establishing them as the central anchor in the final decision-making process.
Extensive experiments demonstrate that our integrated framework achieves new state-of-the-art (SOTA) performance across multiple OOD benchmarks, significantly outperforming each of the original standalone methods. This work not only validates the powerful synergy between static semantic knowledge and dynamic visual adaptation but, more importantly, proposes a principled paradigm for their fusion, charting a new direction for future research in OOD detection.
en
dc.description.provenanceSubmitted by admin ntu (admin@lib.ntu.edu.tw) on 2025-11-26T16:35:35Z
No. of bitstreams: 0
en
dc.description.provenanceMade available in DSpace on 2025-11-26T16:35:35Z (GMT). No. of bitstreams: 0en
dc.description.tableofcontentsVerification Letter from the Oral Examination Committee . . . . . i
Acknowledgements . . . . . ii
摘要 . . . . . iv
Abstract . . . . . vi
Contents . . . . . viii
List of Figures . . . . . xi
List of Tables . . . . . xiii

Chapter 1 Introduction . . . . . 1
1.1 Background on Out-of-Distribution Detection . . . . . 1
1.2 Rationale for Employing Zero-Shot Training-Free in OOD Detection . . . . . 2
1.3 Motivation and Proposed Framework . . . . . 3

Chapter 2 Related Work . . . . . 5
2.1 Traditional Approaches to OOD Detection . . . . . 5
2.2 CLIP (Contrastive Language-Image Pre-training) . . . . . 6
2.2.1 Contrastive Learning Framework . . . . . 7
2.2.2 Learning Objective and Zero-Shot Transfer . . . . . 7
2.3 CLIP-based OOD Detection . . . . . 8
2.3.1 Threshold-based OOD Detection . . . . . 8
2.3.2 Semantic OOD Detection via Competitive Classification . . . . . 9
2.4 Post-Hoc Semantic Negation for Zero-Shot OOD Detection . . . . . 11
2.4.1 Strategic Label Selection via NegMining . . . . . 12
2.4.2 OOD Scoring in an Extended, Competitive Label Space . . . . . 12

Chapter 3 Methodology . . . . . 14
3.1 Superclass-Based Negative Label Mining with LLMs . . . . . 14
3.1.1 Methodological Evolution: From NegLabel to Superclass-NegMining . . . . . 14
3.1.2 The Superclass-based Negative Label Mining Pipeline . . . . . 15
3.1.3 The Superclass-NegMining Algorithm in Detail . . . . . 15
3.1.4 Advantages of the New Approach . . . . . 18
3.2 An Adaptive Framework for OOD Detection . . . . . 19
3.2.1 Feature Memory Bank and Selective Caching . . . . . 19
3.2.2 Adaptive Proxy Generation from Memory . . . . . 21
3.2.2.1 Task-adaptive Proxies (C'ta) . . . . . 21
3.2.2.2 Sample-adaptive Proxies (C'sa) . . . . . 22
3.2.3 Multi-Modal Score for OOD Detection . . . . . 22
3.3 Multi-Modal Proxy Fusion Using Superclass and Adaptive Framework . . . . . 23
3.3.1 Motivation for a Fused Approach . . . . . 23
3.3.2 Component Scores for Fusion . . . . . 24
3.3.2.1 Score from Superclass-based Negative Labels (Snl) . . . . . 24
3.3.2.2 Score from Task-Adaptive Multi-Modal Proxies (Sta) . . . . . 25
3.3.3 The Final Fused Score: Amplifying Textual Priors . . . . . 25

Chapter 4 Experiments . . . . . 27
4.1 Experimental Setup . . . . . 28
4.1.1 Datasets and Benchmarks . . . . . 28
4.1.2 Evaluation Metrics . . . . . 28
4.1.3 Implementation Details . . . . . 28
4.2 Comparison with state-of-the-art Methods . . . . . 29
4.3 Ablation Study on Fusion Formula . . . . . 29

Chapter 5 Benchmark Analysis . . . . . 32
5.1 SUN397 & Places365 Datasets . . . . . 32
5.2 iNaturalist Dataset . . . . . 35
5.3 Describable Textures Dataset (DTD) . . . . . 37

Chapter 6 Contribution . . . . . 40
6.1 Extending the Utility of LLMs by Validating the Superclass Strategy . . . . . 40
6.2 In-Depth Analysis of Multi-Modal Features on Classic OOD Datasets . . . . . 41
6.3 Achieving SOTA Performance by Integrating and Refining Existing Frameworks . . . . . 43

Chapter 7 Future Work . . . . . 46
7.1 Scoring Function Calibration via Bayesian Inference . . . . . 46
7.2 Enhancing Negative Label Quality via Next-Generation Language Models . . . . . 47
7.3 Developing Adaptive Fusion Strategies for Modality Mixing . . . . . 48

References . . . . . 49
-
dc.language.isoen-
dc.subject分佈外偵測-
dc.subject多模態融合-
dc.subject零樣本學習-
dc.subject視覺語言模型-
dc.subject超類別-
dc.subject自適應代理-
dc.subjectOut-of-Distribution Detection-
dc.subjectMulti-Modal Fusion-
dc.subjectZero-Shot Learning-
dc.subjectVision-Language Models-
dc.subjectSuperclass-
dc.subjectAdaptive Proxies-
dc.title融合視覺語言模型的多模態特徵:增強免訓練零樣本的分佈外偵測zh_TW
dc.titleEnhancing Zero-Shot Training-Free Out-of-Distribution Detection by Leveraging Multi-Modal Features and Vision-Language Modelsen
dc.typeThesis-
dc.date.schoolyear114-1-
dc.description.degree碩士-
dc.contributor.oralexamcommittee陳駿丞;李明穗zh_TW
dc.contributor.oralexamcommitteeJun-Cheng Chen;Ming-Sui Leeen
dc.subject.keyword分佈外偵測,多模態融合零樣本學習視覺語言模型超類別自適應代理zh_TW
dc.subject.keywordOut-of-Distribution Detection,Multi-Modal FusionZero-Shot LearningVision-Language ModelsSuperclassAdaptive Proxiesen
dc.relation.page51-
dc.identifier.doi10.6342/NTU202502024-
dc.rights.note未授權-
dc.date.accepted2025-07-25-
dc.contributor.author-college電機資訊學院-
dc.contributor.author-dept資訊工程學系-
dc.date.embargo-liftN/A-
顯示於系所單位:資訊工程學系

文件中的檔案:
檔案 大小格式 
ntu-114-1.pdf
  未授權公開取用
6.33 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved