從表徵學習探究具挑戰性視覺分類問題

許雁棋; Yen-Chi Hsu

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/88132

標題:	從表徵學習探究具挑戰性視覺分類問題 Representation Learning for Challenging Visual Classification Problems
作者:	許雁棋 Yen-Chi Hsu
指導教授:	李明穗 Ming-Sui Lee
共同指導教授:	劉庭祿 Tyng-Luh Liu
關鍵字:	表徵學習,多實例學習,自監督學習,現實世界分佈,順序數據分佈, representation learning,multi-instance learning,self-supervised,real-world distributions,ordering data distributions,
出版年 :	2023
學位:	博士
摘要:	本文對具有挑戰性的分類任務的特徵表示進行了全面探索。研究工作聚焦於四個關鍵方面：多實例數據分佈的學習、無標籤數據分佈的學習、現實世界數據分佈的學習以及順序數據分佈的學習。首先在多實例數據的情境下，我們引入了一種新穎的跨注意力池化方法，結合注意力引導，有效地表示給定特定查詢的一組實例。所提出的方法捕捉了關鍵特徵，實現了準確的分類。接著，為應對無標籤數據分佈的挑戰，本文提出了一種解耦對比學習框架。該框架緩解了對比學習中大批量數據的問題，並討論了各種方法對後續分類任務的影響。然後，在面對現實世界數據分佈帶來的獨特挑戰時，例如細粒度和長尾問題，我們提出了一種自適應批次混淆規範（ABC-Norm）。該方法同時解決了這兩項問題，實現了針對現實世界情境的表徵學習。最後，在處理多個偽造組件和順序問題的深偽影像的表徵問題時，我們將該問題分解為深偽分類、多標籤定位和偽造順序恢復的任務，並提出了一種多標籤排序機制，結合對比的多實例情境，以恢復順序數據分佈。透過廣泛的實驗，本文為分類任務的表徵學習做出了重要貢獻，我們討論了最先進的方法，並且在每個方面中的挑戰都提出了新穎的方法並取得突出的研究成果。 This thesis presents a comprehensive exploration of feature representations for challenging classification tasks. The research efforts focus on four key aspects: learning with multi-instance data distributions, learning with unlabeled data distributions, learning with real-world data distributions, and learning with ordering data distributions. In the context of multi-instance data, we introduce a novel cross-attention pooling approach, incorporating attention guidance, to effectively represent a bag of instances given a specific query. The proposed method captures essential features and enables accurate classification. To address the challenge of unlabeled data distributions, a decoupled contrastive learning framework is proposed. This framework alleviates the issue of large batch sizes in contrastive learning and discusses the implications of various approaches for subsequent classification tasks. Real-world data distributions present unique challenges, such as fine-grained and long-tailed issues. To tackle these complexities, we present an adaptive batch confusion norm (ABC-Norm) that addresses both issues and enables the learning of robust feature representations tailored to real-world scenarios. Finally, we address the representation of deepfake images, which involve multiple manipulated components and ordering issues. The problem is decomposed into deepfake classification, multi-label localization, and manipulation ordering tasks. A multi-label ranking mechanism, combined with a contrastive multi-instance scenario, is proposed to recover the ordering data distributions. Through algorithmic design and extensive experimentation, this thesis contributes to the advancement of representation learning for classification tasks. It discusses state-of-the-art methodologies, pinpoints the challenges associated with each aspect, and proposes effective research approaches. The findings of this research provide useful insights into the field of representation learning for tackling challenging classification tasks.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/88132
DOI:	10.6342/NTU202301574
全文授權:	同意授權(全球公開)
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-111-2.pdf	6.19 MB	Adobe PDF	檢視/開啟

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。