有限或帶噪標籤資料學習於視覺分類

林家慶; Chia-Ching Lin

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/87195

標題:	有限或帶噪標籤資料學習於視覺分類 Learning from Limited or Noisily-Labeled Data for Visual Classification
作者:	林家慶 Chia-Ching Lin
指導教授:	雷欽隆 Chin-Laung Lei
共同指導教授:	王鈺強 Yu-Chiang Frank Wang
關鍵字:	少樣本學習,噪聲標籤學習,元學習,自監督學習, few-shot learning,noisy-label learning,meta learning,self-supervised learning,
出版年 :	2023
學位:	博士
摘要:	深度學習近年來於電腦視覺之多項任務中均取得突破性的成果，例如影像辨識、物件偵測、圖像語意分割等等。然而，深度學習模型的強大效能，往往來自於龐大且具有精確標註的訓練資料。受限於時間、金錢，與人力等因素，在現實世界的應用場景當中，研究人員往往必須面對不完美的訓練資料，例如數量有限或是帶有錯誤標籤的樣本。因此，如何在面對此類不完美的訓練資料時，仍然能夠訓練出可靠的深度學習模型，已然成為重要的研究主題。本研究分成兩個部分，第一部分探討有限的訓練資料，討論少樣本學習（few-shot learning）；第二部分則是探討訓練資料數量充足，但是帶有錯誤標籤的情形，討論噪聲標籤學習（noisy-label learning）。在少樣本學習的研究中，我們從資料擴增（data augmentation）的角度切入，模仿人類的學習模式：藉由觀察基礎類別（base categories）之大量訓練資料，學習舉一反三的能力，從而在面對新類別（novel categories）之少量學習樣本時，有能力幻想（hallucinate）出更多的新類別樣本。現有的資料幻想技術，並沒有充分利用人類學習新類別時可取得的各種先驗知識（prior knowledge），例如類別的語意資訊，或是基礎類別之大量訓練資料當中所隱含的豐富外觀資訊等等。此研究藉由將上述兩種資訊，藉由元學習（meta learning）的技巧，納入資料幻想模型的學習框架當中，並探討他們如何增進深度學習模型在少樣本學習上的表現。實驗結果顯示，我們的模型在提升少樣本類別之準確率，以及生成資料之可解釋性方面，均有比現有方法較佳的表現。在噪聲標籤學習的研究中，我們將自監督學習（self-supervised learning）的概念延伸至集合的層次，刻意擾亂一個小批次（mini-batch）的訓練資料之標籤，並透過元學習的技巧，鼓勵模型在觀察到該標籤擾亂之後，仍然能夠與另一個未觀察到該標籤擾亂之穩定模型具有相似的表現，從而提高模型對錯誤標籤的容忍度。本研究更進一步將上述之集合層次的自監督學習框架進行延伸，發展出估計整體訓練資料當中不同類別之間的標記錯誤機率之演算法，使模型有機會針對特定標記錯誤進行補償；也發展出一種新的樣本重新加權（sample reweighting）演算法，自動調降標記錯誤之訓練樣本的權重，從而提升模型對於錯誤標記之訓練資料的穩健性。實驗結果顯示，我們提出的集合層次自監督學習 (set-level self-supervised learning, SLSSL) 可以有效提升模型在正確標籤之測試資料集的準確率。 Deep learning approaches have recently achieved remarkable progress in many computer vision tasks, such as visual classification, object detection, and semantic segmentation. However, training a deep neural network typically requires a large amount of accurately-labeled training data, which is time-consuming and labor-intensive to collect. In many real-world applications, researchers often have to build their networks based on imperfect training data, such as datasets with a limited number of samples or noisy labels. As a result, how to effectively learn from such imperfect datasets has become an important research topic. This study can be divided into two parts. In the first part, we focus on learning from limited training data, and discuss few-shot learning (FSL) tasks. In the second part, we focus on learning from noisily-labeled training data, which are assumed to have a sufficient amount of samples with noisy labels, and discuss noisy-label learning (NLL) tasks. In the study of FSL, we start with data augmentation approaches, in which a data hallucinator is built from base categories with a large number of training samples, and then applied to generate additional training samples for few-shot novel categories. Particularly, we leverage the meta-learning technique to investigate two forms of prior knowledge to aid the construction of the hallucinator: 1) the semantic information among all categories; 2) the abundant appearance information extracted from base categories. Experiments show that, by leveraging the above prior knowledge, the proposed frameworks are able to make hallucinations that not only improve the performances of downstream FSL tasks, but also exhibit more explainable appearances or distributions than previous data hallucination works. In the study of NLL, we extend the idea of self-supervised learning to the set level, and propose the set-level self-supervised learning (SLSSL) technique. Specifically, we first corrupt the labels of a training set (i.e., mini-batch) and then obtain an augmented model by the meta-learning techniques. Next, we design our pretext task by imposing a consistency constraint on such augmented models to encourage the model's robustness against noisy labels. The proposed SLSSL technique can also be utilized to estimate the associated noise transition matrix to counteract the effect of noisy labels during model training, and also allows us to identify clean/noisy training samples via sample reweighting. Such SLSSL-based model training and sample reweighting algorithms can be further combined as an EM-like algorithm to boost the NLL performance. Experiments on synthetic and real-world noisily-labeled data confirm the effectiveness of our framework.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/87195
DOI:	10.6342/NTU202300302
全文授權:	同意授權(全球公開)
電子全文公開日期:	2025-03-01
顯示於系所單位：	電機工程學系

文件中的檔案：

檔案	大小	格式
ntu-111-1.pdf	13.81 MB	Adobe PDF	檢視/開啟

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。