基於挖掘未標記偏差衝突樣本和損失重新加權之半監督去偏差

楊秉蒼; Bing-Cang Yang

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/92215

標題:	基於挖掘未標記偏差衝突樣本和損失重新加權之半監督去偏差 Semi-supervised Debiasing via Unlabeled Bias-conflicting Samples Discovering and Loss Reweighting
作者:	楊秉蒼 Bing-Cang Yang
指導教授:	陳銘憲 Ming-Syan Chen
關鍵字:	去偏差,半監督學習, Debias,Semi-supervised learning,
出版年 :	2024
學位:	碩士
摘要:	神經網絡在訓練過程中往往會因為偏頗的訓練資料而導致準確率下降。現有的研究緩解這個問題的方法是利用偏差衝突樣本(bias-conflicting sample)──即不包含偏差特徵的樣本──來鼓勵模型學習任務相關特徵，從而提高無偏差測試環境下模型的表現。然而，這些方法在辨識偏差衝突樣本的過程需要使用標籤，而標籤工作昂貴且耗時。為解決此問題，我們提出了一種兩階段去偏差框架：基於偏差衝突分數之損失重新加權(Bias-conflicting Score for Loss Reweighting, BSLR)，旨在通過利用少量標記資料來評估無標記樣本中的偏差衝突程度並以此進行去偏差。在第一階段，我們從無標記資料中檢測偏差衝突樣本。此階段首先在標記資料上訓練一個有偏差和一個無偏差的分類器，並通過這兩個分類器推論無標記資料。我們測量輸出之間的差異作為衡量偏差衝突程度的指標，稱為「偏差衝突分數」，因為較高的分數代表樣本的偏差程度較低。在第二階段，我們利用前一階段獲得的偏差衝突分數重新調整損失以在訓練無偏差的分類器的過程中強調偏差衝突樣本。實驗結果表明 BSLR 通過利用無標記資料的資訊，使其表現優於最先進的方法，特別是在具有較大偏差和較少標籤的資料集上。 Neural networks usually receive significant performance deterioration when trained on biased datasets. Existing research addresses this issue by utilizing samples without bias features (i.e., bias-conflicting samples) to encourage the model to learn task-relevant characteristics, thereby improving unbiased testing performance. However, they identify such samples heavily relying on label information, whereas data labeling is time-consuming. To address this issue, in this paper, we propose a two-stage debiasing framework, named Bias-conflicting Score for Loss Reweighting (BSLR), aiming to evaluate the degree of bias-conflicting in unlabeled samples for debiasing by leveraging only a small amount labeled data. In the first stage, we detect the bias-conflicting samples from unlabeled data. The process begins with pre-training a biased and a debiased classifier on the labeled data. Subsequently, the unlabeled data are inferenced through the two classifiers. We measure the difference between the outputs as a score, named Bias-conflicting Score, since a higher score indicates a lower bias degree within the sample. In the second stage, we reweight the loss to place more emphasis on bias-conflicting samples for retraining the debiased classifier by leveraging the Bias-conflicting Scores obtained from the previous stage. Experimental results show that BSLR exhibits superior performance over the state-of-the-art methods by incorporating information from unlabeled data, especially on the dataset with a large bias and few labels.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/92215
DOI:	10.6342/NTU202400683
全文授權:	未授權
顯示於系所單位：	電機工程學系

文件中的檔案：

檔案	大小	格式
ntu-112-1.pdf 未授權公開取用	1.22 MB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。