利用漸進式自蒸餾方法改進對抗式訓練

吳由由; Yu-Yu Wu

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/87975

標題:	利用漸進式自蒸餾方法改進對抗式訓練 Annealing Self-Distillation Rectification Improves Adversarial Training
作者:	吳由由 Yu-Yu Wu
指導教授:	陳尚澤 Shang-Tse Chen
關鍵字:	對抗式訓練,過度擬合,知識蒸餾, Adversarial Training,Robust Overfitting,Knowledge Distillation,
出版年 :	2023
學位:	碩士
摘要:	在標準的對抗訓練中，模型以獨熱標籤 (one-hot label) 作為優化目標，當對抗式攻擊 (Adversarial Attack) 對資料造成的變異在事先定義可接受的範圍之內，模型都將以相同的標籤作為目標學習。也就是，一定變異內的資料都標上相同的標籤。然而，這種給予標籤的方式使模型忽略了對抗式攻擊產生的變異對資料所帶來的潛在分佈飄移 (distribution shift) 現象，因而導致過度擬合 (overfitting) 的問題出現在對抗式訓練中。為了解決這個問題並強化模型防禦對抗式攻擊，首先，我們先分析了強健模型的特徵，並發現了強健模型更傾向產生平滑且校正良好的輸出。基於這項觀察，我們提出了一個簡單但有效的方法「漸進式自蒸餾校正」Annealing Self-Distillation Rectification，ADR），生成對資料分佈改變更精確描述的軟標籤，對模型訓練提供更好的引導，以準確反應攻擊下的資料分佈偏移狀況。透過漸進式自蒸餾校正得到的校正分佈軟標籤 (rectified labels)，我們得以在沒有預先訓練模型 (pre-trained models) 和額外大量計算的情況下顯著提昇模型的強健性。此外，透過以校正後的軟標籤替換損失函數中的硬標籤，我們的方法可以很方便的與其他對抗式訓練的演算法結合。我們的實驗在廣泛的資料集與模型架構上都得到了強勁的表現，驗證了漸進式自蒸餾校正是個有效改善對抗式訓練，並防止過度擬合的方法。 In standard adversarial training, models are optimized to fit one-hot labels within allowable adversarial perturbation budgets. However, the ignorance of underlying distribution shifts brought by perturbations causes the problem of robust overfitting. To address this issue and enhance adversarial robustness, we analyze the characteristics of robust models and identify that robust models tend to produce smoother and well-calibrated outputs. Based on the observation, we propose a simple yet effective method, Annealing Self-Distillation Rectification (ADR), which generates soft labels as a better guidance mechanism that accurately reflects the distribution shift under attack during adversarial training. By utilizing ADR, we can obtain rectified distributions that significantly improve model robustness without the need for pre-trained models or extensive extra computation. Moreover, our method facilitates seamless plug-and-play integration with other adversarial training techniques by replacing the hard labels in their objectives. We demonstrate the efficacy of ADR through extensive experiments and strong performances across datasets.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/87975
DOI:	10.6342/NTU202301129
全文授權:	同意授權(全球公開)
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-111-2.pdf	3.17 MB	Adobe PDF	檢視/開啟

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。