ANDP：對抗式擾動驅動之擴散式對抗淨化框架

陳信新; Hsin-Hsin Chen

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/101809

標題:	ANDP：對抗式擾動驅動之擴散式對抗淨化框架 ANDP: Adversarial Noise–Driven Diffusion Purification
作者:	陳信新 Hsin-Hsin Chen
指導教授:	王勝德 Sheng-De Wang
關鍵字:	對抗攻擊穩健性,對抗式淨化擴散模型擴散式防禦與分類器無關的防禦具對抗穩健性的生成模型 Adversarial Robustness,Adversarial PurificationDiffusion ModelsDiffusion-Based DefenseClassifier-Agnostic DefenseAdversarially Robust Generative Models
出版年 :	2026
學位:	碩士
摘要:	在深度神經網路中，即使對輸入資料施加人眼難以察覺的對抗式擾動，也可能導致模型產生顯著的預測錯誤，進而影響深度神經網路的可靠性。雖然對抗式訓練被視為一種有效的防禦方法，但其以分類器為中心的 min–max 最佳化方式，往往伴隨高昂的計算成本與有限的可擴展性，且在不同分類器與攻擊設定下的泛化能力受限。對抗式淨化方法透過在分類前對輸入樣本進行前處理，將模型穩健性與分類器訓練過程加以分離，因而成為另一種防禦途徑。在各類淨化方法中，擴散模型因具備良好的訓練動態以及逐步去雜訊的特性而受到關注。然而，多數既有的擴散式淨化方法僅在高斯雜訊假設下進行模型訓練，導致訓練階段與推論階段之間存在差異，因為對抗式擾動具有結構性且非高斯的特性，卻僅於推論時出現。本論文提出 ANDP（Adversarial Noise–Driven Diffusion Purification），一種基於擴散模型的對抗式淨化框架，在擴散模型的加噪過程及去雜訊目標中明確納入對抗式擾動。與修改分類器或進行 min–max 最佳化不同，ANDP 在擴散訓練過程中，透過受控的雜訊混合方式，將高斯雜訊與針對分類流程所產生的對抗式擾動結合，使去雜訊目標分佈能和推論階段所遭遇的擾動型態相對齊。在 CIFAR-10 資料集上的實驗結果顯示，於結合 Expectation over Transformation（EOT）的 AutoAttack 評估下，ANDP 能穩定提升多個百分點的穩健準確率，並達到小幅但穩定的標準準確率提升。此外，在僅使用 10% 的 ImageNet 訓練資料之實驗中，結果顯示所提出的方法可延伸應用至小型資料集以外的設定。整體而言，本研究顯示在擴散模型的加噪過程中，明確將對抗式擾動納入建模，可提供一種具原則性且具彈性的對抗式淨化設計方式。ANDP 作為一種實用且與分類器無關的淨化方法，進一步說明了對抗式擾動在擴散式淨化過程中的處理方式。 Adversarial perturbations pose a fundamental challenge to the reliability of deep neural networks, where even imperceptible input modifications can induce significant prediction errors. Although adversarial training is an effective defense, its reliance on classifier-centric min–max optimization leads to high computational cost, limited scalability, and restricted generalization across classifiers and attack settings. Adversarial purification has emerged as an alternative paradigm that decouples robustness from classifier training by preprocessing adversarial inputs prior to classification. Among purification-based approaches, diffusion models are particularly appealing due to their well-behaved training dynamics and iterative denoising behavior. However, most existing diffusion-based purification methods rely on diffusion models trained exclusively under Gaussian-only corruption, creating a training–inference mismatch: adversarial perturbations are structured and non-Gaussian, yet are encountered only at inference time. In this thesis, we propose ANDP (Adversarial Noise–Driven Diffusion Purification), a diffusion-based adversarial purification framework that explicitly incorporates adversarial perturbations into the diffusion corruption and denoising-target formulation. Rather than modifying the classifier or optimizing a min–max objective, ANDP introduces controlled mixing between Gaussian noise and classifier-induced adversarial perturbations during diffusion training, aligning the denoising target distribution with adversarial perturbations encountered at inference time. Extensive experiments on CIFAR-10 demonstrate that ANDP consistently improves purified robust accuracy by multiple percentage points under strong AutoAttack evaluation with Expectation over Transformation (EOT), while also achieving small but consistent gains in standard accuracy. Additional experiments on ImageNet using 10% of the training data indicate that the proposed approach generalizes beyond small-scale benchmarks. Overall, this work shows that explicitly modeling adversarial perturbations within the diffusion corruption process provides a principled and flexible approach to adversarial purification. ANDP offers a practical, classifier-agnostic alternative to adversarial training for purification-based defenses, while clarifying the role of adversarial exposure in diffusion-based defense frameworks.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/101809
DOI:	10.6342/NTU202600213
全文授權:	未授權
電子全文公開日期:	N/A
顯示於系所單位：	電機工程學系

文件中的檔案：

檔案	大小	格式
ntu-114-1.pdf 未授權公開取用	8.18 MB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。