ANDP：對抗式擾動驅動之擴散式對抗淨化框架

陳信新; Hsin-Hsin Chen

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/101809

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	王勝德	zh_TW
dc.contributor.advisor	Sheng-De Wang	en
dc.contributor.author	陳信新	zh_TW
dc.contributor.author	Hsin-Hsin Chen	en
dc.date.accessioned	2026-03-04T16:44:26Z	-
dc.date.available	2026-03-05	-
dc.date.copyright	2026-03-04	-
dc.date.issued	2026	-
dc.date.submitted	2026-01-30	-
dc.identifier.citation	Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. Technical report, University of Toronto, Toronto, Ontario, 2009. Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 248–255, 2009. Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian J. Goodfellow, and Rob Fergus. Intriguing properties of neural networks. In Proceedings of the International Conference on Learning Representations (ICLR), 2014. Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. In Proceedings of the International Conference on Learning Representations (ICLR), 2015. Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. In Proceedings of the International Conference on Learning Representations (ICLR), 2018. Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. In Proceedings of the IEEE Symposium on Security and Privacy (S&P), pages 39–57, 2017. Anish Athalye, Nicholas Carlini, and David Wagner. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In Proceedings of the International Conference on Machine Learning (ICML), pages 274–283, 2018. Francesco Croce and Matthias Hein. Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. In Proceedings of the International Conference on Machine Learning (ICML), volume 119 of Proceedings of Machine Learning Research, pages 2206–2216, 2020. Francesco Croce, Maksym Andriushchenko, Vikash Sehwag, Edoardo Debenedetti, Nicolas Flammarion, Mung Chiang, Prateek Mittal, and Matthias Hein. Robustbench: A standardized adversarial robustness benchmark. In Proceedings of the NeurIPS Datasets and Benchmarks Track, 2021. Dimitris Tsipras, Shibani Santurkar, Logan Engstrom, Alexander Turner, and Aleksander Madry. Robustness may be at odds with accuracy. In Proceedings of the International Conference on Learning Representations (ICLR), 2019. Pouya Samangouei, Maya Kabkab, and Rama Chellappa. Defense-gan: Protecting classifiers against adversarial attacks using generative models. In Proceedings of the International Conference on Learning Representations (ICLR), 2018. Yang Song, Taesup Kim, Sebastian Nowozin, Stefano Ermon, and Nate Kushman. Pixeldefend: Leveraging generative models to understand and defend against adversarial examples. In Proceedings of the International Conference on Learning Representations (ICLR), 2018. Yilun Du and Igor Mordatch. Implicit generation and modeling with energy-based models. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), 2019. Will Grathwohl, Kuan-Chieh Wang, Jörn-Henrik Jacobsen, David Duvenaud, Mohammad Norouzi, and Kevin Swersky. Your classifier is secretly an energy-based model and you should treat it like one. In Proceedings of the International Conference on Learning Representations (ICLR), 2020. Mitch Hill, Jonathan C. Mitchell, and Song-Chun Zhu. Stochastic security: Adversarial defense using long-run dynamics of energy-based models. In Proceedings of the International Conference on Learning Representations (ICLR), 2021. Jascha Sohl-Dickstein, Eric A. Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsupervised learning using nonequilibrium thermodynamics. In Proceedings of the International Conference on Machine Learning (ICML), 2015. Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), 2020. Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. In Proceedings of the International Conference on Learning Representations (ICLR), 2021. Prafulla Dhariwal and Alex Nichol. Diffusion models beat gans on image synthesis. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), 2021. Weili Nie, Brandon Guo, Yujia Huang, Chaowei Xiao, Arash Vahdat, and Anima Anandkumar. Diffusion models for adversarial purification. In Proceedings of the International Conference on Machine Learning (ICML), volume 162 of Proceedings of Machine Learning Research, 2022. Jinyi Wang, Zhaoyang Lyu, Dahua Lin, Bo Dai, and Hongfei Fu. Guided diffusion model for adversarial purification. arXiv preprint arXiv:2205.14969, 2022. Jeremy Cohen, Elan Rosenfeld, and J. Zico Kolter. Certified adversarial robustness via randomized smoothing. In Proceedings of the International Conference on Machine Learning (ICML), volume 97 of Proceedings of Machine Learning Research, pages 1310–1320, 2019. Mathias Lecuyer, Vaggelis Atlidakis, Roxana Geambasu, Daniel Hsu, and Suman Jana. Certified robustness to adversarial examples with differential privacy. In Proceedings of the IEEE Symposium on Security and Privacy (S&P), 2019. Hadi Salman, Mingjie Sun, Greg Yang, Ashish Kapoor, and J. Zico Kolter. Denoised smoothing: A provably robust defense for pretrained classifiers. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), 2020. Nicholas Carlini, Florian Tramèr, Krishnamurthy (DJ) Dvijotham, Leslie Rice, Mingjie Sun, and J. Zico Kolter. (certified!!) adversarial robustness for free! In Proceedings of the International Conference on Learning Representations (ICLR), 2023. Chaowei Xiao, Zhongzhu Chen, Kun Jin, Jiongxiao Wang, Weili Nie, Mingyan Liu, Anima Anandkumar, Bo Li, and Dawn Song. Densepure: Understanding diffusion models for adversarial robustness. In Proceedings of the International Conference on Learning Representations (ICLR), 2023. Patrick von Platen, Suraj Patil, Anton Lozhkov, Pedro Cuenca, Nathan Lambert, Kashif Rasul, Mishig Davaadorj, and Thomas Wolf. Diffusers: State-of-the-art diffusion models. GitHub repository, 2023. Accessed: 2026-01-09. Kaibo Wang, Xiaowen Fu, Yuxuan Han, and Yang Xiang. Diffhammer: Rethinking the robustness of diffusion-based adversarial purification. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), 2024. Xiao Li, Wenxuan Sun, Huanran Chen, Qiongxiu Li, Yining Liu, Yingzhe He, Jie Shi, and Xiaolin Hu. Adbm: Adversarial diffusion bridge model for reliable adversarial purification. In Proceedings of the International Conference on Learning Representations (ICLR), 2025.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/101809	-
dc.description.abstract	在深度神經網路中，即使對輸入資料施加人眼難以察覺的對抗式擾動，也可能導致模型產生顯著的預測錯誤，進而影響深度神經網路的可靠性。雖然對抗式訓練被視為一種有效的防禦方法，但其以分類器為中心的 min–max 最佳化方式，往往伴隨高昂的計算成本與有限的可擴展性，且在不同分類器與攻擊設定下的泛化能力受限。對抗式淨化方法透過在分類前對輸入樣本進行前處理，將模型穩健性與分類器訓練過程加以分離，因而成為另一種防禦途徑。在各類淨化方法中，擴散模型因具備良好的訓練動態以及逐步去雜訊的特性而受到關注。然而，多數既有的擴散式淨化方法僅在高斯雜訊假設下進行模型訓練，導致訓練階段與推論階段之間存在差異，因為對抗式擾動具有結構性且非高斯的特性，卻僅於推論時出現。本論文提出 ANDP（Adversarial Noise–Driven Diffusion Purification），一種基於擴散模型的對抗式淨化框架，在擴散模型的加噪過程及去雜訊目標中明確納入對抗式擾動。與修改分類器或進行 min–max 最佳化不同，ANDP 在擴散訓練過程中，透過受控的雜訊混合方式，將高斯雜訊與針對分類流程所產生的對抗式擾動結合，使去雜訊目標分佈能和推論階段所遭遇的擾動型態相對齊。在 CIFAR-10 資料集上的實驗結果顯示，於結合 Expectation over Transformation（EOT）的 AutoAttack 評估下，ANDP 能穩定提升多個百分點的穩健準確率，並達到小幅但穩定的標準準確率提升。此外，在僅使用 10% 的 ImageNet 訓練資料之實驗中，結果顯示所提出的方法可延伸應用至小型資料集以外的設定。整體而言，本研究顯示在擴散模型的加噪過程中，明確將對抗式擾動納入建模，可提供一種具原則性且具彈性的對抗式淨化設計方式。ANDP 作為一種實用且與分類器無關的淨化方法，進一步說明了對抗式擾動在擴散式淨化過程中的處理方式。	zh_TW
dc.description.abstract	Adversarial perturbations pose a fundamental challenge to the reliability of deep neural networks, where even imperceptible input modifications can induce significant prediction errors. Although adversarial training is an effective defense, its reliance on classifier-centric min–max optimization leads to high computational cost, limited scalability, and restricted generalization across classifiers and attack settings. Adversarial purification has emerged as an alternative paradigm that decouples robustness from classifier training by preprocessing adversarial inputs prior to classification. Among purification-based approaches, diffusion models are particularly appealing due to their well-behaved training dynamics and iterative denoising behavior. However, most existing diffusion-based purification methods rely on diffusion models trained exclusively under Gaussian-only corruption, creating a training–inference mismatch: adversarial perturbations are structured and non-Gaussian, yet are encountered only at inference time. In this thesis, we propose ANDP (Adversarial Noise–Driven Diffusion Purification), a diffusion-based adversarial purification framework that explicitly incorporates adversarial perturbations into the diffusion corruption and denoising-target formulation. Rather than modifying the classifier or optimizing a min–max objective, ANDP introduces controlled mixing between Gaussian noise and classifier-induced adversarial perturbations during diffusion training, aligning the denoising target distribution with adversarial perturbations encountered at inference time. Extensive experiments on CIFAR-10 demonstrate that ANDP consistently improves purified robust accuracy by multiple percentage points under strong AutoAttack evaluation with Expectation over Transformation (EOT), while also achieving small but consistent gains in standard accuracy. Additional experiments on ImageNet using 10% of the training data indicate that the proposed approach generalizes beyond small-scale benchmarks. Overall, this work shows that explicitly modeling adversarial perturbations within the diffusion corruption process provides a principled and flexible approach to adversarial purification. ANDP offers a practical, classifier-agnostic alternative to adversarial training for purification-based defenses, while clarifying the role of adversarial exposure in diffusion-based defense frameworks.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2026-03-04T16:44:26Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2026-03-04T16:44:26Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	Acknowledgements i 摘要 ii Abstract iv Contents vi List of Figures ix List of Tables x Chapter 1 Introduction 1 1.1 Adversarial Robustness: Challenges and Limitations 1 1.2 Adversarial Purification as an Alternative Paradigm 2 1.3 Diffusion Models for Adversarial Purification 3 1.4 Training–Inference Mismatch in Diffusion Purification 3 1.5 Adversarial Noise–Driven Diffusion Purification 4 1.6 Contributions and Organization 5 Chapter 2 Related Work 7 2.1 Evolution of Adversarial Attacks and Robustness Evaluation 7 2.2 Adversarial Training 8 2.3 Adversarial Purification 9 2.4 Diffusion Models and Diffusion-Based Purification 9 2.5 Stochastic and Noise-Based Defenses 10 2.6 Positioning of ANDP 10 Chapter 3 Method 12 3.1 Notation and Problem Formulation 13 3.2 Preliminaries: Denoising Diffusion Probabilistic Models 14 3.2.1 Architectural Details 16 3.3 ANDP: Adversarial Noise–Driven Diffusion Purification 19 3.3.1 Purification–Classification Pipeline and Adversarial Setting 20 3.3.2 Low-Timestep Truncated Reverse Diffusion 21 3.3.3 Adversarial Noise Mixing with Time-Dependent Scheduling 23 3.4 Relation to Adversarial Training 25 3.5 Algorithm: ANDP Training Procedure 26 3.6 Summary 26 Chapter 4 Experiment 27 4.1 Experimental Setup 27 4.1.1 Datasets 27 4.1.2 Classifiers and Motivation 28 4.1.3 Diffusion Model, Training Configuration, and Hardware 28 4.1.4 Adversarial Attacks and Evaluation Protocol 29 4.2 Baseline Performance of Noise Injection Strategies 30 4.3 Adversarial Noise Mixing Strategies 31 4.4 Controlling Adversarial Exposure 33 4.4.1 Exposure Timing 34 4.4.2 Time-Dependent Mixing Schedules and Strength 35 4.5 Cross-Classifier and Cross-Dataset Generalization 38 4.6 Summary 41 Chapter 5 Conclusion 42 5.1 Summary of Contributions 42 5.2 Limitations and Future Work 44 Appendix 46 References 50	-
dc.language.iso	en	-
dc.subject	對抗攻擊穩健性	-
dc.subject	對抗式淨化	-
dc.subject	擴散模型	-
dc.subject	擴散式防禦	-
dc.subject	與分類器無關的防禦	-
dc.subject	具對抗穩健性的生成模型	-
dc.subject	Adversarial Robustness	-
dc.subject	Adversarial Purification	-
dc.subject	Diffusion Models	-
dc.subject	Diffusion-Based Defense	-
dc.subject	Classifier-Agnostic Defense	-
dc.subject	Adversarially Robust Generative Models	-
dc.title	ANDP：對抗式擾動驅動之擴散式對抗淨化框架	zh_TW
dc.title	ANDP: Adversarial Noise–Driven Diffusion Purification	en
dc.type	Thesis	-
dc.date.schoolyear	114-1	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	雷欽隆;于天立;莊永裕	zh_TW
dc.contributor.oralexamcommittee	Chin-Laung Lei;Tian-Li Yu;Yung-Yu Zhuang	en
dc.subject.keyword	對抗攻擊穩健性,對抗式淨化擴散模型擴散式防禦與分類器無關的防禦具對抗穩健性的生成模型	zh_TW
dc.subject.keyword	Adversarial Robustness,Adversarial PurificationDiffusion ModelsDiffusion-Based DefenseClassifier-Agnostic DefenseAdversarially Robust Generative Models	en
dc.relation.page	53	-
dc.identifier.doi	10.6342/NTU202600213	-
dc.rights.note	未授權	-
dc.date.accepted	2026-02-02	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	電機工程學系	-
dc.date.embargo-lift	N/A	-
顯示於系所單位：	電機工程學系

文件中的檔案：

檔案	大小	格式
ntu-114-1.pdf 未授權公開取用	8.18 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。