Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98480
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor周承復zh_TW
dc.contributor.advisorCheng-Fu Chouen
dc.contributor.author翁如萱zh_TW
dc.contributor.authorJu-Hsuan Wengen
dc.date.accessioned2025-08-14T16:16:45Z-
dc.date.available2025-08-15-
dc.date.copyright2025-08-14-
dc.date.issued2025-
dc.date.submitted2025-07-30-
dc.identifier.citation[1] T. Bai, J. Luo, J. Zhao, B. Wen, and Q. Wang. Recent advances in adversarial training for adversarial robustness. arXiv preprint arXiv:2102.01356, 2021.
[2] S. Basu, N. Zhao, V. I. Morariu, S. Feizi, and V. Manjunatha. Localizing and editing knowledge in text-to-image generative models. In The Twelfth International Conference on Learning Representations, 2023.
[3] M. Blaszczyk, G. McGovern, and K. Stanley. Artificial intelligence impacts on copyright law. RAND (Nov. 20, 2024), https://www. rand. org/pubs/perspectives/PEA3243-1. html, 2024.
[4] T. Brooks, A. Holynski, and A. A. Efros. Instructpix2pix: Learning to follow image editing instructions. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 18392–18402, 2023.
[5] Z.-Y. Chin, C.-M. Jiang, C.-C. Huang, P.-Y. Chen, and W.-C. Chiu. Prompting4debugging: Red-teaming text-to-image diffusion models by finding problematic prompts. arXiv preprint arXiv:2309.06135, 2023.
[6] J. Chung, S. Hyun, and J.-P. Heo. Style injection in diffusion: A training-free approach for adapting large-scale diffusion models for style transfer. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8795– 8805, 2024.
[7] C. Fan, J. Liu, Y. Zhang, E. Wong, D. Wei, and S. Liu. Salun: Empowering machine unlearning via gradient-based weight saliency in both image classification and generation. arXiv preprint arXiv:2310.12508, 2023.
[8] R. Gal, Y. Alaluf, Y. Atzmon, O. Patashnik, A. H. Bermano, G. Chechik, and D. CohenOr. An image is worth one word: Personalizing text-to-image generation using textual inversion. arXiv preprint arXiv:2208.01618, 2022.
[9] R. Gandikota, J. Materzynska, J. Fiotto-Kaufman, and D. Bau. Erasing concepts from diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2426–2436, 2023.
[10] R. Gandikota, H. Orgad, Y. Belinkov, J. Materzyńska, and D. Bau. Unified concept editing in diffusion models. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 5111–5120, 2024.
[11] I. J. Goodfellow, J. Shlens, and C. Szegedy. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.
[12] M. M. Grynbaum and R. Mac. The times sues openai and microsoft over ai use of copyrighted work. The New York Times, 27, 2023.
[13] L. Herijgers. Stable diffusion 3 is a step back for ai images of humans, 2024. Accessed: 2025-05-29.
[14] A. Hertz, R. Mokady, J. Tenenbaum, K. Aberman, Y. Pritch, and D. Cohen-Or. Prompt-to-prompt image editing with cross attention control. arXiv preprint arXiv:2208.01626, 2022.
[15] J. Ho, A. Jain, and P. Abbeel. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
[16] J. Ho and T. Salimans. Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598, 2022.
[17] E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, W. Chen, et al. Lora: Low-rank adaptation of large language models. ICLR, 1(2):3, 2022.
[18] C.-P. Huang, K.-P. Chang, C.-T. Tsai, Y.-H. Lai, F.-E. Yang, and Y.-C. F. Wang. Receler: Reliable concept erasing of text-to-image diffusion models via lightweight erasers. In European Conference on Computer Vision, pages 360–376. Springer, 2024.
[19] Y. Huang, J. Huang, Y. Liu, M. Yan, J. Lv, J. Liu, W. Xiong, H. Zhang, L. Cao, and S. Chen. Diffusion model-based image editing: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025.
[20] T. Hunter. Ai porn is easy to make now. for women, that’s a nightmare. The Washington Post, pages NA–NA, 2023.
[21] D. P. Kingma, M. Welling, et al. Auto-encoding variational bayes, 2013.
[22] A. Krizhevsky, G. Hinton, et al. Learning multiple layers of features from tiny images. 2009.
[23] N. Kumari, B. Zhang, S.-Y. Wang, E. Shechtman, R. Zhang, and J.-Y. Zhu. Ablating concepts in textto-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 22691–22702, 2023.
[24] N. Kumari, B. Zhang, R. Zhang, E. Shechtman, and J.-Y. Zhu. Multi-concept customization of text-to-image diffusion. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1931–1941, 2023.
[25] S. Liu, Z. Zeng, T. Ren, F. Li, H. Zhang, J. Yang, Q. Jiang, C. Li, J. Yang, H. Su, et al. Grounding dino: Marrying dino with grounded pre-training for open-set object detection. In European Conference on Computer Vision, pages 38–55. Springer, 2024.
[26] S. Lu, Z. Wang, L. Li, Y. Liu, and A. W.-K. Kong. Mace: Mass concept erasure in diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6430–6440, 2024.
[27] R. Mokady, A. Hertz, K. Aberman, Y. Pritch, and D. Cohen-Or. Null-text inversion for editing real images using guided diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6038– 6047, 2023.
[28] S.-M. Moosavi-Dezfooli, A. Fawzi, and P. Frossard. Deepfool: a simple and accurate method to fool deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2574–2582, 2016.
[29] A. Nichol, P. Dhariwal, A. Ramesh, P. Shyam, P. Mishkin, B. McGrew, I. Sutskever, and M. Chen. Glide: Towards photorealistic image generation and editing with textguided diffusion models. arXiv preprint arXiv:2112.10741, 2021.
[30] M. Pham, K. O. Marshall, N. Cohen, G. Mittal, and C. Hegde. Circumventing concept erasure methods for text-to-image generative models. arXiv preprint arXiv:2308.01508, 2023.
[31] A. Ramesh, P. Dhariwal, A. Nichol, C. Chu, and M. Chen. Hierarchical textconditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 1(2):3, 2022.
[32] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022.
[33] O. Ronneberger, P. Fischer, and T. Brox. U-net: Convolutional networks for biomedical image segmentation. In Medical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, proceedings, part III 18, pages 234–241. Springer, 2015.
[34] N. Ruiz, Y. Li, V. Jampani, Y. Pritch, M. Rubinstein, and K. Aberman. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 22500–22510, 2023.
[35] C. Saharia, W. Chan, S. Saxena, L. Li, J. Whang, E. L. Denton, K. Ghasemipour, R. Gontijo Lopes, B. Karagol Ayan, T. Salimans, et al. Photorealistic text-to-image diffusion models with deep language understanding. Advancesin neural information processing systems, 35:36479–36494, 2022.
[36] J. Sohl-Dickstein, E. Weiss, N. Maheswaranathan, and S. Ganguli. Deep unsupervised learning using nonequilibrium thermodynamics. In International conference on machine learning, pages 2256–2265. pmlr, 2015.
[37] J. Song, C. Meng, and S. Ermon. Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502, 2020.
[38] Y. Tewel, R. Gal, G. Chechik, and Y. Atzmon. Key-locked rank one editing for textto­image personalization. In ACM SIGGRAPH 2023 conference proceedings, pages 1–11, 2023.
[39] V. T. Truong, L. B. Dang, and L. B. Le. Attacks and defenses for generative diffusion models: A comprehensive survey. ACM Computing Surveys, 57(8):1–44, 2025.
[40] Y.-L. Tsai, C.-Y. Hsu, C. Xie, C.-H. Lin, J.-Y. Chen, B. Li, P.-Y. Chen, C.-M. Yu, and C.-Y. Huang. Ring-a-bell! how reliable are concept removal methods for diffusion models? arXiv preprint arXiv:2310.10012, 2023.
[41] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin. Attention is all you need. Advances in neural information processing systems, 30, 2017.
[42] C. Xie, M. Tan, B. Gong, A. Yuille, and Q. V. Le. Smooth adversarial training. arXiv preprint arXiv:2006.14536, 2020.
[43] Y. Yang, R. Gao, X. Wang, T.-Y. Ho, N. Xu, and Q. Xu. Mma-diffusion: Multimodal attack on diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7737–7746, 2024.
[44] G. Zhang, K. Wang, X. Xu, Z. Wang, and H. Shi. Forget-me-not: Learning to forget in text-to-image diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1755–1764, 2024.
[45] Y. Zhang, X. Chen, J. Jia, Y. Zhang, C. Fan, J. Liu, M. Hong, K. Ding, and S. Liu. Defensive unlearning with adversarial training for robust concept erasure in diffusion models. Advances in Neural Information Processing Systems, 37:36748–36776, 2024.
[46] Y. Zhang, N. Huang, F. Tang, H. Huang, C. Ma, W. Dong, and C. Xu. Inversion-based style transfer with diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10146–10156, 2023.
[47] Y. Zhang, J. Jia, X. Chen, A. Chen, Y. Zhang, J. Liu, K. Ding, and S. Liu. To generate or not? safety-driven unlearned diffusion models are still easy to generate unsafe images... for now. In European Conference on Computer Vision, pages 385–403. Springer, 2024.
[48] Y. Zhao, T. Pang, C. Du, X. Yang, C. Li, N.-M. M. Cheung, and M. Lin. On evaluating adversarial robustness of large vision-language models. Advances in Neural Information Processing Systems, 36:54111–54138, 2023.
[49] A. Zou, Z. Wang, N. Carlini, M. Nasr, J. Z. Kolter, and M. Fredrikson. Universal and transferable adversarial attacks on aligned language models. arXiv preprint arXiv:2307.15043, 2023.
-
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98480-
dc.description.abstract文字生圖的擴散模型因其卓越的圖像生成品質而廣受關注,但也引發諸多爭議,例如生成侵犯著作權、暴力、色情等內容。為了解決這個問題,「概念抹除」技術因應而生,旨在防止模型輸出包含特定概念的圖片。

典型的概念抹除流程,通常是使用者先提供欲移除概念的文字描述,接著調整模型權重。這類基於文字描述的方法,在面對文字輸入時能有效抑制特定概念的產生,然而當輸入模態非文字時,抹除效果很可能失靈。

本文首先設計一套多元評估架構,以全面分析現有概念抹除技術在不同輸入模態中的穩健性。再來我們進一步提出輕量級的後處理模組作為提升穩健性的策略,該模組無須重新訓練原始模型,既可補足原有方法的不足,又能保留其優勢;我們的方法也具備良好的擴展性,可應用於圖片中特定物體的移除或置換。
zh_TW
dc.description.abstractText-to-image diffusion models have attracted attention for their exceptional image generation. However, they have also raised concerns about the creation of copyright-infringing, violent, or pornographic content. To address these issues, concept-erasure techniques have been developed to suppress undesired concepts in model outputs.

The typical workflow involves describing the target concept in text and fine-tuning the model. While these text-based methods are effective for textual inputs, they often struggle with non-textual ones.

In this paper, we first design a multimodal evaluation framework to assess the robustness of existing concept-erasure techniques across different input modalities. We then propose a lightweight post-processing module that improves performance without retraining the original model, complementing existing methods while preserving their strengths. Moreover, it is highly extensible, enabling targeted removal or replacement in images.
en
dc.description.provenanceSubmitted by admin ntu (admin@lib.ntu.edu.tw) on 2025-08-14T16:16:45Z
No. of bitstreams: 0
en
dc.description.provenanceMade available in DSpace on 2025-08-14T16:16:45Z (GMT). No. of bitstreams: 0en
dc.description.tableofcontentsVerification Letter from the Oral Examination Committee i
Acknowledgements iii
摘要 v
Abstract vii
Contents ix
List of Figures xiii
List of Tables xix
Chapter 1 Introduction 1
Chapter 2 Related Work 9
2.1 Concept-erasure Methods in Diffusion Models . . . . . . . . . . . . 9
2.2 Adversarial Attacks on Concept-erasure Methods . . . . . . . . . . . 11
2.3 Textual Inversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Chapter 3 Preliminaries 15
3.1 Diffusion Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.1.1 Fundamentals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.1.2 Deterministic Sampling via DDIM . . . . . . . . . . . . . . . . . . 16
3.1.3 Classifier-Free Guidance . . . . . . . . . . . . . . . . . . . . . . . 17
3.1.4 Latent Diffusion Models . . . . . . . . . . . . . . . . . . . . . . . 18
3.2 Adversarial Attack and Training . . . . . . . . . . . . . . . . . . . . 18
Chapter 4 Methodology 21
4.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.2 Baseline Evaluation in the Text Space . . . . . . . . . . . . . . . . . 21
4.3 Evaluation Framework in the Hybrid Space . . . . . . . . . . . . . . 22
4.3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.3.2 Configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.4 Evaluation Framework in the Latent Space . . . . . . . . . . . . . . 23
4.4.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.4.2 DDIM Inversion and Sampling . . . . . . . . . . . . . . . . . . . . 24
4.4.3 Configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.5 Post-processing for Robustness Enhancement . . . . . . . . . . . . . 27
4.5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.5.2 Concept Localization . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.5.3 Masked Latent Disruption . . . . . . . . . . . . . . . . . . . . . . . 28
Chapter 5 Experiments 31
5.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.1.1 Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.1.2 Hyperparameters . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.1.3 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.2 Evaluation Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.3 Baseline Evaluation Results in the Text Space . . . . . . . . . . . . . 34
5.4 Evaluation Results in the Hybrid Space . . . . . . . . . . . . . . . . 36
5.5 Evaluation Results in the Latent Space . . . . . . . . . . . . . . . . . 39
5.5.1 Quantitative Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 39
5.5.2 Qualitative Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5.6 Effectiveness of the Post-processing Mechanism . . . . . . . . . . . 45
5.6.1 Quantitative Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.6.2 Qualitative Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 47
Chapter 6 Conclusion 49
References 51
Appendix A — More Experimental Results 59
A.1 Effect of Post-processing on Per-class Reproduction Rates . . . . . . 59
Appendix B — Analysis of Hyperparameters in Post-processing 63
B.1 Sensitivity of Intervention Step and Mask Threshold . . . . . . . . . 63
-
dc.language.isoen-
dc.subject人工智慧安全zh_TW
dc.subject擴散模型zh_TW
dc.subject概念抹除zh_TW
dc.subject惡意攻擊zh_TW
dc.subjectDiffusion modelsen
dc.subjectConcept-erasureen
dc.subjectAI securityen
dc.subjectAdversarial attacksen
dc.title擴散模型中概念抹除之多模態輸入空間表現評估與強健性提升策略zh_TW
dc.titleMultimodal Robustness Evaluation and Enhancement for Concept-erasure in Diffusion Modelsen
dc.typeThesis-
dc.date.schoolyear113-2-
dc.description.degree碩士-
dc.contributor.oralexamcommittee陳駿丞;呂政修;李明穗;吳曉光zh_TW
dc.contributor.oralexamcommitteeJun-Cheng Chen;Jenq-Shiou Leu;Ming-Sui Lee;Hsiao-Kuang Wuen
dc.subject.keyword人工智慧安全,擴散模型,概念抹除,惡意攻擊,zh_TW
dc.subject.keywordAI security,Diffusion models,Concept-erasure,Adversarial attacks,en
dc.relation.page64-
dc.identifier.doi10.6342/NTU202501587-
dc.rights.note未授權-
dc.date.accepted2025-07-31-
dc.contributor.author-college電機資訊學院-
dc.contributor.author-dept資訊工程學系-
dc.date.embargo-liftN/A-
顯示於系所單位:資訊工程學系

文件中的檔案:
檔案 大小格式 
ntu-113-2.pdf
  未授權公開取用
70.65 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved