請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98480完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 周承復 | zh_TW |
| dc.contributor.advisor | Cheng-Fu Chou | en |
| dc.contributor.author | 翁如萱 | zh_TW |
| dc.contributor.author | Ju-Hsuan Weng | en |
| dc.date.accessioned | 2025-08-14T16:16:45Z | - |
| dc.date.available | 2025-08-15 | - |
| dc.date.copyright | 2025-08-14 | - |
| dc.date.issued | 2025 | - |
| dc.date.submitted | 2025-07-30 | - |
| dc.identifier.citation | [1] T. Bai, J. Luo, J. Zhao, B. Wen, and Q. Wang. Recent advances in adversarial training for adversarial robustness. arXiv preprint arXiv:2102.01356, 2021.
[2] S. Basu, N. Zhao, V. I. Morariu, S. Feizi, and V. Manjunatha. Localizing and editing knowledge in text-to-image generative models. In The Twelfth International Conference on Learning Representations, 2023. [3] M. Blaszczyk, G. McGovern, and K. Stanley. Artificial intelligence impacts on copyright law. RAND (Nov. 20, 2024), https://www. rand. org/pubs/perspectives/PEA3243-1. html, 2024. [4] T. Brooks, A. Holynski, and A. A. Efros. Instructpix2pix: Learning to follow image editing instructions. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 18392–18402, 2023. [5] Z.-Y. Chin, C.-M. Jiang, C.-C. Huang, P.-Y. Chen, and W.-C. Chiu. Prompting4debugging: Red-teaming text-to-image diffusion models by finding problematic prompts. arXiv preprint arXiv:2309.06135, 2023. [6] J. Chung, S. Hyun, and J.-P. Heo. Style injection in diffusion: A training-free approach for adapting large-scale diffusion models for style transfer. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8795– 8805, 2024. [7] C. Fan, J. Liu, Y. Zhang, E. Wong, D. Wei, and S. Liu. Salun: Empowering machine unlearning via gradient-based weight saliency in both image classification and generation. arXiv preprint arXiv:2310.12508, 2023. [8] R. Gal, Y. Alaluf, Y. Atzmon, O. Patashnik, A. H. Bermano, G. Chechik, and D. CohenOr. An image is worth one word: Personalizing text-to-image generation using textual inversion. arXiv preprint arXiv:2208.01618, 2022. [9] R. Gandikota, J. Materzynska, J. Fiotto-Kaufman, and D. Bau. Erasing concepts from diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2426–2436, 2023. [10] R. Gandikota, H. Orgad, Y. Belinkov, J. Materzyńska, and D. Bau. Unified concept editing in diffusion models. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 5111–5120, 2024. [11] I. J. Goodfellow, J. Shlens, and C. Szegedy. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014. [12] M. M. Grynbaum and R. Mac. The times sues openai and microsoft over ai use of copyrighted work. The New York Times, 27, 2023. [13] L. Herijgers. Stable diffusion 3 is a step back for ai images of humans, 2024. Accessed: 2025-05-29. [14] A. Hertz, R. Mokady, J. Tenenbaum, K. Aberman, Y. Pritch, and D. Cohen-Or. Prompt-to-prompt image editing with cross attention control. arXiv preprint arXiv:2208.01626, 2022. [15] J. Ho, A. Jain, and P. Abbeel. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020. [16] J. Ho and T. Salimans. Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598, 2022. [17] E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, W. Chen, et al. Lora: Low-rank adaptation of large language models. ICLR, 1(2):3, 2022. [18] C.-P. Huang, K.-P. Chang, C.-T. Tsai, Y.-H. Lai, F.-E. Yang, and Y.-C. F. Wang. Receler: Reliable concept erasing of text-to-image diffusion models via lightweight erasers. In European Conference on Computer Vision, pages 360–376. Springer, 2024. [19] Y. Huang, J. Huang, Y. Liu, M. Yan, J. Lv, J. Liu, W. Xiong, H. Zhang, L. Cao, and S. Chen. Diffusion model-based image editing: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025. [20] T. Hunter. Ai porn is easy to make now. for women, that’s a nightmare. The Washington Post, pages NA–NA, 2023. [21] D. P. Kingma, M. Welling, et al. Auto-encoding variational bayes, 2013. [22] A. Krizhevsky, G. Hinton, et al. Learning multiple layers of features from tiny images. 2009. [23] N. Kumari, B. Zhang, S.-Y. Wang, E. Shechtman, R. Zhang, and J.-Y. Zhu. Ablating concepts in textto-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 22691–22702, 2023. [24] N. Kumari, B. Zhang, R. Zhang, E. Shechtman, and J.-Y. Zhu. Multi-concept customization of text-to-image diffusion. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1931–1941, 2023. [25] S. Liu, Z. Zeng, T. Ren, F. Li, H. Zhang, J. Yang, Q. Jiang, C. Li, J. Yang, H. Su, et al. Grounding dino: Marrying dino with grounded pre-training for open-set object detection. In European Conference on Computer Vision, pages 38–55. Springer, 2024. [26] S. Lu, Z. Wang, L. Li, Y. Liu, and A. W.-K. Kong. Mace: Mass concept erasure in diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6430–6440, 2024. [27] R. Mokady, A. Hertz, K. Aberman, Y. Pritch, and D. Cohen-Or. Null-text inversion for editing real images using guided diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6038– 6047, 2023. [28] S.-M. Moosavi-Dezfooli, A. Fawzi, and P. Frossard. Deepfool: a simple and accurate method to fool deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2574–2582, 2016. [29] A. Nichol, P. Dhariwal, A. Ramesh, P. Shyam, P. Mishkin, B. McGrew, I. Sutskever, and M. Chen. Glide: Towards photorealistic image generation and editing with textguided diffusion models. arXiv preprint arXiv:2112.10741, 2021. [30] M. Pham, K. O. Marshall, N. Cohen, G. Mittal, and C. Hegde. Circumventing concept erasure methods for text-to-image generative models. arXiv preprint arXiv:2308.01508, 2023. [31] A. Ramesh, P. Dhariwal, A. Nichol, C. Chu, and M. Chen. Hierarchical textconditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 1(2):3, 2022. [32] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022. [33] O. Ronneberger, P. Fischer, and T. Brox. U-net: Convolutional networks for biomedical image segmentation. In Medical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, proceedings, part III 18, pages 234–241. Springer, 2015. [34] N. Ruiz, Y. Li, V. Jampani, Y. Pritch, M. Rubinstein, and K. Aberman. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 22500–22510, 2023. [35] C. Saharia, W. Chan, S. Saxena, L. Li, J. Whang, E. L. Denton, K. Ghasemipour, R. Gontijo Lopes, B. Karagol Ayan, T. Salimans, et al. Photorealistic text-to-image diffusion models with deep language understanding. Advancesin neural information processing systems, 35:36479–36494, 2022. [36] J. Sohl-Dickstein, E. Weiss, N. Maheswaranathan, and S. Ganguli. Deep unsupervised learning using nonequilibrium thermodynamics. In International conference on machine learning, pages 2256–2265. pmlr, 2015. [37] J. Song, C. Meng, and S. Ermon. Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502, 2020. [38] Y. Tewel, R. Gal, G. Chechik, and Y. Atzmon. Key-locked rank one editing for texttoimage personalization. In ACM SIGGRAPH 2023 conference proceedings, pages 1–11, 2023. [39] V. T. Truong, L. B. Dang, and L. B. Le. Attacks and defenses for generative diffusion models: A comprehensive survey. ACM Computing Surveys, 57(8):1–44, 2025. [40] Y.-L. Tsai, C.-Y. Hsu, C. Xie, C.-H. Lin, J.-Y. Chen, B. Li, P.-Y. Chen, C.-M. Yu, and C.-Y. Huang. Ring-a-bell! how reliable are concept removal methods for diffusion models? arXiv preprint arXiv:2310.10012, 2023. [41] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin. Attention is all you need. Advances in neural information processing systems, 30, 2017. [42] C. Xie, M. Tan, B. Gong, A. Yuille, and Q. V. Le. Smooth adversarial training. arXiv preprint arXiv:2006.14536, 2020. [43] Y. Yang, R. Gao, X. Wang, T.-Y. Ho, N. Xu, and Q. Xu. Mma-diffusion: Multimodal attack on diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7737–7746, 2024. [44] G. Zhang, K. Wang, X. Xu, Z. Wang, and H. Shi. Forget-me-not: Learning to forget in text-to-image diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1755–1764, 2024. [45] Y. Zhang, X. Chen, J. Jia, Y. Zhang, C. Fan, J. Liu, M. Hong, K. Ding, and S. Liu. Defensive unlearning with adversarial training for robust concept erasure in diffusion models. Advances in Neural Information Processing Systems, 37:36748–36776, 2024. [46] Y. Zhang, N. Huang, F. Tang, H. Huang, C. Ma, W. Dong, and C. Xu. Inversion-based style transfer with diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10146–10156, 2023. [47] Y. Zhang, J. Jia, X. Chen, A. Chen, Y. Zhang, J. Liu, K. Ding, and S. Liu. To generate or not? safety-driven unlearned diffusion models are still easy to generate unsafe images... for now. In European Conference on Computer Vision, pages 385–403. Springer, 2024. [48] Y. Zhao, T. Pang, C. Du, X. Yang, C. Li, N.-M. M. Cheung, and M. Lin. On evaluating adversarial robustness of large vision-language models. Advances in Neural Information Processing Systems, 36:54111–54138, 2023. [49] A. Zou, Z. Wang, N. Carlini, M. Nasr, J. Z. Kolter, and M. Fredrikson. Universal and transferable adversarial attacks on aligned language models. arXiv preprint arXiv:2307.15043, 2023. | - |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98480 | - |
| dc.description.abstract | 文字生圖的擴散模型因其卓越的圖像生成品質而廣受關注,但也引發諸多爭議,例如生成侵犯著作權、暴力、色情等內容。為了解決這個問題,「概念抹除」技術因應而生,旨在防止模型輸出包含特定概念的圖片。
典型的概念抹除流程,通常是使用者先提供欲移除概念的文字描述,接著調整模型權重。這類基於文字描述的方法,在面對文字輸入時能有效抑制特定概念的產生,然而當輸入模態非文字時,抹除效果很可能失靈。 本文首先設計一套多元評估架構,以全面分析現有概念抹除技術在不同輸入模態中的穩健性。再來我們進一步提出輕量級的後處理模組作為提升穩健性的策略,該模組無須重新訓練原始模型,既可補足原有方法的不足,又能保留其優勢;我們的方法也具備良好的擴展性,可應用於圖片中特定物體的移除或置換。 | zh_TW |
| dc.description.abstract | Text-to-image diffusion models have attracted attention for their exceptional image generation. However, they have also raised concerns about the creation of copyright-infringing, violent, or pornographic content. To address these issues, concept-erasure techniques have been developed to suppress undesired concepts in model outputs.
The typical workflow involves describing the target concept in text and fine-tuning the model. While these text-based methods are effective for textual inputs, they often struggle with non-textual ones. In this paper, we first design a multimodal evaluation framework to assess the robustness of existing concept-erasure techniques across different input modalities. We then propose a lightweight post-processing module that improves performance without retraining the original model, complementing existing methods while preserving their strengths. Moreover, it is highly extensible, enabling targeted removal or replacement in images. | en |
| dc.description.provenance | Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2025-08-14T16:16:45Z No. of bitstreams: 0 | en |
| dc.description.provenance | Made available in DSpace on 2025-08-14T16:16:45Z (GMT). No. of bitstreams: 0 | en |
| dc.description.tableofcontents | Verification Letter from the Oral Examination Committee i
Acknowledgements iii 摘要 v Abstract vii Contents ix List of Figures xiii List of Tables xix Chapter 1 Introduction 1 Chapter 2 Related Work 9 2.1 Concept-erasure Methods in Diffusion Models . . . . . . . . . . . . 9 2.2 Adversarial Attacks on Concept-erasure Methods . . . . . . . . . . . 11 2.3 Textual Inversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Chapter 3 Preliminaries 15 3.1 Diffusion Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.1.1 Fundamentals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.1.2 Deterministic Sampling via DDIM . . . . . . . . . . . . . . . . . . 16 3.1.3 Classifier-Free Guidance . . . . . . . . . . . . . . . . . . . . . . . 17 3.1.4 Latent Diffusion Models . . . . . . . . . . . . . . . . . . . . . . . 18 3.2 Adversarial Attack and Training . . . . . . . . . . . . . . . . . . . . 18 Chapter 4 Methodology 21 4.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . 21 4.2 Baseline Evaluation in the Text Space . . . . . . . . . . . . . . . . . 21 4.3 Evaluation Framework in the Hybrid Space . . . . . . . . . . . . . . 22 4.3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.3.2 Configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.4 Evaluation Framework in the Latent Space . . . . . . . . . . . . . . 23 4.4.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 4.4.2 DDIM Inversion and Sampling . . . . . . . . . . . . . . . . . . . . 24 4.4.3 Configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 4.5 Post-processing for Robustness Enhancement . . . . . . . . . . . . . 27 4.5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 4.5.2 Concept Localization . . . . . . . . . . . . . . . . . . . . . . . . . 27 4.5.3 Masked Latent Disruption . . . . . . . . . . . . . . . . . . . . . . . 28 Chapter 5 Experiments 31 5.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 5.1.1 Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 5.1.2 Hyperparameters . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 5.1.3 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 5.2 Evaluation Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 5.3 Baseline Evaluation Results in the Text Space . . . . . . . . . . . . . 34 5.4 Evaluation Results in the Hybrid Space . . . . . . . . . . . . . . . . 36 5.5 Evaluation Results in the Latent Space . . . . . . . . . . . . . . . . . 39 5.5.1 Quantitative Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 39 5.5.2 Qualitative Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 42 5.6 Effectiveness of the Post-processing Mechanism . . . . . . . . . . . 45 5.6.1 Quantitative Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 45 5.6.2 Qualitative Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 47 Chapter 6 Conclusion 49 References 51 Appendix A — More Experimental Results 59 A.1 Effect of Post-processing on Per-class Reproduction Rates . . . . . . 59 Appendix B — Analysis of Hyperparameters in Post-processing 63 B.1 Sensitivity of Intervention Step and Mask Threshold . . . . . . . . . 63 | - |
| dc.language.iso | en | - |
| dc.subject | 人工智慧安全 | zh_TW |
| dc.subject | 擴散模型 | zh_TW |
| dc.subject | 概念抹除 | zh_TW |
| dc.subject | 惡意攻擊 | zh_TW |
| dc.subject | Diffusion models | en |
| dc.subject | Concept-erasure | en |
| dc.subject | AI security | en |
| dc.subject | Adversarial attacks | en |
| dc.title | 擴散模型中概念抹除之多模態輸入空間表現評估與強健性提升策略 | zh_TW |
| dc.title | Multimodal Robustness Evaluation and Enhancement for Concept-erasure in Diffusion Models | en |
| dc.type | Thesis | - |
| dc.date.schoolyear | 113-2 | - |
| dc.description.degree | 碩士 | - |
| dc.contributor.oralexamcommittee | 陳駿丞;呂政修;李明穗;吳曉光 | zh_TW |
| dc.contributor.oralexamcommittee | Jun-Cheng Chen;Jenq-Shiou Leu;Ming-Sui Lee;Hsiao-Kuang Wu | en |
| dc.subject.keyword | 人工智慧安全,擴散模型,概念抹除,惡意攻擊, | zh_TW |
| dc.subject.keyword | AI security,Diffusion models,Concept-erasure,Adversarial attacks, | en |
| dc.relation.page | 64 | - |
| dc.identifier.doi | 10.6342/NTU202501587 | - |
| dc.rights.note | 未授權 | - |
| dc.date.accepted | 2025-07-31 | - |
| dc.contributor.author-college | 電機資訊學院 | - |
| dc.contributor.author-dept | 資訊工程學系 | - |
| dc.date.embargo-lift | N/A | - |
| 顯示於系所單位: | 資訊工程學系 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-113-2.pdf 未授權公開取用 | 70.65 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
