擴散模型中概念抹除之多模態輸入空間表現評估與強健性提升策略

翁如萱; Ju-Hsuan Weng

Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98480

Title:	擴散模型中概念抹除之多模態輸入空間表現評估與強健性提升策略 Multimodal Robustness Evaluation and Enhancement for Concept-erasure in Diffusion Models
Authors:	翁如萱 Ju-Hsuan Weng
Advisor:	周承復 Cheng-Fu Chou
Keyword:	人工智慧安全,擴散模型,概念抹除,惡意攻擊, AI security,Diffusion models,Concept-erasure,Adversarial attacks,
Publication Year :	2025
Degree:	碩士
Abstract:	文字生圖的擴散模型因其卓越的圖像生成品質而廣受關注，但也引發諸多爭議，例如生成侵犯著作權、暴力、色情等內容。為了解決這個問題，「概念抹除」技術因應而生，旨在防止模型輸出包含特定概念的圖片。典型的概念抹除流程，通常是使用者先提供欲移除概念的文字描述，接著調整模型權重。這類基於文字描述的方法，在面對文字輸入時能有效抑制特定概念的產生，然而當輸入模態非文字時，抹除效果很可能失靈。本文首先設計一套多元評估架構，以全面分析現有概念抹除技術在不同輸入模態中的穩健性。再來我們進一步提出輕量級的後處理模組作為提升穩健性的策略，該模組無須重新訓練原始模型，既可補足原有方法的不足，又能保留其優勢；我們的方法也具備良好的擴展性，可應用於圖片中特定物體的移除或置換。 Text-to-image diffusion models have attracted attention for their exceptional image generation. However, they have also raised concerns about the creation of copyright-infringing, violent, or pornographic content. To address these issues, concept-erasure techniques have been developed to suppress undesired concepts in model outputs. The typical workflow involves describing the target concept in text and fine-tuning the model. While these text-based methods are effective for textual inputs, they often struggle with non-textual ones. In this paper, we first design a multimodal evaluation framework to assess the robustness of existing concept-erasure techniques across different input modalities. We then propose a lightweight post-processing module that improves performance without retraining the original model, complementing existing methods while preserving their strengths. Moreover, it is highly extensible, enabling targeted removal or replacement in images.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98480
DOI:	10.6342/NTU202501587
Fulltext Rights:	未授權
metadata.dc.date.embargo-lift:	N/A
Appears in Collections:	資訊工程學系

Files in This Item:

File	Size	Format
ntu-113-2.pdf Restricted Access	70.65 MB	Adobe PDF

Show full item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets