請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98480| 標題: | 擴散模型中概念抹除之多模態輸入空間表現評估與強健性提升策略 Multimodal Robustness Evaluation and Enhancement for Concept-erasure in Diffusion Models |
| 作者: | 翁如萱 Ju-Hsuan Weng |
| 指導教授: | 周承復 Cheng-Fu Chou |
| 關鍵字: | 人工智慧安全,擴散模型,概念抹除,惡意攻擊, AI security,Diffusion models,Concept-erasure,Adversarial attacks, |
| 出版年 : | 2025 |
| 學位: | 碩士 |
| 摘要: | 文字生圖的擴散模型因其卓越的圖像生成品質而廣受關注,但也引發諸多爭議,例如生成侵犯著作權、暴力、色情等內容。為了解決這個問題,「概念抹除」技術因應而生,旨在防止模型輸出包含特定概念的圖片。
典型的概念抹除流程,通常是使用者先提供欲移除概念的文字描述,接著調整模型權重。這類基於文字描述的方法,在面對文字輸入時能有效抑制特定概念的產生,然而當輸入模態非文字時,抹除效果很可能失靈。 本文首先設計一套多元評估架構,以全面分析現有概念抹除技術在不同輸入模態中的穩健性。再來我們進一步提出輕量級的後處理模組作為提升穩健性的策略,該模組無須重新訓練原始模型,既可補足原有方法的不足,又能保留其優勢;我們的方法也具備良好的擴展性,可應用於圖片中特定物體的移除或置換。 Text-to-image diffusion models have attracted attention for their exceptional image generation. However, they have also raised concerns about the creation of copyright-infringing, violent, or pornographic content. To address these issues, concept-erasure techniques have been developed to suppress undesired concepts in model outputs. The typical workflow involves describing the target concept in text and fine-tuning the model. While these text-based methods are effective for textual inputs, they often struggle with non-textual ones. In this paper, we first design a multimodal evaluation framework to assess the robustness of existing concept-erasure techniques across different input modalities. We then propose a lightweight post-processing module that improves performance without retraining the original model, complementing existing methods while preserving their strengths. Moreover, it is highly extensible, enabling targeted removal or replacement in images. |
| URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98480 |
| DOI: | 10.6342/NTU202501587 |
| 全文授權: | 未授權 |
| 電子全文公開日期: | N/A |
| 顯示於系所單位: | 資訊工程學系 |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-113-2.pdf 未授權公開取用 | 70.65 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
