擴散模型中概念抹除之多模態輸入空間表現評估與強健性提升策略

翁如萱; Ju-Hsuan Weng

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98480

標題:	擴散模型中概念抹除之多模態輸入空間表現評估與強健性提升策略 Multimodal Robustness Evaluation and Enhancement for Concept-erasure in Diffusion Models
作者:	翁如萱 Ju-Hsuan Weng
指導教授:	周承復 Cheng-Fu Chou
關鍵字:	人工智慧安全,擴散模型,概念抹除,惡意攻擊, AI security,Diffusion models,Concept-erasure,Adversarial attacks,
出版年 :	2025
學位:	碩士
摘要:	文字生圖的擴散模型因其卓越的圖像生成品質而廣受關注，但也引發諸多爭議，例如生成侵犯著作權、暴力、色情等內容。為了解決這個問題，「概念抹除」技術因應而生，旨在防止模型輸出包含特定概念的圖片。典型的概念抹除流程，通常是使用者先提供欲移除概念的文字描述，接著調整模型權重。這類基於文字描述的方法，在面對文字輸入時能有效抑制特定概念的產生，然而當輸入模態非文字時，抹除效果很可能失靈。本文首先設計一套多元評估架構，以全面分析現有概念抹除技術在不同輸入模態中的穩健性。再來我們進一步提出輕量級的後處理模組作為提升穩健性的策略，該模組無須重新訓練原始模型，既可補足原有方法的不足，又能保留其優勢；我們的方法也具備良好的擴展性，可應用於圖片中特定物體的移除或置換。 Text-to-image diffusion models have attracted attention for their exceptional image generation. However, they have also raised concerns about the creation of copyright-infringing, violent, or pornographic content. To address these issues, concept-erasure techniques have been developed to suppress undesired concepts in model outputs. The typical workflow involves describing the target concept in text and fine-tuning the model. While these text-based methods are effective for textual inputs, they often struggle with non-textual ones. In this paper, we first design a multimodal evaluation framework to assess the robustness of existing concept-erasure techniques across different input modalities. We then propose a lightweight post-processing module that improves performance without retraining the original model, complementing existing methods while preserving their strengths. Moreover, it is highly extensible, enabling targeted removal or replacement in images.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98480
DOI:	10.6342/NTU202501587
全文授權:	未授權
電子全文公開日期:	N/A
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-113-2.pdf 未授權公開取用	70.65 MB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。