多模態大型語言模型之可解釋深度偽造檢測基準

黃康洋; Kang-Yang Huang

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98519

標題:	多模態大型語言模型之可解釋深度偽造檢測基準 MMIDBench: Multimodal Interpretable Deepfake Detection Benchmark for Multimodal Large Language Models
作者:	黃康洋 Kang-Yang Huang
指導教授:	鄭文皇 Wen-Huang Cheng
關鍵字:	多模態大型語言模型,可解釋性,深度偽造檢測,深度偽造生成,影像編輯,語音合成,擴散模型,生成式人工智慧, Multimodal Large Language Models,Interpretability,Deepfake Detection,Deepfake Generation,Image Editing,Voice Synthesis,Diffusion Models,Generative Artificial Intelligence,
出版年 :	2025
學位:	碩士
摘要:	生成式人工智慧透過運用多樣化的輸入條件，徹底革新了多媒體內容的創作方式。然而，隨著這些模型日益進步，檢測由人工智慧生成的內容，特別是深度偽造（DeepFake），變得愈發困難。對深偽技術日益增加的關注，使得檢測方法，尤其是多模態大型語言模型在辨識深偽內容方面的效能，成為研究重點。多模態大型語言模型不僅能透過提供決策解釋來提升深偽檢測的透明度，區分真實與合成內容的過程同時也是對其感知與推理能力的嚴格考驗。為了應對這些挑戰，我們提出了 MMIDBench，一個精心設計、全面評估多模態大型語言模型能力的多模態基準。MMIDBench 涵蓋多種最先進的深偽生成模型，橫跨影像、影片與音訊，包含6種不同的深偽任務。該基準包含10k道題目，涵蓋二元選擇、多選題及開放式問答等多種題型，能夠對多模態大型語言模型進行深入評估。我們利用 MMIDBench 評測了5款閉源多模態大型語言模型，揭示了它們在深偽檢測上的優勢與現階段的侷限。 Generative artificial intelligence has revolutionized how multimedia content is created by utilizing diverse input conditions. However, as these models become more advanced, detecting AI-generated content, particularly DeepFakes, has grown increasingly challenging. Rising concerns over DeepFakes have heightened interest in detection methods, specifically the effectiveness of multimodal large language models (MLLMs) in identifying them. MLLMs not only improve the transparency of DeepFake detection by providing explanations for their decisions, but the process of distinguishing authentic from synthetic content also serves as a robust test of their perceptual and reasoning skills. To address these challenges, we introduce MMIDBench, a comprehensive multimodal benchmark meticulously crafted to assess the capabilities of MLLMs. MMIDBench features a variety of state-of-the-art DeepFake generative models spanning images, videos, and audio, encompassing 6 distinct DeepFake tasks. The benchmark comprises 10k questions, including binary, multiple-choice, and open-ended formats, enabling an in-depth assessment of MLLMs. We evaluated 5 proprietary MLLMs with MMIDBench, revealing both their strengths and current limitations in DeepFake detection.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98519
DOI:	10.6342/NTU202503386
全文授權:	同意授權(全球公開)
電子全文公開日期:	2025-08-15
顯示於系所單位：	資訊網路與多媒體研究所

文件中的檔案：

檔案	大小	格式
ntu-113-2.pdf	12.54 MB	Adobe PDF	檢視/開啟

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。