多模態大型語言模型之可解釋深度偽造檢測基準

黃康洋; Kang-Yang Huang

Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98519

Title:	多模態大型語言模型之可解釋深度偽造檢測基準 MMIDBench: Multimodal Interpretable Deepfake Detection Benchmark for Multimodal Large Language Models
Authors:	黃康洋 Kang-Yang Huang
Advisor:	鄭文皇 Wen-Huang Cheng
Keyword:	多模態大型語言模型,可解釋性,深度偽造檢測,深度偽造生成,影像編輯,語音合成,擴散模型,生成式人工智慧, Multimodal Large Language Models,Interpretability,Deepfake Detection,Deepfake Generation,Image Editing,Voice Synthesis,Diffusion Models,Generative Artificial Intelligence,
Publication Year :	2025
Degree:	碩士
Abstract:	生成式人工智慧透過運用多樣化的輸入條件，徹底革新了多媒體內容的創作方式。然而，隨著這些模型日益進步，檢測由人工智慧生成的內容，特別是深度偽造（DeepFake），變得愈發困難。對深偽技術日益增加的關注，使得檢測方法，尤其是多模態大型語言模型在辨識深偽內容方面的效能，成為研究重點。多模態大型語言模型不僅能透過提供決策解釋來提升深偽檢測的透明度，區分真實與合成內容的過程同時也是對其感知與推理能力的嚴格考驗。為了應對這些挑戰，我們提出了 MMIDBench，一個精心設計、全面評估多模態大型語言模型能力的多模態基準。MMIDBench 涵蓋多種最先進的深偽生成模型，橫跨影像、影片與音訊，包含6種不同的深偽任務。該基準包含10k道題目，涵蓋二元選擇、多選題及開放式問答等多種題型，能夠對多模態大型語言模型進行深入評估。我們利用 MMIDBench 評測了5款閉源多模態大型語言模型，揭示了它們在深偽檢測上的優勢與現階段的侷限。 Generative artificial intelligence has revolutionized how multimedia content is created by utilizing diverse input conditions. However, as these models become more advanced, detecting AI-generated content, particularly DeepFakes, has grown increasingly challenging. Rising concerns over DeepFakes have heightened interest in detection methods, specifically the effectiveness of multimodal large language models (MLLMs) in identifying them. MLLMs not only improve the transparency of DeepFake detection by providing explanations for their decisions, but the process of distinguishing authentic from synthetic content also serves as a robust test of their perceptual and reasoning skills. To address these challenges, we introduce MMIDBench, a comprehensive multimodal benchmark meticulously crafted to assess the capabilities of MLLMs. MMIDBench features a variety of state-of-the-art DeepFake generative models spanning images, videos, and audio, encompassing 6 distinct DeepFake tasks. The benchmark comprises 10k questions, including binary, multiple-choice, and open-ended formats, enabling an in-depth assessment of MLLMs. We evaluated 5 proprietary MLLMs with MMIDBench, revealing both their strengths and current limitations in DeepFake detection.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98519
DOI:	10.6342/NTU202503386
Fulltext Rights:	同意授權(全球公開)
metadata.dc.date.embargo-lift:	2025-08-15
Appears in Collections:	資訊網路與多媒體研究所

Files in This Item:

File	Size	Format
ntu-113-2.pdf	12.54 MB	Adobe PDF	View/Open

Show full item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets