多層次內容感知影音時序深偽偵測

鄭世朋; Shih-Peng Cheng

Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/94349

Title:	多層次內容感知影音時序深偽偵測 Multi-Level Content-Aware Audio-Visual Temporal Deepfake Localization (MC-TDL)
Authors:	鄭世朋 Shih-Peng Cheng
Advisor:	張智星 Jyh-Shing Jang
Keyword:	影音時序偽造偵測,深度偽造檢測,邊界偵測,注意力機制,深度學習, Audio-Visual temporal deepfake localization,Deepfake detection,boundary detection,Attention,Deep learning,
Publication Year :	2024
Degree:	碩士
Abstract:	對於音視頻時序深度偽造偵測任務，以往的方法未能取得令人滿意的結果。特別是對於新發布的內容導向的部分深度偽造數據集：AV-Deepfake1M，所提出的方法需要同時利用並整合影像和聲音的資訊，並準確定位整個視頻的只佔一小部分的深度偽造片段。在這項工作中，我們研究了哪些元件對解決這一任務是有效的。通過從AV-Deepfake1M中抽取子集以便在資源有限的情況下進行測試，我們對損失函數、邊界檢測模組和多模態融合方法進行了研究。提出了一個不需要預訓練模型且優於當前最先進的影音時序偽造偵測方法的架構。 For the audio-visual temporal deepfake localization task, previous methods have not been capable of yielding satisfactory results, especially for the newly released content-driven partial deepfake dataset: AV-Deepfake1M. The proposed methods are required to simultaneously utilize and integrate information from both video and audio, accurately localizing deepfake segments that constitute only a minor proportion of the entire video. In this work, we investigate which components are effective for addressing this task. By testing on a subset sampled from AV-Deepfake1M. We conduct various studies on the loss function, boundary detection module, and cross-modality fusion methods. Without the need for pre-trained encoders, we propose an architecture that outperforms the current state-of-the-art multi-modal localization methods.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/94349
DOI:	10.6342/NTU202402060
Fulltext Rights:	同意授權(限校園內公開)
Appears in Collections:	資訊網路與多媒體研究所

Files in This Item:

File	Size	Format
ntu-112-2.pdf Access limited in NTU ip range	9.07 MB	Adobe PDF

Show full item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets