請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/94349| 標題: | 多層次內容感知影音時序深偽偵測 Multi-Level Content-Aware Audio-Visual Temporal Deepfake Localization (MC-TDL) |
| 作者: | 鄭世朋 Shih-Peng Cheng |
| 指導教授: | 張智星 Jyh-Shing Jang |
| 關鍵字: | 影音時序偽造偵測,深度偽造檢測,邊界偵測,注意力機制,深度學習, Audio-Visual temporal deepfake localization,Deepfake detection,boundary detection,Attention,Deep learning, |
| 出版年 : | 2024 |
| 學位: | 碩士 |
| 摘要: | 對於音視頻時序深度偽造偵測任務,以往的方法未能取得令人滿意的結果。特別是對於新發布的內容導向的部分深度偽造數據集:AV-Deepfake1M,所提出的方法需要同時利用並整合影像和聲音的資訊,並準確定位整個視頻的只佔一小部分的深度偽造片段。在這項工作中,我們研究了哪些元件對解決這一任務是有效的。通過從AV-Deepfake1M中抽取子集以便在資源有限的情況下進行測試,我們對損失函數、邊界檢測模組和多模態融合方法進行了研究。提出了一個不需要預訓練模型且優於當前最先進的影音時序偽造偵測方法的架構。 For the audio-visual temporal deepfake localization task, previous methods have not been capable of yielding satisfactory results, especially for the newly released content-driven partial deepfake dataset: AV-Deepfake1M. The proposed methods are required to simultaneously utilize and integrate information from both video and audio, accurately localizing deepfake segments that constitute only a minor proportion of the entire video. In this work, we investigate which components are effective for addressing this task. By testing on a subset sampled from AV-Deepfake1M. We conduct various studies on the loss function, boundary detection module, and cross-modality fusion methods. Without the need for pre-trained encoders, we propose an architecture that outperforms the current state-of-the-art multi-modal localization methods. |
| URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/94349 |
| DOI: | 10.6342/NTU202402060 |
| 全文授權: | 同意授權(限校園內公開) |
| 顯示於系所單位: | 資訊網路與多媒體研究所 |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-112-2.pdf 授權僅限NTU校內IP使用(校園外請利用VPN校外連線服務) | 9.07 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
