請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/85069
標題: | 使用影音一致性的自監督式學習增強臉部偽造辨識 Improved Face Forgery Detection with Self-supervised Audio-Visual Consistency-based Pretraining |
作者: | Chang-Sung Sung 宋昶松 |
指導教授: | 陳祝嵩(Chu-Song Chen) |
共同指導教授: | 陳駿丞(Jun-Cheng Chen) |
關鍵字: | 自監督式學習,影像聲音輸入,臉部偽造辨識, Self-supervised Learning,Audio-Visual,Face Forgery Detection, |
出版年 : | 2022 |
學位: | 碩士 |
摘要: | 近期因為deepfake(深度偽造)的興起及濫用,產生出一些危害到社會上的問題,例如惡意偽造影片造成他人名聲的損毀或是散播不當的假訊息。雖然目前偽造偵測方法能對於已知偽造方法達到一定的成效,並且有些研究發現透過先驗知識或是預訓練的方式也能對未見過的偽造方法達到很好的效果,但這些方法可能侷限於常見的影片壓縮,或是在進行預訓練時,資料所需要的標註成本很高昂。因此在本篇論文中,我們提出了AVM-FFD架構,一種能夠對於未知的偽造方法也能保有良好的偵測能力的模型。AVM-FFD主要著重在判別影像及聲音之間的一致性,並透過相互之間的特徵關係作為線索,判別出是否有被偽造過。架構包含兩個和空間時序相關的特徵抽取模型分別用於影像及聲音上,會預先訓練在影像聲音之間的對應任務上,因此對於影像和聲音的對應關係俱有一定的了解。接著後面加上一個時序相關的辨識模型透過前面提取的特徵進行辨識,為了不過擬合在某些特定的偽造瑕疵上,我們會鎖住特徵抽取模型的參數,只訓練最後的辨識模型在偽造資料上。最後我們透過實驗情境,測試在未見過的偽造種類以及未見過的資料集上,證實了我們的方法確實有效,並能達到良好的偵測效果。 Recently, due to the growth and abuse of deepfake , there have been some problems that threaten the society, such as maliciously made fake videos causing harm to others' reputations or spreading false information. Though the recent Forgery Detection methods can achieve some reasonable results for seen forgeries, and even with some prior knowledge or pretraining, they can reach a certain level of accuracy on unseen forgeries, but these methods may be restricted by audio or video compression or annotations required for pretraining. In this thesis, we propose AVM-FFD as a framework of detecting forgeries that maintains good detection capabilities for unseen forgery methods. The main objective of AVM-FFD is to determine whether there has been a forgery by judging the consistency between sound and face, and using the relationship between the features to evaluate if the forgery has occurred. First, two spatio-temporal feature extraction networks are pretrained to perform an AVM task, in order to build up a rich representation about the relationship between audio and visual information. A temporal classifier network is used to determine whether or not the video has been manipulated using the representations extracted from the feature extraction networks. In order to prevent fitting of manipulation-specific artifacts, we will freeze the feature extraction networks and only train the final classifier network on forged data. Experiments on unseen forgery categories and unseen datasets show that our approach is indeed effective and achieves state-of-the-art performance. |
URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/85069 |
DOI: | 10.6342/NTU202202353 |
全文授權: | 同意授權(限校園內公開) |
電子全文公開日期: | 2022-08-30 |
顯示於系所單位: | 資料科學學位學程 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
U0001-1208202218040400.pdf 授權僅限NTU校內IP使用(校園外請利用VPN校外連線服務) | 1.69 MB | Adobe PDF | 檢視/開啟 |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。