Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 電機工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/92854
標題: 朝向未來音頻深偽檢測:重新評估與基於韻律的檢測方法
Toward Future Audio Deepfake Detection: A Reevaluation and Novel Prosody-Based Approach
作者: 石子仙
Tsu-Hsien Shih
指導教授: 陳銘憲
Ming-Syan Chen
關鍵字: 深偽檢測,防欺騙檢測,音頻深偽檢測,
deepfake detection,anti-spoofing detection,audio deepfake detection,
出版年 : 2024
學位: 碩士
摘要: 語音轉換(VC,也就是音訊深偽)的興起對社會帶來了嚴重的風險。雖然已經開發出許多音訊深偽檢測方法,但現有的方法主要集中在識別深偽樣本中的人工痕跡。隨著深偽技術的進步,人們開始質疑:這些方法是否能夠檢測出未來可能含有較少人工痕跡的深偽?此外,模型是否能學習到與深偽缺陷無關的特徵?
為了解決這些問題,我們引入了平衡環境音訊深偽再評估(Balanced Environment Audio-Deepfake Reevaluation,BEAR)協議,創建了一個在真實樣本和深偽樣本中都有類似人工痕跡或噪音的平衡環境。我們觀察到所有檢測器的性能都有顯著下降,這表明當前的檢測模型嚴重依賴人工痕跡,並且在「平衡」環境中難以識別深偽。
為了應對 BEAR 協議所帶來的挑戰,我們提出了一種新的方法,即基於韻律而非人工痕跡的檢測(Prosody-based Artifact-Independent Detection ,ProsoAI)。這種方法使模型能夠更專注於語者的韻律特徵,減少對人工痕跡的依賴。通過引入適當的損失函數,我們的方法在 white-BEAR 場景中展現出有希望的性能,並在 gray-BEAR 場景中表現出強大的轉移能力。作為從偵測人工痕跡到保護韻律的創新轉變,我們的方法在音訊深偽檢測領域中標誌著一個開創性的步驟。
此外,我們直接將 BEAR 作為訓練環境。我們觀察到,儘管現有的檢測方法在面對不同噪音水平時難以推廣,但 ProsoAI 展現出了令人印象深刻的推廣能力。這突顯了現有模型的局限性,特別是它們對噪音的敏感性以及學習更強健特徵的無能。隨著深偽技術的不斷進化,這些發現強調了需要更靈活和強健的檢測方法的必要性。
儘管我們當前的數據集存在限制,並且我們的檢測方法還有進一步改進的可能性,但我們相信我們的研究為開發更強健的檢測方法提供了寶貴的見解。我們的工作旨在提高音訊深偽檢測方法的強健性和適應性,使其能夠有效地應對不斷進化的深偽技術帶來的挑戰。
The rise of voice conversion (VC), i.e., audio deepfakes, poses serious societal risks. While many audio deepfake detection methods have been developed, current methods focus on identifying artifacts in deepfake samples. As deepfake technology advances, the question arises: can these methods detect future deepfakes that may contain less artifacts? Furthermore, can the models learn features not tied to deepfake imperfections?

To address these concerns, we introduce the Balanced Environment Audio-Deepfake Reevaluation (BEAR) protocol, creating a balanced setting with similar artifacts or noise in both genuine and deepfake samples. Utilizing BEAR as the evaluation setting, we observe a significant performance drop for all experimented detectors, indicating that current detection models heavily rely on artifacts and struggle to identify deepfakes in the "balanced" environment.

To address the challenges presented by the BEAR protocol, we propose a novel method, Prosody-based Artifact-Independent Detection (ProsoAI). This approach enables models to concentrate more on a speaker's prosody characteristics, reducing reliance on artifacts. By incorporating an appropriate loss function, our method demonstrates promising performance in the white-BEAR scenario and shows robust transferability in the gray-BEAR scenario. Representing an innovative shift from artifact detection to prosody preservation, our method marks a pioneering step in the field of audio deepfake detection.

Additionally, we directly incorporate BEAR as the training environment. We observe that while existing detection methods struggle to generalize across varying noise levels, ProsoAI exhibits impressive generalizability. This highlights the limitations of existing models, particularly their sensitivity to noise and their inability to learn more robust features. As deepfake technology continues to evolve, these findings emphasize the need for more adaptable and robust detection methods.

Despite the limitations of our current dataset and the potential for further improvement in our detection method, we believe our study provides valuable insights for the development of more robust detection methods. Our work aims to bolster the robustness and adaptability of audio deepfake detection methods, equipping them to effectively combat the challenges posed by evolving deepfake technologies.
URI: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/92854
DOI: 10.6342/NTU202401106
全文授權: 同意授權(限校園內公開)
顯示於系所單位:電機工程學系

文件中的檔案:
檔案 大小格式 
ntu-112-2.pdf
授權僅限NTU校內IP使用(校園外請利用VPN校外連線服務)
1.41 MBAdobe PDF
顯示文件完整紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved