重建至生成偏移之編碼式偽造語音偵測跨域泛化

吳雲行; Yun-Shing Wu

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/101806

標題:	重建至生成偏移之編碼式偽造語音偵測跨域泛化 Domain Generalization for Codec-based Deepfake Detection under Resynthesis-to-Generation Shift
作者:	吳雲行 Yun-Shing Wu
指導教授:	陳尚澤 Shang-Tse Chen
共同指導教授:	張智星 Jyh-Shing Jang
關鍵字:	音訊防偽,神經音訊編解碼器領域泛化編碼式深偽偵測後訓練模型特徵空間擾動 audio anti-spoofing,neural audio codecdomain generalizationcodec-based deepfake detectionpost-trained modelfeature space perturbation
出版年 :	2026
學位:	碩士
摘要:	近年來，以神經音訊編解碼器為基礎的合成語音技術快速發展，使得合成語音真實度大幅提升，也對深偽語音偵測帶來更艱難挑戰。為了提升模型在未知編碼器生成資料上的泛化能力，本研究以 CodecFake+ 資料集中包含大量編碼重建語音的 CoRS 子資料集作為代理訓練資料，並採用相較於傳統自監督模型更適合本任務的後訓練模型作為初始模型進行訓練。接著，本研究系統性分析三種緩解跨資料集所導致過擬合的方法，包括透過對比式學習強化特徵判別性、透過參數高效化微調（PEFT）控制模型適應能力，以及提出一種利用特徵分佈不穩定性之 domain-shift-aware fine-tuning（DSFT）模組，以模擬潛在的未知領域擾動。實驗結果顯示，所提出的方法能有效提升模型在 CoSG 未知編碼條件下的偵測效能，並建立一個在 CodecFake 基準上顯著優於既有方法的深偽語音偵測系統。 In recent years, codec-based speech generation (CoSG) systems based on neural audio codecs have developed rapidly and make deepfake speech detection more challenging. To improve generalization to unseen codec-generated data, we use the codec-resynthesized speech (CoRS) subset of the CodecFake+ dataset as a proxy training set and adopt a post-trained model as initialization, which is more suitable for this task than conventional self-supervised models. We then study three methods to mitigate overfitting caused by domain shift. These include using contrastive learning to improve feature discrimination, using parameter-efficient fine-tuning (PEFT) to control model capacity, and proposing a domain-shift-aware fine-tuning (DSFT) module that uses feature distribution uncertainty to simulate unseen domain perturbations. Our experiment results show that the proposed methods improve performance under unseen codec conditions on CoSG and significantly outperforms previous methods on the CodecFake benchmark.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/101806
DOI:	10.6342/NTU202600598
全文授權:	同意授權(全球公開)
電子全文公開日期:	2027-02-25
顯示於系所單位：	資訊網路與多媒體研究所

文件中的檔案：

檔案	大小	格式
ntu-114-1.pdf 此日期後於網路公開 2027-02-25	3.5 MB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。