基於循環一致邊界平衡生成式對抗網路之歌唱風格轉換

Cheng-Wei Wu; 巫承威

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/77780

標題:	基於循環一致邊界平衡生成式對抗網路之歌唱風格轉換 Singing Style Transfer Using Cycle-Consistent Boundary Equilibrium Generative Adversarial Networks
作者:	Cheng-Wei Wu 巫承威
指導教授:	張智星(Jyh-Shing Roger Jang)
共同指導教授:	楊奕軒(Yi-Hsuan Yang)
關鍵字:	生成式對抗網路,歌唱人聲轉換, Generative Adversarial Networks,Singing Style Transfer,
出版年 :	2018
學位:	碩士
摘要:	本篇論文聚焦在歌唱人聲轉換，並借鑑以產生高解析度輸出著稱的生成式對抗網路來作為模型基底架構，期望能在沒有成對資料的狀況下將不同歌手的歌唱風格進行轉換，並同時擁有高解析度與逼真的人聲。在本篇論文中，我們透過將邊界平衡生成式對抗網路的訓練方法引入循環一致生成式對抗網路中來穩定訓練過程。而在模型架構部分，我們加入了對稱式跳躍連接 (Symmetric Skip-Connection)，讓轉換後的人聲接近自然人聲並同時擁有高解析度。對於時間資訊，我們則在神經網路輸出層前加入閘門遞迴單元 (Gated Recurrent Unit, GRU)，不僅進一步提升相對音高的準確度，更增加了整體歌唱人聲輸出的品質。為了驗證本論文所提出的訓練方法以及模型架構，我們將訓練方法與模型架構分別拆解，並進行以內部測試 (Inside Test) 為基準的平均意見分數 (MOS, Mean Opinion Score) 之問卷調查。根據實驗結果顯示，本論文提出的訓練方法以及模型架構不僅能顯著地轉換不同歌手間的歌唱風格，亦能產生具有高解析度且真實的歌唱人聲。 This thesis focuses on the singing style transfer and attempts to generate natural vocal sound with high-resolution while transferring the singing style of a given singer to the target singer, using the generative adversarial networks which are capable of synthesizing high-resolution output. In this work, we integrate the Boundary Equilibrium Generative Adversarial Networks with Cycle-Consistent Generative Adversarial Networks to stabilize the training procedure. For the model architecture, we add the symmetric skip-connection to make the transferred vocal more natural. To account for temporal information, we add GRU units before the output layer of the network which not only improves the accuracy of the relative pitch but also enhances the overall quality of the outputs of the singing voice. To validate our proposed training strategy and model architecture, we disentangle both the training strategy and model architecture to conduct the inside test based subjective evaluation via MOS (Mean Opinion Score). According to the results of the subjective evaluation, our proposed training strategy and model architecture can significantly transfer the singing style between different singers and generate natural singing voice with high-resolution.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/77780
DOI:	10.6342/NTU201801003
全文授權:	有償授權
電子全文公開日期:	2023-07-23
顯示於系所單位：	資訊網路與多媒體研究所

文件中的檔案：

檔案	大小	格式
ntu-107-R05944004-1.pdf 未授權公開取用	14.33 MB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。