使用向量量化技術來達成語音轉換

Da-Yi Wu; 吳達懿

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/52009

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	呂育道(Yuh-Dauh Lyuu)
dc.contributor.author	Da-Yi Wu	en
dc.contributor.author	吳達懿	zh_TW
dc.date.accessioned	2021-06-15T14:02:48Z	-
dc.date.available	2020-08-21
dc.date.copyright	2020-08-21
dc.date.issued	2020
dc.date.submitted	2020-08-17
dc.identifier.citation	[1] J. Song, P. Kalluri, A. Grover, S. Zhao, and S. Ermon, “Learning controllable fair representations,” in Proceedings of 22nd International Conference on Artificial Intelligence and Statistics, April 16–18, 2019, Naha, Japan, pp. 2164–2173. [2] F. Villavicencio and J. Bonada, “Applying voice conversion to concatenative singing-voice synthesis,” in 11th Annual Conference of the International Speech Communication Association, September 26–30, 2010, Makuhari, Japan, pp. 2162– 2165. [3] E. Nachmani and L. Wolf, “Unsupervised singing voice conversion,” in 20th Annual Conference of the International Speech Communication Association, September 15– 19, 2019, Graz, Austria, pp. 2583–2587. [4] S. H. Mohammadi and A. Kain, “Voice conversion using deep neural networks with speaker-independent pre-training,” in Spoken Language Technology Workshop, December 7–10, 2014, Lake Tahoe, pp. 19–23. [5] M. Sahidullah and G. Saha, “Design, analysis and experimental evaluation of block based transformation in MFCC computation for speaker recognition,” Speech Commun., vol. 54, no. 4, pp. 543–565, 2012. [6] K. Saino, H. Zen, Y. Nankaku, A. Lee, and K. Tokuda, “An HMM-based singing voice synthesis system,” in Ninth International Conference on Spoken Language Processing, September 17–21, 2006, Pittsburgh [7] E. Helander, T. Virtanen, J. Nurminen, and M. Gabbouj, “Voice conversion using partial least squares regression,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 18, no. 5, pp. 912–921, 2010. [8] Q. Ma, J. T. Wang, D. E. Shasha, and C. H. Wu, “DNA sequence classification via an expectation maximization algorithm and neural networks: a case study,” IEEE Trans. Syst. Man Cybern. Part C, vol. 31, no. 4, pp. 468–475, 2001. [9] K. Qian, Y. Zhang, S. Chang, X. Yang, and M. Hasegawa-Johnson, “Autovc: Zeroshot voice style transfer with only autoencoder loss,” in Proceedings of the 36th International Conference on Machine Learning, June 9–15, 2019, Long Beach, CA, pp. 5210–5219. [10] J. Chou and H. Lee, “One-shot voice conversion by separating speaker and content representations with instance normalization,” in 20th Annual Conference of the International Speech Communication Association, September 15–19, 2019, Graz, Austria, pp. 664–668. [11] X. Huang and S. J. Belongie, “Arbitrary style transfer in real-time with adaptive instance normalization,” in International Conference on Computer Vision, October 22–29, 2017, Venice, pp. 1510–1519. [12] J. Chorowski, R. J. Weiss, S. Bengio, and A. van den Oord, “Unsupervised speech representation learning using wavenet autoencoders,” IEEE/ACM Trans. Audio Speech Lang. Process., vol. 27, no. 12, pp. 2041–2053, 2019. [13] D. Wu and H. Lee, “One-shot voice conversion by vector quantization,” in International Conference on Acoustics, Speech and Signal Processing, May 4–8, 2020, Barcelona, pp. 7734–7738. [14] A. van den Oord, O. Vinyals, and K. Kavukcuoglu, “Neural discrete representation learning,” in Annual Conference on Neural Information Processing Systems, December 4–9, 2017, Long Beach, CA, pp. 6306–6315. [15] K. Kumar, R. Kumar, T. de Boissiere, L. Gestin, W. Z. Teoh, J. Sotelo, A. de Brebisson, Y. Bengio, and A. C. Courville, “Melgan: Generative adversarial networks for conditional waveform synthesis,” in Annual Conference on Neural Information Processing Systems, December 8–14, 2019, Vancouver, pp. 14881–14892. [16] J. Yamagishi, C. Veaux, and K. MacDonald, “CSTR VCTK corpus: English multi-speaker corpus for CSTR voice cloning toolkit”, 2019. Available: https://datashare.is.ed.ac.uk/handle/10283/2651. [17] A. Kolesnikov, X. Zhai, and L. Beyer, “Revisiting self-supervised visual representation learning,” in Conference on Computer Vision and Pattern Recognition, June 16–20, 2019, Long Beach, CA, pp. 1920–1929
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/52009	-
dc.description.abstract	本論文提出了一套基於向量量化的深度語者轉換模型。與此同時，本文對此方法與其他現存的方法做了多種客觀、主觀的評估。結果顯示，本文所提出的方法在流利度以及語者相似度上都比現存的方法優秀。	zh_TW
dc.description.abstract	This thesis proposes a vector quantization-based voice conversion approach. The objective and the subjective evaluations show that the proposed method performs better than other existing approaches in both audio naturalness and speaker similarity	en
dc.description.provenance	Made available in DSpace on 2021-06-15T14:02:48Z (GMT). No. of bitstreams: 1 U0001-0708202016441000.pdf: 2284138 bytes, checksum: 2f4e7670112161763516e0a00558ed2d (MD5) Previous issue date: 2020	en
dc.description.tableofcontents	誌謝 i 中文摘要 ii 英文摘要 iii 一、導論 1 1.1 研究動機 1 1.2 語音轉換 1 1.3 研究方向 3 二、背景知識 4 2.1 波形 4 2.2 頻譜 4 2.3 梅式頻譜 4 三、文獻回顧 6 3.1 監督式學習 6 3.1.1 高斯混和模型 6 3.1.2 回歸模型 7 3.2 非監督式學習 8 3.2.1 維度瓶頸 9 3.2.2 實列規範化 9 四、使用向量量化的語音轉換 10 4.1 向量量化 10 4.2 向量量化語音轉換模型 10 4.2.1 編碼 10 4.2.2 解構 11 4.2.3 訓練 11 4.3 實作 12 4.4 訓練資料集 13 4.5 客觀評估 13 4.5.1 內容特徵 13 4.5.2 語者特徵 14 4.6 主觀評估 15 五、結論與未來展望 18 參考目錄 19
dc.language.iso	zh-TW
dc.subject	深度學習	zh_TW
dc.subject	語音轉換	zh_TW
dc.subject	向量量化	zh_TW
dc.subject	voice conversion	en
dc.subject	deep learning	en
dc.subject	vector quantization	en
dc.title	使用向量量化技術來達成語音轉換	zh_TW
dc.title	Using Vector Quantization To Achieve Voice Conversion	en
dc.type	Thesis
dc.date.schoolyear	108-2
dc.description.degree	碩士
dc.contributor.coadvisor	李宏毅(Hung-yi Lee)
dc.contributor.oralexamcommittee	楊奕軒(Yi-Hsuan Yang),曹昱(Yu Tsao)
dc.subject.keyword	語音轉換,向量量化,深度學習,	zh_TW
dc.subject.keyword	voice conversion,vector quantization,deep learning,	en
dc.relation.page	21
dc.identifier.doi	10.6342/NTU202002653
dc.rights.note	有償授權
dc.date.accepted	2020-08-18
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	資訊工程學研究所	zh_TW
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
U0001-0708202016441000.pdf 未授權公開取用	2.23 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。