請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/52009
完整後設資料紀錄
DC 欄位 | 值 | 語言 |
---|---|---|
dc.contributor.advisor | 呂育道(Yuh-Dauh Lyuu) | |
dc.contributor.author | Da-Yi Wu | en |
dc.contributor.author | 吳達懿 | zh_TW |
dc.date.accessioned | 2021-06-15T14:02:48Z | - |
dc.date.available | 2020-08-21 | |
dc.date.copyright | 2020-08-21 | |
dc.date.issued | 2020 | |
dc.date.submitted | 2020-08-17 | |
dc.identifier.citation | [1] J. Song, P. Kalluri, A. Grover, S. Zhao, and S. Ermon, “Learning controllable fair representations,” in Proceedings of 22nd International Conference on Artificial Intelligence and Statistics, April 16–18, 2019, Naha, Japan, pp. 2164–2173. [2] F. Villavicencio and J. Bonada, “Applying voice conversion to concatenative singing-voice synthesis,” in 11th Annual Conference of the International Speech Communication Association, September 26–30, 2010, Makuhari, Japan, pp. 2162– 2165. [3] E. Nachmani and L. Wolf, “Unsupervised singing voice conversion,” in 20th Annual Conference of the International Speech Communication Association, September 15– 19, 2019, Graz, Austria, pp. 2583–2587. [4] S. H. Mohammadi and A. Kain, “Voice conversion using deep neural networks with speaker-independent pre-training,” in Spoken Language Technology Workshop, December 7–10, 2014, Lake Tahoe, pp. 19–23. [5] M. Sahidullah and G. Saha, “Design, analysis and experimental evaluation of block based transformation in MFCC computation for speaker recognition,” Speech Commun., vol. 54, no. 4, pp. 543–565, 2012. [6] K. Saino, H. Zen, Y. Nankaku, A. Lee, and K. Tokuda, “An HMM-based singing voice synthesis system,” in Ninth International Conference on Spoken Language Processing, September 17–21, 2006, Pittsburgh [7] E. Helander, T. Virtanen, J. Nurminen, and M. Gabbouj, “Voice conversion using partial least squares regression,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 18, no. 5, pp. 912–921, 2010. [8] Q. Ma, J. T. Wang, D. E. Shasha, and C. H. Wu, “DNA sequence classification via an expectation maximization algorithm and neural networks: a case study,” IEEE Trans. Syst. Man Cybern. Part C, vol. 31, no. 4, pp. 468–475, 2001. [9] K. Qian, Y. Zhang, S. Chang, X. Yang, and M. Hasegawa-Johnson, “Autovc: Zeroshot voice style transfer with only autoencoder loss,” in Proceedings of the 36th International Conference on Machine Learning, June 9–15, 2019, Long Beach, CA, pp. 5210–5219. [10] J. Chou and H. Lee, “One-shot voice conversion by separating speaker and content representations with instance normalization,” in 20th Annual Conference of the International Speech Communication Association, September 15–19, 2019, Graz, Austria, pp. 664–668. [11] X. Huang and S. J. Belongie, “Arbitrary style transfer in real-time with adaptive instance normalization,” in International Conference on Computer Vision, October 22–29, 2017, Venice, pp. 1510–1519. [12] J. Chorowski, R. J. Weiss, S. Bengio, and A. van den Oord, “Unsupervised speech representation learning using wavenet autoencoders,” IEEE/ACM Trans. Audio Speech Lang. Process., vol. 27, no. 12, pp. 2041–2053, 2019. [13] D. Wu and H. Lee, “One-shot voice conversion by vector quantization,” in International Conference on Acoustics, Speech and Signal Processing, May 4–8, 2020, Barcelona, pp. 7734–7738. [14] A. van den Oord, O. Vinyals, and K. Kavukcuoglu, “Neural discrete representation learning,” in Annual Conference on Neural Information Processing Systems, December 4–9, 2017, Long Beach, CA, pp. 6306–6315. [15] K. Kumar, R. Kumar, T. de Boissiere, L. Gestin, W. Z. Teoh, J. Sotelo, A. de Brebisson, Y. Bengio, and A. C. Courville, “Melgan: Generative adversarial networks for conditional waveform synthesis,” in Annual Conference on Neural Information Processing Systems, December 8–14, 2019, Vancouver, pp. 14881–14892. [16] J. Yamagishi, C. Veaux, and K. MacDonald, “CSTR VCTK corpus: English multi-speaker corpus for CSTR voice cloning toolkit”, 2019. Available: https://datashare.is.ed.ac.uk/handle/10283/2651. [17] A. Kolesnikov, X. Zhai, and L. Beyer, “Revisiting self-supervised visual representation learning,” in Conference on Computer Vision and Pattern Recognition, June 16–20, 2019, Long Beach, CA, pp. 1920–1929 | |
dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/52009 | - |
dc.description.abstract | 本論文提出了一套基於向量量化的深度語者轉換模型。 與此同時,本文對此方法與其他現存的方法做了多種客觀、主觀的評估。結果顯示,本文所提出的方法在流利度以及語者相似度上都比現存的方法優秀。 | zh_TW |
dc.description.abstract | This thesis proposes a vector quantization-based voice conversion approach. The objective and the subjective evaluations show that the proposed method performs better than other existing approaches in both audio naturalness and speaker similarity | en |
dc.description.provenance | Made available in DSpace on 2021-06-15T14:02:48Z (GMT). No. of bitstreams: 1 U0001-0708202016441000.pdf: 2284138 bytes, checksum: 2f4e7670112161763516e0a00558ed2d (MD5) Previous issue date: 2020 | en |
dc.description.tableofcontents | 誌謝 i 中文摘要 ii 英文摘要 iii 一、導論 1 1.1 研究動機 1 1.2 語音轉換 1 1.3 研究方向 3 二、背景知識 4 2.1 波形 4 2.2 頻譜 4 2.3 梅式頻譜 4 三、文獻回顧 6 3.1 監督式學習 6 3.1.1 高斯混和模型 6 3.1.2 回歸模型 7 3.2 非監督式學習 8 3.2.1 維度瓶頸 9 3.2.2 實列規範化 9 四、使用向量量化的語音轉換 10 4.1 向量量化 10 4.2 向量量化語音轉換模型 10 4.2.1 編碼 10 4.2.2 解構 11 4.2.3 訓練 11 4.3 實作 12 4.4 訓練資料集 13 4.5 客觀評估 13 4.5.1 內容特徵 13 4.5.2 語者特徵 14 4.6 主觀評估 15 五、結論與未來展望 18 參考目錄 19 | |
dc.language.iso | zh-TW | |
dc.title | 使用向量量化技術來達成語音轉換 | zh_TW |
dc.title | Using Vector Quantization To Achieve Voice Conversion | en |
dc.type | Thesis | |
dc.date.schoolyear | 108-2 | |
dc.description.degree | 碩士 | |
dc.contributor.coadvisor | 李宏毅(Hung-yi Lee) | |
dc.contributor.oralexamcommittee | 楊奕軒(Yi-Hsuan Yang),曹昱(Yu Tsao) | |
dc.subject.keyword | 語音轉換,向量量化,深度學習, | zh_TW |
dc.subject.keyword | voice conversion,vector quantization,deep learning, | en |
dc.relation.page | 21 | |
dc.identifier.doi | 10.6342/NTU202002653 | |
dc.rights.note | 有償授權 | |
dc.date.accepted | 2020-08-18 | |
dc.contributor.author-college | 電機資訊學院 | zh_TW |
dc.contributor.author-dept | 資訊工程學研究所 | zh_TW |
顯示於系所單位: | 資訊工程學系 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
U0001-0708202016441000.pdf 目前未授權公開取用 | 2.23 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。