Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/52009
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor呂育道(Yuh-Dauh Lyuu)
dc.contributor.authorDa-Yi Wuen
dc.contributor.author吳達懿zh_TW
dc.date.accessioned2021-06-15T14:02:48Z-
dc.date.available2020-08-21
dc.date.copyright2020-08-21
dc.date.issued2020
dc.date.submitted2020-08-17
dc.identifier.citation[1] J. Song, P. Kalluri, A. Grover, S. Zhao, and S. Ermon, “Learning controllable fair representations,” in Proceedings of 22nd International Conference on Artificial Intelligence and Statistics, April 16–18, 2019, Naha, Japan, pp. 2164–2173.
[2] F. Villavicencio and J. Bonada, “Applying voice conversion to concatenative singing-voice synthesis,” in 11th Annual Conference of the International Speech Communication Association, September 26–30, 2010, Makuhari, Japan, pp. 2162– 2165.
[3] E. Nachmani and L. Wolf, “Unsupervised singing voice conversion,” in 20th Annual Conference of the International Speech Communication Association, September 15– 19, 2019, Graz, Austria, pp. 2583–2587.
[4] S. H. Mohammadi and A. Kain, “Voice conversion using deep neural networks with speaker-independent pre-training,” in Spoken Language Technology Workshop, December 7–10, 2014, Lake Tahoe, pp. 19–23.
[5] M. Sahidullah and G. Saha, “Design, analysis and experimental evaluation of block based transformation in MFCC computation for speaker recognition,” Speech Commun., vol. 54, no. 4, pp. 543–565, 2012.
[6] K. Saino, H. Zen, Y. Nankaku, A. Lee, and K. Tokuda, “An HMM-based singing voice synthesis system,” in Ninth International Conference on Spoken Language Processing, September 17–21, 2006, Pittsburgh
[7] E. Helander, T. Virtanen, J. Nurminen, and M. Gabbouj, “Voice conversion using partial least squares regression,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 18, no. 5, pp. 912–921, 2010.
[8] Q. Ma, J. T. Wang, D. E. Shasha, and C. H. Wu, “DNA sequence classification via an expectation maximization algorithm and neural networks: a case study,” IEEE Trans. Syst. Man Cybern. Part C, vol. 31, no. 4, pp. 468–475, 2001.
[9] K. Qian, Y. Zhang, S. Chang, X. Yang, and M. Hasegawa-Johnson, “Autovc: Zeroshot voice style transfer with only autoencoder loss,” in Proceedings of the 36th International Conference on Machine Learning, June 9–15, 2019, Long Beach, CA, pp. 5210–5219.
[10] J. Chou and H. Lee, “One-shot voice conversion by separating speaker and content representations with instance normalization,” in 20th Annual Conference of the International Speech Communication Association, September 15–19, 2019, Graz, Austria, pp. 664–668.
[11] X. Huang and S. J. Belongie, “Arbitrary style transfer in real-time with adaptive instance normalization,” in International Conference on Computer Vision, October 22–29, 2017, Venice, pp. 1510–1519.
[12] J. Chorowski, R. J. Weiss, S. Bengio, and A. van den Oord, “Unsupervised speech representation learning using wavenet autoencoders,” IEEE/ACM Trans. Audio Speech Lang. Process., vol. 27, no. 12, pp. 2041–2053, 2019.
[13] D. Wu and H. Lee, “One-shot voice conversion by vector quantization,” in International Conference on Acoustics, Speech and Signal Processing, May 4–8, 2020, Barcelona, pp. 7734–7738.
[14] A. van den Oord, O. Vinyals, and K. Kavukcuoglu, “Neural discrete representation learning,” in Annual Conference on Neural Information Processing Systems, December 4–9, 2017, Long Beach, CA, pp. 6306–6315.
[15] K. Kumar, R. Kumar, T. de Boissiere, L. Gestin, W. Z. Teoh, J. Sotelo, A. de Brebisson, Y. Bengio, and A. C. Courville, “Melgan: Generative adversarial networks for conditional waveform synthesis,” in Annual Conference on Neural Information Processing Systems, December 8–14, 2019, Vancouver, pp. 14881–14892.
[16] J. Yamagishi, C. Veaux, and K. MacDonald, “CSTR VCTK corpus: English multi-speaker corpus for CSTR voice cloning toolkit”, 2019. Available: https://datashare.is.ed.ac.uk/handle/10283/2651.
[17] A. Kolesnikov, X. Zhai, and L. Beyer, “Revisiting self-supervised visual representation learning,” in Conference on Computer Vision and Pattern Recognition, June 16–20, 2019, Long Beach, CA, pp. 1920–1929
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/52009-
dc.description.abstract本論文提出了一套基於向量量化的深度語者轉換模型。 與此同時,本文對此方法與其他現存的方法做了多種客觀、主觀的評估。結果顯示,本文所提出的方法在流利度以及語者相似度上都比現存的方法優秀。zh_TW
dc.description.abstractThis thesis proposes a vector quantization-based voice conversion approach. The objective and the subjective evaluations show that the proposed method performs better than other existing approaches in both audio naturalness and speaker similarityen
dc.description.provenanceMade available in DSpace on 2021-06-15T14:02:48Z (GMT). No. of bitstreams: 1
U0001-0708202016441000.pdf: 2284138 bytes, checksum: 2f4e7670112161763516e0a00558ed2d (MD5)
Previous issue date: 2020
en
dc.description.tableofcontents誌謝 i
中文摘要 ii
英文摘要 iii
一、導論 1
1.1 研究動機 1
1.2 語音轉換 1
1.3 研究方向 3
二、背景知識 4
2.1 波形 4
2.2 頻譜 4
2.3 梅式頻譜 4
三、文獻回顧 6
3.1 監督式學習 6
3.1.1 高斯混和模型 6
3.1.2 回歸模型 7
3.2 非監督式學習 8
3.2.1 維度瓶頸 9
3.2.2 實列規範化 9
四、使用向量量化的語音轉換 10
4.1 向量量化 10
4.2 向量量化語音轉換模型 10
4.2.1 編碼 10
4.2.2 解構 11
4.2.3 訓練 11
4.3 實作 12
4.4 訓練資料集 13
4.5 客觀評估 13
4.5.1 內容特徵 13
4.5.2 語者特徵 14
4.6 主觀評估 15
五、結論與未來展望 18
參考目錄 19
dc.language.isozh-TW
dc.subject深度學習zh_TW
dc.subject語音轉換zh_TW
dc.subject向量量化zh_TW
dc.subjectvoice conversionen
dc.subjectdeep learningen
dc.subjectvector quantizationen
dc.title使用向量量化技術來達成語音轉換zh_TW
dc.titleUsing Vector Quantization To Achieve Voice Conversion
en
dc.typeThesis
dc.date.schoolyear108-2
dc.description.degree碩士
dc.contributor.coadvisor李宏毅(Hung-yi Lee)
dc.contributor.oralexamcommittee楊奕軒(Yi-Hsuan Yang),曹昱(Yu Tsao)
dc.subject.keyword語音轉換,向量量化,深度學習,zh_TW
dc.subject.keywordvoice conversion,vector quantization,deep learning,en
dc.relation.page21
dc.identifier.doi10.6342/NTU202002653
dc.rights.note有償授權
dc.date.accepted2020-08-18
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept資訊工程學研究所zh_TW
顯示於系所單位:資訊工程學系

文件中的檔案:
檔案 大小格式 
U0001-0708202016441000.pdf
  未授權公開取用
2.23 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved