中文語音轉換在混合激發線性預測語音編碼器上之實現

Chun-Jung Hsiao; 蕭鈞榮

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/34747

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	賴飛羆(Fei-Pei Lai 賴飛羆)
dc.contributor.author	Chun-Jung Hsiao	en
dc.contributor.author	蕭鈞榮	zh_TW
dc.date.accessioned	2021-06-13T06:34:12Z	-
dc.date.available	2006-01-26
dc.date.copyright	2006-01-26
dc.date.issued	2006
dc.date.submitted	2006-01-20
dc.identifier.citation	【1】M. Abe, S. Nakamura, K. Shikano, and H. Kuwabara. “Voice conversion through vector quantization”. J. Acoust. Soc. Jpn.(E), Vol. 11, No. 2, pp. 71–76, 1990. 【2】M. Mashimo, T. Toda, H. Kawanami. K. Shikano, and N.Campbell. “Cross-language voice conversion evaluation using bilingual databases”. IPSJ Journal, Vol. 43, No. 7, pp. 2177–2185, 2002. 【3】T. Toda. “High-quality and flexible speech synthesis with segment selection and voice conversion”. Ph.D. Thesis, Graduate School of Information Science, Nara Institute of Science and Technology, 2003. 【4】Ki Seung Lee, Dae Hee Yun, and Il Whan Cha, “A New Voice Transformation Method based on Both Linear and Nonlinear Prediction Analysis”. The International Conference on Spoken Language Processing,1401-1404,Philadelphia,USA,October 1996. 【5】M. Tamura, T. Masuko, K. Tokuda, and T. Kobayashi. ”Adaptation of pitch and spectrum for HMM-based speech synthesis using MLLR”. Proc. ICASSP, pp. 805–808, Salt Lake City, USA, May 2001. 【6】H. Kawanami, Y. Iwami, T. Toda, H. Saruwatari, K. Shikano, ``GMM-based Voice Conversion Applied to Emotional Speech Synthesis,' Proc. European Conference on Speech Communication and Technology (EUROSPEECH2003), pp. 2401-2404, Geneva, Switzerland, Sep. 2003. 【7】Childers, D. G., 'Glottal source modeling for voice conversion', Speech Communication, vol. 16, pp. 127 - 138, 1995. 【8】M. A. Kohler, “A Comparison of the New 2400 BPS MELP Federal Standard with other Standard Coders”, Proc. of the Int. Conf. Acoust., Speech and Signal Processing, 1997. 【9】John Puterbaugh ,“Voice Conversion” : http://silvertone.princeton.edu/~john/voiceconversion.htm 【10】林明灶，” 中文音節辨認之研究—混合模型法”，大同大學電機工程研究所博士論文，2003 【11】徐志文，”國語關鍵詞擷取與發音確認之研究 ”，國立台灣大學碩士論文，2000 【12】L.M. Arslan, D.Talkin, .Voice conversion by codebook mapping of line spectral frequences and excitation spectrum., Proceedings EUROSPEECH, 1997,3:1347-1350. 【13】L.M. Supplee, R.P. Cohn, J.S. Collura, A.V. McCree, 'MELP: The New Federal Standard at 2400 bps,' IEEE International Conference on Acoustics, Speech, and Signal Processing, Munich, Germany, 1997. 【14】M.-T. Lin, C.-K. Lee and C.-Y. Lin, 'Consonant/ vowel segmentation for Mandarin syllable recognition,' Comp. Speech and Lang., vol. 13, pp. 207-222, 1999. 【15】Speech conversion using MELP speech coding algorithm Salor, O.; Demirekler, M.; Signal Processing and Communications Applications Conference, 2004. Proceedings of the IEEE 12th 28-30 April 2004 Page(s):268 - 271 【16】楊東敏, 基於線性預測編碼及音框週期同步之高品質語音變換技術, 碩士論文, 國立中央大學, 2003 【17】江佩芳，” 混合激發線性預測語音編碼之研究 ”，國立成功大學碩士論文，2001
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/34747	-
dc.description.abstract	本篇論文主要的研究方向是將語音變換方法架構在2.4kbps低位元率的混合激發線性預測(Mixed Excitation Linear Prediction)語音編碼器上，以便實際應用在即時通訊之中，增添娛樂性質甚至保密功能。經由大量語料統計發現，在相同語者說話語音的相同音節發音當中，使用ＭＥＬＰ編碼器分析而得的四階線頻譜（Line Spectrum Frequency）參數，其第一階及第二階參數在向量索引(index)的分布上具有多數聚集的特性。本論文提出以音節為基礎的對照方式，建造一來源語者與目標語者的口腔頻譜特徵對照表，以改善因為選錯音節而造成不連續語音的情形；另外線性調整兩語者的基頻週期，改變語者語音的原始激發訊號(Residual Signal)；經由模擬實驗結果證實，來源語者確實可以改變成目標語者的效果，而合成語音的品質也令人滿意。	zh_TW
dc.description.abstract	In this work we focused on reusing parameters of 2.4kbps Mixed Excitation Linear Prediction (MELP) voice coder, implement the speech conversion from source speaker to the specified target speaker. Using MELP algorithm to analyze the speech, statistically we found that for the same phoneme of the same speaker, the first and second stage indexes of MELP 4-stage vector quantized Line Spectral Frequency (LSF) tend to collect around some certain index values. We proposed a method that based on Mandarin syllable to build up a mapping table of these indexes between the spectral features of the source and the target speakers. To avoid the discontinued voice that caused by mismatching of the syllable, we proposed a new segmental technique based on feature vector frame. The pitch periods of residual signal were also modified using linear relationship. The simulation results show that the source speaker can be changed to the target speaker, and the quality of synthesized voice is good.	en
dc.description.provenance	Made available in DSpace on 2021-06-13T06:34:12Z (GMT). No. of bitstreams: 1 ntu-95-P92921005-1.pdf: 982140 bytes, checksum: 7f01b9be2063ef5d42a713b21d34b895 (MD5) Previous issue date: 2006	en
dc.description.tableofcontents	中文摘要 i Abstract ii 致謝 iii Contents iv List of Figures vi List of Tables vii Chapter 1 INTRODUCTION 1 1.1. Motive 1 1.2. Background and Related works 2 1.3. Research Methodology 3 1.4. Organization 3 Chapter 2 RELATED TECHNOLOGIES OVERVIEW 5 2.1. Fundamental Knowledge of Speech 5 2.1.1. Vocal System 5 2.1.2. Characteristic of Mandarin Speech 6 2.2. Speech Conversion Introduction 9 2.3. MELP Speech Coding Basics 11 2.3.1. Encoder 11 2.3.2. Decoder 14 Chapter 3 RESEARCH METHODOLOGY 17 3.1. Source-Filter Model 17 3.1.1. Vocal Tract Filter 17 3.1.2. Excitation 20 3.2. Method of Spectral Mapping 22 3.2.1. Mandarin Syllable 22 3.2.2. Dynamic Time Warping 22 3.3. Syllable Segments 26 3.3.1. Methodology 26 Chapter 4 SIMULATION AND RESULTS 33 4.1. Simulation 33 4.1.1. Input Speech Data 33 4.1.2. Model Training 33 4.1.3. Modified Speech 35 4.2. Results 36 4.2.1. Mono-syllables 36 4.2.2. Continuous sentences 39 Chapter 5 CONCLUSIONS 43 REFERENCES 45
dc.language.iso	en
dc.subject	語音轉換	zh_TW
dc.subject	混合激發線性預測	zh_TW
dc.subject	國語音節	zh_TW
dc.subject	MELP	en
dc.subject	Mandarin syllable	en
dc.subject	Speech Conversion	en
dc.title	中文語音轉換在混合激發線性預測語音編碼器上之實現	zh_TW
dc.title	Implement Mandarin Speech Conversion on Mixed Excitation Linear Prediction (MELP) CODEC	en
dc.type	Thesis
dc.date.schoolyear	94-1
dc.description.degree	碩士
dc.contributor.oralexamcommittee	鄭士康(Shyh-Kang Jeng 鄭士康),陳柏誠(Po-Cheng Chen 陳柏誠)
dc.subject.keyword	語音轉換,混合激發線性預測,國語音節,	zh_TW
dc.subject.keyword	MELP,Speech Conversion,Mandarin syllable,	en
dc.relation.page	46
dc.rights.note	有償授權
dc.date.accepted	2006-01-23
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	電機工程學研究所	zh_TW
顯示於系所單位：	電機工程學系

文件中的檔案：

檔案	大小	格式
ntu-95-1.pdf 未授權公開取用	959.12 kB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。