Skip navigation

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets

Learn More
DSpace logo
English
中文
  • Browse
    • Communities
      & Collections
    • Publication Year
    • Author
    • Title
    • Subject
    • Advisor
  • Search TDR
  • Rights Q&A
    • My Page
    • Receive email
      updates
    • Edit Profile
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 電機工程學系
Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/34747
Title: 中文語音轉換在混合激發線性預測語音編碼器上之實現
Implement Mandarin Speech Conversion on Mixed Excitation Linear Prediction (MELP) CODEC
Authors: Chun-Jung Hsiao
蕭鈞榮
Advisor: 賴飛羆(Fei-Pei Lai 賴飛羆)
Keyword: 語音轉換,混合激發線性預測,國語音節,
MELP,Speech Conversion,Mandarin syllable,
Publication Year : 2006
Degree: 碩士
Abstract: 本篇論文主要的研究方向是將語音變換方法架構在2.4kbps低位元率的混合激發線性預測(Mixed Excitation Linear Prediction)語音編碼器上,以便實際應用在即時通訊之中,增添娛樂性質甚至保密功能。
經由大量語料統計發現,在相同語者說話語音的相同音節發音當中,使用MELP編碼器分析而得的四階線頻譜(Line Spectrum Frequency)參數,其第一階及第二階參數在向量索引(index)的分布上具有多數聚集的特性。本論文提出以音節為基礎的對照方式,建造一來源語者與目標語者的口腔頻譜特徵對照表,以改善因為選錯音節而造成不連續語音的情形;另外線性調整兩語者的基頻週期,改變語者語音的原始激發訊號(Residual Signal);經由模擬實驗結果證實,來源語者確實可以改變成目標語者的效果,而合成語音的品質也令人滿意。
In this work we focused on reusing parameters of 2.4kbps Mixed Excitation Linear Prediction (MELP) voice coder, implement the speech conversion from source speaker to the specified target speaker.
Using MELP algorithm to analyze the speech, statistically we found that for the same phoneme of the same speaker, the first and second stage indexes of MELP 4-stage vector quantized Line Spectral Frequency (LSF) tend to collect around some certain index values. We proposed a method that based on Mandarin syllable to build up a mapping table of these indexes between the spectral features of the source and the target speakers. To avoid the discontinued voice that caused by mismatching of the syllable, we proposed a new segmental technique based on feature vector frame. The pitch periods of residual signal were also modified using linear relationship. The simulation results show that the source speaker can be changed to the target speaker, and the quality of synthesized voice is good.
URI: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/34747
Fulltext Rights: 有償授權
Appears in Collections:電機工程學系

Files in This Item:
File SizeFormat 
ntu-95-1.pdf
  Restricted Access
959.12 kBAdobe PDF
Show full item record


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved