Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 電信工程學研究所
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/48026
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor李琳山
dc.contributor.authorShang-Wen Lien
dc.contributor.author李尚文zh_TW
dc.date.accessioned2021-06-15T06:44:40Z-
dc.date.available2014-07-07
dc.date.copyright2011-07-07
dc.date.issued2011
dc.date.submitted2011-06-29
dc.identifier.citation[1] Defense Advance Research Projects Agency, http://www.darpa.mil.
[2] H. Hermansky and S Sharma, “Temporal patterns (traps) in asr of noisy speech,” in
ICASSP, 1999.
[3] H. Hermansky and N.Morgan, “Rasta processing of speech,” in IEEE Trans. Speech
and Audio Proc., 1994, pp. 578–589.
[4] H. Hermansky and P. Fousek, “Multi-resolution rasta filtering for tandem-based
asr,” in Interspeech, 2005.
[5] H. Hermansky, D. Ellis, and S. Sharma, “Tandem connectionist feature extraction
for conventional hmm systems,” in ICASSP, 2000.
[6] D.A. Depireux, J.Z. Simon, D.J. Klein, and S.A. Shamma, “Spectro-temporal re-
sponse field characterization with dynamic ripples in ferret primary auditory cortex,”
in J. Neurophysiology, 2001, vol. 85, pp. 1220–1234.
[7] S. Thomas, S. Ganapathy, and H. Hermansky, “Recognition of reverberant speech
using frequency domain linear prediction,” in IEEE Sig. Proc. Let., vol. 15, pp.
681–684.
[8] S. Ganapathy, S. Thomas, and H. Hermansky, “Robust spectro-temporal features
based on autoregressive models of hilbert envelopes,” in ICASSP, 2010.
[9] X. Domont, M. Heckmann, F. Joublin, and C. Goerick, “Hierarchical spectro-
temporal features for robust speech recognition,” in ICASSP, 2008.
[10] M. Kleinschmidt and D. Gelbart, “Improving word accuracy with gabor feature
extraction,” in ICSLP, 2002.
[11] S. Zhao and N. Morgan, “Multi-stream spectro-temporal features for robust speech
recognition,” in Interspeech, 2008.
[12] B. Meyer and B. Kollmeier, “Complementarity of mfcc, plp and gabor features in
the presence of speech-intrinsic variabilities,” in Interspeech, 2009.
[13] S. Zhao, S. Ravuri, and N. Morgan, “Multi-stream to many-stream: Using spectro-
temporal features for asr,” in ICASSP, 2009.
[14] S. Thomas, N. Mesgarani, and H. Hermansky, “A multistream multiresolution
framework for phoneme recognition,” in Interspeech, 2010.
[15] S. Ravuri and N. Morgan, “Using spectro-temporal features to improve afe feature
extraction for asr,” in Interspeech, 2010.
[16] L. Rabiner and B.-H. Juang, Fundamentals of Speech Recognition, Prentice Hall,
1993.
[17] S. Haykin, Neural Networks A comprehensive Foundation, 1999.
[18] S.-Y. Chang and L.-S. Lee, “Data-driven clustered hierarchical tandem system for
lvcsr,” in Interspeech, 2008.
[19] 張碩尹, “串接群聚階層式多層感知器聲學模型之中文大字彙語音辨識,” M.S.
thesis,國立台灣大學電信工程學研究所 , 2009.
[20] International Computer Science Institute( ICSI),
http://www.icsi.berkeley.edu/Speech/qn.html.
[21] Cambridge University Engineering Dept. (CUED), Machine Intelligence Labora-
tory, ”HTK,” http://htk.eng.cam.ac.uk/.
[22] H. Misra, H. Bourlard, and V. Tyagi, “New entropy based combination rules in
hmm/ann multi-stream asr,” in ICASSP, 2003.
[23] SRI Speech Technology and Research Laboratory, ”SRILM,”
http://www.speech.sri.com/projects/srilm/.
[24] 潘奕誠, “大字彙中文連續語音辨認之一段式及以詞圖為基礎之搜尋演算法,”
M.S. thesis,國立台灣大學資訊工程研究所 , 2002.
[25] S. Katz, “Estimation of probabilities from sparse data for other language component
of a speech recognizer,” in IEEE Trans. Acoustics, Speech and Signal Proc., 1987,
vol. 35, pp. 400–401.
[26] M.-Y. Hwang, G. Peng, W.Wang, A. Faria, A. Heidel, and M. Ostendorf, “Building
a highly accurate mandarin speech recognizer,” in ASRU, 2007.
[27] ESPS Version 5.0 Program Manual, http://www.speech.kth.se/software/.
[28] V. Nefian, L.-H. Liang, X.-B. Pi, Liu X.-X., C. Mao, and K. Murphy, “A coupled
hmm for audio-visual speech recognition,” in ICASSP, 2002.
[29] P.-S. Huang, X.-D. Zhuang, and M. Hasegawa-Johnson, “Improving acoustic event
detection using generalizable visual features and multi-modality modeling,” in
ICASSP, 2011.
[30] J. Frankel, D. Wang, and S. King, “Growing bottleneck features for tandem asr,” in
Interspeech, 2008.
[31] O. Vinyals and S. Ravuri, “Comparing multilayer perceptron to deep belief network
tandem features for robust asr,” in ICASSP, 2011.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/48026-
dc.description.abstract傳統語音辨識中,使用梅爾倒頻譜係數特徵參數來抽取聲音訊號中的語音資訊,並用這樣的特徵參數訓練統計模型,對聲音加以辨識;然而梅爾倒頻譜係數有一些無法克服的缺點,例如其所抽取的資訊僅限於短時間內等。近年來已有不少研究,藉由抽取聲音中更長時間的訊息,或是時域、頻域及時頻域上的變化,來獲取更豐富的特徵參數,進而提升辨識系統的效能。
本論文中,利用加伯濾波器抽取出富含時頻訊息的特徵參數,經多層感知器學習其在不同音素間的變化,得到音素事後機率向量,並藉由串接式系統將加伯事後機率和梅爾倒頻譜係數事後機率做整合,發現可以提升辨識系統的正確率。此外,我們進一步藉由群聚階層式多層感知器,針對易混淆的音素,估計更為精準的事後機率,改善了辨識系統的效能。最後,我們在特徵參數中加入了基頻特徵,並在聲學模型中考慮了聲調的變化,這樣的語音辨識系統在中文大字彙新聞辨識實驗中,辨識正確率有顯著的進步。
zh_TW
dc.description.abstractIn conventional speech recognition, we use MFCC features to extract speech information in waveform. We further train statistic models with these features for decoding. However, MFCC features retain only the information within a short time span. Recently, many researches focus on extracting long-term information from speech signal or the variation in spectral, temporal or spectro-temporal modulation frequency, and these studies achieve significant performance improvement.
Here, we utilize Gabor filters to extract Gabor features, which are abundant in spectro-temporal information. An MLP is trained for learning the variation of Gabor features among different phonemes. The outputs of MLP are Gabor posteriors. We use Tandem system to integrate Gabor and MFCC posteriors and achieve better performance in our speech recognition system. Furthermore, we estimate posteriors more accurately by clustered hierarchical MLP, which emphasize on the classification of error-prone phoneme pairs. Thus, we obtain even better recognition performance. Finally, we add pitch features while MLP training and adopt tonal acoustic units. With these modifications, we significantly improve the performance in Mandarin large vocabulary broadcast news recognition.
en
dc.description.provenanceMade available in DSpace on 2021-06-15T06:44:40Z (GMT). No. of bitstreams: 1
ntu-100-R98942035-1.pdf: 17213022 bytes, checksum: 3f732e46a75682dbc308aea40b5fa4af (MD5)
Previous issue date: 2011
en
dc.description.tableofcontents中文摘要. . . . . . . . . . . . . . . . . . . . . . . . i
一、緒論. . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 研究動機 . . . . . . . . . . . . . . . . . . . . . .1
1.2 語音辨識原理. . . . . . . . . . . . . . . . . . . . 1
1.3 特徵抽取. . . . . . . . . . . . . . . . . . . . . . 2
1.3.1 梅爾倒頻譜係數特徵. . . . . . . . . . . . . . . . 3
1.3.2 時域特徵. . . . . . . . . . . . . . . . . . . . . 3
1.3.3 時頻特徵. . . . . . . . . . . . . . . . . . . . . 4
1.4 聲學模型. . . . . . . . . . . . . . . . . . . . . . 5
1.5 語言模型 . . . . .. . . . . . . . . . . . . . . . . 6
1.6 研究方法及成果. . . . . . . . . . . . . . . . . . . 6
1.7 論文章節概要. . . . . . . . . . . . . . . . . . . . 7
二、背景知識 . . . .. . . . . . . . . . . . . . . . . . 8
2.1 加伯特徵. . . . . . . . . . . . . . . . . . . . . . 8
2.2 多層感知器 . . . . .. . . . . . . . . . . . . . . . 10
2.3 群聚階層式多層感知器 . . . . . . . . . .. . . . . . 13
2.3.1 音素距離. . . . . . . . . . . . . . . . . . . . . 13
2.3.2 階層式群聚法. . . . . . . . . . . . . . . . . . . 14
2.3.3 群聚階層式多層感知器. . . . . . . . . . . . . . . 15
2.4 串接式系統. . . . . . . . . . . . . . . . . . . . . 17
三、加伯特徵與梅爾倒頻譜係數之整合. . . . . . . . . . . 18
3.1 實驗語音資料庫與模型設定. . . . . . . . . . . . . . 18
3.1.1 實驗語料. . . . . . . . . . . . . . . . . . . . . 18
3.1.2 訓練與辨識系統工具. . . . . . . . . . . . . . . . 18
3.1.3 聲學模型設定. . . . . . . . . . . . . . . . . . . 19
3.2 事後機率. . . . . . . . . . . . . . . . . . . . . . 19
3.2.1 梅爾倒頻譜係數事後機率. . . . . . . . . . . . . . 19
3.2.2 加伯事後機率. . . . . . . . . . . . . . . . . . . 20
3.3 整合事後機率之串接式系統. . . . . . . . . . . . . . 20
3.4 實驗結果. . . . . . . . . . . . . . . . . . . . . . 21
3.5 互補性分析. . . . . . . . . . . . . . . . . . . . . 23
3.6 本章結論. . . . . . . . . . . . . . . . . . . . . . 26
四、群聚階層式多層感知器與不同特徵之整合. . . . . . . . 27
4.1 實驗語音資料庫與模型設定. . . . . . . . . . . . . . 27
4.1.1 實驗語料. . . . . . . . . . . . . . . . . . . . . 27
4.1.2 訓練與辨識系統工具. . . . . . . . . . . . . . . . 28
4.1.3 聲學模型設定. . . . . . . . . . . . . . . . . . . 28
4.1.4 辭典與語言模型設定. . . . . . . . . . . . . . . . 29
4.2 事後機率. . . . . . . . . . . . . . . . . . . . . . 29
4.2.1 由多層感知器獲得之事後機率. . . . . . . . . . . . 30
4.2.2 由群聚階層式多層感知器獲得之事後機率. . . . . . . 30
4.3 實驗結果. . . . . . . . . . . . . . . . . . . . . . 33
4.4 系統分析. . . . . . . . . . . . . . . . . . . . . . 37
4.4.1 各種事後機率之音框正確率比較. . . . . . . . . . . 37
4.4.2 事後機率之平均數及變異數正規化. . . . . . . . . . 39
4.5 本章結論. . . . . . . . . . . . . . . . . . . . . . 40
五、聲調特徵與聲調聲學單元之整合. . . . . . . . . . . . 42
5.1 實驗語音資料庫與模型設定. . . . . . . . . . . . . . 42
5.1.1 聲調與音素集. . . . . . . . . . . . . . . . . . . 42
5.1.2 實驗語料與基礎實驗設定. . . . . . . . . . . . . . 43
5.1.3 辭典與語言模型設定. . . . . . . . . . . . . . . . 43
5.2 音調特徵. . . . . . . . . . . . . . . . . . . . . . 43
5.3 事後機率. . . . . . . . . . . . . . . . . . . . . . 44
5.3.1 加伯與梅爾倒頻譜係數事後機率. . . . . . . . . . . 44
5.3.2 經基頻特徵增強之事後機率. . . . . . . . . . . . . 45
5.4 整合事後機率之串接式系統. . . . . . . . . . . . . . 45
5.5 實驗結果. . . . . . . . . . . . . . . . . . . . . . 46
5.6 不同特徵與聲學單位之分析. . . . . . . . . . . . . . 49
5.7 本章結論. . . . . . . . . . . . . . . . . . . . . . 52
六、結論與展望. . . . . . . . . . . . . . . . . . . . . 53
6.1 結論. . . . . . . . . . . . . . . . . . . . . . . . 53
6.2 展望. . . . . . . . . . . . . . . . . . . . . . . . 54
參考文獻. . . . . . . . . . . . . . . . . . . . . . . . 55
dc.language.isozh-TW
dc.title用串接式系統整合加伯與基頻特徵之國語語音辨識zh_TW
dc.titleIntegrating Gabor and Pitch Features in Tandem Systems for Mandarin Speech Recognitionen
dc.typeThesis
dc.date.schoolyear99-2
dc.description.degree碩士
dc.contributor.oralexamcommittee陳信宏,鄭秋豫,王小川,簡仁宗
dc.subject.keyword語音辨識,特徵抽取,串接式系統,zh_TW
dc.subject.keywordspeech recognition,feature extraction,Tandem system,en
dc.relation.page58
dc.rights.note有償授權
dc.date.accepted2011-06-29
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept電信工程學研究所zh_TW
顯示於系所單位:電信工程學研究所

文件中的檔案:
檔案 大小格式 
ntu-100-1.pdf
  目前未授權公開取用
16.81 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved