串接群聚階層式多層感知器聲學模型之中文大字彙語
音辨識

Shuo-Yiin Chang; 張碩尹

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/43000

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	李琳山(Lin-Shan Lee)
dc.contributor.author	Shuo-Yiin Chang	en
dc.contributor.author	張碩尹	zh_TW
dc.date.accessioned	2021-06-15T01:32:14Z	-
dc.date.available	2009-07-24
dc.date.copyright	2009-07-24
dc.date.issued	2009
dc.date.submitted	2009-07-20
dc.identifier.citation	【1】 Defense Advanced Research Projects Agency http://www.darpa.mil/ 【2】 National Institute of Standards and Technology http://www.nist.gov/index.html 【3】 X. Huang, A. Acero, H.-W. Hon, “Spoken Language Processing,” Pearson Education Taiwan Ltd., pp. 424-426, 2005 【4】 H. Hermansky, B. Hanson, and H. Wakita, “Perceptually based linear predictive analysis of speech,” Apr 1985, vol. 10, pp. 509–512. 【5】 L. Rabiner, B.H. Juang, “Fundamentals of Speech Recognition”, Prentice Hall, 1993, 【6】 Hynek Hermansky Daniel, Daniel P. W. Ellis, and Sangita Sharma, “Tandem connectionist feature extraction for conventional hmm systems,” CASSP, 2000 【7】 Bourlard, H. and Morgan, N., “Connectionist speech recognition: A hybrid approach”, Kluwer Academic Publishers,Boston, USA, 1994 【8】 Simon Haykin “Neural Networks A comprehensive Foundation”,USA 1999 【9】 ICSI Speech FAQ: International Computer Science Institute(ICSI)， http://www.icsi.berkeley.edu/speech/faq/nn-train.html 【10】 Richard O.Duda, Peter E.Harty and David G.Stork “Pattern Classification”, Canada 2001 【11】 Zhu, Q., Chen, B., Morgan, N., and Stolcke, A., “On using MLP features in LVCSR”, ICSLP 2004 【12】 Chen, B., Zhu, Q., Morgan, N. “Learning long term temporal features in LVCSR using neural networks”, ICSLP 2004 【13】 International Computer Science Institute(ICSI)， 64 http://www.icsi.berkeley.edu/Speech/qn.html 【14】 Cambridge University Engineering Dept. (CUED), Machine Intelligence Laboratory, “HTK,” http://htk.eng.cam.ac.uk/ 【15】 SRI Speech Technology and Research Laboratory, “SRILM,” http://www.speech.sri.com/projects/srilm/ 【16】潘奕誠，『大字彙中文連續語音辨認之一段式及以詞圖為基礎之搜尋演算法』，碩士論文，國立台灣大學資訊工程研究所，2002 【17】 X. Huang, A. Acero, H.-W. Hon, “Spoken Language Processing,” Pearson Education Taiwan Ltd., pp. 424-426, 2005 【18】 S. Furui, “Cepstral analysis technique for automatic speaker verification,” IEEE Trans. Acoustics, Speech and Signal Processing, Vol.29, No.2, pp. 254-272, 1981 【19】 S. M. Katz. “Estimation of Probabilities from Sparse Data for Other Language Component of a Speech Recognizer,” IEEE Trans. Acoustics, Speech and Signal Processing, Vol.35, No.3, pp.400-401, 1987. 【20】 S-Y. Chang and L-S Lee. ”Data-driven Clustered Hierarchical Tandem System for LVCSR”, Interspeech 2008 【21】 S-Y. Chang and L-S Lee. ” Improved Clustered Hierarchical Tandem System with Bottom-Up Processing”, ICASSP 2009 【22】 Guillermo Aradilla, Jithendra Vepa and Herv’e Bourlard “An Acoustic Model Based on Kullback-Leibler Divergence for Posterior Features” ICASSP 2007 【 23 】 Guillermo Aradilla, Herv’e Bourlard and Mathew Magimai Doss ”Using KL-based Acoustic Models in a Large Vocabulary Recognition Task” Interspeech 2008 【24】 R. Veldhuis, “The Centroid of the Symmetrical Kullback-Leibler Distance,” 65 IEEE Signal Processing Letters, vol. 9, pp.96–99, 2002 【25】 Valente,F. and Hermansky,H. ”Hierarchical and parallel processing of modulation spectrum for ASR application”, ICASSP 2008 【26】 Hermansky H. and Fousek P., “Multi-resolution rasta filtering for tandem-based ASR.,” Interspeech 2005 【27】 Sherry Y. Zhao and Nelson Morgan “Multi-Stream Spectro-Temporal Features for Robust Speech Recognition” Interspeech 2008 【28】 Fosler-Lussier,E. and Morris,J. “Crandem systems: Conditional random field acoustic models for hidden macrov models”, ICASSP 2008 【29】 R. Teunen and M. Akamine, “Speech recognition using soft decision trees” Interspeech, 2008. 【30】 Deepu Vijayasenan, Fabio Valente, Herve Bourlard, “Integration of TDOA Features in Information Bottleneck Framework for Fast Speaker Diarization” 【31】 Dong Wang, Javier Tejedor, Joe Frankel, Simon King and Jose Col’as ” Posterior-Based Confidence Measures for Spoken Term Detection”
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/43000	-
dc.description.abstract	在傳統的聲學模型中，連續機率密度隱藏馬可夫模型最為被廣泛使用。但是連續機率密度隱藏馬可夫模型有一些無法克服的缺點，近年不少研究藉由不同的訓練方法或是結合不同機器學習的技術以改進連續機率密度隱藏馬可夫模型，這些方法在新一代的語音辨識技術上已漸受肯定並普受重視，而且有不少被實踐在各項國際競賽中。本論文即是嘗試使用多層感知器來幫助聲學模型辨識的研究。在本論文中，我們提出藉由音素分群建立的階層式多層感知器。一般串接模型中以單一多層感知器學習概括性的音素分類，很難區分混淆的音素；本論文藉由拆解概括性音素分類問題為一組針對性的階層式分類，將複雜的音素分類問題分而治之，並且討論在不同的分群結構下階層式多層感知器的表現，之後再以由下而上的訓練方法，進一步改進階層式多層感知器。最後在以上述的方法為第一階段辨識，由隱藏馬可夫與多層感知器混合模型以及隱藏馬可夫(KL)模型重新計分。這些方法在中文大字彙新聞辨識中都證實可以使辨識正確率有明確進步。	zh_TW
dc.description.provenance	Made available in DSpace on 2021-06-15T01:32:14Z (GMT). No. of bitstreams: 1 ntu-98-R96942045-1.pdf: 2955239 bytes, checksum: 1bab5de9fafbbb794df33d290d44c371 (MD5) Previous issue date: 2009	en
dc.description.tableofcontents	口試委員會審定書............................................................................................................ i 誌謝.................................................................................................................................. ii 中文摘要.......................................................................................................................... iii 內容大綱.......................................................................................................................... iv 圖目錄............................................................................................................................ viii 表目錄............................................................................................................................... x 第一章緒論 ........................................................................................................... 1 1.1 研究動機 ........................................................................................................ 1 1.2 統計式語音辨識原理 .................................................................................... 2 1.3 聲學模型 ........................................................................................................ 3 1.4 語言模型 ........................................................................................................ 4 1.5 傳統聲學模型的特性 .................................................................................... 5 1.6 本論文的研究方法 ........................................................................................ 7 1.7 本論文的研究成果 ........................................................................................ 8 第二章串接式聲學模型 ....................................................................................... 9 2.1 鑑別模型和生成模型的比較 ........................................................................ 9 2.2 類神經網路分類器 ...................................................................................... 11 2.3 串接式模型 .................................................................................................. 17 vi 2.4 長時間特徵的串接式模型 .......................................................................... 19 2.5 事後機率特徵的結合 .................................................................................. 22 2.6 實驗語音資料庫與模型設定 ...................................................................... 22 2.6.1 實驗語料 ............................................................................................ 22 2.6.2 訓練與辨識系統工具. ....................................................................... 23 2.6.3 前端處理. ........................................................................................... 23 2.6.4 聲學模型設定 .................................................................................... 23 2.6.5 辭典與語言模型設定. ....................................................................... 24 2.6.3 前端處理. ........................................................................................... 23 2.6.4 聲學模型設定 .................................................................................... 23 2.6.5 辭典與語言模型設定. ....................................................................... 24 2.7 基礎實驗結果 .............................................................................................. 25 2.7.1 多層感知器訓練與分類實驗. ........................................................... 25 2.7.2 串接模型大字彙辨識實驗. ............................................................... 26 第三章群聚階層式串接模型 ............................................................................. 29 3.1 音素距離 ...................................................................................................... 30 3.2 階層式群聚法 .............................................................................................. 32 3.3 群聚階層式串接模型 .................................................................................. 36 3.3.1 高層感知器. ....................................................................................... 36 vii 3.3.2 末端感知器. ....................................................................................... 36 3.3.3 高層感知器和末端感知器的整合方法. ........................................... 36 3.4 實驗結果 ...................................................................................................... 37 3.5 分群結果分析 .............................................................................................. 40 3.6 本章結論 ...................................................................................................... 42 第四章群聚階層式串接模型的改進 ................................................................. 43 4.1 基於群聚訓練的缺點 .................................................................................. 43 4.2 群聚階層式串接模型由下而上的處理 ...................................................... 44 4.3 實驗結果 ...................................................................................................... 46 4.3.1 基礎實驗 ............................................................................................ 46 4.2.2 實驗結果 ............................................................................................ 47 Chapter 5 隱藏馬可夫與多層感知器之混合模型 ................................................. 51 5.1 隱藏馬可夫與多層感知器混合模型 .......................................................... 51 5.1.1 隱藏馬可夫與多層感知器混合模型的架構 .................................... 51 5.1.2 隱藏馬可夫與多層感知器混合模型的訓練 .................................... 53 5.2 隱藏馬可夫(KL)模型 .................................................................................. 54 5.3 串接模型和混合模型的比較 ...................................................................... 57 5.4 實驗結果 ...................................................................................................... 58 viii 5.5 本章結論 ...................................................................................................... 58 第六章結論與展望 ............................................................................................. 61 6.1 結論 .............................................................................................................. 61 6.2 展望 .............................................................................................................. 61 REFERENCE .................................................................................................................. 62
dc.language.iso	zh-TW
dc.subject	大字彙語音辨識	zh_TW
dc.subject	聲學模型	zh_TW
dc.subject	多層感知器	zh_TW
dc.subject	Multi-layer Perceptron	en
dc.subject	LVCSR	en
dc.subject	Acoustic Model	en
dc.title	串接群聚階層式多層感知器聲學模型之中文大字彙語音辨識	zh_TW
dc.title	Large Vocabulary Mandarin Speech Recognition Based on Tandem System with Clustered Hierarchical Multi-layer Perceptron	en
dc.type	Thesis
dc.date.schoolyear	97-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	王小川(Hsiao-Chuan Wang),陳信宏(Sin-Horng Chen),鄭秋豫(Chiu-yu Tseng),簡仁宗(Jen-Tzung Chien)
dc.subject.keyword	大字彙語音辨識,聲學模型,多層感知器,	zh_TW
dc.subject.keyword	LVCSR,Acoustic Model,Multi-layer Perceptron,	en
dc.relation.page	68
dc.rights.note	有償授權
dc.date.accepted	2009-07-20
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	電信工程學研究所	zh_TW
顯示於系所單位：	電信工程學研究所

文件中的檔案：

檔案	大小	格式
ntu-98-1.pdf 未授權公開取用	2.89 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。