最小音素錯誤訓練法及其改進方法在國語大字彙辨識上之評估與分析

Yung-Jen Cheng; 程永任

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/9719

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	李琳山
dc.contributor.author	Yung-Jen Cheng	en
dc.contributor.author	程永任	zh_TW
dc.date.accessioned	2021-05-20T20:37:25Z	-
dc.date.available	2008-08-05
dc.date.available	2021-05-20T20:37:25Z	-
dc.date.copyright	2008-08-05
dc.date.issued	2008
dc.date.submitted	2008-07-29
dc.identifier.citation	【1】 L. Bahl, P. Brown, P de Souza, R. Merce, “Maximum Mutual Information Estimation Of Hidden Markov Model Parameters For Speech Recognition,” Proc. ICASSP, 1986. 【2】 B.-H. Juang, W. Chou, C.-H Lee, “Minimum Classification Error Rate Methods For Speech Recognition,” IEEE Transactions on Speech and Audio Processing, 1997. 【3】 D. Povey, P.C. Woodland, “Minimum Phone Error And I-smoothing For Improved Discriminative Training,” Proc. ICASSP, 2002. 【4】 J. Zheng, A. Stolcke, “Improved Discriminative Training Using Phone Lattices,” Interspeech, 2005. 【5】 J. Du, P. Liu, F. K. Soong, J.-L. Zhou, R.-H. Wang, “Minimum Divergence Based Discriminative Training,” Interspeech, 2006. 【6】 L.R. Rabiner, “A tutorial on Hidden Markov Models and selected applications in speech recognition,” Proc. IEEE, Vol.77, No.2, pp.257–285, 1989 【7】 L.R. Bahl, F. Jelinek, R. L. Mercer, “A Maximum Likelihood Approach to Continuous Speech Recongnition,” IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. PAMI-5, No.2, pp.179–190, 1983 【8】 V. Goel, S. Kumar, W. Byrne, “Segmental Minimum Bayes-Risk Decoding for Automatic Speech Recognition,” IEEE Trans. Speech and Audio Processing, 2004 【9】 L. Mangu, E. Brill, A. Stolcke, “Finding Consensus in Speech Recognition: Word Error Minimization and Other Applications of Confusion Nerworks,” Computer Speech and Language, 2004 【10】 P. F. Brown “The Acoustic-Modeling Problem in Automatic Speech Recognition,” Ph.D Dissertation, Carnegie Mellon University, Pittsburg, 1987. 【11】 P. S. Gopalakrishnan, D. Kanevsky, A. Nádas & D. Nahamoo “An Inequality for Rational Functions with Applications to Some Statistical Estimation Problems,” IEEE Trans. Information Theory, Vol. 37, pp.107-113, 1991. 【12】 Y. Normandin. “Hidden Markov Models, Maximum Mutual Information Estimation, and the Speech Recognition Problem,” Ph.D Dissertation, McGill University, Montreal, 1991. 【13】 V. Valtchev, J. J. Odell, P. C. Woodland, S. J. Young. (1997). “MMIE Training of Large Vocabulary Recognition Systems,” Speech Communication, Vol. 22, No. 4, pp.303-314, September 1997. 【14】 P. C. Woodland and D. Povey (2002). “Large Scale Discriminative Training of Hidden Markov Models for Speech Recognition,” Computer Speech and Language, Vol. 16, pp.25-47, 2002. 【15】 Wikipedia, Levenshtein distance, http://en.wikipedia.org/wiki/Levenshtein_distance 【16】 J. Kaiser, B. Horvat, Z. Kacic “Overall Risk Criterion Estimation of Hidden Markov Model Parameters,” Speech Communication, Vol. 38, pp.383-398, 2002. 【17】 B.-H. Juang and S. Katagiri “Discriminative Learning for Minimum Error Classification,” IEEE Trans. Signal Processing, Vol. 40, No. 12, pp. 3043-3054, 1992. 【18】 W. Chou, C.-H. Lee, B.-H. Juang (1993). “Minimum Error Rate Training based on N-Best String Models,” Proc. ICASSP, 1993. 【19】 L. K. Saul and M. G. Rahim “Maximum Likelihood and Minimum Classification Error Factor Analysis for Automatic Speech Recognition,” IEEE Trans. Acoustics, Speech and Signal Processing, Vol. 8, No. 2, pp. 115-125, March 2000. 【20】 R. Schlüter. “Investigations on Discriminative Training Criteria,” Ph.D Dissertation, RWTH Aachen University of Technology, 2000. 【21】 H.-M. Wang, B. Chen, J.-W. Kuo, and S.-S Cheng “MATBN: A Mandarin Chinese Broadcast News Corpus,” Interational Journal of Computational Linguistics and Chinese Language Processing, 2005 【22】 Cambridge University Engineering Dept. (CUED), Machine Intelligence Laboratory, “HTK,” http://htk.eng.cam.ac.uk/ 【23】 SRI Speech Technology and Research Laboratory, “SRILM,” http://www.speech.sri.com/projects/srilm/ 【24】潘奕誠，『大字彙中文連續語音辨認之一段式及以詞圖為基礎之搜尋演算法』，碩士論文，國立台灣大學資訊工程研究所，2002 【25】 X. Huang, A. Acero, H.-W. Hon, “Spoken Language Processing,” Pearson Education Taiwan Ltd., pp. 424-426, 2005 【26】 S. Furui, “Cepstral analysis technique for automatic speaker verification,” IEEE Trans. Acoustics, Speech and Signal Processing, Vol.29, No.2, pp. 254-272, 1981. 【27】 S. M. Katz. “Estimation of Probabilities from Sparse Data for Other Language Component of a Speech Recognizer,” IEEE Trans. Acoustics, Speech and Signal Processing, Vol.35, No.3, pp.400-401, 1987. 【28】郭人瑋，『最小化音素錯誤鑑別式聲學模型學習於中文大詞彙連續語音辨識之初步研究』，碩士論文，國立台灣師範大學資訊工程研究所，2005 【29】陳佳妤，『最小化音素錯誤模型及特徵訓練法於中文大詞彙辨識上之初步研究』，碩士論文，國立台灣大學電信工程研究所，2006 【30】 D. Povey. “Discriminative Training for Large Vocabulary Speech Recognition,” Ph.D Dissertation, Peterhouse, University of Cambridge, 2004. 【31】 X. L. Aubert “An Overview of Decoding Techniques for Large Vocabulary Continuous Speech Recognition, “ Computer Speech and Language, 2002 【32】 D. Povey, B. Kingsbury, “Evaluation of Proposed Modifications to MPE for Large Scale Discriminative Training,” Proc. ICASSP, 2007. 【33】 S.-H. Liu, F.-H. Chu, S.-H. Lin, B. Chen, “Investigation Data Selection for Minimum Phone Error Training of Acoustic Models,” Proc. ICME, 2007. 【34】 M. Gibson, T. Hain, “Hypothesis Spaces for Minimum Bayes Risk Training in Large Vocabulary Speech Recognition,” Interspeech, 2006. 【35】 Wikipedia, Multivariate normal distribution, http://en.wikipedia.org/wiki/Multivariate_normal_distribution#Kullback-Leibler_divergence 【36】蔡明怡，『國語語音之發音變異分析及提昇辨識效能之發音模型』，博士論文，國立台灣大學電信工程研究所，2006 【37】 X. Li, H. Jiang, C. Liu, “Larginn Margin HMMs for Speech Recognition,” ICASSP, 2005 【38】 Wikipedia, Support vector machine, http://en.wikipedia.org/wiki/Support_vector_machine 【39】 DTREG, Software For Predictive Modeling and Forecasting, SVM - Support Vector Machines, http://www.dtreg.com/svm.htm 【40】 S.-H. Liu, F.-H. Chu, S.-H. Lin, H.-S. Lee, B. Chen, “Training Data Selection for Improving Discriminative Training of Acoustic Models,” ASRU, 2007. 【41】朱芳輝，『資料選取方法於鑑別式聲學模型訓練之研究』，碩士論文，國立台灣師範大學資訊工程研究所，2008 【42】 H. Jiang, X. Li, C. Liu, “Large Margin Hidden Markov Models for Speech Recognition,” IEEE Trans. Acoustics, Speech and Signal Processing, Vol.14, No.5, pp. 1584-1595, 2006. 【43】 F. Sha, L. K. Saul, “Comparison of Large Margin Training to Other Discriminative Methods for Phonetic Recognition by Hidden Markov Models,” ICASSP, 2007 【44】 J. Li, M. Yuan, C.-H. Lee, “Soft Margin Estimation of Hidden Markov Model Parameters,” Interspeech, 2006. 【45】 D. Povey, D. Kanevsky, B. Kingsbury, B. Ramabhadran, G. Soan, K. Visweswariah, “Boosted MMI for Model and Feature- Space Discriminative Training,” ICASSP, 2008
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/9719	-
dc.description.abstract	傳統的語音模型訓練以最大相似度(Maximum Likelihood, ML)來訓練聲學模型，雖然可以使正確的轉寫在訓練語料中有最大的事後機率，卻無法保證錯誤的聲學特徵(feature)不會產生更大的事後機率。鑑別式訓練(discriminative training)同時將可能的辨識結果與正確轉寫納入訓練，設法避免不正確的聲學特徵產生高於正確轉寫的事後機率。本論文以最小音素錯誤訓練法(Minimum Phone Error, MPE)以及其改進方法為主軸，詳細介紹鑑別式訓練法的背景知識、理論基礎以及實驗結果。本論文可分為五個部份：第一部份為鑑別式訓練的基礎理論，從貝氏風險(Bayes Risk)開始，介紹目前廣泛研究的若干種模型訓練法，包括最大相似度估測法、最大相互資訊(Maximum Mutual Information, MMI)估測法、全面風險法則估測(Overall Risk Criterion Estimation, ORCE)、最小分類錯誤(Minimum Classification Error, MCE)訓練法以及最小音素錯誤(Minimum Phone Error, MPE)訓練法，這些訓練法的目標函數都可以視為貝氏風險的延伸。第二部份為本論文的實驗架構：包括師大的新聞語料庫；實驗的前端處理方式，梅爾倒頻譜係數(Mel-Frequency Cepstrum Coefficient, MFCC)；初始聲學模型的訓練，由HTK以最大相似度估測法訓練而成；詞典及語言模型的建立，以中央通訊社收集的文字語料由SRILM訓練而成；以及語音辨識工具，為台大語音實驗室的TTK。基礎實驗為初始聲學模型的辨識結果。第三部份為最小音素錯誤訓練法，先介紹目標函數最佳化的理論推導過程，求得模型參數的更新公式。再介紹模型參數的更新公式中，各項統計值在實作上的計算方法，其中包含正確度的定義，以及詞弧正確度和詞圖期望正確度的算法。實驗結果最小音素錯誤訓練法有約2.4%字正確率的進步。第四部份介紹最小音素錯誤訓練法的改進方法，包括最小音素音框錯誤 (Minimum Phone Frame Error, MPFE)訓練法、狀態層級最小貝氏風險(physical state level Minimum Bayes Risk, sMBR)訓練法和最小歧異度(Minimum Divergence, MD)訓練法，這些方法主要差異在於目標函數中正確度的定義。實驗結果包括最小音素錯誤訓練法的四種方法之中，除了最小歧異度訓練法之外的三種方法都可以在詞正確率以及字正確率上進步，其中又以最小音素錯誤訓練法在字正確率的表現最好，而詞正確率則是以最小音素音框錯誤訓練法表現最好。此外，本論文也在目標函數中正確度的定義做了更進一步的改進：在正確度中加入了錯誤處罰以及音素長度正規化，實驗結果這個正確度的改進版本會產生字正確率進步，而在詞正確率上退步的情形。第五部份介紹基於詞弧期望正確度的資料選取方法，目標是篩選出較具有鑑別力的詞弧納入訓練，實驗在最小音素錯誤訓練法和最小音素音框錯誤訓練法的其中一種修改版本上，實驗結果顯示資料選取對於正確率的變化並沒有很大的影響，不過可以加快訓練的收斂速度。	zh_TW
dc.description.abstract	The traditional acoustic model training is based on Maximum Likelihood (ML). This training method maximizes the posterior probability of transcription in the training corpus, but cannot guarantee that the incorrect observations do not obtain a larger posterior probability. Discriminative training considers the possible recognition hypotheses and transcription into training at the same time, and tries to avoid the incorrect observations obtain larger posterior probability than correct ones. This thesis takes Minimum Phone Error (MPE) and its modified versions as the principal thing, detailed the background, theorem and experimental results of discriminative training. This thesis can be divided into five parts: The first part is the background and theorem of discriminative training. This part starts with Bayesian risk, and then introduces several popular model training methods, including Maximum Likelihood, Maximum Mutual Information (MMI) Estimation, Overall Risk Criterion Estimation (ORCE), Minimum Classification Error (MCE) training method and Minimum Phone Error (MPE) training method, the objective functions of these training methods can be considered as an extension of Bayesian risk. The second part is the experiments framework of this thesis, including Taiwan Broadcast News from National Taiwan Normal University, front-end processing of corpus, Mel-Frequency Cepstrum Coefficient (MFCC), initial acoustic model training, which is trained by HTK and based on Maximum Likelihood, establishment of lexicon and language model, which is trained by SRILM and based on the corpus of text collected by Central News Agency, speech recognition decoder, TTK from speech lab of National Taiwan University. The baseline of experiment is the recognition result of decoding of initial acoustic model. The third part is Minimum Phone Error. First the theory and optimization deriving process of objective function is introduced to obtain the model parameters updating formula. Then, the calculation method in the implementation of statistics in the model parameters updating formula is introduced. It includes the definition of accuracy, the word arc accuracy and the expectation of word graph accuracy. The experiment results of Minimum Phone Error have about 2.4% improvement in character accuracy. Fourth part introduces the modifications of Minimum Phone Error, including Minimum Phone Frame Error (MPFE) training method, physical state level Minimum Bayes Risk (sMBR) training method and Minimum Divergence (MD) training method. The main difference between these methods is the definition of accuracy in the objective function. The experiment results of the 4 methods including MPE show that all training methods have improvement in word and character accuracy except Minimum Divergence. MPE has the best character accuracy, and MPFE has the best word accuracy. In addition, this thesis has further improvements on he definition of accuracy in the objective functions: adding Error Penalty and Normalization of phone length in accuracy. The experimental results show that these modifications improve character accuracy but deprave word accuracy. Fifth part introduces data selection based on the accuracy expectations of word arc. The target is to select more discriminative word arcs into training. This method has experiments on MPE and one modification version of MPFE. The experimental results show that data selection has not great effects on accuracy, but can speed up the training convergence rate.	en
dc.description.provenance	Made available in DSpace on 2021-05-20T20:37:25Z (GMT). No. of bitstreams: 1 ntu-97-R95942038-1.pdf: 1320443 bytes, checksum: 1f054f6bff119cfaaa54781a1a1f9a06 (MD5) Previous issue date: 2008	en
dc.description.tableofcontents	中文摘要 i 目錄 iii 圖目錄 vii 表目錄 xi 第1章緒論 1 1.1 研究動機 1 1.2 統計式語音辨識 2 1.2.1 聲學模型 2 1.2.2 語言模型 4 1.3 研究主題與主要成果 4 1.4 論文架構 5 第2章背景知識 6 2.1 鑑別式訓練法則 6 2.2 貝氏風險(Bayes Risk) 7 2.3 最大相似度(Maximum Likelihood, ML) 8 2.4 最大相互資訊(Maximum Mutual Information, MMI) 10 2.5 全面風險法則估測(Overall Risk Criterion Estimation, ORCE) 11 2.6 最小分類錯誤(Minimum Classification Error, MCE) 13 2.7 最小音素錯誤(Minimum Phone Error, MPE) 14 2.8 綜合各種訓練法之目標函數推導流程 15 2.9 本章結論 15 第3章實驗基礎架構及語料庫 17 3.1 實驗語料 17 3.2 訓練與辨識系統 17 3.2.1 前端處理 18 3.2.2 聲學模型設定 18 3.2.3 詞典建立與語言模型設定 19 3.2.4 語音辨識工具 19 3.3 基礎實驗(baseline) 20 3.4 本章結論 22 第4章最小音素錯誤訓練 23 4.1 目標函數 23 4.1.1 目標函數之最佳化 23 4.1.2 目標函數之微分 24 4.1.3 聲學模型參數更新 27 4.1.4 I平滑 30 4.2 實作流程 31 4.2.1 詞圖 31 4.2.2 詞弧正確度 32 4.2.3 詞圖期望正確度 34 4.2.4 詞圖前向後向演算法 36 4.3 實驗結果 38 4.4 本章結論 39 第5章基於最小音素錯誤改進之鑑別式訓練法 48 5.1 最小音素音框錯誤訓練 48 5.1.1 目標函數 48 5.1.2 加入錯誤處罰與音素長度正規化的詞弧正確度 51 5.1.3 實驗結果 52 5.2 狀態層級最小貝氏風險訓練 66 5.2.1 目標函數 66 5.2.2 加入錯誤處罰的詞弧正確度 68 5.2.3 加入錯誤處罰與音素長度正規化的詞弧正確度 69 5.2.4 實驗結果 71 5.3 最小歧異度訓練 81 5.3.1 目標函數 81 5.3.2 實驗結果 82 5.4 本章結論 84 第6章最小音素錯誤與最小音素音框錯誤的資料選取 86 6.1 基於詞弧期望正確度的資料選取 86 6.2 實驗結果 88 6.3 各實驗綜合整理 100 第7章結論與展望 105 7.1 總結 105 7.2 未來展望 106 附錄A 右相關聲韻母模型 107 附錄B 輔助函數(Auxiliary Function) 111 B.1 強性輔助函數(Strong-Sense Auxiliary Function) 112 B.2 弱性輔助函數(Weak-Sense Auxiliary Function) 115 參考文獻 116
dc.language.iso	zh-TW
dc.title	最小音素錯誤訓練法及其改進方法在國語大字彙辨識上之評估與分析	zh_TW
dc.title	Evaluation and Analysis of Minimum Phone Error Training and Its Modified Versions for Large Vocabulary Mandarin Speech Recognition	en
dc.type	Thesis
dc.date.schoolyear	96-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	鄭秋豫,王小川,陳信宏,簡仁宗
dc.subject.keyword	國語大字彙語音辨識,鑑別式訓練,最小音素錯誤訓練,最小音素音框錯誤,最小貝氏風險,最小歧異度,	zh_TW
dc.subject.keyword	large vocabulary mandarin speech recognition,discriminative training,minimum phone error,minimum phone frame error,minimum bayes risk,minimum divergence,	en
dc.relation.page	120
dc.rights.note	同意授權(全球公開)
dc.date.accepted	2008-07-29
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	電信工程學研究所	zh_TW
顯示於系所單位：	電信工程學研究所

文件中的檔案：

檔案	大小	格式
ntu-97-1.pdf	1.29 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。