請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/33530完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 李琳山(Lin-Shan Lee) | |
| dc.contributor.author | I-Fan Chen | en |
| dc.contributor.author | 陳羿帆 | zh_TW |
| dc.date.accessioned | 2021-06-13T04:45:40Z | - |
| dc.date.available | 2006-07-31 | |
| dc.date.copyright | 2006-07-31 | |
| dc.date.issued | 2006 | |
| dc.date.submitted | 2006-07-17 | |
| dc.identifier.citation | [1] M. J. F. Gales, B. Jia, X. Liu, K.C. Sim, P.C. Woodland and K. Yu, “Development of the CUHTK 2004 Mandarin Conversational Telephone Speech Transcription System,” in Proc. ICASSP, 2005
[2] X. Huang, A. Acero, H. W. Hon, “Spoken Language Processing – A Guide to Theory, Algorithm, and System Development,” Prentice Hall PTR [3] D.Y. Kim, H.Y. Chan, G. Evermann, M.J.F. Gales, D. Mrva, K.C. Sim, P.C. Woodland, “Development of the CU-HTK 2004 Broadcast News Transcription Systems,” in Proc. ICASSP, 2005 [4] Nagendra Kumar, “Investigation of Silicon-Auditory Models and Generalization of Linear Discriminant Analysis for Improved Speech Recognition”, Ph.D. thesis, John Hopkins University, Baltimore, 1997 [5] Nagendra Kumar and A. G. Andreou, “Heteroscedastic Discriminant Analysis and Reduced Rank HMMs for Improved Speech Recognition”, Speech Communication, v.26 n.4, p.283-297, Dec. 1998 [6] Mark J. F. Gales, “Semi-tied Covariance Matrices for Hidden Markov Models”, IEEE Tr. SAP, 7(3), pages 272–281, 1999 [7] R. A. Gopinath, “Maximum likelihood modeling with Gaussian distributions”, Proceedings of ICASSP’98, Seattle, 1998 [8] George Saon, Mukund Padmanabhan, Ramesh Gopinath and Scott Chen, “Maximum Likelihood Discriminant Feature Spaces”, in Proc. ICASSP, 2000 [9] National Institute of Standards and Technology. http://www.nist.gov/ [10] J. Fiscus, “A Post-Processing System to Yield Reduced Word Error Rates: Recognizer Output Voting Error Reduction (ROVER),” in Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, pp. 347-352, 1997 [11] X.L. Aubert, “An Overview of Decoding Techniques for Large Vocabulary Continuous Speech Recognition,” Computer Speech and Language, Vol. 16, January 2002 [12] S. Ortmanns, H. Ney, and X. Aubert, “A word graph algorithm for large vocabulary continuous speech recognition,” Computer Speech and Language, vol. 11, no.1, pp.43-72, Jan. 1997 [13] H. M. Wang, B. Chen, J. W. Kuo, and S. S. Cheng, “MATBN: A Mandarin Chinese Broadcast News Corpus,” International Journal of Computational Linguistics & Chinese Language Processing, June, 2005 [14] 郭人瑋, “最小化音素錯誤鑑別式聲學模型學習於中文大詞彙連續語音辨識之初步研究”, 碩士論文, 國立臺灣師範大學資訊工程研究所 [15] D. Povey, “Discriminative Training for Large Vocabulary Speech Recognition,” PhD Thesis, Cambridge University Engineering Department, 2003 [16] V. Goel, and W. Byrne, “Minimum Bayes-Risk Methods in Automatic Speech Recognition,” PATTERN RECOGNITION in SPEECH and LANGUAGE PROCESSING(CRC Press), chapter. 2, pp. 51-80, 2003 [17] F. Wessel, R. Schlüter, and H. Ney, “Explicit word error minimization using word hypothesis posterior probabilities,” in Proc. ICASSP, pp. 33-36, 2001 [18] J.T. Chien, C.H. Huang, K. Shinoda, and S. Furui, “Towards Optimal Bayes Decision for Speech Recognition,” in Proc. ICASSP, 2006 [19] L. Mangu, E. Brill, and A. Stolcke, “Finding consensus in speech recognition recognition: word error minimization and other applications of confusion networks,” Computer Speech and Language, vol. 14, pp. 373-400, Oct. 2000 [20] A. Stolcke, H. Bratt, J. Butzberger, H. Franco, V. Gadde, M. Plauché, C. Richey, E. Shriberg, K. Sönmez, F. Weng, J. Zheng, “The SRI March 2000 HUB-5 Conversational Speech Transcription System” [21] G. Evermann, and P. Woodland, “Posterior Probability Decoding, Confidence Estimation and System Combination,” in Proceedings of the NIST and NSA Speech Transcription Workshop, College Park, MD, 2000 [22] F. Wessel, R. Schlüter, and H. Ney, “Using Posterior Word Probabilities for Improved Speech Recognition,” in Proc. ICASSP, 2000 [23] F. Wessel, R. Schlüter, K. Macherey, and H. Ney, “Confidence Measures for Large Vocabulary Continuous Speech Recognition,” IEEE Trans. Speech and Audio Processing, vol. 9, pp. 288-298, March, 2001 [24] Ananth Sankar, “Bayesian Model Combination (BAYCOM) for Improved Recognition,” in Proc. ICASSP, 2005 [25] I. F. Chen, L. S. Lee, “A New Framework for System Combination Based on Integrated Hypothesis Space,” in Proc. ICSLP, 2006 | |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/33530 | - |
| dc.description.abstract | 語音是人類最主要最方便的溝通方式之一。隨著科技發展,各種科技產品,如手機、個人數位助理(PDA)等逐漸充斥我們身邊,再加上無線通訊與無線網路的普及,一般公認為在不久的將來,語音將擔任新一代智慧型科技產品與人類之間溝通的主要介面。但是夠高的辨識正確率仍是任何應用的先決條件,而鑑別式解碼(Discriminative Decoding)與多重系統結合(Multi-Systems Combination)是目前兩個廣泛使用且證明能有效提昇辨識率的方法。
在本論文中,我們針對上述兩種方法進行一系列完整探討。在鑑別式解碼部分,我們研究了包括最小貝氏風險解碼(Minimum Bayes Risk Decoding, MBR)、區段最小貝氏風險解碼(Segment Minimum Bayes Risk Decoding, SMBR)、最小時間音框錯誤解碼(Minimum Time Frame Error Decoding, TFE)、與最佳貝氏分類解碼(Optimal Bayes Classification, OBC)之理論內容,並以中文大詞彙廣播新聞辨識為例進行完整實驗與探討。而在多重辨識系統結合部分,我們針對目前廣泛使用的辨識系統結果投票結合法(Recognizer Output Voting Error Reduction, ROVER)搭配單一最佳句、N最佳句、混淆網路(Confusion Network)做為輸入的演算法上進行探討,並以中文大詞彙廣播新聞為例進行完整實驗。 最後我們提出一個基於詞圖合併,鑑別式解碼技術可以成功應用的多重系統結合架構,使上述兩種技術可以有效密切整合。初步實驗結果顯示,在這個整合架構下,鑑別式解碼與多重系統結合可以相輔相成,獲得更佳的辨識率。這是因為由多重系統的詞圖合併可提供更全面的辨識假設空間(Hypothesis Space)使鑑別式解碼技術在風險估測上更為穩定與準確;而鑑別式解碼技術也付予多重系統結合可以選取出更正確的辨識結果的能力。 | zh_TW |
| dc.description.abstract | Substantial efforts have been made in various areas towards the goal of improving the performance of large vocabulary continuous speech recognition (LVCSR) technologies. Two important areas towards this goal, among many others, are rescoring over the word graph as well as combination of multiple systems.
In this thesis, we focused on these two areas for complete studies. In the area of rescoring by discriminative decoding, we studied Minimum Bayes Risk decoding (MBR), Segment Minimum Bayes Risk decoding (SMBR) [16] , Minimum Time Frame Error decoding[17], and Optimal Bayes Classification decoding (OBC)[18] with experiments on Chinese broadcast news corpus. For combination of the outputs of several different systems, we focused on the ROVER technique with N-Best input[9][20]. A new concept of integrated hypothesis space for large vocabulary continuous speech recognition (LVCSR) systems combination is then proposed. Unlike the conventional systems combination approaches such as ROVER, the hypothesis spaces are directly integrated here without string alignment. In this way the timing information for all word hypotheses is well preserved and the new framework is more flexible on rescoring approaches used. Four different rescoring criteria on the integrated hypothesis space were further explored and experiments on Chinese broadcast news corpus indicated improved performance. | en |
| dc.description.provenance | Made available in DSpace on 2021-06-13T04:45:40Z (GMT). No. of bitstreams: 1 ntu-95-R93942025-1.pdf: 897950 bytes, checksum: 33d3a94b8b29d54b5d1f098a89559333 (MD5) Previous issue date: 2006 | en |
| dc.description.tableofcontents | 章節目錄 I
圖目錄 III 表目錄 VII 緒論 1 1.1 研究動機 1 1.2 語音系統與系統結合之特性 2 1.3 本論文研究方法與主要成果 4 1.4 論文架構 5 第二章 語音辨識系統背景知識及本論文之基礎實驗與基礎系統 7 2.1 語音辨識系統架構 7 2.1.1 統計式語音辨識原理 8 2.1.2 聲學模型 9 2.1.3 語言模型 10 2.1.4 搜尋演算法 10 2.2 詞圖(Word Graph) 13 2.2.1 本論文詞圖之定義 13 2.2.2 應用於詞圖之維特比搜尋演算法 14 2.3 本論文所使用之實驗語音資料庫與基本實驗 15 2.3.1 基礎系統設定 15 2.3.2 聲學特徵與雙系統 17 2.3.3 基礎實驗 19 2.4 總結 32 第三章 單系統鑑別式(Discriminative)語音辨識之方法與實驗 33 3.1 概論 33 3.2 貝氏風險(Bayes Risk) 34 3.3 基於貝氏風險設計之辨識法則 35 3.3.1 最小貝氏風險解碼(Minimum Bayes Risk Decoding)與區段最小貝氏風險解碼(Segmental Minimum Bayes Risk Decoding) 35 3.3.2 最小時間音框錯誤解碼(Minimum Time Frame Error Decoding) 38 3.3.3 最佳貝氏分類(Optimal Bayes Classification, OBC) 40 3.4 鑑別式辨識法則之實驗與比較 41 3.4.1 α值設定 42 3.4.2 插入(Insertion)與刪除(Deletion)分析 45 3.5 結論 48 第四章 多重辨識系統結合(Multi-Systems Combination)之方法與實驗 49 4.1 概論 49 4.2 辨識系統結果投票結合法(Recognizer Output Voting Error Reduction – ROVER) 49 4.2.1 多重詞串動態對準(Multiple Sequence Dynamic Programming Alignment) 50 4.2.2 投票法則 52 4.3 混淆網路(Confusion Network)結合法 53 4.3.1 產生混淆網路之演算法 54 4.3.2 混淆網路結合與最佳結果選擇 55 4.4 實驗與比較 55 4.4.1 ROVER參數α與Conf(@)調整實驗 56 4.4.2 不同投票法則間結合錯誤率比較 66 4.4.3 ROVER結合法對個別辨識語句之改善效果表現 67 4.5 結論 69 第五章 鑑別式多系統結合方法與實驗 71 5.1 概論 71 5.2 基於詞圖之系統結合 73 5.3 多系統合併詞圖(Integrated Word Graph)之解碼法則 74 5.3.1 共識性解碼(Consensus Decoding) 75 5.3.2 最小期望音素錯誤解碼(Minimum Expected Phone Error Decoding) 76 5.3.3 最小時間音框錯誤解碼 77 5.3.4 多重解碼法則之結合 77 5.4 實驗與比較 78 5.4.1 時間正規化參數α對不同結合解碼法則之辨識錯誤率影響比較 79 5.4.2 不同結合解碼法則之辨識錯誤率比較 83 5.4.3 多重解碼法則結合之表現分析 84 5.4.4 不同結合解碼法則對個別辨識語句之改善效果表現 85 5.5 結論 87 第六章 結論與展望 89 6.1 總結 89 6.2 展望 90 參考文獻 91 | |
| dc.language.iso | zh-TW | |
| dc.subject | 鑑別式解碼 | zh_TW |
| dc.subject | 語音辨識 | zh_TW |
| dc.subject | 詞圖 | zh_TW |
| dc.subject | 多重系統結合 | zh_TW |
| dc.subject | Speech Recognition | en |
| dc.subject | Word Graph | en |
| dc.subject | Multi-systems Combination | en |
| dc.subject | Discriminative Decoding | en |
| dc.title | 鑑別式解碼應用於多重系統結合之中文大詞彙語音辨識 | zh_TW |
| dc.title | Discriminative Decoding on Multi-systems Combination for Improved Large Vocabulary Mandarin Speech Recognition | en |
| dc.type | Thesis | |
| dc.date.schoolyear | 94-2 | |
| dc.description.degree | 碩士 | |
| dc.contributor.oralexamcommittee | 陳銘憲,王小川,鄭秋豫,陳信宏 | |
| dc.subject.keyword | 多重系統結合,鑑別式解碼,語音辨識,詞圖, | zh_TW |
| dc.subject.keyword | Multi-systems Combination,Discriminative Decoding,Speech Recognition,Word Graph, | en |
| dc.relation.page | 94 | |
| dc.rights.note | 有償授權 | |
| dc.date.accepted | 2006-07-18 | |
| dc.contributor.author-college | 電機資訊學院 | zh_TW |
| dc.contributor.author-dept | 電信工程學研究所 | zh_TW |
| 顯示於系所單位: | 電信工程學研究所 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-95-1.pdf 未授權公開取用 | 876.9 kB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
