國語語音之發音變異分析及提昇辨識效能之發音模型

Ming-Yi Tsai; 蔡明怡

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/34537

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	李琳山(Lin-Shan Lee)
dc.contributor.author	Ming-Yi Tsai	en
dc.contributor.author	蔡明怡	zh_TW
dc.date.accessioned	2021-06-13T06:13:54Z	-
dc.date.available	2006-02-13
dc.date.copyright	2006-02-13
dc.date.issued	2006
dc.date.submitted	2006-02-06
dc.identifier.citation	[1] W. O’Grady, J. Archibald, M. Aronoff, and J. Rees-Miller, Contemporary Linguistics - An Introduction, 4th ed. Bedford/St. Martin’s, 2001. [2] E. Fosler-Lussier and N. Morgan, “Effects of speaking rate and word frequency on pronunciations in convertional speech,” Speech Communication, vol. 29, no. Modeling pronunciation variation for automatic speech recognition, pp. 137–158, 1999. [3] S. Greenberg, “Speaking in shorthand - a syllable-centric perspective for understanding pronunciation variation,” Speech Communication, vol. 29, pp. 159–176, 1999. [4] H. Strik and C. Cucchiarini, “Modeling pronunciation variation for ASR: A survey of the literature,” Speech Communication, vol. 29, pp. 225–246, 1999. [5] M. Weintraub, E. Fosler, C. Galles, Y.-H. Kao, S. Khudanpur, M. Saraclar, and S. Wegmann, “WS96 project report: Automatic learning of word pronunciation from data,” in JHU Workshop 96 Pronunciation Group, 1996. [6] T. Sloboda and A. Waibel, “Dictionary learning for spontaneous speech recognition,” in International Conference on Spoken Language processing, vol. 4, 1996, pp. 2328–2331. [7] D. Torre, L. Villarrubia, L. Hernandez, and L. Elvira, “Automatic alternative 118 transcription generation and vocabulary selection for flexible word recognizers,” in IEEE International Conference On Acoustics, Speech, And Signal Processing, 1997, pp. 1463–1466. [8] M. Finke and A. Waibel, “Flexible transcription alignment,” in IEEE Automatic Speech Recognition and Understanding Workshop, 1997, pp. 34–40. [9] ——, “Speaking mode dependent pronunciation modeling in large vocabulary conversational speech recognition,” in European Conference on Speech Communication and Technology, 1997, pp. 2379–2382. [10] B. Byrne, M. Finke, S. Khudanpur, J. McDonough, H. Nock, M. Riley, M. Saraclar, C. Wooters, and G. Zavaliagkos, “Pronunciation modelling for conversational speech recognition: A status report from WS97,” in IEEE Workshop on Speech Recognition and Understanding, 1997. [11] W. Byrne, V. Venkataramani, T. Kamm, T. F. Zheng, Z. Song, P. Fung, Y. Liu, and U. Ruhi, “Automatic generation of pronunciation lexicons for Mandarin spontaneous speech,” in IEEE International Conference On Acoustics, Speech, And Signal Processing, 2001, pp. 569–572. [12] M. Riley, W. Byrne, M. Finke, S. Khudanpur, A. Ljolje, J. McDonough, H. Nock, M. Saraclar, C. Wooters, and G. Zavaliagkos, “Stochastic pronunciation modelling from hand-labelled phonetic corpora,” Speech Communication, vol. 29, pp. 209–224, 1999. [13] E. Fosler-Lussier and G. Williams, “Not just what, but also when: Guided automatic pronunciation modeling for broadcast news,” in DARPA Broadcast News Workshop, 1999, pp. 171–174. [14] E. Fosler-Lussier, “Multi-level decision trees for static and dynamic pronuncia119 tion models,” in European Conference on Speech Communication and Technology, 1999, pp. 463–466. [15] T. Holter and T. Svendsen, “Maximum likelihood modelling of pronunciation variation,” Speech Communication, vol. 29, pp. 177–191, 1999. [16] N. Cremelie and J.-P. Martens, “In search of better pronunciation models for speech recognition,” Speech Communication, vol. 29, pp. 115–136, 1999. [17] M.-Y. Tsai, F.-C. Chou, and L.-S. Lee, “Improved pronunciation modelling by inverse word frequency and pronunciation entropy,” in IEEE Automatic Speech Recognition and Understanding Workshop, 2001, pp. 53–56. [18] ——, “Improved pronunciation modeling by properly integrating better approaches for baseform generation, ranking and pruning,” in ISCA Workshop: Pronunciation Modeling and Lexicon Adaptation for Spoken Language, 2002, pp. 77–82. [19] M. Wester, “Pronunciation modeling for ASR - knowledge-based and dataderived methods,” Computer Speech and Language, pp. 69–85, 2003. [20] G. Tajchman, E. Fosler, and D. Jurafsky, “Building multiple pronunciation models for novel words using exploratory computational phonology,” in European Conference on Speech Communication and Technology, 1995, pp. 2247–2250. [21] J. M. Kessens, M. Wester, and H. Strik, “Improving the performance of a Dutch CSR by modeling within-word and cross-word pronunciaiton variation,” Speech Communication, vol. 29, pp. 193–207, 1999. [22] T. J. Hazen, I. L. Hetherington, H. Shu, and K. Livescu, “Pronunciation modeling using a finite-state transducer representation,” in ISCA Workshop: Pronunciation Modeling and Lexicon Adaptation for Spoken Language, 2002, pp. 99–104. 120 [23] T. Fukada, T. Yoshimura, and Y. Sagisaka, “Automatic generation of multiple pronunciations based on neural networks,” Speech Communicaiton, vol. 27, pp. 63–73, 1999. [24] M.-Y. Tsai, F.-C. Chou, and L.-S. Lee, “Pronunciation variation analysis with respect to various linguistic levels and contextual conditions for Mandarin Chinese,” in European Conference on Speech Communication and Technology, 2001, pp. 1445–1448. [25] I. Amdal, F. Korkmazskiy, and A. C. Surendran, “Joint pronunciation modelling of non-native speakers using data-driven methods,” in International Conference on Spoken Language processing, vol. 3, 2000, pp. 622–625. [26] Q. Yang and J.-P. Martens, “Data-driven lexical modeling of pronunciation variations for ASR,” in International Conference on Spoken Language Processing, vol. 1, 2000, pp. 417–420. [27] Q. Yang, J.-P. Martens, P.-J. Ghesquiere, and D. V. Compernolle, “Pronunciation variation modeling for ASR : Large improvements are possible but small ones are likely to achieve,” in ISCA Workshop: Pronunciation Modeling and Lexicon Adaptation for Spoken Language, 2002, pp. 123–128. [28] E. Fosler-Lussier, I. Amdal, and H.-K. J. Kuo, “On the road to improved lexical confusability metrics,” in ISCA Workshop: Pronunciation Modeling and Lexicon Adaptation for Spoken Language, 2002, pp. 53–58. [29] M.Wester, “Pronunciation variation modeling for Dutch automatic speech recognition,” Ph.D. dissertation, University of Nijmegen, The Netherlands, 2002. [30] S. Greenberg and S. Chang, “Linguistic dissection of switchboard-corpus automatic speech recognition systems,” in the ISCA Workshop on Automatic Speech Recognition: Challenges for the New Millennium, 2000. 121 [31] L. R. Rabiner, B.-H. Juang, and C.-H. Lee, “An overview of automatic speech recognition,” in Automatic speech and speaker recognition: Advanced topics, C.- H. Lee, F. K. Soong, and K. K. Paliwal, Eds. Kluwer Academic, 1996, vol. Chapter 1. [32] A. Akmajian, R. A. Demers, A. K. Farmer, and R. M. Harnish, Linguistics: An Introduction to Language and Communication. The MIT Press, 2001. [33] D. Jurafsky and J. H. Martin, Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition. Prentice Hall, 2000. [34] C.-y. Tseng and F.-c. Chou, “Machine readable phonetic transcription system for Chinese dialects spoken in Taiwan,” J. Acoust. Soc. Jpn, vol. 20, pp. 215–223, 1999. [35] F. Seide and J. C. Wang, “Phonetic modeling in the Philips Chinese continuous speech recognition system,” in International Symposium. Chinese Spoken Language Processing, 1998, pp. 54–59. [36] S. Young, D. Kershaw, J. Odell, D. Ollason, V. Valtchev, and P. Woodland, “The HTK book Version 3.0,” 2000. [37] B. H. Juang and L. R. Rabiner, “A probabilistic distance measure for hidden markov models,” ATT Technical Journal, vol. 64 No.2, pp. 391–408, 1985. [38] M. Vihola, M. Harju, P. Salmela, J. Suontausta, and J. Savela, “Two dissimilarity measures for hmms and their application in phoneme model clustering,” in IEEE International Conference On Acoustics, Speech, And Signal Processing, vol. 1, 2002, pp. 933–936. [39] R. Singh, B. Raj, and R. M. Stern, “Structured redefinition of sound units by 122 merging and splitting for improved speech recognition,” in INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, 2000, pp. 151–154. [40] J. Kohler, “Multi-lingual phoneme recognition exploiting acoustic-phonetic similarities of sounds,” in INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, 1996. [41] A. Kienappel, D. Geller, and R. Bippus, “Cross-language transfer of multilingual phoneme models,” in ISCA ITRW ASR, 2000, pp. 155–159. [42] Z. Zhang and S. Furui, “An online incremental speaker adaptation method using speaker-clustered initial models,” in INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, vol. 3, 2000, pp. 694–697. [43] P. Geutner, M. Finke, and A. Waibel, “Selection criteria for hypothesis driven lexical adaptation,” in IEEE International Conference On Acoustics, Speech, And Signal Processing, vol. 2, 1999, pp. 617–620. [44] B. T. Tan, Y. Gu, and T. Thomas, “Word confusability measures for vocabulary selection in speech recognition,” in IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP, 1999, pp. 185–188. [45] J. Yi and J. Glass, “Information-theoretic criteria for unit selection synthesis,” in INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, 2002, pp. 2617–2620. [46] Y. Singer and M. K. Warmuth, “Batch and on-line parameter estimation of gaussian mixtures based on the joint entropy,” in Advances in Neural Information Processing Systems, 1998, pp. 578–584. [47] M.-Y. Tsai, F.-C. Chou, and L.-S. Lee, “Pronunciation modeling with reduced confusion for mandarin chinese using a three-stage framework,” to be published in IEEE Transactions on Speech and Audio Processing, 2005. 123 [48] R. Baeza and B. Ribeiro, Modern Information Retrieval. ACM Press, 1999. [49] F. Wilcoxon, Individual Comparisons by Ranking Methods. Biometrics, 1945, vol. 1. [50] M. Saraclar, H. Nock, and S. Khudanpur, “Pronunciation modeling by sharing Gaussian densities across phonetic models,” Computer Speech and Language, vol. 14, pp. 136–160, 2000. [51] B.-H. Juang and S. Katagiri, “Discriminative learning for minimum error classification,” IEEE Transactions on Signal Processing, vol. 40, no. 12, pp. 3043 – 3054, 1992. [52] W. Chou, C.-H. Lee, and B.-H. Juang, “Minimum error rate training based on n-best string models,” in IEEE International Conference On Acoustics, Speech, And Signal Processing, vol. 2, 1993, pp. 652 – 655. [53] B.-H. J. W. H. C.-H. Lee, “Minimum classification error rate methods for speech recognition,” IEEE Transactions on Speech and Audio Processing, vol. 5, no. 3, pp. 257 – 265, 1997. [54] L. Bahl, P. Brown, P. d. Souza, and R. Mercer, “Maximum mutual information estimation of hidden markov model parameters for speech recognition,” in IEEE International Conference On Acoustics, Speech, And Signal Processing, 1986, pp. 49–52. [55] D. Povey and P. Woodland, “Improved discriminative training techniques for large vocabulary continuous speech recognition,” in IEEE International Conference On Acoustics, Speech, And Signal Processing, vol. 1, 2001, pp. 45 – 48. [56] R. Schluter, W. Macherey, B. Muller, and H. Ney, “Comparison of discriminative training criteria and optimization methods for speech recognition,” Speech Communication, vol. 34, pp. 287–310, 2001. 124 [57] H.-K. J. Kuo, E. Fosler-Lussier, H. Jiang, and C.-H. Lee, “Discriminative training of language models for speech recognition,” in IEEE International Conference On Acoustics, Speech, And Signal Processing, vol. 1, 2002, pp. 325–328. [58] P.Woodland and D. Povey, “Large scale discriminative training of hidden markov models for speech recognition,” Computer Speech and Language, vol. 16, no. 1, pp. 25–47, 2002. [59] L. R. Bahl, F. Jelinek, and R. L. Mercer, “A maximum likelihood approach to continuous speech recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 179–190, 1983. [60] H. Printz and P. Olsen, “Theory and practice of acoustic confusability,” Computer, Speech and Language, vol. 16, no. 1, pp. 131–164, 2002. [61] S. Chen, D. Beeferman, and R. Rosenfeld, “Evaluation metrics for language models,” in DARPA Broadcast News Transcription and Understanding Workshop, 1998, pp. 275–280. [62] C.-S. Huang, C.-H. Lee, and H.-C. Wang, “New model-based HMM distances with applications to run-time ASR error estimation and model tuning,” in European Conference on Speech Communication and Technology, 2003, pp. 457–460. [63] M.-Y. Tsai and L.-S. Lee, “Pronunciation modeling for spontaneous speech by maximizing word correct rate in a production-recognition model,” in The ISCA and IEEE Workshop on Spontaneous Speech Processing and Recognition, 2003. [64] Y. Deng, M. Mahajan, and A. Acero, “Estimating speech recognition error rate without acoustic test data,” in European Conference on Speech Communication and Technology, 2003, pp. 929–932. 125 [65] F. E. Korkmazskiy and B.-H. Juang, “Discriminative training of the pronunciation networks,” in IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP, 1997, pp. 137–144. [66] F. Korkmazskiy and B.-H. Juang, “Statistical modeling of pronunciation and production variations for speech recognition,” in International Conference on Spoken Language Processing, vol. 2, 1998, pp. 149–152. [67] H. Schramm and P. Beyerlein, “Towards discriminative lexicon optimization,” in European Conference on Speech Communication and Technology, 2001, pp. 1457–1460. [68] ——, “Discriminative optimization of the lexical model,” in ISCA Workshop: Pronunciation Modeling and Lexicon Adaptation for Spoken Language, 2002, pp. 105–110. [69] W. J. M. Levelt, “Spoken word production: A theory of lexical access,” in National Academy of Sciences, vol. 98, 2001, pp. 13 464–13 471. [70] M.-Y. Tsai and L.-S. Lee, “Pronunciation variation analysis based on acoustic and phonemic distance measures with application examples on Mandarin Chinese,” in IEEE Workshop on Automatic Speech Recognition and Understanding, 2003, pp. 117– 122. [71] D. McAllaster, L. Gillick, F. Scattone, and M. Newman, “Fabricating conversational speech datawith acoustic models: A program to examine model-data mismatch,” in International Conference on Spoken Language Processing, 1998, pp. 1847–1850. [72] J.-X. Yu, “Large vocabulary continuous mandarin speech recognition using finitestate machines,” Master Thesis, National Taiwan University, 2004.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/34537	-
dc.description.abstract	本論文主要分兩大部份，第一部份對國語語音的發音變異作深入的量化分析，第二部份是發展發音變異模型以提昇辨識效能。在第一部份中，我們利用一些統計的方法來分析語音信號中的發音變異，這些統計方法包括新提出的聲學距離(Acoustic distance)和音素距離(Phonemic distance)，及發音熵值(Entropy)、音韻規律(Phonological rules)等。第三章分析發音變異在次音節(聲韻母)、音節、字、詞這四種不同的語音單位的層次上，在不同的說話速率、不同詞頻以及不同的前後文關係下的平均熵值，來觀察發音變異出現的情形。第四章提出了新的分析架構，這個架構同時考慮聲學距離和音素距離，基於這個架構，我們分析了聲韻母或音素在聲學上的混淆度（聲學距離）以及在發音上的混淆度（音素距離）。為了更深入瞭解發音變異出現的時機，在第五章裡，我們從語音信號中自動取得統計式的音韻規律，並加以分析，藉以增加對中文發音變異的瞭解。這些是用中文大字彙語料來分析的，包括廣播新聞(LDC HUB-4NE)和對話語料 (LDC CALLHOME)。此外，雖然讓詞典中的若干詞可以有多種發音來處理發音變異的確可增進語音辨識率，但是這些額外的發音也同時增加了辨識過程中詞的混淆度，因而限制了所能增進的辨識率。為了減少這些可能引起的混淆度，本論文在第二部份的第六章提出了自動建立發音變異模型的新架構，這個架構包括三個主要的步驟：發音變異資訊的產生、發音變異的排序、發音變異的選取。另外，我們也在第七章中提出了新方法來衡量詞典中各種發音的混淆度，實驗結果顯示衡量出來的混淆度跟語音辨識率有很大的相關性，而且這個架構所建立的發音變異詞典可以有效地降低混淆度和語音辨識的錯誤率。為了使混淆度降到最低，本論文也在第八章中提出了一個快速的鑑別式訓練架構，可以用來訓練詞典中的發音機率，這個架構利用一個模擬語音發生和辨識的整合模型來得到模擬的辨識錯誤資訊，使得訓練過程快速又有效。這些實驗是作在中文大字彙語音辨識上，用的語料包括廣播新聞(LDC HUB-4NE)和對話語料 (LDC CALLHOME)。	zh_TW
dc.description.abstract	This thesis consists of two parts, one on pronunciation variation analysis and the other on pronunciation modeling, both for Mandarin Chinese. In the first part of the thesis, the pronunciation variation for Mandarin Chinese was extensively analyzed in a quantitative way. Various statistical methods were used for the analysis, including the proposed acoustic and phonemic distances in addition to pronunciation entropy and phonological rules. The pronunciation entropy were used to analyze the dependency of pronunciation variation at different linguistic levels on various contextual conditions, different speaking rates and different occurring frequencies. On the other hand, the proposed framework based on the acoustic/phonemic distances was used for analyzing the acoustic and phonemic confusion between Initial/Finals or phonemes. Furthermore, the probabilistic phonological rules were derived automatically from speech data to analyze the phonological transformation in various context conditions. All these analyses were carried out on planned (LDC HUB-4NE) and spontaneous (LDC CALLHOME) Mandarin Chinese speech corpora. On the other hand, multiple-pronunciation dictionaries have been found to be useful in pronunciation modeling for speech recognition. However, the extra pronunciation variants added in the dictionary inevitably increase the confusion among different words during recognition, and consequently limit the achievable improvements in the recognition performance. The second part of this thesis therefore further proposed a three-stage framework for Mandarin Chinese to construct automatically the multiple-pronunciation dictionary while reducing the possible confusion caused. The proposed framework includes pronunciation generation (Stage 1), ranking (Stage 2) and pruning (Stage 3). New measures of confusability for multiple-pronunciation dictionaries were developed and shown to have a very strong correlation with the recognition performance. With the proposed framework, it was shown that the confusability as measured can be reduced and recognition performance improved stage by stage. To further reduce the possible confusion during recognition, it was then proposed that the pronunciation probabilities in the multiple-pronunciation dictionaries can be re-estimated within a proposed rapid discriminative training framework using simulated recognition errors based on a Speech Production/Recognition Model. The experimental results show that the recognition performance can be improved over the training iterations. These findings were verified by a series of experiments performed on planned (LDC HUB-4NE) and spontaneous (LDC CALLHOME) Mandarin Chinese speech corpora.	en
dc.description.provenance	Made available in DSpace on 2021-06-13T06:13:54Z (GMT). No. of bitstreams: 1 ntu-95-F87942018-1.pdf: 1816371 bytes, checksum: 7767983f7d28058bd093cf773de494f4 (MD5) Previous issue date: 2006	en
dc.description.tableofcontents	Chinese Abstract i English Abstract iii Contents v List of Tables ix List of Figures xi 1 Introduction 1 1.1 Pronunciation Variation Analysis andModeling . . . . . . . . . . . . 1 1.2 Review of the State of the Art . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Organization of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . 5 2 Background Review and Corpora Used 7 2.1 Introduction of Pronunciation Variation. . . . . . . . . . . . . . . . . 7 2.2 Automatic Speech Recognition with a Multiple-Pronunciation Dictionary 8 2.3 Introduction ofMandarin Chinese . . . . . . . . . . . . . . . . . . . . 9 2.3.1 General Characteristics ofMandarin Chinese . . . . . . . . . . 9 2.3.2 Syllable Structure ofMandarin Chinese . . . . . . . . . . . . . 10 2.3.3 Sub-word Units of Mandarin Chinese . . . . . . . . . . . . . . 11 2.4 Corpora Used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.4.1 The Planned Speech . . . . . . . . . . . . . . . . . . . . . . . 13 2.4.2 The Spontaneous Speech . . . . . . . . . . . . . . . . . . . . . 14 PART I Pronunciation Variation Analysis 15 3 Pronunciation Variation Analysis for Mandarin Chinese at Various Linguistic Levels and Contextual Conditions 17 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.2 Analysis Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.3 Pronunciation Variation at Different Linguistic Levels and Contextual Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.4 Pronunciation Variation at Different Speaking Rates . . . . . . . . . 22 3.5 Pronunciation Variation at DifferentWord Frequencies . . . . . . . . 24 3.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4 Pronunciation Variation Analysis Based on Acoustic and Phonemic Distance Measures 27 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 4.2 Acoustic Distance/Confusion . . . . . . . . . . . . . . . . . . . . . . . 29 4.3 Phonemic Distance/Confusion . . . . . . . . . . . . . . . . . . . . . . 30 4.4 Acoustic/Phonemic Distance Plane for Analysis . . . . . . . . . . . . 30 4.5 Analysis Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.6 Analysis onMandarin Consonants . . . . . . . . . . . . . . . . . . . . 32 4.7 Analysis onMandarin Vowels . . . . . . . . . . . . . . . . . . . . . . 36 4.8 Acoustic Distance of Retrained AcousticModels . . . . . . . . . . . . 39 4.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 5 Pronunciation Variation Analysis Using Data-driven Phonological Rules 41 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 5.2 Analysis onMandarin Consonants or Initials . . . . . . . . . . . . . . 43 5.3 Analysis onMandarin Vowels and Finals . . . . . . . . . . . . . . . . 45 5.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 PART II Pronunciation Modeling for ASR 59 6 Pronunciation Modeling with Reduced Confusion Using a Threestage Framework 61 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 6.2 Stage 1 - Pronunciation Generation . . . . . . . . . . . . . . . . . . . 62 6.3 Stage 2 - Pronunciation Ranking . . . . . . . . . . . . . . . . . . . . 66 6.4 Stage 3 - Pronunciation Pruning . . . . . . . . . . . . . . . . . . . . . 69 6.5 Pronunciation Probabilities . . . . . . . . . . . . . . . . . . . . . . . 71 6.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 7 Experimental Results and Analysis of Pronunciation Modeling using the Three-stage Framework 73 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 7.2 Speech Corpora and Experimental Setup . . . . . . . . . . . . . . . . 74 7.2.1 The Planned Speech . . . . . . . . . . . . . . . . . . . . . . . 74 7.2.2 The Spontaneous Speech . . . . . . . . . . . . . . . . . . . . . 75 7.2.3 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . 76 7.3 Stage 1 - Pronunciation Generation . . . . . . . . . . . . . . . . . . . 76 7.4 Confusability Measures for the Dictionaries . . . . . . . . . . . . . . . 79 7.5 Stage 2 - Pronunciation Ranking . . . . . . . . . . . . . . . . . . . . 82 7.6 Effect of Weighting the Pronunciation Probabilities . . . . . . . . . . 85 7.7 Stage 3 - Pronunciation Pruning . . . . . . . . . . . . . . . . . . . . . 88 7.8 Scaling the Pronunciation Probabilities . . . . . . . . . . . . . . . . . 90 7.9 Interactions among Acoustic, Pronunciaiton and Language Models . . 91 7.10 Summary with Further Analysis . . . . . . . . . . . . . . . . . . . . . 94 7.11 Parallel Results for Spantaneous Speech . . . . . . . . . . . . . . . . 96 7.12 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 8 Rapid Discriminative Training in Pronunciation Modeling Using Simulated Recognition Errors Based on Speech Production/Recognition Model 101 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 8.2 Speech Production/Recognition Model for Simulating Recognition Errors104 8.2.1 Speech ProductionModel . . . . . . . . . . . . . . . . . . . . 104 8.2.2 Speech RecognitionModel . . . . . . . . . . . . . . . . . . . . 106 8.3 Discriminative Training Based on the Simulated Errors . . . . . . . . 107 8.4 Rapid Discriminative Training on PronunciationModel . . . . . . . . 108 8.5 Experiments and Discussions . . . . . . . . . . . . . . . . . . . . . . . 110 8.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 9 Conclusion 115 Bibliography 117
dc.language.iso	en
dc.subject	混淆度	zh_TW
dc.subject	發音變異	zh_TW
dc.subject	發音變異模型	zh_TW
dc.subject	聲學距離	zh_TW
dc.subject	音素距離	zh_TW
dc.subject	熵值	zh_TW
dc.subject	音韻規律	zh_TW
dc.subject	Entropy	en
dc.subject	Pronunciation Modeling	en
dc.subject	Pronunciation Variation	en
dc.subject	Phonological rules	en
dc.subject	Confusion	en
dc.subject	Acoustic distance	en
dc.subject	Phonemic distance	en
dc.title	國語語音之發音變異分析及提昇辨識效能之發音模型	zh_TW
dc.title	Pronunciation Variation Analysis and Modeling for Mandarin Chinese for Improved Speech Recognition	en
dc.type	Thesis
dc.date.schoolyear	94-1
dc.description.degree	博士
dc.contributor.oralexamcommittee	鄭秋豫,陳信宏,王小川,鄭伯順,陳信希
dc.subject.keyword	發音變異,發音變異模型,混淆度,聲學距離,音素距離,熵值,音韻規律,	zh_TW
dc.subject.keyword	Pronunciation Variation,Pronunciation Modeling,Confusion,Acoustic distance,Phonemic distance,Entropy,Phonological rules,	en
dc.relation.page	125
dc.rights.note	有償授權
dc.date.accepted	2006-02-07
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	電信工程學研究所	zh_TW
顯示於系所單位：	電信工程學研究所

文件中的檔案：

檔案	大小	格式
ntu-95-1.pdf 未授權公開取用	1.77 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。