Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 電信工程學研究所
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/34537
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor李琳山(Lin-Shan Lee)
dc.contributor.authorMing-Yi Tsaien
dc.contributor.author蔡明怡zh_TW
dc.date.accessioned2021-06-13T06:13:54Z-
dc.date.available2006-02-13
dc.date.copyright2006-02-13
dc.date.issued2006
dc.date.submitted2006-02-06
dc.identifier.citation[1] W. O’Grady, J. Archibald, M. Aronoff, and J. Rees-Miller, Contemporary Linguistics
- An Introduction, 4th ed. Bedford/St. Martin’s, 2001.
[2] E. Fosler-Lussier and N. Morgan, “Effects of speaking rate and word frequency
on pronunciations in convertional speech,” Speech Communication, vol. 29, no.
Modeling pronunciation variation for automatic speech recognition, pp. 137–158,
1999.
[3] S. Greenberg, “Speaking in shorthand - a syllable-centric perspective for understanding
pronunciation variation,” Speech Communication, vol. 29, pp. 159–176,
1999.
[4] H. Strik and C. Cucchiarini, “Modeling pronunciation variation for ASR: A survey
of the literature,” Speech Communication, vol. 29, pp. 225–246, 1999.
[5] M. Weintraub, E. Fosler, C. Galles, Y.-H. Kao, S. Khudanpur, M. Saraclar, and
S. Wegmann, “WS96 project report: Automatic learning of word pronunciation
from data,” in JHU Workshop 96 Pronunciation Group, 1996.
[6] T. Sloboda and A. Waibel, “Dictionary learning for spontaneous speech recognition,”
in International Conference on Spoken Language processing, vol. 4, 1996,
pp. 2328–2331.
[7] D. Torre, L. Villarrubia, L. Hernandez, and L. Elvira, “Automatic alternative
118
transcription generation and vocabulary selection for flexible word recognizers,”
in IEEE International Conference On Acoustics, Speech, And Signal Processing,
1997, pp. 1463–1466.
[8] M. Finke and A. Waibel, “Flexible transcription alignment,” in IEEE Automatic
Speech Recognition and Understanding Workshop, 1997, pp. 34–40.
[9] ——, “Speaking mode dependent pronunciation modeling in large vocabulary
conversational speech recognition,” in European Conference on Speech Communication
and Technology, 1997, pp. 2379–2382.
[10] B. Byrne, M. Finke, S. Khudanpur, J. McDonough, H. Nock, M. Riley, M. Saraclar,
C. Wooters, and G. Zavaliagkos, “Pronunciation modelling for conversational
speech recognition: A status report from WS97,” in IEEE Workshop on
Speech Recognition and Understanding, 1997.
[11] W. Byrne, V. Venkataramani, T. Kamm, T. F. Zheng, Z. Song, P. Fung, Y. Liu,
and U. Ruhi, “Automatic generation of pronunciation lexicons for Mandarin
spontaneous speech,” in IEEE International Conference On Acoustics, Speech,
And Signal Processing, 2001, pp. 569–572.
[12] M. Riley, W. Byrne, M. Finke, S. Khudanpur, A. Ljolje, J. McDonough, H. Nock,
M. Saraclar, C. Wooters, and G. Zavaliagkos, “Stochastic pronunciation modelling
from hand-labelled phonetic corpora,” Speech Communication, vol. 29, pp.
209–224, 1999.
[13] E. Fosler-Lussier and G. Williams, “Not just what, but also when: Guided automatic
pronunciation modeling for broadcast news,” in DARPA Broadcast News
Workshop, 1999, pp. 171–174.
[14] E. Fosler-Lussier, “Multi-level decision trees for static and dynamic pronuncia119
tion models,” in European Conference on Speech Communication and Technology,
1999, pp. 463–466.
[15] T. Holter and T. Svendsen, “Maximum likelihood modelling of pronunciation
variation,” Speech Communication, vol. 29, pp. 177–191, 1999.
[16] N. Cremelie and J.-P. Martens, “In search of better pronunciation models for
speech recognition,” Speech Communication, vol. 29, pp. 115–136, 1999.
[17] M.-Y. Tsai, F.-C. Chou, and L.-S. Lee, “Improved pronunciation modelling by
inverse word frequency and pronunciation entropy,” in IEEE Automatic Speech
Recognition and Understanding Workshop, 2001, pp. 53–56.
[18] ——, “Improved pronunciation modeling by properly integrating better approaches
for baseform generation, ranking and pruning,” in ISCA Workshop:
Pronunciation Modeling and Lexicon Adaptation for Spoken Language, 2002, pp.
77–82.
[19] M. Wester, “Pronunciation modeling for ASR - knowledge-based and dataderived
methods,” Computer Speech and Language, pp. 69–85, 2003.
[20] G. Tajchman, E. Fosler, and D. Jurafsky, “Building multiple pronunciation models
for novel words using exploratory computational phonology,” in European
Conference on Speech Communication and Technology, 1995, pp. 2247–2250.
[21] J. M. Kessens, M. Wester, and H. Strik, “Improving the performance of a Dutch
CSR by modeling within-word and cross-word pronunciaiton variation,” Speech
Communication, vol. 29, pp. 193–207, 1999.
[22] T. J. Hazen, I. L. Hetherington, H. Shu, and K. Livescu, “Pronunciation modeling
using a finite-state transducer representation,” in ISCA Workshop: Pronunciation
Modeling and Lexicon Adaptation for Spoken Language, 2002, pp. 99–104.
120
[23] T. Fukada, T. Yoshimura, and Y. Sagisaka, “Automatic generation of multiple
pronunciations based on neural networks,” Speech Communicaiton, vol. 27, pp.
63–73, 1999.
[24] M.-Y. Tsai, F.-C. Chou, and L.-S. Lee, “Pronunciation variation analysis with
respect to various linguistic levels and contextual conditions for Mandarin Chinese,”
in European Conference on Speech Communication and Technology, 2001,
pp. 1445–1448.
[25] I. Amdal, F. Korkmazskiy, and A. C. Surendran, “Joint pronunciation modelling
of non-native speakers using data-driven methods,” in International Conference
on Spoken Language processing, vol. 3, 2000, pp. 622–625.
[26] Q. Yang and J.-P. Martens, “Data-driven lexical modeling of pronunciation variations
for ASR,” in International Conference on Spoken Language Processing,
vol. 1, 2000, pp. 417–420.
[27] Q. Yang, J.-P. Martens, P.-J. Ghesquiere, and D. V. Compernolle, “Pronunciation
variation modeling for ASR : Large improvements are possible but small
ones are likely to achieve,” in ISCA Workshop: Pronunciation Modeling and
Lexicon Adaptation for Spoken Language, 2002, pp. 123–128.
[28] E. Fosler-Lussier, I. Amdal, and H.-K. J. Kuo, “On the road to improved lexical
confusability metrics,” in ISCA Workshop: Pronunciation Modeling and Lexicon
Adaptation for Spoken Language, 2002, pp. 53–58.
[29] M.Wester, “Pronunciation variation modeling for Dutch automatic speech recognition,”
Ph.D. dissertation, University of Nijmegen, The Netherlands, 2002.
[30] S. Greenberg and S. Chang, “Linguistic dissection of switchboard-corpus automatic
speech recognition systems,” in the ISCA Workshop on Automatic Speech
Recognition: Challenges for the New Millennium, 2000.
121
[31] L. R. Rabiner, B.-H. Juang, and C.-H. Lee, “An overview of automatic speech
recognition,” in Automatic speech and speaker recognition: Advanced topics, C.-
H. Lee, F. K. Soong, and K. K. Paliwal, Eds. Kluwer Academic, 1996, vol.
Chapter 1.
[32] A. Akmajian, R. A. Demers, A. K. Farmer, and R. M. Harnish, Linguistics: An
Introduction to Language and Communication. The MIT Press, 2001.
[33] D. Jurafsky and J. H. Martin, Speech and Language Processing: An Introduction
to Natural Language Processing, Computational Linguistics and Speech Recognition.
Prentice Hall, 2000.
[34] C.-y. Tseng and F.-c. Chou, “Machine readable phonetic transcription system for
Chinese dialects spoken in Taiwan,” J. Acoust. Soc. Jpn, vol. 20, pp. 215–223,
1999.
[35] F. Seide and J. C. Wang, “Phonetic modeling in the Philips Chinese continuous
speech recognition system,” in International Symposium. Chinese Spoken
Language Processing, 1998, pp. 54–59.
[36] S. Young, D. Kershaw, J. Odell, D. Ollason, V. Valtchev, and P. Woodland,
“The HTK book Version 3.0,” 2000.
[37] B. H. Juang and L. R. Rabiner, “A probabilistic distance measure for hidden
markov models,” ATT Technical Journal, vol. 64 No.2, pp. 391–408, 1985.
[38] M. Vihola, M. Harju, P. Salmela, J. Suontausta, and J. Savela, “Two dissimilarity
measures for hmms and their application in phoneme model clustering,” in IEEE
International Conference On Acoustics, Speech, And Signal Processing, vol. 1,
2002, pp. 933–936.
[39] R. Singh, B. Raj, and R. M. Stern, “Structured redefinition of sound units by
122
merging and splitting for improved speech recognition,” in INTERNATIONAL
CONFERENCE ON SPOKEN LANGUAGE PROCESSING, 2000, pp. 151–154.
[40] J. Kohler, “Multi-lingual phoneme recognition exploiting acoustic-phonetic similarities
of sounds,” in INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE
PROCESSING, 1996.
[41] A. Kienappel, D. Geller, and R. Bippus, “Cross-language transfer of multilingual
phoneme models,” in ISCA ITRW ASR, 2000, pp. 155–159.
[42] Z. Zhang and S. Furui, “An online incremental speaker adaptation method using
speaker-clustered initial models,” in INTERNATIONAL CONFERENCE ON
SPOKEN LANGUAGE PROCESSING, vol. 3, 2000, pp. 694–697.
[43] P. Geutner, M. Finke, and A. Waibel, “Selection criteria for hypothesis driven
lexical adaptation,” in IEEE International Conference On Acoustics, Speech,
And Signal Processing, vol. 2, 1999, pp. 617–620.
[44] B. T. Tan, Y. Gu, and T. Thomas, “Word confusability measures for vocabulary
selection in speech recognition,” in IEEE AUTOMATIC SPEECH RECOGNITION
AND UNDERSTANDING WORKSHOP, 1999, pp. 185–188.
[45] J. Yi and J. Glass, “Information-theoretic criteria for unit selection synthesis,”
in INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING,
2002, pp. 2617–2620.
[46] Y. Singer and M. K. Warmuth, “Batch and on-line parameter estimation of
gaussian mixtures based on the joint entropy,” in Advances in Neural Information
Processing Systems, 1998, pp. 578–584.
[47] M.-Y. Tsai, F.-C. Chou, and L.-S. Lee, “Pronunciation modeling with reduced
confusion for mandarin chinese using a three-stage framework,” to be published
in IEEE Transactions on Speech and Audio Processing, 2005.
123
[48] R. Baeza and B. Ribeiro, Modern Information Retrieval. ACM Press, 1999.
[49] F. Wilcoxon, Individual Comparisons by Ranking Methods. Biometrics, 1945,
vol. 1.
[50] M. Saraclar, H. Nock, and S. Khudanpur, “Pronunciation modeling by sharing
Gaussian densities across phonetic models,” Computer Speech and Language,
vol. 14, pp. 136–160, 2000.
[51] B.-H. Juang and S. Katagiri, “Discriminative learning for minimum error classification,”
IEEE Transactions on Signal Processing, vol. 40, no. 12, pp. 3043 –
3054, 1992.
[52] W. Chou, C.-H. Lee, and B.-H. Juang, “Minimum error rate training based on
n-best string models,” in IEEE International Conference On Acoustics, Speech,
And Signal Processing, vol. 2, 1993, pp. 652 – 655.
[53] B.-H. J. W. H. C.-H. Lee, “Minimum classification error rate methods for speech
recognition,” IEEE Transactions on Speech and Audio Processing, vol. 5, no. 3,
pp. 257 – 265, 1997.
[54] L. Bahl, P. Brown, P. d. Souza, and R. Mercer, “Maximum mutual information
estimation of hidden markov model parameters for speech recognition,” in IEEE
International Conference On Acoustics, Speech, And Signal Processing, 1986, pp.
49–52.
[55] D. Povey and P. Woodland, “Improved discriminative training techniques for
large vocabulary continuous speech recognition,” in IEEE International Conference
On Acoustics, Speech, And Signal Processing, vol. 1, 2001, pp. 45 – 48.
[56] R. Schluter, W. Macherey, B. Muller, and H. Ney, “Comparison of discriminative
training criteria and optimization methods for speech recognition,” Speech
Communication, vol. 34, pp. 287–310, 2001.
124
[57] H.-K. J. Kuo, E. Fosler-Lussier, H. Jiang, and C.-H. Lee, “Discriminative training
of language models for speech recognition,” in IEEE International Conference
On Acoustics, Speech, And Signal Processing, vol. 1, 2002, pp. 325–328.
[58] P.Woodland and D. Povey, “Large scale discriminative training of hidden markov
models for speech recognition,” Computer Speech and Language, vol. 16, no. 1,
pp. 25–47, 2002.
[59] L. R. Bahl, F. Jelinek, and R. L. Mercer, “A maximum likelihood approach
to continuous speech recognition,” IEEE Transactions on Pattern Analysis and
Machine Intelligence, pp. 179–190, 1983.
[60] H. Printz and P. Olsen, “Theory and practice of acoustic confusability,” Computer,
Speech and Language, vol. 16, no. 1, pp. 131–164, 2002.
[61] S. Chen, D. Beeferman, and R. Rosenfeld, “Evaluation metrics for language
models,” in DARPA Broadcast News Transcription and Understanding Workshop,
1998, pp. 275–280.
[62] C.-S. Huang, C.-H. Lee, and H.-C. Wang, “New model-based HMM distances
with applications to run-time ASR error estimation and model tuning,” in European
Conference on Speech Communication and Technology, 2003, pp. 457–460.
[63] M.-Y. Tsai and L.-S. Lee, “Pronunciation modeling for spontaneous speech by
maximizing word correct rate in a production-recognition model,” in The ISCA
and IEEE Workshop on Spontaneous Speech Processing and Recognition, 2003.
[64] Y. Deng, M. Mahajan, and A. Acero, “Estimating speech recognition error rate
without acoustic test data,” in European Conference on Speech Communication
and Technology, 2003, pp. 929–932.
125
[65] F. E. Korkmazskiy and B.-H. Juang, “Discriminative training of the pronunciation
networks,” in IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING
WORKSHOP, 1997, pp. 137–144.
[66] F. Korkmazskiy and B.-H. Juang, “Statistical modeling of pronunciation and
production variations for speech recognition,” in International Conference on
Spoken Language Processing, vol. 2, 1998, pp. 149–152.
[67] H. Schramm and P. Beyerlein, “Towards discriminative lexicon optimization,”
in European Conference on Speech Communication and Technology, 2001, pp.
1457–1460.
[68] ——, “Discriminative optimization of the lexical model,” in ISCA Workshop:
Pronunciation Modeling and Lexicon Adaptation for Spoken Language, 2002, pp.
105–110.
[69] W. J. M. Levelt, “Spoken word production: A theory of lexical access,” in National
Academy of Sciences, vol. 98, 2001, pp. 13 464–13 471.
[70] M.-Y. Tsai and L.-S. Lee, “Pronunciation variation analysis based on acoustic
and phonemic distance measures with application examples on Mandarin Chinese,”
in IEEE Workshop on Automatic Speech Recognition and Understanding,
2003, pp. 117– 122.
[71] D. McAllaster, L. Gillick, F. Scattone, and M. Newman, “Fabricating conversational
speech datawith acoustic models: A program to examine model-data
mismatch,” in International Conference on Spoken Language Processing, 1998,
pp. 1847–1850.
[72] J.-X. Yu, “Large vocabulary continuous mandarin speech recognition using finitestate
machines,” Master Thesis, National Taiwan University, 2004.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/34537-
dc.description.abstract本論文主要分兩大部份,第一部份對國語語音的發音變異作深入的量化分析,第二部份是發展發音變異模型以提昇辨識效能。
在第一部份中,我們利用一些統計的方法來分析語音信號中的發音變異,這些統計方法包括新提出的聲學距離(Acoustic distance)和音素距離(Phonemic distance),及發音熵值(Entropy)、音韻規律(Phonological rules)等。第三章分析發音變異在次音節(聲韻母)、音節、字、詞這四種不同的語音單位的層次上,在不同的說話速率、不同詞頻以及不同的前後文關係下的平均熵值,來觀察發音變異出現的情形。第四章提出了新的分析架構,這個架構同時考慮聲學距離和音素距離,基於這個架構,我們分析了聲韻母或音素在聲學上的混淆度(聲學距離)以及在發音上的混淆度(音素距離)。為了更深入瞭解發音變異出現的時機,在第五章裡,我們從語音信號中自動取得統計式的音韻規律,並加以分析,藉以增加對中文發音變異的瞭解。這些是用中文大字彙語料來分析的,包括廣播新聞(LDC HUB-4NE)和對話語料 (LDC CALLHOME)。
此外,雖然讓詞典中的若干詞可以有多種發音來處理發音變異的確可增進語音辨識率,但是這些額外的發音也同時增加了辨識過程中詞的混淆度,因而限制了所能增進的辨識率。為了減少這些可能引起的混淆度,本論文在第二部份的第六章提出了自動建立發音變異模型的新架構,這個架構包括三個主要的步驟:發音變異資訊的產生、發音變異的排序、發音變異的選取。另外,我們也在第七章中提出了新方法來衡量詞典中各種發音的混淆度,實驗結果顯示衡量出來的混淆度跟語音辨識率有很大的相關性,而且這個架構所建立的發音變異詞典可以有效地降低混淆度和語音辨識的錯誤率。為了使混淆度降到最低,本論文也在第八章中提出了一個快速的鑑別式訓練架構,可以用來訓練詞典中的發音機率,這個架構利用一個模擬語音發生和辨識的整合模型來得到模擬的辨識錯誤資訊,使得訓練過程快速又有效。這些實驗是作在中文大字彙語音辨識上,用的語料包括廣播新聞(LDC HUB-4NE)和對話語料 (LDC CALLHOME)。
zh_TW
dc.description.abstractThis thesis consists of two parts, one on pronunciation variation analysis and the other on pronunciation modeling, both for Mandarin Chinese.
In the first part of the thesis, the pronunciation variation for Mandarin Chinese was extensively analyzed in a quantitative way. Various statistical methods were used for the analysis, including the proposed acoustic and phonemic distances in addition to pronunciation entropy and phonological rules.
The pronunciation entropy were used to analyze the dependency of pronunciation variation at different linguistic levels on various contextual conditions, different speaking rates and different occurring frequencies. On the other hand, the proposed framework based on the acoustic/phonemic distances was used for analyzing the acoustic and phonemic confusion between Initial/Finals or phonemes. Furthermore, the probabilistic phonological rules were derived automatically from speech data to analyze the phonological transformation in various context conditions.
All these analyses were carried out on planned (LDC HUB-4NE) and spontaneous (LDC CALLHOME) Mandarin Chinese speech corpora.
On the other hand, multiple-pronunciation dictionaries have been found to be useful in pronunciation modeling for speech recognition. However, the extra pronunciation variants added in the dictionary inevitably increase the confusion among different words during recognition, and consequently limit the achievable improvements in the recognition performance. The second part of this thesis therefore further proposed a three-stage framework for Mandarin Chinese to construct automatically the multiple-pronunciation dictionary while reducing the possible confusion caused. The proposed framework includes pronunciation generation (Stage 1), ranking (Stage 2) and pruning (Stage 3). New measures of confusability for multiple-pronunciation dictionaries were developed and shown to have a very strong correlation with the recognition performance. With the proposed framework, it was shown that the confusability as measured can be reduced and recognition performance improved stage by stage.
To further reduce the possible confusion during recognition, it was then proposed that the pronunciation probabilities in the multiple-pronunciation dictionaries can be re-estimated within a proposed rapid discriminative training framework using simulated recognition errors based on a Speech Production/Recognition Model. The experimental results show that the recognition performance can be improved over the training iterations.
These findings were verified by a series of experiments performed on planned (LDC HUB-4NE) and spontaneous (LDC CALLHOME) Mandarin Chinese speech corpora.
en
dc.description.provenanceMade available in DSpace on 2021-06-13T06:13:54Z (GMT). No. of bitstreams: 1
ntu-95-F87942018-1.pdf: 1816371 bytes, checksum: 7767983f7d28058bd093cf773de494f4 (MD5)
Previous issue date: 2006
en
dc.description.tableofcontentsChinese Abstract i
English Abstract iii
Contents v
List of Tables ix
List of Figures xi
1 Introduction 1
1.1 Pronunciation Variation Analysis andModeling . . . . . . . . . . . . 1
1.2 Review of the State of the Art . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Organization of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Background Review and Corpora Used 7
2.1 Introduction of Pronunciation Variation. . . . . . . . . . . . . . . . . 7
2.2 Automatic Speech Recognition with a Multiple-Pronunciation Dictionary 8
2.3 Introduction ofMandarin Chinese . . . . . . . . . . . . . . . . . . . . 9
2.3.1 General Characteristics ofMandarin Chinese . . . . . . . . . . 9
2.3.2 Syllable Structure ofMandarin Chinese . . . . . . . . . . . . . 10
2.3.3 Sub-word Units of Mandarin Chinese . . . . . . . . . . . . . . 11
2.4 Corpora Used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4.1 The Planned Speech . . . . . . . . . . . . . . . . . . . . . . . 13
2.4.2 The Spontaneous Speech . . . . . . . . . . . . . . . . . . . . . 14
PART I Pronunciation Variation Analysis 15
3 Pronunciation Variation Analysis for Mandarin Chinese at Various
Linguistic Levels and Contextual Conditions 17
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2 Analysis Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.3 Pronunciation Variation at Different Linguistic Levels and Contextual
Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.4 Pronunciation Variation at Different Speaking Rates . . . . . . . . . 22
3.5 Pronunciation Variation at DifferentWord Frequencies . . . . . . . . 24
3.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4 Pronunciation Variation Analysis Based on Acoustic and Phonemic
Distance Measures 27
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.2 Acoustic Distance/Confusion . . . . . . . . . . . . . . . . . . . . . . . 29
4.3 Phonemic Distance/Confusion . . . . . . . . . . . . . . . . . . . . . . 30
4.4 Acoustic/Phonemic Distance Plane for Analysis . . . . . . . . . . . . 30
4.5 Analysis Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.6 Analysis onMandarin Consonants . . . . . . . . . . . . . . . . . . . . 32
4.7 Analysis onMandarin Vowels . . . . . . . . . . . . . . . . . . . . . . 36
4.8 Acoustic Distance of Retrained AcousticModels . . . . . . . . . . . . 39
4.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5 Pronunciation Variation Analysis Using Data-driven Phonological
Rules 41
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.2 Analysis onMandarin Consonants or Initials . . . . . . . . . . . . . . 43
5.3 Analysis onMandarin Vowels and Finals . . . . . . . . . . . . . . . . 45
5.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
PART II Pronunciation Modeling for ASR 59
6 Pronunciation Modeling with Reduced Confusion Using a Threestage
Framework 61
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
6.2 Stage 1 - Pronunciation Generation . . . . . . . . . . . . . . . . . . . 62
6.3 Stage 2 - Pronunciation Ranking . . . . . . . . . . . . . . . . . . . . 66
6.4 Stage 3 - Pronunciation Pruning . . . . . . . . . . . . . . . . . . . . . 69
6.5 Pronunciation Probabilities . . . . . . . . . . . . . . . . . . . . . . . 71
6.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
7 Experimental Results and Analysis of Pronunciation Modeling using
the Three-stage Framework 73
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
7.2 Speech Corpora and Experimental Setup . . . . . . . . . . . . . . . . 74
7.2.1 The Planned Speech . . . . . . . . . . . . . . . . . . . . . . . 74
7.2.2 The Spontaneous Speech . . . . . . . . . . . . . . . . . . . . . 75
7.2.3 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . 76
7.3 Stage 1 - Pronunciation Generation . . . . . . . . . . . . . . . . . . . 76
7.4 Confusability Measures for the Dictionaries . . . . . . . . . . . . . . . 79
7.5 Stage 2 - Pronunciation Ranking . . . . . . . . . . . . . . . . . . . . 82
7.6 Effect of Weighting the Pronunciation Probabilities . . . . . . . . . . 85
7.7 Stage 3 - Pronunciation Pruning . . . . . . . . . . . . . . . . . . . . . 88
7.8 Scaling the Pronunciation Probabilities . . . . . . . . . . . . . . . . . 90
7.9 Interactions among Acoustic, Pronunciaiton and Language Models . . 91
7.10 Summary with Further Analysis . . . . . . . . . . . . . . . . . . . . . 94
7.11 Parallel Results for Spantaneous Speech . . . . . . . . . . . . . . . . 96
7.12 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
8 Rapid Discriminative Training in Pronunciation Modeling Using
Simulated Recognition Errors Based on Speech Production/Recognition
Model 101
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
8.2 Speech Production/Recognition Model for Simulating Recognition Errors104
8.2.1 Speech ProductionModel . . . . . . . . . . . . . . . . . . . . 104
8.2.2 Speech RecognitionModel . . . . . . . . . . . . . . . . . . . . 106
8.3 Discriminative Training Based on the Simulated Errors . . . . . . . . 107
8.4 Rapid Discriminative Training on PronunciationModel . . . . . . . . 108
8.5 Experiments and Discussions . . . . . . . . . . . . . . . . . . . . . . . 110
8.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
9 Conclusion 115
Bibliography 117
dc.language.isoen
dc.subject混淆度zh_TW
dc.subject發音變異zh_TW
dc.subject發音變異模型zh_TW
dc.subject聲學距離zh_TW
dc.subject音素距離zh_TW
dc.subject熵值zh_TW
dc.subject音韻規律zh_TW
dc.subjectEntropyen
dc.subjectPronunciation Modelingen
dc.subjectPronunciation Variationen
dc.subjectPhonological rulesen
dc.subjectConfusionen
dc.subjectAcoustic distanceen
dc.subjectPhonemic distanceen
dc.title國語語音之發音變異分析及提昇辨識效能之發音模型zh_TW
dc.titlePronunciation Variation Analysis and Modeling for Mandarin Chinese for Improved Speech Recognitionen
dc.typeThesis
dc.date.schoolyear94-1
dc.description.degree博士
dc.contributor.oralexamcommittee鄭秋豫,陳信宏,王小川,鄭伯順,陳信希
dc.subject.keyword發音變異,發音變異模型,混淆度,聲學距離,音素距離,熵值,音韻規律,zh_TW
dc.subject.keywordPronunciation Variation,Pronunciation Modeling,Confusion,Acoustic distance,Phonemic distance,Entropy,Phonological rules,en
dc.relation.page125
dc.rights.note有償授權
dc.date.accepted2006-02-07
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept電信工程學研究所zh_TW
顯示於系所單位:電信工程學研究所

文件中的檔案:
檔案 大小格式 
ntu-95-1.pdf
  未授權公開取用
1.77 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved