Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 電機工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/41669
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor陳永耀(Yung-Yaw Chen)
dc.contributor.authorXuan Huangen
dc.contributor.author黃璿zh_TW
dc.date.accessioned2021-06-15T00:26:56Z-
dc.date.available2010-09-01
dc.date.copyright2009-02-03
dc.date.issued2009
dc.date.submitted2009-01-22
dc.identifier.citation[1] C. Coker, N. Umeda, and C. Browman, 'Automatic synthesis from ordinary english test,' Audio and Electroacoustics, IEEE Transactions on, vol. 21, pp. 293-298, 1973.
[2] Y. Sagisaka, 'Speech synthesis from text,' Communications Magazine, IEEE, vol. 28, pp. 35-41, 55, 1990.
[3] A. Breen and A. Breen, 'Speech synthesis models: a review
Speech synthesis models: a review,' Electronics & Communication Engineering Journal, vol. 4, pp. 19-31, 1992.
[4] L. R. Rabiner and B. H. Juang, Fundamentals of speech recognition. Englewood Cliffs, N.J.: PTR Prentice Hall, 1993.
[5] M. R. Schroeder, Computer speech : recognition, compression, synthesis : with introductions to hearing and signal analysis and a glossary of speech and comuter terms. Berlin ; New York: Springer, 1999.
[6] D. A. Reynolds, D. A. Reynolds, and R. C. Rose, 'Robust text-independent speaker identification using Gaussian mixture speaker models
Robust text-independent speaker identification using Gaussian mixture speaker models,' Speech and Audio Processing, IEEE Transactions on, vol. 3, pp. 72-83, 1995.
[7] P. C. Loizou, Speech enhancement : theory and practice. Boca Raton: CRC Press, 2007.
[8] N. A. Campbell, Biology, 3rd ed. Redwood City, Calif.: Benjamin/Cummings, 1993.
[9] G. Fant, Acoustic theory of speech production. s'Gravenhage,: Mouton, 1960.
[10] L. R. Rabiner and R. W. Schafer, Digital processing of speech signals. Englewood Cliffs, N.J.: Prentice-Hall, 1978.
[11] B. Gold and N. Morgan, Speech and audio signal processing : processing and perception of speech and music. New York: John Wiley, 2000.
[12] S. E. Levinson, Mathematical models for speech technology. Chichester, West Sussex, England ; Hoboken, N.J., U.S.A.: John Wiley, 2005.
[13] T. F. Quatieri, Discrete-time speech signal processing : principles and practice. Upper Saddle River, NJ: Prentice Hall PTR, 2002.
[14] J. Benesty, M. M. Sondhi, and Y. Huang, Springer handbook of speech processing. [Berlin ; London]: Springer, 2008.
[15] J. N. Holmes and W. Holmes, Speech synthesis and recognition, 2nd ed. London ; New York: Taylor & Francis, 2001.
[16] J. Flanagan and L. Landgraf, 'Self-oscillating source for vocal-tract synthesizers,' Audio and Electroacoustics, IEEE Transactions on, vol. 16, pp. 57-64, 1968.
[17] A. Paige and V. Zue, 'Calculation of vocal tract length,' Audio and Electroacoustics, IEEE Transactions on, vol. 18, pp. 268-270, 1970.
[18] U. Chong and D. Magill, 'The Residual-Excited Linear Prediction Vocoder with Transmission Rate Below 9.6 kbits/s,' Communications, IEEE Transactions on [legacy, pre - 1988], vol. 23, pp. 1466-1474, 1975.
[19] J. Makhoul, 'Linear prediction: A tutorial review,' Proceedings of the IEEE, vol. 63, pp. 561-580, 1975.
[20] M. Berouti, D. Childers, and A. Paige, 'Glottal area versus glottal volume-velocity,' in Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP '77., 1977, pp. 33-36.
[21] H. Wakita, 'Normalization of vowels by vocal-tract length and its application to vowel identification,' Acoustics, Speech, and Signal Processing [see also IEEE Transactions on Signal Processing], IEEE Transactions on, vol. 25, pp. 183-192, 1977.
[22] R. Kirlin, 'A posteriori estimation of vocal tract length,' Acoustics, Speech, and Signal Processing [see also IEEE Transactions on Signal Processing], IEEE Transactions on, vol. 26, pp. 571-574, 1978.
[23] D. Wong, J. Markel, and A. Gray, Jr., 'Least squares glottal inverse filtering from the acoustic speech waveform,' Acoustics, Speech, and Signal Processing [see also IEEE Transactions on Signal Processing], IEEE Transactions on, vol. 27, pp. 350-355, 1979.
[24] H. Strube, 'Comments on 'Least squares glottal inverse filtering from the acoustic speech waveform',' Acoustics, Speech, and Signal Processing [see also IEEE Transactions on Signal Processing], IEEE Transactions on, vol. 28, pp. 343-343, 1980.
[25] J. Deller, Jr., 'Some notes on closed phase glottal inverse filtering,' Acoustics, Speech, and Signal Processing [see also IEEE Transactions on Signal Processing], IEEE Transactions on, vol. 29, pp. 917-919, 1981.
[26] D. Veeneman and S. BeMent, 'Automatic glottal inverse filtering from speech and electroglottographic signals,' Acoustics, Speech, and Signal Processing [see also IEEE Transactions on Signal Processing], IEEE Transactions on, vol. 33, pp. 369-377, 1985.
[27] G. J. Freij and B. M. G. Cheetham, 'Improved sequential linear prediction by selective time-domain coefficient extraction,' Electronics Letters, vol. 22, pp. 470-472, 1986.
[28] P. Milenkovic, 'Glottal inverse filtering by joint estimation of an AR system with a linear input model,' Acoustics, Speech, and Signal Processing [see also IEEE Transactions on Signal Processing], IEEE Transactions on, vol. 34, pp. 28-42, 1986.
[29] B. C. Watson and B. C. Watson, 'Measures of speech production
Measures of speech production,' Engineering in Medicine and Biology Magazine, IEEE, vol. 7, pp. 30-33, 1988.
[30] M. H. O'Malley, 'Text-to-speech conversion technology,' Computer, vol. 23, pp. 17-23, 1990.
[31] I. T. Lim and B. G. Lee, 'Lossless pole-zero modeling of speech signals,' Speech and Audio Processing, IEEE Transactions on, vol. 1, pp. 269-276, 1993.
[32] B. W. Wah, T. S. Huang, A. K. Joshi, D. Moldovan, J. Aloimonos, R. K. Bajcsy, D. Ballard, D. DeGroot, K. DeJong, C. R. Dyer, S. E. Fahlman, R. Grishman, L. Hirschman, R. E. Korf, S. E. Levinson, D. P. Miranker, N. H. Morgan, S. Nirenburg, T. Poggio, E. M. Riseman, C. Stanfill, S. J. Stolfo, S. L. Tanimoto, and C. Weems, 'Report on workshop on high performance computing and communications for grand challenge applications: computer vision, speech and natural language processing, and artificial intelligence,' Knowledge and Data Engineering, IEEE Transactions on, vol. 5, pp. 138-154, 1993.
[33] F. Grandori, F. Grandori, P. Pinelli, P. Ravazzani, F. A. C. F. Ceriani, G. A. M. G. Miscio, F. A. P. F. Pisano, R. A. C. R. Colombo, S. A. I. S. Insalaco, and G. A. T. G. Tognola, 'Multiparametric analysis of speech production mechanisms
Multiparametric analysis of speech production mechanisms,' Engineering in Medicine and Biology Magazine, IEEE, vol. 13, pp. 203-209, 1994.
[34] J. Dang and K. Honda, 'An improved vocal tract model of vowel production implementing piriform resonance and transvelar nasal coupling,' in Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on, 1996, pp. 965-968 vol.2.
[35] B. Yegnanarayana and P. Satyanarayana Murthy, 'Source-system windowing for speech analysis and synthesis,' Speech and Audio Processing, IEEE Transactions on, vol. 4, pp. 133-137, 1996.
[36] T. Claes, I. Dologlou, L. ten Bosch, and D. van Compernolle, 'A novel feature transformation for vocal tract length normalization in automatic speech recognition,' Speech and Audio Processing, IEEE Transactions on, vol. 6, pp. 549-557, 1998.
[37] A. Kain, A. Kain, and M. W. Macon, 'Spectral voice conversion for text-to-speech synthesis
Spectral voice conversion for text-to-speech synthesis,' in Acoustics, Speech and Signal Processing, 1998. Proceedings of the 1998 IEEE International Conference on, 1998, pp. 285-288 vol.1.
[38] B. Yegnanarayana, C. d'Alessandro, and V. Darsinos, 'An iterative algorithm for decomposition of speech signals into periodic and aperiodic components,' Speech and Audio Processing, IEEE Transactions on, vol. 6, pp. 1-11, 1998.
[39] B. Yegnanarayana and R. N. J. Veldhuis, 'Extraction of vocal-tract system characteristics from speech signals,' Speech and Audio Processing, IEEE Transactions on, vol. 6, pp. 313-327, 1998.
[40] L. Welling, S. Kanthak, and H. Ney, 'Improved methods for vocal tract normalization,' in Acoustics, Speech, and Signal Processing, 1999. ICASSP '99. Proceedings., 1999 IEEE International Conference on, 1999, pp. 761-764 vol.2.
[41] J. Bing-Hwang and S. Furui, 'Automatic recognition and understanding of spoken language - a first step toward natural human-machine communication,' Proceedings of the IEEE, vol. 88, pp. 1142-1165, 2000.
[42] P. Gray, M. P. Hollier, and R. E. Massara, 'Non-intrusive speech-quality assessment using vocal-tract models,' Vision, Image and Signal Processing, IEE Proceedings -, vol. 147, pp. 493-501, 2000.
[43] D. O’Shaughnessy, 'Speech Communications: Human and Machine,' 2000.
[44] W. Chou, B. H. Juang, and MyiLibrary., 'Pattern recognition in speech and language processing,' in Electrical engineering and applied signal processing series. Boca Raton ; London: CRC Press, 2003, pp. vi, 394 p.
[45] H. Jiang, F. K. Soong, and C. H. Lee, 'A Dynamic In-Search Data Selection Method With Its Applications to Acoustic Modeling and Utterance Verification,' Speech and Audio Processing, IEEE Transactions on, vol. 13, pp. 945-955, 2005.
[46] F. Qiang and P. Murphy, 'Robust glottal source estimation based on joint source-filter model optimization,' Audio, Speech and Language Processing, IEEE Transactions on [see also Speech and Audio Processing, IEEE Transactions on], vol. 14, pp. 492-501, 2006.
[47] A. Watanabe and T. Sakata, 'Reliable methods for estimating relative vocal tract lengths from formant trajectories of common words,' Audio, Speech and Language Processing, IEEE Transactions on [see also Speech and Audio Processing, IEEE Transactions on], vol. 14, pp. 1193-1204, 2006.
[48] M. Damian, K. Antti, M. Jack, and S. Simon, 'Acoustic Modeling Using the Digital Waveguide Mesh,' Signal Processing Magazine, IEEE, vol. 24, pp. 55-66, 2007.
[49] K. S. Lee and K. S. Lee, 'Statistical Approach for Voice Personality Transformation
Statistical Approach for Voice Personality Transformation,' Audio, Speech, and Language Processing, IEEE Transactions on [see also Speech and Audio Processing, IEEE Transactions on], vol. 15, pp. 641-651, 2007.
[50] A. Mouchtaris, A. Mouchtaris, J. Van der Spiegel, P. Mueller, and P. A. T. P. Tsakalides, 'A Spectral Conversion Approach to Single-Channel Speech Enhancement
A Spectral Conversion Approach to Single-Channel Speech Enhancement,' Audio, Speech, and Language Processing, IEEE Transactions on [see also Speech and Audio Processing, IEEE Transactions on], vol. 15, pp. 1180-1193, 2007.
[51] J. Mullen, D. M. Howard, and D. T. Murphy, 'Real-Time Dynamic Articulations in the 2-D Waveguide Mesh Vocal Tract Model,' Audio, Speech and Language Processing, IEEE Transactions on [see also Speech and Audio Processing, IEEE Transactions on], vol. 15, pp. 577-585, 2007.
[52] Z. Nengheng, L. Tan, and P. C. Ching, 'Integration of Complementary Acoustic Features for Speaker Recognition,' Signal Processing Letters, IEEE, vol. 14, pp. 181-184, 2007.
[53] T. Toda, T. Toda, A. W. Black, and K. Tokuda, 'Voice Conversion Based on Maximum-Likelihood Estimation of Spectral Parameter Trajectory
Voice Conversion Based on Maximum-Likelihood Estimation of Spectral Parameter Trajectory,' Audio, Speech, and Language Processing, IEEE Transactions on [see also Speech and Audio Processing, IEEE Transactions on], vol. 15, pp. 2222-2235, 2007.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/41669-
dc.description.abstract在人工智慧的相關領域中,使得電腦習得自動化處理視覺及聽覺信息且更進一步從而的組織其中的意義乃是近年來的熱門課題。而使用聲音傳遞感受或訊息交流更是當中最直接且迅速的方式,並被大量的運用於日常生活中。故本論文將針對語音訊號做深入的分析與探討。
本論文探討的課題是如何找出具有物理意義的聲學特徵做為語音辨認的參考。語音辨認中,大致可分為兩個面向討論,語音的內容以及說話者辨認。不同說話者發出相同的字句時,必存在某種共通性用以辨識語音內容,且同時存在相異性以辨別說話者;而人體發聲器官所能產生的變化是有限的,故可視為說話者間存在天生發聲器官間的差異,再經由後天學習改變腔道說出語言。而基於發聲腔道所建立的模型中,明顯且被廣泛使用的聲學特徵即所謂的共振峰,共振峰為聲音於頻譜分佈圖上之峰值所在。;因此,本論文選用共振峰做為研究對象。而由於相較之下子音在頻譜及聽覺上都無明顯的辨識度,故將著重探討不同說話者間母音的共振峰分佈關係。
zh_TW
dc.description.abstractThe way to cognize the speech is still the riddle unsolved. Hence speech is the most direct method to transmit information and applied frequently in daily life, many researchers try to find out ways to deal with the voice, and to comprehend the meanings behind speech. As a result, speech recognition becomes a popular topic in signal processing. Among all the methods of speech recognition, formants detection is the one based on physical structure. This thesis would make deeper analysis and discussion in how to cognize the speech with formants as feature.
The topic of this thesis is to find out the acoustic features with physical meaning. Speech recognition could be discussed in two different aspects, the content of the speech and the speaker identify. When different speakers pronounce the same words, there must be a certain compatibility to integrate the content and the differences to distinguish the speakers. Because the variation of the pronunciation organs is limited, it could be regarded as that there exists natural diverseness between speakers, and then changing the vocal tract to speak by learning. Among the vocal tract models, the significant and wildly used acoustics feature is the so-called formant, which is the peak of the frequency spectrum. According to that, this thesis would focus on the relationships between formants and vowels.
en
dc.description.provenanceMade available in DSpace on 2021-06-15T00:26:56Z (GMT). No. of bitstreams: 1
ntu-98-R95921003-1.pdf: 4442906 bytes, checksum: a0e2c73751feb79b75af7c25f3606f33 (MD5)
Previous issue date: 2009
en
dc.description.tableofcontentsAbstract ....................................................................................................................... I
摘要 ............................................................................................................................. II
Contents.................................................................................................................... III
List of Figures ............................................................................................................ V
List of Tables ............................................................................................................. IX
Chapter 1 Introduction ......................................................................................... 1
1.1 Overview ................................................................................................... 1
1.2 Application ................................................................................................ 3
1.3 Thesis Organization ................................................................................... 5
Chapter 2 Mechanisms and Models of Human Speech ........................................ 6
2.1 Speech Production ..................................................................................... 6
2.1.1 Mechanism ........................................................................................ 6
2.1.2 Model .............................................................................................. 14
2.1.3 Mathematical Model of Vocal Tract .................................................. 17
2.2 Auditory System ...................................................................................... 25
2.2.1 Mechanism in Physiological Organs of Auditory System.................. 25
2.2.2 Psychophysical Measurements ......................................................... 33
Chapter 3 Speech Signal Processing ................................................................... 36
3.1 Motivation ............................................................................................... 36
3.2 Signal Pre-processing .............................................................................. 37
3.2.1 Windowing ...................................................................................... 38
3.2.2 Pre-emphasis .................................................................................... 40
3.3 Pitch Detection ........................................................................................ 40
3.3.1 The Definition of Pitch ..................................................................... 40
3.3.2 Pitch Detection................................................................................. 41
3.4 Formant Detection ................................................................................... 42
3.4.1 The Definition of Formant ............................................................... 43
3.4.2 Formant Detection ........................................................................... 44
IV
3.4.3 Linear prediction coefficient extraction ............................................ 48
Chapter 4 Analysis of the Formants on Vowels .................................................. 56
4.1 Procedure of the Formant Analysis .......................................................... 57
4.2 Analytical Results .................................................................................... 58
4.2.1 Analysis I: Single Data ..................................................................... 58
4.2.2 Analysis II: One Speaker with Different Tones ................................. 64
4.2.3 Analysis III: Multi Speaker with Different Pitch............................... 68
Chapter 5 Vocal Tract Modeling ......................................................................... 70
5.1 Procedure of the Vocal Tract Synthesis..................................................... 70
5.2 Experimental Results ............................................................................... 71
5.2.1 Experimental I: Original Signal = Target Signal ............................... 71
5.2.2 Experimental II: Same Speaker with Different Pitch......................... 77
5.2.3 Experimental III: Different Speaker with Different Pitch .................. 82
Chapter 6 Conclusions and Future work ........................................................... 87
Reference ................................................................................................................... 89
dc.language.isoen
dc.subject共振峰zh_TW
dc.subject語音辨認zh_TW
dc.subject聲學特徵zh_TW
dc.subject發音模型zh_TW
dc.subject發聲腔道zh_TW
dc.subjectAcoustics Featureen
dc.subjectFormanten
dc.subjectVocal Tract Filteren
dc.subjectPronouncing Modelen
dc.subjectSpeech Recognitionen
dc.title人聲發音機制之探討zh_TW
dc.titleResearch on Pronunciation Mechanisms of Human Voiceen
dc.typeThesis
dc.date.schoolyear97-1
dc.description.degree碩士
dc.contributor.oralexamcommittee顏家鈺,蔡坤諭,莊禮彰
dc.subject.keyword語音辨認,聲學特徵,發音模型,發聲腔道,共振峰,zh_TW
dc.subject.keywordSpeech Recognition,Acoustics Feature,Pronouncing Model,Vocal Tract Filter,Formant,en
dc.relation.page93
dc.rights.note有償授權
dc.date.accepted2009-01-22
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept電機工程學研究所zh_TW
顯示於系所單位:電機工程學系

文件中的檔案:
檔案 大小格式 
ntu-98-1.pdf
  未授權公開取用
4.34 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved