Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 電信工程學研究所
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/31717
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor李琳山
dc.contributor.authorJui-Ting Huangen
dc.contributor.author黃瑞婷zh_TW
dc.date.accessioned2021-06-13T03:18:20Z-
dc.date.available2006-08-01
dc.date.copyright2006-08-01
dc.date.issued2006
dc.date.submitted2006-07-28
dc.identifier.citation[1] D. Crystal, A Dictionary of Linguistics and Phonetics, 4th edtition, Blackwell Publishers Inc.
[2] X. Huang, A. Acero, and H-W Hon, Spoken Language Processing, Prentice Hall 2001
[3] Colin W. Wightman, Mari Ostendorf, “Automatic Labeling of Prosodic Patterns”, IEEE Transactions on speech and audio processing, vol. 2, NO. 4. October 1994.
[4] J. H. Kim, and P. C. Woodland, “The use of prosody in a combined system for punctuation generation and speech recognition,” in Proc. Eurospeech, 1997
[5] E. Shriberg et al., “Prosody-based automatic segmentation of speech into sentences and topics,” Speech communication, 32(1-2):127-154, 2000, Special Issue on Accessing Information in Spoken Audio.
[6] Y. Lui et al., “Automatic disfluency identification in conversational speech using multiple knowledge sources,’ in Proc. Eurospeech, 2003
[7] C-K Lin, and L-S Lee, “Improved spontaneous Mandarin speech recognition by disfluency interruption point (IP) detection using prosodic features,” in Proc. Eurospeech, 2005
[8] A. Stolcke et al., “Dialogue act modeling for automatic tagging and recognition of conversational speech,” Computayional Linguisrics, 26(2):339-373, 2000
[9] Nwe T.L., Foo S.W.; De Silva L.C. “Speech emotion recognition using hidden Markov models,” Speech Communication, Volume 41, Number 4, November 2003, pp. 603-623(21)
[10] R. Cowie, E. Douglas-Cowie, N. Tsapatsoulis, G. Votsis, S. Kollias, W. Fellenz, and J.G. Taylor, “Emotion recognition in human-computer interaction,” IEEE Signal Processing magazine, vol. 18, no. 1, pp. 32–80, Jan. 2001
[11] S. Kajarekar et al., “Speaker recognition using prosodic and lexical features,” in Proc. IEEE Workshop on Speech Recognition and Understanding, 2003
[12] L. Hahn, “Native speakers’ reactions to non-native stress in English discourse,” Ph.D. thesis, University of Illinois at Urbana-Champaign, 1999.
[13] Ken Chen et al., “Prosody dependent speech recognition on radio news corpus of American English,” IEEE trans. Audio, Speech, and Language Processing, vol.14, No.1, Jan. 2006
[14] V. R. R. Gadde, “Modeling word durations,” in Proc. ICSLP, 2000
[15] A. Stolcke et al.,“Modeling the prosody of hidden events for improved word recognition,” in Proc. Eurospeech, 1999
[16] R. Trask, A Dictionary of Phonetics and Phonology, Routledge 1996
[17] Silverman, K., Beckman, M., Pitrelli, J., Ostendorf, M., Wightman, C., Price, P., Pierrehumbert, J., & Hirschberg, J. (1992), “Tobi: a standard for labeling english prosody,” in Proc. ICSLP, 1992
[18] Tseng, and F.C. Chou, “A prosodic labeling system for Mandarin speech database,” in Proc. ICPhS, 1999
[19] Tseng et al., “Fluent speech prosody: framework and modeling,” Speech Communication, Vol.46,issues 3-4, July 2005, Special Issue on Quantitative Prosody Modeling for Natural Speech Description and Generation, 284-309.
[20] “A detailed description of COSPRO and Toolkit,” http://reg.myet.com/registration/corpus/en/Papers.asp
[21] Douglas A. Reynolds, “Robust text-independent speaker identification using Gaussian mixture speaker models,” IEEE trans. Speech and Audio Processing, vol. 3, No. 1, Jan 1995
[22] Breiman et al., Classification and regression trees. Chapman & Hall/CRC, 1984
[23] Richard O. Duda et al., Pattern Classification, Wiley-Interscience, 2001
[24] Vapnik Vladimir N., “The nature of statistical learning theory,” Springer-Verlag New York, Inc., 1995
[25] K.-M. Lin and C.-J. Lin, “A study on reduced support vector machines,” IEEE Trans. Neural Networks, 14(2003), 1449-1559
[26] S. B. Davis & P. Mermelstein, “Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences,” IEEE Trans. Acoustics Speech and Signal Processing, Vol. 28, No.4, 1980
[27] Jerome L. Packard, The morphology of Chinesea linguistic and cognitive approach, Cambridge University Press,2000.New York, NY
[28] Chao Wang and Stephanie Seneff, “Robust pitch tracking for prosodic modeling,” in Proc. ICASSP, 2000
[29] ESPS Version 5.0 Program Manual. 1993
[30]林婉怡,「流利國語語音之聲調辨識及其在大字彙辨識上的應用」,碩士論文--國立臺灣大學電信工程學硏究所。(2004)
[31] S. H. Chen et al. ”Vector quantization of pitch information in Mandarin speech,” IEEE trans. Communications, Vol. 38, No. 9, 1990
[32] C. Wightman et al., “Segmental durations in the vicinity of prosodic phrase boundaries,” J. Acoust. Soc. Amer., Mar. 1992
[33] H. Bourlard & N. Morgan., “Continuous Speech Recognition by Connectionist Statistical Methods,” IEEE trans. Neural Networks, Vol. 4, No.6, Nov. 1993
[34] http://www.stat.berkeley.edu/users/breiman/RandomForests/
[35] J-T Huang & L-S Lee. “Improved Large Vocabulary Mandarin
Speech using prosodic features,” Speech Prosody, 2006
[36] J-T Huang & L-S Lee. “Prosodic Modeling in Large Vocabulary Mandarin Speech Recognition,” in Proc. ICSLP, 2006
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/31717-
dc.description.abstract人類利用了大量的韻律(Prosody)訊息在日常的口語溝通裡,因此在語音技術中加入韻律相關訊息是讓系統更智慧化與擬人化的途徑之一。本論文企圖突破現今的語音辨識技術,利用韻律訊息來協助辨識。除了傳統頻譜特徵例如MFCC之外,從語音中抽取了韻律特徵來幫助辨識,並訓練韻律模型來建立模型以描述韻律特徵與文字結構的關係。
本論文主要以音節為單位計算了許多基頻、能量、長度相關的參數,有些參數是基於韻律學相關知識,推測可能與聲調及韻律詞邊界有關;其他則是列出各種可能的組合,期望用下一步的韻律模型自動選擇出重要的參數。並針對韻律特徵與中文之間的關係提出了韻律詞模型和階層模型兩種方法。並探討基於高斯混合模型(Gaussian Mixture Models)和分類法的模型實現,也提出了企圖結合兩者優點的結合法。其中基於分類法的階層模型有最好的分類正確率。
本論文採取兩段式(two pass)大字彙中文辨識架構。第一階段利用基礎辨識器產生詞圖(word graph)之後,第二階段把韻律模型計算的分數加入每個詞弧(word arc),對詞圖的每一條可能路徑重新評分,然後決定最可能的辨識結果。實驗顯示,韻律模型的整合可增加基礎實驗的字正確率約0.35~1.45%的辨識率。
zh_TW
dc.description.provenanceMade available in DSpace on 2021-06-13T03:18:20Z (GMT). No. of bitstreams: 1
ntu-95-R93942030-1.pdf: 557278 bytes, checksum: 913c75d1665f2dd224509268d4c8eb31 (MD5)
Previous issue date: 2006
en
dc.description.tableofcontents第一章 緒論 1
1.1 研究動機 1
1.2 相關研究 2
1.3 研究主題與主要成果 4
1.4 章節大綱 5
第二章 基礎背景簡介 7
2.1國語韻律特徵 7
2.1.1 口語中的韻律 7
2.1.2 國語韻律結構 8
2.2高斯混合模型 10
2.2.1 模型描述 10
2.2.2 參數估測 11
2.3 基本分類法 13
2.3.1 決策樹 13
2.3.2 支撐向量機 16
2.4 大字彙國語連續語音辨識之實驗環境及基礎實驗 18
2.4.1 基礎實驗語料 18
2.4.2 特徵參數抽取 18
2.4.3 聲學模型的架構 19
2.4.4 基礎實驗 20
2.5 本章結論 22

第三章 結合韻律模型的辨識系統 25
3.1 系統完整架構 25
3.2 考慮國語特徵之韻律模型建立 26
3.2.1 韻律詞層模型 28
3.2.2 階層模型 30
3.3 韻律特徵參數之抽取 32
3.3.1 基頻抽取 33
3.3.2 韻律型特徵 34
3.3.3 類別型特徵 37
3.4 本章結論 38
第四章 韻律模型之探討 39
4.1 高斯混合模型法 39
4.1.1 方法 39
4.1.2 討論 40
4.2 分類法 42
4.2.1 公式推導 42
4.2.2 決策樹:隨機森林演算法 43
4.2.3 討論 43
4.3結合高斯與決策樹法 44
4.4 正確率比較 45
4.5 本章結論 46

第五章 實驗結果 47
5.1 訓練語料之前處理 48
5.2 韻律型特徵之重要性分析 48
5.2.1 與聲調之關係 48
5.2.2 與韻律詞邊界之關係 48
5.3 結合韻律模型的大字彙國語連續語音辨識實驗結果 50 5.4 本章結論 52
第六章 結論與展望 54
6.1 結論 54
6.2 展望 55
參考文獻 59
dc.language.isozh-TW
dc.subject語音辨識zh_TW
dc.subject韻律zh_TW
dc.subjectspeech recognitionen
dc.subjectprosodyen
dc.title使用韻律模型的進一步大字彙國語連續語音辨識zh_TW
dc.titleImproved Large Vocabulary Continuous Mandarin Speech Recognition By Prosodic Modelingen
dc.typeThesis
dc.date.schoolyear94-2
dc.description.degree碩士
dc.contributor.oralexamcommittee鄭秋豫,馮世邁,王小川,陳信宏
dc.subject.keyword韻律,語音辨識,zh_TW
dc.subject.keywordprosody,speech recognition,en
dc.relation.page62
dc.rights.note有償授權
dc.date.accepted2006-07-30
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept電信工程學研究所zh_TW
顯示於系所單位:電信工程學研究所

文件中的檔案:
檔案 大小格式 
ntu-95-1.pdf
  未授權公開取用
544.22 kBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved