Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/18387
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor張智星(Jyh-Shing Jang)
dc.contributor.authorChen-Hung Yangen
dc.contributor.author楊晨弘zh_TW
dc.date.accessioned2021-06-08T01:02:40Z-
dc.date.copyright2020-08-21
dc.date.issued2020
dc.date.submitted2020-08-13
dc.identifier.citation[1] P. Auer, Code-switching in conversation: Language, interaction and identity. Rout- ledge, 2013.
[2] D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek, N. Goel, M. Hanne- mann, P. Motlicek, Y. Qian, P. Schwarz, J. Silovsky, G. Stemmer, and K. Vesely, “The Kaldi Speech Recognition Toolkit,” in Proc. IEEE ASRU, 2011.
[3] 吳妙嬬, “Bilingual Code-Mixed Acoustic Modeling by Unit Mapping and Model Recovery,” Master’s thesis, 國立臺灣大學, 2007.
[4] 葉青峰, “Initial Study on Chinese/English Bilingual Speech Recognition based on Lecture Recording,” Master’s thesis, 國立臺灣大學, 2011.
[5] 卓楷斌, “Merging Acoustic Models for Improving Mandarin-English Bilingual Speech Recognition,” Master’s thesis, 國立清華大學, 2012.
[6] P. Guo1, H. Xu, L. Xie, and E. S. Chng, Study of Semi-supervised Ap- proaches to Improving English-Mandarin Code-Switching Speech Recognition. arXiv:1806.06200v1, 2018.
[7] D.-C. Lyu1, T.-P. Tan, E.-S. Chng, and H. Li, “Seame: a mandarin-english code- switching speech corpus in south-east asia.” INTERSPEECH, 2010.
doi:10.6342/NTU202003127
[8] A.Waibel,T.Hanazawa,G.Hinton,K.Shikano,andK.J.Lang,“PhonemeRecogni- tion Using Time-Delay Neural Networks,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 37, no. 3, pp. 328–339, 1989.
[9] V. Peddinti, D. Povey, and S. Khudanpur, “A Time Delay Neural Network Archi- tecture for Efficient Modeling of Long Temporal Contexts,” in Proc. Interspeech, 2015.
[10] H.-M. Wang, B. Chen, J.-W. Kuo, and S.-S. Cheng, “MATBN: A Mandarin Chi- nese Broadcast News Corpus,” International Journal of Computational Linguistics Chinese Language Processing, vol. 10, no. 2, pp. 219–236, 2005.
[11] J. Garofolo, D. Graff, D. Paul, and D. Pallett, “Csr-i (wsj0) sennheiser ldc93s6b,” Web Download. Philadelphia: Linguistic Data Consortium, 1993.
[12] “Csr-ii (wsj1) sennheiser ldc94s13b,” DVD. Philadelphia: Linguistic Data Consor- tium, 1994.
[13] P. Kenny, “Joint factor analysis of speaker and session variability: Theory and algo- rithms,” CRIM, Montreal,(Report) CRIM-06/08-13, vol. 14, pp. 28–29, 2005.
[14] 李岳庭, “Improving Mandarin LVCSR Using Place and Manner Based Multi-task Learning,” Master’s thesis, 國立臺灣大學, 2019.
[15] I. P. Association and Others, Handbook of the International Phonetic Association: A Guide to theUuse of the International Phonetic Alphabet. Cambridge University Press, 1999.
[16] L. R. Rabiner, “A tutorial on hidden markov models and selected applications in speech recognition,” Proc. of the IEEE, vol. 77, no. 2, pp. 257–286, 1989.
[17] B.-H. Juang and L. Rabiner, “The segmental k-means algorithm for estimating pa- rameters of hidden Markov models,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 38, no. 9, pp. 1639–1641, 1990.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/18387-
dc.description.abstract本論文主要的研究目的為透過訓練中英混合的語音辨識,解決日常生活中常見的中英夾雜對話之辨識問題,研究中的應用情境為日常的中英夾雜對話。本篇論文利用了傳統的GMM-HMM方法以及深層神經網路混合模型DNN-HMM的方法,用以訓練聲學模型。透過處理各種不同的文本(例如:PTT、MATBN以及WSJ),以SRILM的方法訓練語言模型。實驗中的測試語料則是採用 EAT 所切分的測試資料以及國立臺灣大學米爾實驗室(MIRLAB)所錄製的中英混合句子作為測試。本篇論文在嘗試 TCC300 與 WSJ 之搭配以及 MATBN 與 WSJ 之搭配,爾後採取不同標音方式探討其結果,再者則是加入 台灣英語語料庫 (English across Taiwan, EAT)及部分的文字後處理,最後得到30.27%的詞錯誤率,相較於未加入 EAT 的詞錯誤率改良了 32.16 %。zh_TW
dc.description.abstractThe main purpose of this thesis is to solve the common Chinese-English mixed recognition problem in daily conversation by constructing a recogni- tion engine that can deal with such mixed code conversation. We use both the traditional GMM-HMM model and the deep neural network hybrid model(DNN- HMM) as acoustic models for both Chinese and English. We also use various source of texts, including PTT, MATBN, and WSJ to train the language model via the SRILM method. The test copora in the experiments include MIR Chi- nese/English mixed test dataset and and EAT test data. First, we tried the mix of TCC300 and WSJ, and the mix of MATBN and WSJ, for construct- ing acoustic models and compared their performance. Second, we used two different phonetic alphabets to compare their recognition results. Finally, we found the best performance can be achieved by using the mix of MATBN, WSJ, and English across Taiwan corpus (EAT), with a post-processing, to achieve 30.27% word error rate, which is about 32.16% of error reduction when compared with the result without EAT.
en
dc.description.provenanceMade available in DSpace on 2021-06-08T01:02:40Z (GMT). No. of bitstreams: 1
U0001-1208202016524300.pdf: 3043749 bytes, checksum: 333c2c28116c653c823299270c67147b (MD5)
Previous issue date: 2020
en
dc.description.tableofcontentsContents
摘要.................................. ii
Abstract.................................. iii
誌謝.................................. 1
1 緒論.................................. 2
1.1.................................. 2
1.2.................................. 3
1.2.1 Kaldi ................................ 3
1.2.2 stanford research institute language modeling toolkit, SRILM .. 3
1.3.................................. 4
主題簡介.................................. 2
工具簡介.................................. 3
章節概述.................................. 4
2 文獻探討與基本知識 5
2.1 前人方法簡介 ............................... 5
2.2 中英混合語音辨識遭遇之問題...................... 6
2.2.1 中英混合語料的不足 ....................... 6
2.2.2 中英文標音的不同 ........................ 6
2.2.3 語言模型的探討.......................... 6
2.2.4 台灣口音的英文.......................... 7
3 語料庫介紹 8
3.1 聲學模型語料介紹............................. 8
3.1.1 中文廣播新聞語料庫MATBN .................. 8
3.1.2 WSJ(Wallstreetjournal)語料................... 9
3.1.3 台灣口音英語語料庫 (English Across Taiwan, EAT) . . . . . . 10
3.1.4 米爾英文單字短句語料集 (MIR English Dataset) . . . . . . . . 11
3.1.5 米爾中英混合測試語料集 (MIR Chinese/English mixed test dataset)..................... 11
3.2 語言模型語料介紹............................. 12
3.2.1 批踢踢實業坊(PTT)文字語料................. 12
3.2.2 Mobile01論壇........................... 13
4 方法簡介、實驗設計、實驗設定 14
4.1 方法簡介.................................. 14
4.1.1 語音辨識的流程.......................... 14
4.1.2 聲學特徵抽取........................... 16
4.1.3 因子分析與i-向量......................... 19
4.1.4 發音詞典.............................. 21
4.1.5 語言模型.............................. 23
4.1.6 聲學模型.............................. 23
4.1.7 時延神經網路........................... 25
4.2 實驗設計.................................. 26
4.2.1 單一語言之語音辨識結果 .................... 26
4.2.2 TCC300與WSJ的實驗...................... 26
4.2.3 MATBN與WSJ的實驗...................... 26
4.2.4 加入台灣口音米爾英文語料集 MIR English dataset 的實驗 . . 26
4.2.5 加入台灣口音英語語料庫(EAT)的實驗............. 27
4.3 效能評估方式 ............................... 27
5 實驗結果與數據 29
5.1 WSJ的單一語言辨識結果......................... 29
5.2 TCC300的單一語言辨識結果....................... 29
5.3 TCC300及WSJ的雙語言辨識結果 ................... 30
5.3.1 外部測試.............................. 30
5.3.2 語言模型的內部測試 ....................... 31
5.4 MATBN及WSJ的雙語言辨識結果 ................... 32
5.4.1 完全的外部測試.......................... 32
5.4.2 語言模型的內部測試 ....................... 34
5.5 使用Mobile01語料訓練的語言模型................... 37
5.5.1 外部測試.............................. 37
5.6 完全的內部測試.............................. 38
5.7 加入EAT語料訓練的聲學模型...................... 38
5.7.1 加入EAT後的外部測試 ..................... 38
5.7.2 將中英同義詞視為辨識正確 ................... 40
6 結論與未來展望 42
6.1 結論..................................... 42
6.2 未來展望.................................. 43
Bibliography 44
dc.language.isozh-TW
dc.title中英混合語音辨識的研究與實作zh_TW
dc.titleResearch and Implementation of Chinese-English Mixed Speech Recognitionen
dc.typeThesis
dc.date.schoolyear108-2
dc.description.degree碩士
dc.contributor.oralexamcommittee李宏毅(Hung-Yi Lee),王新民(Hsin-Min Wang)
dc.subject.keyword多語言混合辨識,時延神經網路,大詞彙語音辨識,zh_TW
dc.subject.keywordcode-switching recognition,time-delay neural networks,LVCSR,en
dc.relation.page46
dc.identifier.doi10.6342/NTU202003127
dc.rights.note未授權
dc.date.accepted2020-08-14
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept資訊工程學研究所zh_TW
顯示於系所單位:資訊工程學系

文件中的檔案:
檔案 大小格式 
U0001-1208202016524300.pdf
  未授權公開取用
2.97 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved