Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/44663
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor李琳山(Lin-Shan Lee)
dc.contributor.authorChih-Hao Hungen
dc.contributor.author洪志豪zh_TW
dc.date.accessioned2021-06-15T03:52:31Z-
dc.date.available2010-07-20
dc.date.copyright2010-07-20
dc.date.issued2010
dc.date.submitted2010-07-08
dc.identifier.citation[1] H. Bourlard, N. Morgan, “Connectioinist speech recognition: A hybrid approach,” in Kluwer Academic Publishers, Boston, 1994
[2] Edmondo Trentin and Marco Gori, “A survey of hybrid ANN/HMM models for acoustic speech recognition,” in Neurocomputing Vol. 37, No. 1, pp.91-126, 2001
[3] H. Hermansky, D.P.W. Ellis, S. Sharma, “Tandem connectionist feature extraction for conventional HMM systems,” in Proc. ICASSP 2000
[4] Huda Mohammad Nural, Muhammad Ghulam, Junsei Horikawa and Tsuneo Nitta, “Distinctive phonetic feature (DPF) based phone segmentation using hybrid neural networks,” in Proc. Interspeech 2007
[5] Jitendra Ajmera and Masami Akamine, “Speech recognition using soft decision trees,” in Proc. Interspeech 2008
[6] Remco Teunen and Masami Akamine, “HMM-based speech recognition using decision trees instead of GMMs,” in Proc. Interspeech 2007
[7] Jitendra Ajmera and Masami Akamine, “Decision tree acoustic models for ASR,” in Proc. Interspeech 2009
[8] Biing-Hwang Juang, Wu Hou, Chin-Hui Lee, “Minimum classification error rate methods for speech recognition,” in IEEE Trans. Speech and Audio Processing, May 1997
[9] Leo Breiman and Adele Cutler, “Random forests,” in Machine Learning Vol. 45, October, 2001, p5-32
[10] Yoav Freund, Robert E. Schapire, “Experiments with a new boosting algorithm,” in Proc. ICML 1996
[11] Chris Seiffert, Taghi M. Khoshgoftaar, Jason Van Hulse and Amri Napolitano, “Resampling or reweighting: a comparison of boosting implementations,” in Proc. ICTAI 2008
[12] M. Riedmiller, H. Braun, “A direct adaptive method for faster backpropagation learning: The RPROP algorithm,” in Proc. ICNN 1993
[13] Cristina Olara and Louis Wehenkel, “A complete fuzzy decision tree technique,” in Fuzzy Sets and Systems Volume 138, Number 2,pp.221–254, 2003
[14] P.P. Bonissone, J.M. Cadenas, M.C. Garrido and R.A. Díaz-Valladares, “A fuzzy random forest: Fundamental for design and construction,” in Proc. IPMU 2008
[15] Chiu-Yu Tseng and Fu-Chiang Chou, “Machine reachable phonetic transcription system for Chinese dialects spoken in Taiwan,” The First Oriental COCOSDA Workshop, 1998
[16] Cambridge University Engineering Dept. (CUED), Machine Intelligence Laboratory, “HTK,” http://htk.eng.cam.ac.uk
[17] SRI Speech Technology and Research Laboratory, “SRILM,” http://www.speech.sri.com/projects/srilm
[18] 潘奕誠,『大字彙中文連續語音辨認之一段式及以詞圖為基礎之搜尋演算法』,碩士論文,國立臺灣大學資訊工程研究所,2002
[19] X. Huang, A. Acero, H.-W. Hon, “Spoken language processing,” in Pearson Education Taiwan Ltd, pp.424-426, 2005
[20] Sadaoki Fruri, “Cepstral analysis technique for automatic speaker verification,” in IEEE Trans. Acoustics, Speech and Signal Processing, Vol. 29, No. 2, pp.254-272, 1981
[21] Slava M. Katz, “Estimation of probabilities from sparse data for other language component of a speech recognizer,” in IEEE Trans. Acoustics, Speech and Signal Processing, Vol. 35, No. 3, pp.400-401, 1987
[22] H. Hermansky and P. Fousek, “Multi-resolution rasta filtering for tandem-based ASR,” in Proc. Interspeech 2005
[23] F. Valente and H. Hermansky, “Hierarchical and parallel processing of modulation spectrum for ASR application,” in Proc. ICASSP 2008
[24] S. Y. Zhao and N. Morgan, “Multi-stream spectro-temporal features for robust speech recognition,” in Proc. Interspeech 2008
[25] H. Hermansky and S. Sharma, “Temporal patterns (TRAPs) in ASR of noisy speech,” in Proc. ICASSP 1999
[26] D. Yu, L. Deng, X. He and A. Acero, “Large-margin minimum classification error training: A theoretical risk minimization perspective,” in Computer Speech and Language, Vol. 22, pp. 415-429, Oct. 2008
[27] Dong Yu, Li Deng, Xiaodong He and Alex Acero, “Use of incrementally regulated discriminative margins in MCE training for speech recognition,” in Proc. Interspeech 2006
[28] D. W. Prunell and E. C. Botha, “Improved generalization of MCE parameter estimation with application to speech recognition,” in IEEE Trans. Speech and Audio Processing ,Vol. 10, pp. 232-239, 2002
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/44663-
dc.description.abstract近年來由於機器學習(machine learning)的蓬勃發展,有越來越多語音相關的研究開始使用各式各樣該領域的新技術與新模型。其中有一群人的作法是維持傳統聲學模型常用的隱馬可夫模型(hidden Markov model, HMM)的架構來處理語音訊號在相近時間上的相依性,再加入其他機器學習裡的分類器作為輔助。他們的方法可概括分為混合式(hybrid)聲學模型及串接式(tandem)聲學模型。
本論文嘗詴使用一些樹狀分類器的集成(ensemble)來取代串接式聲學模型中常用的多層感知器(multi-layer perceptron, MLP),並為了配合隱馬可夫模型中高斯混合模型(Gaussian mixture model, GMM)的使用,引入模糊理論(fuzzy theory)的想法來得到連續的事後機率向量分布。本論文所使用的前端分類器有模糊隨機森林(fuzzy random forest, FRF)與以模糊決策樹(fuzzy decision tree, FDT)為基底學習器的AdaBoost.M2,並提出幾個降低計算複雜度的方法與分類錯誤最小化訓練的改進。實驗結果顯示,雖然進步量比不上常用的多層感知器,本論文提出的作法還是優於傳統的聲學模型。最後分析串接式聲學模型的有效條件並舉實驗結果加以驗證,再依分析結果選取適當參數重新訓練前端分類器,進一步提升字元正確率;相對於傳統聲學模型的字元錯誤率,可以得到8.56%的相對進步率。
zh_TW
dc.description.provenanceMade available in DSpace on 2021-06-15T03:52:31Z (GMT). No. of bitstreams: 1
ntu-99-R97922051-1.pdf: 1286446 bytes, checksum: 32446c7818cc97ea2e08230f3c07fbe8 (MD5)
Previous issue date: 2010
en
dc.description.tableofcontents口試委員會審定書 ........................................................................................................... i
摘要 ................................................................................................................................. iii
目錄 ................................................................................................................................ v
圖目錄 ............................................................................................................................. viii
表目錄 ............................................................................................................................ ix
Chapter 1 緒論 .......................................................................................................... 1
1.1. 研究動機 ......................................................................................................... 1
1.2. 統計式語音辨識原理 ..................................................................................... 1
1.2.1. 聲學模型 ............................................................................................ 2
1.2.2. 語言模型 ............................................................................................ 3
1.3. 傳統聲學模型的缺點 ..................................................................................... 3
1.4. 聲學模型與機器學習的結合 ......................................................................... 4
1.4.1. 混合式聲學模型 ................................................................................ 4
1.4.2. 串接式聲學模型 ................................................................................ 4
1.5. 研究方法及成果 ............................................................................................. 5
1.6. 論文章節概要 ................................................................................................. 5
Chapter 2 背景知識 .................................................................................................. 7
2.1. 決策樹 ............................................................................................................. 7
2.2. 隨機森林 ......................................................................................................... 8
2.3. AdaBoost ......................................................................................................... 9
2.4. RPROP演算法 ............................................................................................. 11
2.5. 分類錯誤最小化訓練 ................................................................................... 12
2.6. 實驗架構與初步實驗結果 ........................................................................... 13
2.6.1. 實驗語料 .......................................................................................... 13
2.6.2. 訓練與辨識系統工具 ...................................................................... 14
2.6.3. 前端處理 .......................................................................................... 14
2.6.4. 聲學模型設定 .................................................................................. 15
2.6.5. 辭典與語言模型設定 ...................................................................... 15
2.6.6. 基於決策樹之串接式聲學模型 ...................................................... 16
2.6.7. 初步實驗結果 .................................................................................. 17
2.6.8. 結論 .................................................................................................. 19
Chapter 3 基於模糊隨機森林之串接式聲學模型 ................................................ 21
3.1. 模糊決策樹 ................................................................................................... 21
3.1.1. 模糊決策樹之架構 .......................................................................... 21
3.1.2. 模糊決策樹之訓練方法 .................................................................. 22
3.2. 模糊隨機森林 ............................................................................................... 24
3.2.1. 模糊隨機森林之近似 ...................................................................... 25
3.2.2. 模糊隨機森林之訓練方法 .............................................................. 26
3.2.3. 特徵串流 .......................................................................................... 28
3.3. 實驗設定 ....................................................................................................... 30
3.4. 實驗結果 ....................................................................................................... 31
3.5. 本章結論 ....................................................................................................... 32
Chapter 4 基於AdaBoost之串接式聲學模型 ...................................................... 35
4.1. 模糊決策子樹 ............................................................................................... 35
4.2. 前端分類器訓練流程 ................................................................................... 36
4.2.1. 第一種訓練流程 .............................................................................. 36
4.2.2. 第二種訓練流程 .............................................................................. 38
4.2.3. 第二種訓練流程套用於模糊隨機森林 .......................................... 39
4.3. 實驗設定 .......................................................................................................40
4.4. 實驗結果 ....................................................................................................... 40
4.5. 本章結論 ....................................................................................................... 42
Chapter 5 實驗結果分析 ........................................................................................ 45
5.1. 串接式聲學模型有效的條件 ....................................................................... 45
5.2. 驗證實驗 ....................................................................................................... 46
5.3. 選擇適當模型參數 ....................................................................................... 50
5.4. 本章結論 ....................................................................................................... 54
Chapter 6 結論與展望 ............................................................................................ 55
6.1. 結論 ............................................................................................................... 55
6.2. 展望 ............................................................................................................... 56
參考文獻 ........................................................................................................................ 57
dc.language.isozh-TW
dc.subjectAdaBoostzh_TW
dc.subject中文大字彙語音辨識zh_TW
dc.subject串接式聲學模型zh_TW
dc.subject模糊決策樹zh_TW
dc.subject模糊隨機森林zh_TW
dc.subjecttandem systemen
dc.subjectMandarin LVCSRen
dc.subjectAdaBoosten
dc.subjectfuzzy random foresten
dc.subjectfuzzy decision treeen
dc.title使用基於樹狀分類器之串接式聲學模型之中文大字彙語音辨識zh_TW
dc.titleTandem System with Tree-Based Classifiers for Mandarin LVCSRen
dc.typeThesis
dc.date.schoolyear98-2
dc.description.degree碩士
dc.contributor.oralexamcommittee王小川,陳信宏,鄭秋豫
dc.subject.keyword中文大字彙語音辨識,串接式聲學模型,模糊決策樹,模糊隨機森林,AdaBoost,zh_TW
dc.subject.keywordMandarin LVCSR,tandem system,fuzzy decision tree,fuzzy random forest,AdaBoost,en
dc.relation.page59
dc.rights.note有償授權
dc.date.accepted2010-07-08
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept資訊工程學研究所zh_TW
顯示於系所單位:資訊工程學系

文件中的檔案:
檔案 大小格式 
ntu-99-1.pdf
  未授權公開取用
1.26 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved