請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/44663完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 李琳山(Lin-Shan Lee) | |
| dc.contributor.author | Chih-Hao Hung | en |
| dc.contributor.author | 洪志豪 | zh_TW |
| dc.date.accessioned | 2021-06-15T03:52:31Z | - |
| dc.date.available | 2010-07-20 | |
| dc.date.copyright | 2010-07-20 | |
| dc.date.issued | 2010 | |
| dc.date.submitted | 2010-07-08 | |
| dc.identifier.citation | [1] H. Bourlard, N. Morgan, “Connectioinist speech recognition: A hybrid approach,” in Kluwer Academic Publishers, Boston, 1994
[2] Edmondo Trentin and Marco Gori, “A survey of hybrid ANN/HMM models for acoustic speech recognition,” in Neurocomputing Vol. 37, No. 1, pp.91-126, 2001 [3] H. Hermansky, D.P.W. Ellis, S. Sharma, “Tandem connectionist feature extraction for conventional HMM systems,” in Proc. ICASSP 2000 [4] Huda Mohammad Nural, Muhammad Ghulam, Junsei Horikawa and Tsuneo Nitta, “Distinctive phonetic feature (DPF) based phone segmentation using hybrid neural networks,” in Proc. Interspeech 2007 [5] Jitendra Ajmera and Masami Akamine, “Speech recognition using soft decision trees,” in Proc. Interspeech 2008 [6] Remco Teunen and Masami Akamine, “HMM-based speech recognition using decision trees instead of GMMs,” in Proc. Interspeech 2007 [7] Jitendra Ajmera and Masami Akamine, “Decision tree acoustic models for ASR,” in Proc. Interspeech 2009 [8] Biing-Hwang Juang, Wu Hou, Chin-Hui Lee, “Minimum classification error rate methods for speech recognition,” in IEEE Trans. Speech and Audio Processing, May 1997 [9] Leo Breiman and Adele Cutler, “Random forests,” in Machine Learning Vol. 45, October, 2001, p5-32 [10] Yoav Freund, Robert E. Schapire, “Experiments with a new boosting algorithm,” in Proc. ICML 1996 [11] Chris Seiffert, Taghi M. Khoshgoftaar, Jason Van Hulse and Amri Napolitano, “Resampling or reweighting: a comparison of boosting implementations,” in Proc. ICTAI 2008 [12] M. Riedmiller, H. Braun, “A direct adaptive method for faster backpropagation learning: The RPROP algorithm,” in Proc. ICNN 1993 [13] Cristina Olara and Louis Wehenkel, “A complete fuzzy decision tree technique,” in Fuzzy Sets and Systems Volume 138, Number 2,pp.221–254, 2003 [14] P.P. Bonissone, J.M. Cadenas, M.C. Garrido and R.A. Díaz-Valladares, “A fuzzy random forest: Fundamental for design and construction,” in Proc. IPMU 2008 [15] Chiu-Yu Tseng and Fu-Chiang Chou, “Machine reachable phonetic transcription system for Chinese dialects spoken in Taiwan,” The First Oriental COCOSDA Workshop, 1998 [16] Cambridge University Engineering Dept. (CUED), Machine Intelligence Laboratory, “HTK,” http://htk.eng.cam.ac.uk [17] SRI Speech Technology and Research Laboratory, “SRILM,” http://www.speech.sri.com/projects/srilm [18] 潘奕誠,『大字彙中文連續語音辨認之一段式及以詞圖為基礎之搜尋演算法』,碩士論文,國立臺灣大學資訊工程研究所,2002 [19] X. Huang, A. Acero, H.-W. Hon, “Spoken language processing,” in Pearson Education Taiwan Ltd, pp.424-426, 2005 [20] Sadaoki Fruri, “Cepstral analysis technique for automatic speaker verification,” in IEEE Trans. Acoustics, Speech and Signal Processing, Vol. 29, No. 2, pp.254-272, 1981 [21] Slava M. Katz, “Estimation of probabilities from sparse data for other language component of a speech recognizer,” in IEEE Trans. Acoustics, Speech and Signal Processing, Vol. 35, No. 3, pp.400-401, 1987 [22] H. Hermansky and P. Fousek, “Multi-resolution rasta filtering for tandem-based ASR,” in Proc. Interspeech 2005 [23] F. Valente and H. Hermansky, “Hierarchical and parallel processing of modulation spectrum for ASR application,” in Proc. ICASSP 2008 [24] S. Y. Zhao and N. Morgan, “Multi-stream spectro-temporal features for robust speech recognition,” in Proc. Interspeech 2008 [25] H. Hermansky and S. Sharma, “Temporal patterns (TRAPs) in ASR of noisy speech,” in Proc. ICASSP 1999 [26] D. Yu, L. Deng, X. He and A. Acero, “Large-margin minimum classification error training: A theoretical risk minimization perspective,” in Computer Speech and Language, Vol. 22, pp. 415-429, Oct. 2008 [27] Dong Yu, Li Deng, Xiaodong He and Alex Acero, “Use of incrementally regulated discriminative margins in MCE training for speech recognition,” in Proc. Interspeech 2006 [28] D. W. Prunell and E. C. Botha, “Improved generalization of MCE parameter estimation with application to speech recognition,” in IEEE Trans. Speech and Audio Processing ,Vol. 10, pp. 232-239, 2002 | |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/44663 | - |
| dc.description.abstract | 近年來由於機器學習(machine learning)的蓬勃發展,有越來越多語音相關的研究開始使用各式各樣該領域的新技術與新模型。其中有一群人的作法是維持傳統聲學模型常用的隱馬可夫模型(hidden Markov model, HMM)的架構來處理語音訊號在相近時間上的相依性,再加入其他機器學習裡的分類器作為輔助。他們的方法可概括分為混合式(hybrid)聲學模型及串接式(tandem)聲學模型。
本論文嘗詴使用一些樹狀分類器的集成(ensemble)來取代串接式聲學模型中常用的多層感知器(multi-layer perceptron, MLP),並為了配合隱馬可夫模型中高斯混合模型(Gaussian mixture model, GMM)的使用,引入模糊理論(fuzzy theory)的想法來得到連續的事後機率向量分布。本論文所使用的前端分類器有模糊隨機森林(fuzzy random forest, FRF)與以模糊決策樹(fuzzy decision tree, FDT)為基底學習器的AdaBoost.M2,並提出幾個降低計算複雜度的方法與分類錯誤最小化訓練的改進。實驗結果顯示,雖然進步量比不上常用的多層感知器,本論文提出的作法還是優於傳統的聲學模型。最後分析串接式聲學模型的有效條件並舉實驗結果加以驗證,再依分析結果選取適當參數重新訓練前端分類器,進一步提升字元正確率;相對於傳統聲學模型的字元錯誤率,可以得到8.56%的相對進步率。 | zh_TW |
| dc.description.provenance | Made available in DSpace on 2021-06-15T03:52:31Z (GMT). No. of bitstreams: 1 ntu-99-R97922051-1.pdf: 1286446 bytes, checksum: 32446c7818cc97ea2e08230f3c07fbe8 (MD5) Previous issue date: 2010 | en |
| dc.description.tableofcontents | 口試委員會審定書 ........................................................................................................... i
摘要 ................................................................................................................................. iii 目錄 ................................................................................................................................ v 圖目錄 ............................................................................................................................. viii 表目錄 ............................................................................................................................ ix Chapter 1 緒論 .......................................................................................................... 1 1.1. 研究動機 ......................................................................................................... 1 1.2. 統計式語音辨識原理 ..................................................................................... 1 1.2.1. 聲學模型 ............................................................................................ 2 1.2.2. 語言模型 ............................................................................................ 3 1.3. 傳統聲學模型的缺點 ..................................................................................... 3 1.4. 聲學模型與機器學習的結合 ......................................................................... 4 1.4.1. 混合式聲學模型 ................................................................................ 4 1.4.2. 串接式聲學模型 ................................................................................ 4 1.5. 研究方法及成果 ............................................................................................. 5 1.6. 論文章節概要 ................................................................................................. 5 Chapter 2 背景知識 .................................................................................................. 7 2.1. 決策樹 ............................................................................................................. 7 2.2. 隨機森林 ......................................................................................................... 8 2.3. AdaBoost ......................................................................................................... 9 2.4. RPROP演算法 ............................................................................................. 11 2.5. 分類錯誤最小化訓練 ................................................................................... 12 2.6. 實驗架構與初步實驗結果 ........................................................................... 13 2.6.1. 實驗語料 .......................................................................................... 13 2.6.2. 訓練與辨識系統工具 ...................................................................... 14 2.6.3. 前端處理 .......................................................................................... 14 2.6.4. 聲學模型設定 .................................................................................. 15 2.6.5. 辭典與語言模型設定 ...................................................................... 15 2.6.6. 基於決策樹之串接式聲學模型 ...................................................... 16 2.6.7. 初步實驗結果 .................................................................................. 17 2.6.8. 結論 .................................................................................................. 19 Chapter 3 基於模糊隨機森林之串接式聲學模型 ................................................ 21 3.1. 模糊決策樹 ................................................................................................... 21 3.1.1. 模糊決策樹之架構 .......................................................................... 21 3.1.2. 模糊決策樹之訓練方法 .................................................................. 22 3.2. 模糊隨機森林 ............................................................................................... 24 3.2.1. 模糊隨機森林之近似 ...................................................................... 25 3.2.2. 模糊隨機森林之訓練方法 .............................................................. 26 3.2.3. 特徵串流 .......................................................................................... 28 3.3. 實驗設定 ....................................................................................................... 30 3.4. 實驗結果 ....................................................................................................... 31 3.5. 本章結論 ....................................................................................................... 32 Chapter 4 基於AdaBoost之串接式聲學模型 ...................................................... 35 4.1. 模糊決策子樹 ............................................................................................... 35 4.2. 前端分類器訓練流程 ................................................................................... 36 4.2.1. 第一種訓練流程 .............................................................................. 36 4.2.2. 第二種訓練流程 .............................................................................. 38 4.2.3. 第二種訓練流程套用於模糊隨機森林 .......................................... 39 4.3. 實驗設定 .......................................................................................................40 4.4. 實驗結果 ....................................................................................................... 40 4.5. 本章結論 ....................................................................................................... 42 Chapter 5 實驗結果分析 ........................................................................................ 45 5.1. 串接式聲學模型有效的條件 ....................................................................... 45 5.2. 驗證實驗 ....................................................................................................... 46 5.3. 選擇適當模型參數 ....................................................................................... 50 5.4. 本章結論 ....................................................................................................... 54 Chapter 6 結論與展望 ............................................................................................ 55 6.1. 結論 ............................................................................................................... 55 6.2. 展望 ............................................................................................................... 56 參考文獻 ........................................................................................................................ 57 | |
| dc.language.iso | zh-TW | |
| dc.subject | AdaBoost | zh_TW |
| dc.subject | 中文大字彙語音辨識 | zh_TW |
| dc.subject | 串接式聲學模型 | zh_TW |
| dc.subject | 模糊決策樹 | zh_TW |
| dc.subject | 模糊隨機森林 | zh_TW |
| dc.subject | tandem system | en |
| dc.subject | Mandarin LVCSR | en |
| dc.subject | AdaBoost | en |
| dc.subject | fuzzy random forest | en |
| dc.subject | fuzzy decision tree | en |
| dc.title | 使用基於樹狀分類器之串接式聲學模型之中文大字彙語音辨識 | zh_TW |
| dc.title | Tandem System with Tree-Based Classifiers for Mandarin LVCSR | en |
| dc.type | Thesis | |
| dc.date.schoolyear | 98-2 | |
| dc.description.degree | 碩士 | |
| dc.contributor.oralexamcommittee | 王小川,陳信宏,鄭秋豫 | |
| dc.subject.keyword | 中文大字彙語音辨識,串接式聲學模型,模糊決策樹,模糊隨機森林,AdaBoost, | zh_TW |
| dc.subject.keyword | Mandarin LVCSR,tandem system,fuzzy decision tree,fuzzy random forest,AdaBoost, | en |
| dc.relation.page | 59 | |
| dc.rights.note | 有償授權 | |
| dc.date.accepted | 2010-07-08 | |
| dc.contributor.author-college | 電機資訊學院 | zh_TW |
| dc.contributor.author-dept | 資訊工程學研究所 | zh_TW |
| 顯示於系所單位: | 資訊工程學系 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-99-1.pdf 未授權公開取用 | 1.26 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
