請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/65632完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 李琳山 | |
| dc.contributor.author | Yang Chang | en |
| dc.contributor.author | 張暘 | zh_TW |
| dc.date.accessioned | 2021-06-16T23:55:06Z | - |
| dc.date.available | 2012-07-20 | |
| dc.date.copyright | 2012-07-20 | |
| dc.date.issued | 2012 | |
| dc.date.submitted | 2012-07-19 | |
| dc.identifier.citation | [1] L. Deng and X. Huang, “Challenges in adopting speech recognition”, Communications of the ACM, vol. 47, no. 1, pp. 69-75,2004
[2] D. O’Shaughnessy, “Automatic speech recognition: History, methods and challenges”, invited paper, Pattern Recognition,2008. [3] M. J. F. Gales, “Model-based techniques for noise robust speech recognition”, ph.D dissertation, Cambridge University. 1995. [4] M. J. F. Gales, S. J. Young, “Cepstral parameter compensation for HMM recognition”, Speech Communication, vol. 12, no. 3, pp.231-239, July 1993. [5] J. Dropoo, A. Acero, and L. Deng, “Uncertainty decoding with SPLICE for noise robust speech recognition,” in Proc. IEEE Int. Conf. Acoust. Speech, Signal Processing, vol. 1, May 2002, pp. 57-60 [6] H. Liao and M.J.F. Gales, “Joint uncertainty decoding for noise robust speech recognition,” in Proc. Eurospeech-2005, Sep. 2005, pp. 3129-3132 [7] J.A. Arrowood and M.A. Clements, “Using observation uncertainty in HMM decoding,” in Proc. ICSLP, Sep. 2002, pp. 1561-1564 [8] ---,”Robust speech recognition in additive and convolutional noise using parallel model combination,” Computer Speech and Language, vol. 9, no. 4, pp. 289-307, Oct. 1995. [9] ---,”A fast and flexibble implementation of parallel model combination,” in Proc. IEEE Int. Conf. Acoust. Speech, Signal Processing, vol. 1, May 1995, pp. 133-136 [10] P.J.Moreno, B.Raj, R.M.Stern, “A vector Taylor series approach for environment-independent speech recognition”, in Proc. ICASSP 1996, pp.733-736 [11] A.Acero, L.Deng, T,Kristjansson, and J.Zhang, “HMM adaptation using vector Taylor series for noisy robust speech recognition”, in Proc. ICSLP 2000,pp.869-872 [12] C.J.Leggetter, P.C.Woodland, “Maxumum likelihood linear regression for speaker adaptation of continuous density hidden Markov models”, Computer Speech and language, vol. 9,no. 2,pp.171-185,1995 [13] D.K.Kim, M.J.F.Gales, “Adaptative training with noisy constrained maximum likelihood linear regressionfor noise robust speech recognition”, in Proc. Interspeech 2009, pp.2383-2386 [14] C-W.Hsu, L-S.Lee, “Higher order cepstral moment normalization for improved robust speech recognition”, IEEE Trans. On Audio, Speech, and Language Processing, vol. 17, no. 2, pp.205-220,Feb. 2009 [15] L-C.Sun, C-W.Hsu, L-S.Lee, “Modulation spectrum equalization for robust speech recognition”, in Proc. IEEE ASRU 2007, pp.81-86 [16] L-C.Sun, C-W.Hsu, L-S.Lee, “Evaluation of modulation spectrum equalization for large vocabulary robust speech recognition”, in Proc. Interspeech 2008, pp.1004-1007. [17] Y. Chen. C.-Y. Wan, and L.-S. Lee, “Entropy-based feature parameter weighting for robust speech recognition,” in Proc. IEEE Int. Conf. Acoustic. Speech, Signal Processing, vol. 1, May 2006, pp. 41-44. [18] Y. Chen. C.-Y. Wan, and L.-S. Lee, “Confusion-based entropy-weighted decoding for robust speech recognition,” in Proc. Interspeech 2008-ICSLP, Sep. 2008. [19] Yang Chang, L.-S. Lee, “Two-dimensional frame-and-feature weighted Viterbi decoding for robust speech recognition,” in Proc. ICASSP 2012, March. [20] H.-G. Hirsch and D. Pearce, “The AURORA experimental framework for the performance evaluation of speech recognition systems under noisy conditions,” in Proc. ISCA ITRW ASR-2000, Sep. 2000, pp. 181-188. [21] N. Parihar and J.Picone, “Aurora Working Group: DSR Front End LVCSR Evaluation AU/384/02”, Institute for Signal and Information Processing, Mississippi State University [22] S. Young, G. Evermann, M. Gales, T. Hain, D. Kershaw, X. Liu, G.Moore, J. Odell, D. Ollasom, D. Povey, V. Valtchev, and P. Woodland, “The HTK Book (for HTK Version 3.4)”. Cambridge, U.K.: CambridgeUniv.Press, 2006. [23] J-W.Hung,W-Y.Tsai, “Constructing modulation frequency domain based features for robust speech recognition”, in IEEE Trans. On Audio, Speech, and Language Processing, vol. 16, Issue 3, pp.563-577, Mar. 2008. [24] H.You, A.Alwan, “Temporal modulation processing of speech signals for noise robust ASR”, in Proc. Interspeech 2009, pp.36-39 [25] S.Ganapathy, S.Thomas, H.Hermansky, “Robust spectro-temporal features based on autoregressive models of Hilbert envelopes”, in Proc. ICASSP 2010, pp.4286-4289. [26] L.Garcia, J.C.Segura, C.Benitez, J.Ramirez, and A.de la Torre, “Normalization of the inter-frame information using smoothing filtering”, in Proc. Interspeech 2006, pp.369-372. [27] X.Xiao, E.S.Chng, H.Li, “Normalizing the speech modulation spectrum for robust speech recognition”, in Proc. ICASSP 2007, pp.1021-1024. [28] X.Xiao, E.S.Chng, and H.LI, “Normalization of the speech modulation spectra for robust speech recognition”, in IEEE Trans. On Audio, Speech, and Language Processing, vol. 16, no. 8, Nov. 2008. [29] W-H. Tu, S-Y.Huang, and J-H.Hung, “Sub-band modulation spectrum compensation for robust speech recognition”, IEEE ASRU 2009. [30] A.-T. Yu, H.-C. Wang, “New Speech Harmonic Structure Measure and Its Application to Post Speech Enhancement,” ICASSP 2004. [31] S. Vaseghi, E. Zavarehei, Q. Yan, “Speech Bandwidth Extension: Extrapolations of Spectral Envelope and Harmonicity Quality of Extraction,” ICASSP 2006. [32] K. R. Krishnamachari, R. E. Yantorno, “Spectral Autocorrelation Ratio as a Usability Measure of Speech Segments under Co-Channel Conditions,” IEEE International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS), 2000. [33] X.Lu, S.Matsuda, M.Unoki, and S.Nakamura, “Temporal modulation normalization for robust soeech feature extraction and recognition”, in Proc. IEEE CISP 2009. [34] X.Lu, S.Matsuda, and S.Nakamura, “Normalization on temporal modulation transfer function for robust speech recognition”, in Proc. Second International Symposium on Universal Communication 2008. [35] X.Lu, S.Matsuda, and S.Nakamura, “Temporal contrast normalization and edge-preserved smoothing of temporal modulation structures of speech for robust speech recognition”, Speech Communication 2010. | |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/65632 | - |
| dc.description.abstract | 本篇論文先概括性的介紹了各種語音強健化的演算法,諸如: 倒頻譜平均值消去法(Cepstral Moment Subtraction, CMS)、倒頻譜正規化法(Cepstral Mean and Variance Normalization, CMVN)、倒頻譜分佈等化法(Histogram Equalization, HEQ)、高階倒頻譜動差正規化法(Higher Order Cepstral Moment Normalization, HOCMN)……等方式,同時也介紹了目前在語音強健化領域所公訂的國際標準語料Aurora 2和Aurora 4,並報告了在這兩套語料上基礎的實驗結果。
以亂度為基礎之特徵權重法(entropy-based feature weighting)、以混淆為基礎之特徵權重法(confusion-based feature weighting)考慮不同參數的辨別能力,在辨識時給予不同參數不同的權重,加強那些擁有較好辨別能力的參數,並利用混淆矩陣(confusion matrix)多加考慮了各種音素之間可能發生錯誤的情形,此種方法除了可以直接應用在梅爾倒頻上,更可和許多現行的語音強健化法做結合。 以支撐向量辨識器為基礎之音框權重法(SVM-based frame weighting),使用機器學習(machine learning)中的支撐向量機器(Support Vector Machine)作為機器,利用音框的能量分佈(energy distribution)及諧波率析(harmonicity estimation)將測試資料分為可信賴音框(reliable frames)、不可信賴音框(unreliable frames),在做辨識過程中時較為依賴可信賴音框來幫助辨識,因此利用支撐向量辨識器的分數來給予可信賴音框較大的權重、不可信賴音框較低的權重。 最後試著結合以混淆為基礎之特徵權重法(confusion-based feature weighting)及以支撐向量辨識器為基礎之音框權重法(SVM-based frame weighting)而成為二維特徵音框權重維特比演算法(Two-dimensional frame-and-feature weighted Viterbi decoding),給予不同參數不同比重、不同音框不同權重。此種方法結合了上述兩種方法的優點而得到更好的進步表現。 | zh_TW |
| dc.description.abstract | In this paper we propose a new approach of two-dimensional frame-and-feature weighted Viterbi decoding performed at the recognizer back-end for robust speech recognition. The frame weighting is based on an Support Vector Machine (SVM) classifier considering the energy distribution and cross-correlation spectrum of the frame. The basic idea is that voiced frames with higher harmonicity is in general more reliable than other frames in noisy speech and therefore should be weighted higher. The feature weighting is based on an entropy measure considering confusion between phoneme classes. The basic idea is that the scores obtained with more discriminating features causing less confusion between phonemes should be weighted higher. These two different weighting schemes on the two different dimensions, frames and features, are then properly integrated in Viterbi decoding. Very significant improvements were achieved in extensive experiments performed with the Aurora 4 testing environment for all types of noise and all SNR values. | en |
| dc.description.provenance | Made available in DSpace on 2021-06-16T23:55:06Z (GMT). No. of bitstreams: 1 ntu-101-R99942057-1.pdf: 10120554 bytes, checksum: 327d9a782a677df9137e3fa1686d1cad (MD5) Previous issue date: 2012 | en |
| dc.description.tableofcontents | 論文摘要.........................................................................................................................i
目錄...............................................................................................................................iii 表目錄..........................................................................................................................vii 圖目錄...........................................................................................................................ix 一、 導論........................................................................................................................1 1.1 研究動機........................................................................................................1 1.2 強健性處理方法............................................................................................2 1.3 主要研究成果................................................................................................3 1.4 章節摘要........................................................................................................4 二、 以混淆為基礎之特徵參數權重............................................................................5 2.1 背景知識........................................................................................................5 2.1.1 倒頻譜平均值消去法…....................................................................5 2.1.2 倒頻譜正規化法…............................................................................5 2.1.3 倒頻譜分佈等化法…........................................................................6 2.1.4 高階動差正規化法…........................................................................7 2.2 實驗環境……................................................................................................8 2.2.1 Aurora 2 測試語料…........................................................................8 2.2.2 Aurora 4 測試語料…......................................................................13 2.3 基礎系統(Baseline System)之實驗結果.....................................................15 2.4 本章結論......................................................................................................20 三、 以混淆為基礎之特徵權重法..............................................................................21 3.1 背景介紹......................................................................................................21 3.2 以亂度為基礎之特徵權重法(Entropy-based FeatureWeighting)…....................................................................................23 3.2.1 每一維特徵參數高斯混合模型(Gaussian Mixture Model)之分數…..............................................................................................23 3.2.2 以亂度為基礎之特徵權重法(Entropy-based Feature Weighting)…................................................................................24 3.3 以混淆為基礎之特徵權重法(Confusion-based FeatureWeighting)........................................................................................27 3.3.1 以混淆為基礎之特徵權重法(Confusion-based Feature Weighting)之觀念…....................................................................27 3.3.2 混淆矩陣(Confusion Matrix)…...................................................28 3.3.3 以混淆為基礎之特徵權重法(Confusion-based Feature Weighting)…................................................................................29 3.4 本章結論......................................................................................................30 四、 Aurora 2及Aurora 4之實驗結果…...................................................................32 4.1 前言介紹…..................................................................................................32 4.2 Aurora 2 之實驗結果…..............................................................................32 4.2.1 Aurora 2實驗環境簡介…...........................................................32 4.2.2 以混淆為基礎之特徵權重法,直接應用於梅爾倒頻譜…..............................................................................................33 4.2.3 以混淆為基礎之特徵權重法,疊加其他參數正規化法…..............................................................................................34 4.3 Aurora 4 之實驗結果…..............................................................................37 4.3.1 Aurora 4實驗環境簡介…...........................................................37 4.3.2 以混淆為基礎之特徵權重法,直接應用於梅爾倒頻譜…..............................................................................................37 4.3.3 以混淆為基礎之特徵權重法,疊加其他參數正規化法…..............................................................................................40 4.3.4 音素(monophone)錯誤率之進一步分析…................................44 4.3.5 複合式情境訓練語料乾淨語音之比例…..................................47 4.4 本章結論…..................................................................................................48 五、 以支撐向量機器為基礎之音框權重解碼法…..................................................50 5.1 前言介紹…..................................................................................................50 5.2 以支撐向量機器為基礎之音框權重解碼法(SVM-based Frame-weighted Viterbi Decoding)….....................................................................................50 5.2.1 整體架構圖(Overall block diagram)…...........................................50 5.2.2 各音框(Frame)之能量分佈(Energy Distribution)及諧波度分析(Harmonicity estimation)….........................................................................51 5.2.3 以支撐向量機(Support Vector Machine, SVM)為辨識器(Classifier)……………………………………………………………...….53 5.3 以支撐向量機器為基礎之音框權重解碼法(SVM-based Frame-weighted Decoding)在Aurora 4上之實驗結果…......................................................55 5.3.1 以支撐向量機器為基礎之音框權重解碼法(SVM-based Frame-weighted Viterbi Decoding) 直接應用於梅爾倒頻譜之結果……55 5.3.2 以支撐向量機器為基礎之音框權重解碼法(SVM-based Frame-weighted Decoding) 疊加其他正規化法 …................................57 5.4 本章結論…..................................................................................................61 六、 二維特徵音框權重維特比演算法…..................................................................62 6.1 前言介紹…..................................................................................................62 6.2 二維特徵音框權重維特比演算法架構…..................................................62 6.3 二維特徵音框權重維特比演算法應用於Aurora 4之實驗結果…..........63 6.4 音素(monophone)錯誤率之進一步分析…................................................65 6.5 本章結論......................................................................................................68 七、 二維特徵音框權重維特比演算法之進一步改進…..........................................69 7.1 應用二維維特比演算法之子音、母音錯誤率之分析…..........................69 7.2 清音權重調整之進一步演算法…..............................................................70 7.3 清音權重調整之進一步演算法在Aurora 4上之結果….........................71 7.4 本章結論….................................................................................................73 八、 前端調變頻譜之正規化分析….........................................................................74 8.1 基於調變頻譜之平均值消去法、正規化法….........................................74 8.2 基於調變頻譜之平均值消去法、正規化法實驗結果….........................78 8.3 本章結論….................................................................................................80 九、 結論與展望…......................................................................................................81 9.1 結論…..........................................................................................................81 9.2 展望…..........................................................................................................83 參考文獻…..................................................................................................................84 | |
| dc.language.iso | zh-TW | |
| dc.subject | 語音處理 | zh_TW |
| dc.subject | 強健化 | zh_TW |
| dc.subject | speech recognition | en |
| dc.subject | robustness | en |
| dc.title | 使用二維特徵音框權重法及調變頻譜正規化之強健型語音辨識 | zh_TW |
| dc.title | Robust Speech Recognition with Two-dimensional Frame-and-feature Weighting and Modulation Spectrum Normalization | en |
| dc.type | Thesis | |
| dc.date.schoolyear | 100-2 | |
| dc.description.degree | 碩士 | |
| dc.contributor.oralexamcommittee | 王小川,陳信宏,簡仁宗,鄭秋豫 | |
| dc.subject.keyword | 語音處理,強健化, | zh_TW |
| dc.subject.keyword | speech recognition,robustness, | en |
| dc.relation.page | 87 | |
| dc.rights.note | 有償授權 | |
| dc.date.accepted | 2012-07-19 | |
| dc.contributor.author-college | 電機資訊學院 | zh_TW |
| dc.contributor.author-dept | 電信工程學研究所 | zh_TW |
| 顯示於系所單位: | 電信工程學研究所 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-101-1.pdf 未授權公開取用 | 9.88 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
