強健性語音辨識中處理感知線性預測參數與梅爾倒頻譜係數之進一步方法

Feng-Seng Chu; 朱峰森

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/36510

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	李琳山
dc.contributor.author	Feng-Seng Chu	en
dc.contributor.author	朱峰森	zh_TW
dc.date.accessioned	2021-06-13T08:03:34Z	-
dc.date.available	2005-07-27
dc.date.copyright	2005-07-27
dc.date.issued	2005
dc.date.submitted	2005-07-21
dc.identifier.citation	[1] L.-s. Lee and Y. Lee, “Voice Access of Global Information for Broad-band Wireless:Technologies of Today and Challenges of Tomorrow”, Proceedings of the IEEE, Jan 2001. [2] Y. Gong, “Speech Recognition in Noisy Environment:A Survey” , Speech Communication. 16, 1995. [3] A.E. Rosenberg, C. –H. Lee, and F. K. Soong, “Cepstral Channel Normalization Techniques for HMM-based speaker Verification” , ICSLP , 1992 [4] O. Vikki and K. Laurila, “Noise Robust HMM-based Speech Recognition Using Segmental Cepstral Feature Normalization” , in ECSA NATO Workshop Robust Speech Recognition Unknown Communication Channels , France, 1997. [5] H. Hermansky and N. Morgan, “RASTA Processing of Speech” , IEEE Trans. On Speech and Audio Processing. 2, 1994. [6] C. Avendano, S. v. Vuuren, and H. Hermansky, “Data Based Filter Design for RASTA-like Channel Normalization for ASR” , ICASSP, 1996. [7] J. –w. Huang, et al “Comparative Analysis for Data-Driven Temporal Filters Obtained via Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) in Speech Recognition” , Eurospeech 2001. [8] M.J.F. Gales , ”Model-based Techniques for Noise Robust Speech Recognition” , University of Cambridge, Sep. 1995. [9] J.W. Hung, J.L. Shen, L.S. Lee, “New Approaches for Domain Transformation and Parameter Combination for Improved Accuracy in Parallel Model Combination (PMC) Techniques” , IEEE Trans. on Speech and Audio Processing, Nov. 2001. [10] J.L. Gauvain and C.H.Lee, “Maximum a Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains” , IEEE Trans. on Speech and Audio Processing, 1994. [11] C.J. Leggetter and P.C. Woodland, “Maximum Likelihood Linear Regression for Speaker Adaptation of Continuous Density Hidden Markov Models” , Computer Speech and Language, 1995. [12] P. Lockwood and J. Boudy, “Experiments with a Nonlinear Spectral Subtractor (NSS), Hidden Markov Models and the Projection, for Robust Speech Recognition in Cars”, Eurospeech 1991. [13] J. Sohn, N. S. Kim, and W. Sung, “A Statistical Model-Based Voice Activity Detection”, IEEE Signal Processing. Letters, Vol. 6, No. 1,January 1999. [14] J. Ramirez et al, “A Adaptive Long-Term Spectral Estimation Voice Activity Detection”, Eurospeech, 2003. [15] B.A. Mellor and A.P. Varga, “Noise Masking in the MFCC Domain for the Recognition of Speech in Background Noise”, ICASSP 1992. [16] 18] Y. Ephraim and H.L. Van Trees, “A Signal Subspace Approach for Speech Enhancement”, IEEE Trans. on Speech and Audio Processing, 1995. [17] J. Droppo, L. Deng and A. Acero, “Evaluation of SPLICE on the Aurora 2 and 3 Tasks,” ICSLP 2002. [18] Jancovic, P., Ming, J., “Combing the Union Model and Missing Feature Method to Improve Noise Robustness in ASR”, ICASSP 2002 [19] D. Macho, et al, “Evaluation of a Noise-Robust DSR Front-End on AURORA Databases”, ICSLP 2002. [20] Agarwal, A., Cheng, and Y.M., “Two-Stage Mel-Warped Wiener Filter for Robust Speech Recognition”, Proc. ASRU’99, 1999. [21] Macho, D., Cheng, Y.M., “SNR-dependent Waveform Processing for Robust Speech Recognition”, Proc. ICASSP’01, 2001. [22] N.-c. Wang, J.-h. Hung, and L.-s. Lee, “Data-Driven Temporal Filters Based on Multi-Eigenvectors for Robust Features in Speech Recognition”, ICASSP, 2003. [23] 王迺鈞,”強健性語音辨識中以多特性向量為基礎的特徵參數及相關研究”，台灣大學碩士論文，2003. [24] ETSI standard, ETSI ES 202 050 v1.1.3 (2003-11), “Speech Processing, Transmission and Quality Aspects (STQ); Distributed speech recognition; Advanced front-end feature extraction algorithm; Compression algorithms” [25] H.G. Hirsch and D. Pearce, “The Aurora Experimental Framework for the Performance Evaluation of Speech Recognition Systems under Noisy Conditions”, ISCA ITRW ASR2000, Automatic Speech Recognition: Challenges for the Next Millennium, 2000. [26] ETSI standard, ETSI ES 201 108 v1.1.3 (2003-09), “Speech Processing, Transmission and Quality Aspects (STQ); Distributed speech recognition; Front-end feature extraction algorithm; Compression algorithms” [27] H. Hermansky, “Perceptual linear predictive(PLP) analysis of speech”, J. Acoust. Soc. Am,1990 [28] L.Mauuary, “Blind Equalization in the Cepstral Domain for Robust Telephone based Speech Recognition”, EUSPICO 1998.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/36510	-
dc.description.abstract	為了使語音辨識成為隨時隨地都可以使用的人機介面，探討如何提高其強健性，減低環境不匹配對辨識率的影響，便成為一個很重要的研究方向。本論文即是藉由前端對辨識參數的處理來提升對聲學環境改變的強健性。本論文以兩種最為主流的特徵參數，亦即梅爾倒頻譜係數（Mel Frequency Cepstrum coefficient，MFCC）與感知線性預測參數（Perceptual Linear Prediction）為對象，分別結合各種強健性處理的技術，並討論將來兩種特徵參數整合的可能性。實驗結果顯示，在未曾結合任何強健性處理時，感知線性預測參數的效果要比梅爾倒頻譜係數要好，在乾淨語音訓練模式下，對所有的測試條件做平均之後，感知線性預測參數基礎實驗的正確率為63.38%，而梅爾倒頻譜係數僅為60.3%，但結合強健性處理之後，兩者便介於伯仲之間。若嚐試將各種強健性的方法以串接的方式結合，那麼就會發現僅有某些結合有加成性，而大部分都是沒有的。本論文並仔細測試出兩種有加成性的組合方式，分別是將特徵參數向量正規化之後再結合多特性向量時域濾波器，以及兩階式維納濾波器結合訊噪比相關波形處理法及盲目等化法，後者事實上與ETSI所提出的進階式前端處理（Advance Front End）[24]的差異僅是本論文同時測試了使用梅爾倒頻譜係數及感知線性預測參數的狀況，然後比較了兩者的差異。本論文最後進一步嚐試用各種不同的方式去整合兩種不同的特徵參數，希望利用兩者間互補的資訊得到比兩者任一都要好的結果。實驗結果顯示，整合後的確可獲得較佳的結果。	zh_TW
dc.description.provenance	Made available in DSpace on 2021-06-13T08:03:34Z (GMT). No. of bitstreams: 1 ntu-94-R92942100-1.pdf: 2064880 bytes, checksum: 8fca8418d178b9f754bc5a92f0762c7c (MD5) Previous issue date: 2005	en
dc.description.tableofcontents	大綱 1. 導論： 1.1 研究動機 ••••••••••••••••••••••••••••••• 1 1.2 研究主題 ••••••••••••••••••••••••••••••• 2 1.3 主要成果 ••••••••••••••••••••••••••••••• 4 1.4 章節概要 ••••••••••••••••••••••••••••••• 5 2. 背景知識： 2.1 特徵參數正規化（feature normalization） •••••••••••••••••• 7 2.1.1 倒頻譜平均消去法（Cepstrum Mean Subtraction） ••••••••••• 7 2.1.2 倒頻譜正規化法（Cepstrum Normalization） ••••••••••••• 8 2.2 兩階式維納濾波器（two-stage Wiener filter） •••••••••••••••• 8 2.3 訊噪比相關波形處理法（SNR dependent waveform processing） •••••••• 15 2.4 倒頻譜領域上的處理（Cepstrum domain processing） •••••••••••• 17 2.4.1 主成份分析法（Principle Component Analysis） •••••••••••• 18 2.4.2 線性鑑別分析法（Linear Discriminant Analysis） ••••••••••• 19 2.5 多特性向量時間濾波器技術（multi-eigenvectors temporal filtering） •••••• 21 2.6 本章結論 •••••••••••••••••••••••••••••• 22 3. 實驗環境： 3.1 測試環境介紹 ••••••••••••••••••••••••••••• 24 3.2 語音特徵參數的擷取 •••••••••••••••••••••••••• 31 3.2.1 梅爾倒頻譜係數（MFCC） •••••••••••••••••••• 31 3.2.2 感知線性預測（PLP） •••••••••••••••••••••• 32 3.3 聲學模型及辨識效能之評估 ••••••••••••••••••••••• 35 3.4 基礎系統之實驗結果 •••••••••••••••••••••••••• 36 3.5 本章結論 ••••••••••••••••••••••••••••••• 42 4. 感知線性預測與梅爾倒頻譜係數結合各種強健性處理之效果比較 4.1 特徵參數正規化 •••••••••••••••••••••••••••• 43 4.2 兩階式維納濾波器 ••••••••••••••••••••••••••• 51 4.3 訊噪比相關波形處理法 ••••••••••••••••••••••••• 55 4.4 倒頻譜領域上的處理 •••••••••••••••••••••••••• 58 4.4.1 線性鑑別法 ••••••••••••••••••••••••••• 58 4.4.2 主成份分析法 •••••••••••••••••••••••••• 62 4.5 多特性向量時間濾波器技術 ••••••••••••••••••••••• 65 4.6 本章結論 ••••••••••••••••••••••••••••••• 69 5. 結合各種強健性處理之比較 5.1 特徵參數正規化結合多特性向量時間濾波器技術••••••••••••••• 70 5.2 兩階式維納濾波器結合訊噪比相關波形處理法與盲目等化法 ••••••••• 80 5.3 本章結論•••••••••••••••••••••••••••••••• 89 6. 兩種語音特徵參數的結合 6.1 串接後結合主成份分析法降維 ••••••••••••••••••••••• 92 6.2 特徵參數正規化後相加 •••••••••••••••••••••••••• 98 6.2.1 加權平均••••••••••••••••••••••••••••• 98 6.2.2 訊噪比相關特徵參數合併法 ••••••••••••••••••••• 102 6.3 本章結論•••••••••••••••••••••••••••••••• 107 7. 結論與展望 7.1 結論 •••••••••••••••••••••••••••••••••• 109 7.2 展望 •••••••••••••••••••••••••••••••••• 112 8. 參考資料
dc.language.iso	zh-TW
dc.title	強健性語音辨識中處理感知線性預測參數與梅爾倒頻譜係數之進一步方法	zh_TW
dc.title	Improved Approaches of Processing Perceptual Linear Prediction（PLP）and Mel Frequency Cepstrum Coefficient（MFCC）Parameters for Robust Speech Recognition	en
dc.type	Thesis
dc.date.schoolyear	93-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	貝蘇章,陳信宏,王小川,鄭秋豫
dc.subject.keyword	梅爾倒頻譜係數,桿之線性預測參數,強健性,	zh_TW
dc.subject.keyword	MFCC,PLP,Robustness,	en
dc.relation.page	113
dc.rights.note	有償授權
dc.date.accepted	2005-07-21
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	電信工程學研究所	zh_TW
顯示於系所單位：	電信工程學研究所

文件中的檔案：

檔案	大小	格式
ntu-94-1.pdf 目前未授權公開取用	2.02 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。