基於子空間觀念及頻譜消去法的進一步語音強化技術

Gwo-Hwa Ju; 朱國華

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/34516

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	李琳山(Lin-Shan Lee)
dc.contributor.author	Gwo-Hwa Ju	en
dc.contributor.author	朱國華	zh_TW
dc.date.accessioned	2021-06-13T06:12:42Z	-
dc.date.available	2006-02-15
dc.date.copyright	2006-02-13
dc.date.issued	2006
dc.date.submitted	2006-02-08
dc.identifier.citation	[1] D. Oshaughnessy, 'Enhancing speech degraded by additive noise or interfering speakers,' IEEE Comm. Magazine, pp. 46-52, Feb. 1989. [2] E. Nemer and R. Goubran and S. Mahmoud, 'Speech enhancement using fourth-order cumulants and optimal filters in the subband domain,' Speech Commun., vol. 36, pp. 219-246, 2002. [3] S. Haykin, Communication Systems, New York: Wiley, 4th Ed., 2000. [4] B.H. Juang and M. G. Rahim, 'Signal bias removal by maximum likelihood estimation for robust telephone speech recognition' IEEE Trans. Speech Audio Processing, vol. 4, Issue: 1, Jan. 1996. [5] J. Huang and Y. Zhao, 'An energy-constrained signal subspace method for speech enhancement and recognition in colored noise,' Proc. IEEE Int. Conf. Acoust. Speech Signal Processing, pp. 377-340, May 1998. [6] Gwo-Hwa Ju and Lin-Shan Lee, 'Speech enhancement based on generalized singular value decomposition approach,' Proc. InterSpeech 2002-ICSLP, pp. 1801-1804, 2002. [7] Gwo-Hwa Ju and Lin-Shan Lee, 'Speech enhancement and improved recognition accuracy by integrating wavelet transform and spectral subtraction algorithm,' Proc. InterSpeech 2003-Eurospeech, pp. 1377-1380, 2003. [8] Gwo-Hwa Ju and Lin-Shan Lee, 'Perceptually constrained generalized singular value decomposition-based approach for enhancing speech corrupted by colored noise,' Proc. InterSpeech 2003-Eurospeech, pp. 533-536, 2003. [9] Gwo-Hwa Ju and Lin-Shan Lee, 'Improved speech enhancement by applying time-shift property of DFT on Hankel matrices for signal subspace decomposition,' Proc. InterSpeech 2004-ICSLP, Jeju, Korea, 2004. [10] S. F. Boll, 'Suppression of acoustic noise in speech using spectral subtraction,' IEEE Trans. Acoust. Speech Signal Processing, vol. ASSP-27, pp. 113-120, Apr. 1979. [11] R. J. McAulay and M. L. Malpass, 'Speech enhancement using a soft decision noise suppression filter,' IEEE Trans. Acoust. Speech Signal Processing, vol. ASSP-28, pp. 137-145, Apr. 1980. [12] S. Molau and M. Pitz and H. Ney, 'Histogram based normalization in the acoustic feature space,' Proc. ASRU2001- Automatic Speech Recognition and Understanding Workshop, Madonna di Campiglo, Trento, Italy, pp. 21-24, Dec. 2001. [13] S. Molau and F. Hilger and D. Keysers and H. Ney, 'Enhanced histogram normalization in the acoustic feature space,' Proc. InterSpeech 2002-ICSLP, Denver, USA. pp. 1421-1424, Sep. 2002. [14] S. H. Jensen and P. C. Hansen and S. D. Hansen and J. A. Sorensen, 'Reduction of broad-band noise in speech by truncated QSVD,' IEEE Trans. Speech Audio Processing, vol. 3, pp. 439-448, Nov. 1995. [15] J. G. Fiscus, 'A post-processing system to yield reduced word error rates: Recognizer output voting error reduction (ROVER),' Proc. of ASRU, pp. 347-352 1997. [16] K. Hermus, 'Signal subspace decompositions for perceptual speech and audio processing,' PhD. thesis (available in http://www.esat.kuleuven.be/~ spch/cgi-bin/get_file.cgi?/hermus/phd04/hermus:phd04 screen.pdf), K. U. Leuven, ESAT, Dec. 2004. [17] Y. Gong, 'Speech recognition in noisy environments: A survey,' Speech Commun., vol. 16, pp. 261-291, 1995. [18] M. Berouti and R. Schwartz and J. Makhoul, 'Enhancement of speech corruped by additive noise,' Proc. IEEE Int. Conf. Acoust. Speech Signal Processing, pp. 208-211, Apr. 1979. [19] J. E. Porter and S. F. Boll, 'Optimal estimators for spectral restoration of noisy speech,' Proc. IEEE Int. Conf. Acoust. Speech Signal Processing, pp. 18A2.1-2.4, Mar. 1984. [20] D. V. Compernolle, 'Improved noise immunity in large vocabulary speech recognition with the aid of spectral subtraction,' Proc. IEEE Int. Conf. Acoust. Speech Signal Processing, pp. 1143-1146, Apr. 1987. [21] P. Lockwood and J. Boudy, 'Experiments with a nonlinear spectral subtraction (NSS) hidden Markov models and projection for robust recognition in cars,' Speech Commun., vol. 11, pp. 215-228, Jun. 1992. [22] P. M. Crozier and B. M. G. Cheetham and C. Holt and E. Munday, 'The use of linear prediction and spectral scaling for improving speech enhancement,' Proc. European Conf. on Speech Technology, pp. 231-234, Berlin, 1993. [23] B. L. Sim and Y. C. Tong and J. S. Chang and C. T. Tan, 'A parametric formulation of the generalized spectral subtraction method,' IEEE Trans. Speech Audio Processing, vol. 6, pp. 328-337, Jul. 1998. [24] D. E. Tsoukalas and J. N. Mourjopoulos and G. Kokkinakis, 'Speech enhancement based on audible noise suppression,' IEEE Trans. Speech Audio Processing, vol. 5, pp. 479-514, Nov. 1997. [25] Z. Goh and K. C. Tan and T. G. Tan, 'Post-processing method for suppressing musical noise generated by spectral subtraction,' IEEE Trans. Speech Audio Processing, vol. 6, pp.287-292, Nov. 1998. [26] N. Virag, 'Single channel speech enhancement based on masking properties of the human auditory system,' IEEE Trans. Speech Audio Processing, vol. 7, pp. 126-137, Mar. 1999. [27] S. V. Huffel, 'Enhanced resolution based on minimum variance estimation and exponential data modeling,' Signal Processing, vol. 33, pp. 333-355, Sep. 1993. [28] B. T. Lilly and K. K. Paliwal, 'Robust speech recognition using singular value decomposition based speech enhancement,' Proc. IEEE TENCO-Speech and Image Tech. for Computer and Telecommunication, pp. 257-260, 1997. [29] M. Klein and P. Kabal, 'Signal subspace speech enhancement with perceptual post-filtering,' Proc. IEEE Int. Conf. Acoust. Speech Signal Processing, pp. 537-540, 2002. [30] Chang-Huai You and Soo-Ngee Koh and S. Rahardja, 'Subspace speech enhancement for audible noise reduction,' Proc. IEEE Int. Conf. Acoust. Speech Signal Processing, pp. 145-148, 2005. [31] Y. Ephraim and H. L. Van Trees, 'A signal subspace approach for speech enhancement,' IEEE Trans. Speech Audio Processing, vol. 3, pp. 251-266, Jul. 1995. [32] Y. Ephraim and H. L. Van Trees, 'A spectrally-based signal subspace approach for speech enhancement,' Proc. IEEE Int. Conf. Acoust. Speech Signal Processing, pp. 804-807, 1995. [33] U. Mittal and N. Phamdo, 'Signal/noise KLT based approach for enhancing speech degraded by colored noise,' IEEE Trans. Speech Audio Processing, vol. 8, pp. 159-167, Mar. 2000. [34] A. Rezayee and S. Gazor, 'An adaptive KLT approach for speech enhancement,' IEEE Trans. Speech Audio Processing, vol. 9, pp. 87-95, Feb. 2001. [35] Yi Hu and P. Loizou, 'A subspace approach for enhancing speech corrupted by colored noise,' IEEE Signal Processing Lett., vol. 9, pp. 204-206, Jul. 2002. [36] H. Lev-Ari and Y. Ephraim, 'Extension of the signal subspace speech enhancement approach to colored noise,' IEEE Signal Processing Letters, vol. 10, pp. 104-106, Apr. 2003. [37] Y. Ephraim and D. Malah, 'Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator,' IEEE Trans. Acoust. Speech Signal Processing, vol. 6, pp. 1109-1121, Dec. 1984. [38] L. Lin and W. H. holmes and E. Ambikairajah, 'Speech enhancement based on a perceptual modification of Wiener filtering,' Proc. InterSpeech 2002-ICSLP, pp. 781-784, May 2002. [39] ETSI, 'Speech processing transmission and quality aspects (STQ); Distributed speech recognition; Extended advanced front-end feature extraction algorithm; Compression algorithms; Back-end speech reconstruction algorithm,' ES 202 050 V1.1.1 Recommendation, Oct. 2002. [40] A. Agarwal and Y. M. Cheng, 'Two-stage mel-warped Wiener filter for robust speech recognition,' Proc. ASRU1999- Automatic Speech Recognition and Understanding Workshop, pp. 67-70, Dec. 1999. [41] Y. Ephraim and D. Malah, 'Speech enhancement using a minimum mean-square log spectral amplitude estimator,' IEEE Trans. Acoust. Speech Signal Processing, vol. 2, pp. 443-445, Apr. 1985. [42] Y. Ephraim, 'A Bayesian estimation approach for speech enhancement using hidden Markov models,' IEEE Trans. Acoust. Speech Signal Processing, vol. 4, pp. 725-735, Oct. 1992. [43] H. Sameti and H. Sheikhzadeh and L. Deng and R. L. Brennan, 'HMM-based strategies for enhancement of speech signals embedded in nonstationary noise,' IEEE Trans. Speech Audio Processing, vol. 5, pp. 445-455, Sep. 1998. [44] S. Gannot and D. Burshtein and E. Weinstein, 'Iterative and sequential Kalman filter based speech enhancement algorithms,' IEEE Trans. Speech Audio Processing, vol. 4, pp. 373-385, Jul. 1998. [45] E. Grivel and M. Gabrea and M. Najim, 'Subspace state space model identification for speech enhancement,' Proc. IEEE Int. Conf. Acoust. Speech Signal Processing, pp. 781-784, Mar. 1999. [46] M. Gabrea, 'Robust adaptive Kalman filtering-based speech enhancement algorithm,' Proc. IEEE Int. Conf. Acoust. Speech Signal Processing, pp. 301-304, May 2004. [47] E. Grivel and M. Gabrea and M. Najim, 'Speech enhancement as a realisation issue,' Signal Processing, vol. 12, pp. 1963-1978, Dce. 2002. [48] Y. Ephraim and I. Cohen, Recent Advancements in Speech Enhancement, The Electrical Engineering Handbook, CRC Press, 2004. [49] C. T. Lu and H. C. Wang, 'Speech enhancement using perceptually-constrained gain factors in critical-band wavelet-packet transform,' Electronics Letters, vol. 40(6), pp. 394-396, 2004. [50] M. Brandstein and D. Ward, Eds., Microphone arrays, Springer, 2001. [51] S. L. Gay and J. Benesty, Eds., Acoustic signal processing for telecommunications, Kluwer, 2001. [52] S. Doclo and M. Moonen, 'GSVD-based optimal filtering for single and multimicrophone speech enhancement,' IEEE Trans. Signal Processing, vol. 50, pp. 2230-2244, Sep. 2002. [53] J. C. Junqua and B. Mak and B. Reaves, 'A robust algorithm for word boundary detection in the presence of noise,' IEEE Trans. Speech Audio Processing, vol. 2, pp. 406-412, Apr. 1994. [54] Q. Li and J. Zheng and Q. Zhou and C. H. Lee, 'A robust, real-time endpoint detector with energy normalization for ASR in adverse environments,' Proc. IEEE Int. Conf. Acoust. Speech Signal Processing, pp. 233-236, 2001. [55] J. Padrell and D. Macho and C. Nadeu, 'Robust speech activity detection using LDA applied to FF parameters,' Proc. IEEE Int. Conf. Acoust. Speech Signal Processing, pp. 557-560, 2005. [56] E. Marcheret and K. Visweswariah and G. Potamianos, 'Speech activity detection fusing acoustic phonetic and energy features,' Proc. InterSpeech 2005-Eurospeech, pp. 241-244, 2005. [57] Jia-Lin Shen and Jeih-Weih Hung and Lin-Shan Lee, 'Robust entropy-based endpoint detection for speech recognition in noisy environments,' Proc. ICSLP, pp. 232-235, 1998. [58] Z. Tuske and P. Mihajlik and Z. Tobler and T. Fegyo, 'Robust voice activity detection based on the entropy of noise-suppressed spectrum,' Proc. InterSpeech 2005-Eurospeech, pp. 245-248, 2005. [59] J. Sohn and W. Sung, 'A voice activity detector employing soft decision based noise spectrum adaptation,' Proc. IEEE Int. Conf. Acoust. Speech Signal Processing, pp. 365-368, 1998. [60] D. Malah and R. V. Cox and A. J. Accardi, 'Tracking speech-presence uncertainty to improve speech enhancement in nonstationary noise environments,' Proc. IEEE Int. Conf. Acoust. Speech Signal Processing, pp. 789-792, 1999. [61] R. Martin, 'Noise power spectral density estimation based on optimal smoothing and minimum statics,' IEEE Trans. on Speech Audio Processing, vol. 9, pp. 504-512, Jul. 2001. [62] E. Kosmides and E. Dermatas and G. Kokkinakis, 'Stochastic endpoint detection in noisy speech,' SPECOM Workshop, pp. 109-114, 1997. [63] J. Ramirez and J. C. Segura and C. Benitez and A Torre and A. Rubio, 'A new adaptive long-term spectral estimation voice activity detector,' Proc. InterSpeech 2003-Eurospeech, pp. 3041-3044, 2003. [64] J. D. Johnston, 'Transform coding of audio signals using perceptual noise criteria,' IEEE J. Select. Areas Commun., vol. 6, pp. 314-323, Feb. 1988. [65] N. Virag, 'Speech enhancement based on masking properties of the auditory system,' Proc. IEEE Int. Conf. Acoust. Speech Signal Processing, pp. 796-799, Detroit, MI, May 1995. [66] E. Zwicker and H. Fastle, Psychoacoustics, Springer Verlag, 2nd Ed., 1999. [67] F. Jabloun and B. Champagne, 'A perceptual signal subspace approach for speech enhancement in colored noise,' Proc. IEEE Int. Conf. Acoust. Speech Signal Processing, pp. 569-572, 2002. [68] Nat. Inst. Stand. Technol. (NIST), 'TIMIT acoustic-phonetic continuous speech corpus,' Speech Disc 1-1.1, NIST Order No. PB91-505065, 1990. [69] H. C. Wang and F. Seide and C. Y. Tseng and and L. S. Lee, 'MAT2000 - Design, collection, and validation of a Mandarin 2000-speaker telephone speech database,' Proc. InteSpeech 2000-ICSLP, Beijing, China, pp. IV.460-463, 2000. [70] H. G. Hirsch and D. Pearce, 'The AURORA experimental framework for the performance evaluations of speech recognition systems under noisy conditions,' ISCA ITRW ASR2000, pp. 181-188, 2000. [71] A. Varga and H. J. M. Steeneken, 'Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems,' Speech Commun., vol. 12, pp. 247-251, Jul. 1993. [72] Kai-Fu Lee and Hsiao-Wen Hon, 'Speaker-independent phone recognition using hidden Markov models,' IEEE Trans. Acoust. Speech Signal Processing, vol. 37, pp. 1641-1648, Nov. 1989. [73] S. Young and D. Kershaw and J. Odell and D. Ollason and V. Valtchev and P. Woodland, The HTK Book Version 3.2, Cambridge University, 2002. [74] Xuai-Dong Huang and A. Acero and Hsiao-Wen Hon, Spoken language processing, Prentice Hall, 2001. [75] M. H. L. Hecker and C. E. Williams, 'Choice of reference conditions for speech preference tests,' Journal Acoust. Soc. Amer., vol. 39, pp. 946-952, Nov. 1966. [76] S. Quackenbush and T. Barnwell and M. Clements, Objective measures of speech quality, Englewood Cliffs NJ: Prentice-Hall, 1988. [77] F. Hilger and S. Molau and H. Ney, 'Quantile based histogram equalization for online applications,' Proc. InterSpeech 2002-ICSLP, Denver, USA. pp. 237-240, Sep. 2002. [78] S. Mallat, A wavelet tool for signal processing, 2nd Ed., Academic Press, 1999. [79] D. L. Donoho,'De-noising by soft-Thresholding,' IEEE Trans. Information Theory, vol. 41, no. 13, pp. 613-627, May 1995. [80] B. D. Moor, 'The singular value decomposition and long and short spaces of noisy matrices,' Signal Processing, vol. 41, no. 9, pp. 2826-2838, Sep. 1993. [81] R. J. Mcaulay and T. F. Quatieri, 'Speech analysis/synthesis based on a sinusoidal representation,' IEEE Trans. Acoust. Speech Signal Processing, vol. ASSP-34, no. 4, pp. 744-754, Aug. 1986. [82] G. H. Golub and C. F. Van Loan, Matrix computations, J. H. Univ. Press, 2nd Ed., 1996. [83] C. C. Paige and M. A. Saunders, 'Towards a generalized singular value decomposition,' SIAM J. Numer. Anal., vol. 18, pp. 398-405, 1981. [84] M. Dendrinos and S. Bakamidis and G. Garayannis, 'Speech enhancement from noise: A regenerative approach,' Speech Commun., vol. 10, pp. 45-57, Feb. 1991. [85] S. Haykin, Adaptive Filter Theory, Prentice Hall, Englewood Cliffs, NJ, 2nd Ed., 1991. [86] M. H. Hayes, Statistical digital signal processing and modeling, John Wiley & Sons Inc. 1999. [87] T. K. Moon and W. C. Stirling, Mathematical methods and algorithms for signal processing, Prentice-Hall, 2000. [88] F. T. Luk, 'A parallel method for computing the generalized singular value decomposition,' J. Paral. Distrib. Comput., vol. 2, pp. 250-260, Aug. 1985. [89] S. K. Mitra, Digital signal processing :a computer-based approach, McGraw-Hill, 2nd Ed., 2001. [90] R. Zhang and A. I. Rudnicky, 'Apply N-best list re-ranking to acoustic model combinations of boosting training,' Proc. InterSpeech 2004-ICSLP, Jeju, 2004. [91] Chang-Huai You and Soo-Ngee Koh and S. Rahardja, 'An invertible frequency eigen-domain transformation for masking-based subspace speech enhancement,' IEEE Signal Processing Lett., vol. 12, No. 5, pp. 461-464, May 2005. [92] D. A. Reynolds and R. C. Rose, 'Robust text-independent speaker identification using gaussian mixture speaker models,' IEEE Trans. Speech Audio Processing, vol. 3, pp. 72-83, Nov. 1995.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/34516	-
dc.description.abstract	由無線通訊、助聽設備到語音辨識應用等各種不同的領域，雜訊衰減技術已被廣泛應用以達到提升語音系統效能的目的。針對各種不同種類及特性的外加雜訊源，本論文提出數種基於子空間觀念及頻譜消去法的進一步語音強化技術，以期提升雜訊語音的品質與可理解度及增進辨識系統的強健性。語音強化演算法領域中最被人所熟悉的就是頻譜消去法 (Spectral Subtraction， SS)，它具有簡單、易於實現等優點。我們於第三章中提出以統計圖等化法 (Histogram Equalization) 來取代SS中的底限處理以降低頻譜過度消減時信號失真程度增加的問題。此外我們針對SS在外加雜訊源非白雜訊時效能會退化的缺點加以改善；我們將其與分頻編碼 (Sub-Band Coding) 法結合，可使得每一次頻帶內的外加雜訊源經由次取樣 (Decimation) 處理後可近似白雜訊的特性，如此一來SS演算法就可有效降低每個次頻帶內的雜訊成分，進而提升SS於非白雜訊環境下的處理效能。第四章中我們利用目前非常盛行的子空間觀念進行語音強化處理實驗。我們使用廣義奇異值分解法 (Generalized Singular Value Decomposition，GSVD) 將由各雜訊語音框所建構漢可 (Hankle) 資料矩陣的向量空間劃分成不相交的訊號子空間及雜訊子空間。訊號子空間內包含乾淨語音成分及雜訊源成分，雜訊子空間成分則完全由雜訊源所組成。因此乾淨語音訊號可由訊號子空間中估測出。此GSVD子空間法乃引申自前人所發表之Truncated Quotient SVD法，該語音強化演算法需根據訊噪比大小，藉由經驗法則估測出每一雜訊語音框的信號子空間的維度，而本論文中所提出的GSVD法可自動及精確地計算出信號及雜訊子空間的維度。前述使用的子空間法雖然可有效提升系統效能，但在低訊噪比環境下，釵h聽起來不甚自然的剩餘雜訊成分會存在於強化語音信號中。為了消除此一現象因此我們於第五章中嘗試將人耳聽覺遮蔽特性應用於GSVD子空間法上 (PCGSVD)，期使剩餘雜訊頻譜能量低於聽覺遮蔽臨界值 (Auditory Masking Thresholds) 而不致被人耳所查覺，進一步達到提升聲音品質與可理解度及增進辨識系統正確率的目的。由於利用PCGSVD子空間法所需的計算量非常大，因此在第六章中我們引用其觀念，將離散傅氏轉換法及其時域平移特性應用於漢可資料矩陣上，在頻域上提出另外一種結合聽覺遮蔽效應與子空間法的語音強化架構。此新演算法所需計算量與傳統SS相當，但其效能接近於前述PCGSVD子空間法。最後於第七章對本論文作一總結，並提出未來幾個可能的研究方向。	zh_TW
dc.description.abstract	Noise suppression to enhance speech quality or intelligibility is necessary in a wide range of applications including mobile communication, hearing aids and speech recognition. In this dissertation, we propose several improved single-channel speech enhancement approaches based on subspace concept and spectral subtraction for improving sound quality and intelligibility and increasing speech recognition robustness for the case of speech corrupted by additive noise. Spectral subtraction (SS) is the most popular approach among the various speech enhancement approaches, which is simple, effective and easy to implement. In this dissertation, we propose two improved versions of the SS approach for speech enhancement in Chapter 3. First the silence-fractional histogram equalization process is performed as an additionally flooring process stage for SS to improve the speech quality. Furthermore, it is evident that the SS algorithm can offer significant performance improvements to slow-varying, broad-band additive noise, but become less helpful when the noise is narrow-band and/or non-stationary. Therefore we propose to integrate the sub-band coding (SBC) and SS in Chapter 3, in which we use the SBC to split the frequency domain of input speech signal into several overlapped frequency bands and extend each band to fit the full frequency scale by decimation. The spectrum of the additive noise in each frequency band obtained in this way can then be better approximated as white if the number of bands is large enough, and therefore SS can be more effective. In Chapter 4, we introduce a subspace-based approach for speech enhancement. We propose a generalized singular value decomposition (GSVD)-based approach, an extended version of the previously proposed truncated quotient SVD (QSVD)-based approach, in which more flexible and precise determination of the dimensions of the signal and noise subspaces became possible for each frame of the noisy signal using well-defined procedures. In this subspace-based approach, we properly partition the vector space of every input speech frame into signal and noise subspaces. It assumes that the speech is presented only in the signal subspace, whereas the corrupting noise spans both the signal and noise subspaces. We can thus discard the noise subspace components and reconstruct the speech from those of the signal subspace only. This approach is very effective whether the additive noise is white or not. Though the GSVD-based approach has been shown to be effective, however some unnatural sounding characteristics, usually due to the perceivable residual noise, still occur in the estimated speech under adverse environment. To solve this problem, we integrate the auditory masking thresholds (AMTs) in human auditory functions into the GSVD-based approach to establish an improved framework for speech enhancement in Chapter 5. We proposed to restrict the spectral energy of every residual noise component below the corresponding AMTs, thus the noise can be masked and not perceivable. Experiments show that the subspace-based approach proposed in Chapters 4 and 5 behaves well regardless of whether the additive noise is stationary or not, especially when it is non-white. However, high computational complexity of such a subspace-based approach makes it hard to be practically applied to the real-world environments. In Chapter 6, we further develop a new speech enhancement framework, in which the time-shift property of DFT (Discrete Fourier Transform) is applied to the special structure of Hankel-form matrices (constructed from the noisy speech frames and estimated noise signal) to replace the time-consuming GSVD algorithm used in the previous two chapters, such that the required computation load of this new approach can be as low as that of the conventional SS algorithm, but offer comparable performance to that of the previously proposed subspace-based approaches in this dissertation. Finally, we conclude this thesis in Chapter 7. We also address several issues for future research directions.	en
dc.description.provenance	Made available in DSpace on 2021-06-13T06:12:42Z (GMT). No. of bitstreams: 1 ntu-95-D89942008-1.pdf: 6133842 bytes, checksum: f7801b7732cb3ca4c205dd831772a763 (MD5) Previous issue date: 2006	en
dc.description.tableofcontents	Chinese Part pp. 1 Chinese Chapter 1 pp.3 Chinese Chapter 2 pp.5 Chinese Chapter 3 pp.7 Chinese Chapter 4 pp.9 Chinese Chapter 5 pp.11 Chinese Chapter 6 pp.13 Chinese Chapter 7 pp.15 English Part pp.17 Chapter 1 Introduction pp.19 1.1 Background pp.19 1.2 Primary Achievements of this Dissertation pp.20 Chapter 2 Preliminaries - Background Review pp.25 2.1 Introduction pp.25 2.2 Review of Existing Speech Enhancement Approaches pp. 26 2.2.1 Spectral Subtraction (SS) pp. 26 2.2.2 Subspace-Based Speech Enhancement Approaches pp. 29 2.2.3 Wiener Filtering Approach pp. 35 2.2.4 Model-based and Multi-channel Speech Enhancement Approaches pp. 36 2.3 Estimation of Noise Statistics pp. 38 2.4 Auditory Masking Properties of Human Perception pp. 39 2.5 Speech Corpora and Evaluation Methods pp. 42 2.5.1 Clean Speech and Noise Corpora for Performance Evaluation pp. 42 2.5.2 Subjective and Objective Evaluation Methods pp. 45 2.6 Summary pp. 47 Chapter 3 Two Improved Versions of the PSS Algorithm pp. 49 3.1 Introduction pp. 49 3.2 Improved PSS with Silence-Fractional Histogram Equalization pp. 51 3.2.1 Histogram Equalization (HEQ) pp. 51 3.2.2 Integrating SF-HEQ Processing with the PSS Algorithm pp. 52 3.2.3 Experimental Environment pp.53 3.2.4 Segmental SNR Measures pp. 55 3.2.5 Segmental LSD Measures pp. 57 3.2.6 English Phoneme Recognition Accuracy pp. 58 3.2.7 Time Domain Waveforms and Spectrogram Plots pp. 59 3.2.8 Remarks pp. 62 3.3 Improved PSS with Sub-Band Coding Technique pp. 63 3.3.1 Sub-Band Coding and Integration with PSS pp. 63 3.3.2 Experimental Environment pp. 65 3.3.3 Segmental SNR Measures pp. 66 3.3.4 Segmental LSD Measures pp. 67 3.3.5 English Phoneme Recognition Accuracy pp. 68 3.3.6 Time Domain Waveforms and Spectrogram Plots pp. 70 3.3.7 Paired-Comparison Listening Test .pp. 72 3.3.8 Remarks pp. 73 3.4 Summary pp. 74 Chapter 4 GSVD-Based Speech Enhancement Approach pp. 77 4.1 Introduction pp. 77 4.2 The GSVD-Based Approach pp. 78 4.2.1 Phase (I): Framer, Non-speech Detector and Buffer pp. 79 4.2.2 Phase (II): Construction of the Hankel-form Sample Matrices pp. 80 4.2.3 Phase (III): The GSVD Algorithm pp. 81 4.2.4 Phase (IV): GSVD-MVE- and GSVD-LSE-based Subspace Approaches pp. 83 4.2.5 Phase (V): Frame Overlap and Add pp. 89 4.3 Spectral Domain Constrained Estimator pp. 89 4.4 Experiments for the Unconstrained GSVD-Based Enhancement Approach pp. 90 4.4.1 Segmental SNR Measures pp. 92 4.4.2 Segmental LSD Measures pp. 94 4.4.3 English Phoneme Recognition Accuracy pp. 95 4.4.4 Time Domain Waveforms and Spectrogram Plots pp. 96 4.5 Evaluation of the Spectral Domain Constrained GSVD-based Approach pp. 98 4.6 Remarks pp. 100 4.7 Summary pp. 102 Chapter 5 Perceptually Constrained GSVD (PCGSVD)-Based Approach for Enhancing Speech Corrupted by Colored Noise pp. 107 5.1 Introduction pp. 107 5.2 The Perceptually Constrained GSVD-based Approach pp. 108 5.2.1 Formulation of the PCGSVD-based Subspace Approach pp. 109 5.2.2 Estimating AMTs Projected onto the Generalized Singular Domain pp. 110 5.2.3 Solution for the PCGSVD-MVE-based Subspace Approach pp. 112 5.2.4 PCGSVD-LSE-based Subspace Approach pp. 114 5.2.5 Estimation of the scaling factor a pp. 114 5.3 Experiments pp. 115 5.3.1 Segmental SNR Measures pp. 116 5.3.2 Segmental LSD Measures pp. 117 5.3.3 English Phoneme/Digit Recognition Accuracy pp. 118 5.3.4 Time Domain Waveforms and Spectrogram Plots v 120 5.3.5 Subjective Listening Tests pp. 123 5.3.6 Summary of the Performance Evaluation pp. 127 5.4 Remarks pp. 128 5.5 Summary pp. 130 Chapter 6 Applying Time-Shift Property of DFT on Hankel-Form Matrices for Signal Subspace Decomposition pp. 133 6.1 Introduction pp. 133 6.2 Framework of the PCDFT-based Approaches pp. 134 6.2.1 Phase (I): Framer, Non-speech Detector and Buffer pp. 134 6.2.2 Phase (II): Construction of Hankel-Form Matrices pp. 134 6.2.3 Phase (III): Applying DFT and Its Time-Shift Property to the Hankel- Form Matrices for Signal Subspace Construction pp. 135 6.2.4 Phase (IV): MVE-/LSE-Based Enhancement Approaches in Frequency Domain pp. 137 6.3 Experiments pp. 139 6.3.1 Segmental SNR Measures pp. 140 6.3.2 Segmental LSD Measures pp. 141 6.3.3 English Phoneme Recognition Accuracy pp. 142 6.3.4 Time Domain Waveforms and Spectrogram Plots pp. 143 6.4 Remarks pp. 145 6.5 Summary pp. 149 Chapter 7 Conclusions and Future Works pp. 151 7.1 Conclusions pp. 151 7.2 Future Works pp. 155 Bibliography pp. 163 A Proof for Equations (4.6) & (4.7) and (5.6) & (5.7) pp. 173
dc.language.iso	en
dc.title	基於子空間觀念及頻譜消去法的進一步語音強化技術	zh_TW
dc.title	Improved Speech Enhancement Approaches Based on Subspace Concept and Spectral Subtraction	en
dc.type	Thesis
dc.date.schoolyear	94-1
dc.description.degree	博士
dc.contributor.oralexamcommittee	王小川(Hsiao-Chuan Wang),陳信宏(Sin-Hong Chen),鄭伯順(Bor-Shenn Jeng),鄭秋豫(Chiu-Yu Tseng),陳信希(Hsin-Hsi Chen)
dc.subject.keyword	語音強化技術,訊號子空間,雜訊子空間,頻譜消去法,	zh_TW
dc.subject.keyword	Speech Enhancement,Signal Subspace,Noise Subspace,Spectral Subtraction,	en
dc.relation.page	181
dc.rights.note	有償授權
dc.date.accepted	2006-02-09
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	電信工程學研究所	zh_TW
顯示於系所單位：	電信工程學研究所

文件中的檔案：

檔案	大小	格式
ntu-95-1.pdf 目前未授權公開取用	5.99 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。