在語音信號上的線性預測改良演算法

Hsiang-Hao Hsieh; 謝祥浩

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/16316

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	貝蘇章(Soo-Chang Pei)
dc.contributor.author	Hsiang-Hao Hsieh	en
dc.contributor.author	謝祥浩	zh_TW
dc.date.accessioned	2021-06-07T18:09:29Z	-
dc.date.copyright	2012-08-01
dc.date.issued	2012
dc.date.submitted	2012-07-11
dc.identifier.citation	[1] J. Makhoul, “Linear prediction: A tutorial review,” Proc. IEEE, vol. 63, no. 4, pp. 561–580, Apr. 1975. [2] B. S. Atal and S. L. Hanauer, “Speech analysis and synthesis by linear prediction of the speech wave,” The Journal of the Acoustical Society of America, vol. 50, no. 2B, pp. 637–655, 1971. [3] A. El-Jaroudi and J. Makhoul, “Discrete all-pole modeling,” IEEE Transactions on Signal Processing, vol. 39, no. 2, pp. 411–423, 1991. [4] M. S. Rahman and T. Shimamura, “Speech analysis based on modeling the effective voice source,” IEICE Transactions on Information and Systems, vol. E89-D, no. 3, pp. 1107–1115, 2006. [5] H. Deng, R. K. Ward, M. P. Beddoes, and M. Hodgson, “A new method for obtaining accurate estimates of vocal-tract filters and glottal waves from vowel sounds,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 14, no. 2, pp. 445–455, 2006. [6] M. S. Rahman and T. Shimamura, “Linear prediction using homomorphic deconvolution in the autocorrelation domain,” in Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS ’05), vol. 3, pp. 2855–2858, Kobe Japan, May 2005. [7] T. F. Quatieri, Discrete-Time Speech Signal Processing: Principles and Practice, Prentice-Hall, Upper Saddle River, NJ, USA, 2002. [8] M. S. Rahman and T. Shimamura, “Linear prediction using refined autocorrelation function,” EURASIP Journal on Audio, Speech, and Music Processing, June 2007. [9] A. Oppenheim and R. Schafer, “Homomorphic analysis of speech,” IEEE Transactions on Audio and Electroacoustics, vol. 16, no. 2, pp. 221–226, 1968. [10] K. K. Paliwal and W. B. Kleijn, “Quantization of LPC parameters,” in Speech Coding and Synthesis, W.B. Kleijn and K.K. Paliwal, Eds. New York: Elsevier, ch. 12, pp. 433–466, 1995. [11] “Adaptive multi-rate (AMR) speech codec; Transcoding functions,” 2004, 3GPP TS 26.090. [12] “Speech codec speech processing functions; Adaptive multi-rate wideband (AMR-WB) speech codec; Transcoding functions 2005, 3GPP TS 26.190. [13] M. N. Murthi and W. B. Kleijn, “Regularized linear prediction allpole models,” in Proc. IEEE Workshop Speech Coding, Sep. 2000, pp. 96–98. [14] L. A. Ekman, W. B. Kleijn, and M. N. Murthi, “Spectral envelope estimation and regularization,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., May 2006, pp. I-245–I-248. [15] J. C. Nash, “The Choleski decomposition,” in Compact Numerical Methods for Computers: Linear Algebra and Function Minimisation. Bristol, U.K.: Adam Hilger, 1979, ch. 7, pp. 70–78. [16] “Acoustic-phonetic continuous speech corpus,” DARPA-TIMIT, 1990, NIST Speech Disc 1-1.1. [17] M. Brookes, VOICEBOX: Speech Processing Toolbox for MATLAB. London, U.K.: Imperial College, 2006. [18] L. Anders Ekman, W. Bastiaan Kleijn, and Manohar N. Murthi, “Regularized Linear Prediction of Speech”, in IEEE transactions on Audio, Speech, and Language processing, Vol. 16, No.1, pp.65-73, Jan. 2008. [19] S. M. Kay,Modern Spectral Estimation: Theory and Application, Prentice-Hall, Upper Saddle River, NJ, USA, 1988. [20] P. Stoica and R. L. Moses, Introduction to Spectral Analysis, Prentice-Hall, Upper Saddle River, NJ, USA, 1997. [21] G. Fant, J. Liljencrants, and Q. G. Lin, “A four parameter model of glottal flow,” Quarterly Progress and Status, pp. 1–13, Speech Transmission Laboratory, Royal Institute of Technology, Stockholm, Sweden, October-December 1985. [22] D. H. Klatt, “Software for a cascade/parallel formant synthesizer,” Journal of the Acoustical Society of America, vol. 67, no. 3, pp. 971–995, 1980. [23] W. Verhelst and O. Steenhaut, “A new model for the shorttime complex cepstrum of voiced speech,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 34, no. 1, pp.43–51, 1986. [24] O. Cappe and E. Moulines, “Regularization techniques for discrete cepstrum estimation,” IEEE Signal Process. Lett., vol. 3, no. 4, pp.100–102, Apr. 1996. [25] M. Oudot, O. Cappe, and E. Moulines, “Robust estimation of the spectral envelope for “harmonics+noise” models,” in Proc. IEEE Workshop Speech Coding, 1997, pp. 11–12. [26] J. Nocedal and S. J. Wright, Numerical Optimization. New York: Springer-Verlag, 1999. [27] “Adaptive multi-rate (AMR) speech codec; Transcoding functions,” 2004, 3GPP TS 26.090. [28] “Speech codec speech processing functions; Adaptive multi-rate wideband (AMR-WB) speech codec; Transcoding functions 2005, 3GPP TS 26.190. [29] Ma, C., Kamp, Y., Williams, L., 1993. Robust signal selection for linear prediction analysis of voiced speech. Speech Comm. 12 (1), 69–81. [30] Alku, P., 1992. Glottal wave analysis with pitch synchronous iterative adaptive inverse filtering. Speech Comm. 11 (2), 109–118. [31] Carlo Magi, Jouni Pohjalainen, Tom Backstrom, Paavo Alku. Stabilised weighted linear prediction. Speech Comm. 51 (2009), 401–411.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/16316	-
dc.description.abstract	我們從語音信號波形觀察，可以發現在語音出現的時間內，特別是在發母音的時候，一個取樣點的振幅，與鄰近取樣點的振幅有相關性。因此，某一個時間點的振幅便可以取其前若干點的振幅來做估計，此結果導出線性預測的概念。在語音信號處理中，線性預測是一項很重要的分析方法，實作上也常運用線性預測來取出語音的特徵參數。線性預測實際上是對口腔模型的另一種解釋，在線性預測中所得到的分析參數，可以用來轉換成描述語音在頻域上的特性。但是傳統的線性預測在分析上有以下的問題：在分析high-pitch的講者(通常為女性)共振峰時，傳統的線性預測方式會有不正常的極值出現在語音的包絡線上，造成較大的估計誤差；若預測的語音激發訊號有諧波的結構組成，在傳統的線性預測中其自相關函數會有膺頻效應的情況發生，一樣對語音共振峰的估計有不良的影響；此外，在處理受到雜訊干擾的語音信號時，傳統的線性預測表現也不盡理想。本論文研究整理改良語音信號線性預測的演算法，使其在估計語音共振峰的結果更為精確；另外，利用加權的方式，來降低雜訊對線性預測的影響；這些經過改良後的演算法經過推導證實，所得到的全極點濾波器皆為穩定。	zh_TW
dc.description.abstract	We observe from the waveform of speech signal, and we can find out that in the period of voiced speech, the amplitude of one sample has a relationship with its neighbors. Therefore, we can estimate someone sample by taking previous other samples. And then, this result is the conception of linear prediction. In digital speech processing, linear prediction is a very important method of analysis, and we usually use linear prediction to get features in the speech signal in fact. By modeling the spectral envelope, linear prediction can capture the most essential acoustical cues of speech originating from two major parts of the human voice production mechanism, the glottal flow and the vocal tract. However, linear prediction analysis also suffers from some drawbacks, for examples, the biasing of the formant estimates by their neighboring harmonics which caused by aliasing that occurs in the autocorrelation domain and the phenomenon is most severe for high-pitch speaker in general. Additionally, it is well-known that the performance of LP deteriorates in the presence of noise. In this thesis, we try to improve conventional algorithm to solve these problems, in order to make spectral envelope estimation more accuracy and increase robustness against noise. By our verification, these improved algorithms all have the stability of the all-pole filters.	en
dc.description.provenance	Made available in DSpace on 2021-06-07T18:09:29Z (GMT). No. of bitstreams: 1 ntu-101-R99942111-1.pdf: 3806566 bytes, checksum: ccadc0c07d1a97c89c72986ba31b41c2 (MD5) Previous issue date: 2012	en
dc.description.tableofcontents	誌謝 i 中文摘要 iii ABSTRACT v CONTENTS vii LIST OF FIGURES ix LIST OF TABLES xi Chapter 1 Introduction 1 1.1 Background 1 1.2 Speech Production 2 1.3 Linear Prediction Model 7 1.4 Levinson-Durbin Recursive Method 10 Chapter 2 Refined Autocorrelation Algorithm 13 2.1 The Problem Description of Linear Prediction 14 2.2 Homomorphic Deconvolution in the Autocorrelation Domain 18 2.2.1 The Selection of Cepstral Window 23 2.2.2 The Stability of the AR Filter 24 2.3 Results on Synthetic Speech 26 2.3.1 Accuracy in Formant Frequency Estimation 27 2.3.2 Dependency on the Length of Analysis Window 30 2.3.3 Accuracy in Formant Bandwidth Estimation 31 2.4 Results on Real Speech 32 Chapter 3 Regularized Algorithm 34 3.1 Theory of Regularized Linear Prediction 36 3.2 Regularization 37 3.2.1 Stability of the All-pole Filter 40 3.3 Selection of Lambda 43 3.3.1 Objective Quality Criterion 45 3.3.2 The Reference Envelope 45 3.3.3 Finding the Optimal Lambda 49 3.4 The Error of Outliers of Spectral Estimation 55 Chapter 4 Stabilised Weighted Algorithm 61 4.1 Theory of Weighted Linear Prediction 62 4.2 Weighted Linear Prediction Model Formulation 65 4.2.1 Stability of the All-pole Filter 69 4.3 The Result of the Algorithm 73 4.3.1 Objective Spectral Distortion Measurements 77 Chapter 5 Conclusion and Future Work 83 5.1 Thesis Conclusion 83 5.2 Future Work 84 REFERENCE 85
dc.language.iso	en
dc.subject	頻譜包絡線估測	zh_TW
dc.subject	全極點模型	zh_TW
dc.subject	正規化	zh_TW
dc.subject	語音訊號	zh_TW
dc.subject	線性預測	zh_TW
dc.subject	All-pole modeling	en
dc.subject	Spectral envelope estimation	en
dc.subject	Homomorphic deconvolution	en
dc.subject	Regularization	en
dc.subject	Speech signal	en
dc.subject	Linear prediction	en
dc.title	在語音信號上的線性預測改良演算法	zh_TW
dc.title	Improved Algorithms for Linear Prediction Speech	en
dc.type	Thesis
dc.date.schoolyear	100-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	馬杰,徐忠枝
dc.subject.keyword	語音訊號,線性預測,頻譜包絡線估測,正規化,全極點模型,	zh_TW
dc.subject.keyword	Speech signal,Linear prediction,Spectral envelope estimation,Homomorphic deconvolution,Regularization,All-pole modeling,	en
dc.relation.page	87
dc.rights.note	未授權
dc.date.accepted	2012-07-11
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	電信工程學研究所	zh_TW
顯示於系所單位：	電信工程學研究所

文件中的檔案：

檔案	大小	格式
ntu-101-1.pdf 未授權公開取用	3.72 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。