請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/16316
完整後設資料紀錄
DC 欄位 | 值 | 語言 |
---|---|---|
dc.contributor.advisor | 貝蘇章(Soo-Chang Pei) | |
dc.contributor.author | Hsiang-Hao Hsieh | en |
dc.contributor.author | 謝祥浩 | zh_TW |
dc.date.accessioned | 2021-06-07T18:09:29Z | - |
dc.date.copyright | 2012-08-01 | |
dc.date.issued | 2012 | |
dc.date.submitted | 2012-07-11 | |
dc.identifier.citation | [1] J. Makhoul, “Linear prediction: A tutorial review,” Proc. IEEE, vol. 63, no. 4, pp. 561–580, Apr. 1975.
[2] B. S. Atal and S. L. Hanauer, “Speech analysis and synthesis by linear prediction of the speech wave,” The Journal of the Acoustical Society of America, vol. 50, no. 2B, pp. 637–655, 1971. [3] A. El-Jaroudi and J. Makhoul, “Discrete all-pole modeling,” IEEE Transactions on Signal Processing, vol. 39, no. 2, pp. 411–423, 1991. [4] M. S. Rahman and T. Shimamura, “Speech analysis based on modeling the effective voice source,” IEICE Transactions on Information and Systems, vol. E89-D, no. 3, pp. 1107–1115, 2006. [5] H. Deng, R. K. Ward, M. P. Beddoes, and M. Hodgson, “A new method for obtaining accurate estimates of vocal-tract filters and glottal waves from vowel sounds,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 14, no. 2, pp. 445–455, 2006. [6] M. S. Rahman and T. Shimamura, “Linear prediction using homomorphic deconvolution in the autocorrelation domain,” in Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS ’05), vol. 3, pp. 2855–2858, Kobe Japan, May 2005. [7] T. F. Quatieri, Discrete-Time Speech Signal Processing: Principles and Practice, Prentice-Hall, Upper Saddle River, NJ, USA, 2002. [8] M. S. Rahman and T. Shimamura, “Linear prediction using refined autocorrelation function,” EURASIP Journal on Audio, Speech, and Music Processing, June 2007. [9] A. Oppenheim and R. Schafer, “Homomorphic analysis of speech,” IEEE Transactions on Audio and Electroacoustics, vol. 16, no. 2, pp. 221–226, 1968. [10] K. K. Paliwal and W. B. Kleijn, “Quantization of LPC parameters,” in Speech Coding and Synthesis, W.B. Kleijn and K.K. Paliwal, Eds. New York: Elsevier, ch. 12, pp. 433–466, 1995. [11] “Adaptive multi-rate (AMR) speech codec; Transcoding functions,” 2004, 3GPP TS 26.090. [12] “Speech codec speech processing functions; Adaptive multi-rate wideband (AMR-WB) speech codec; Transcoding functions 2005, 3GPP TS 26.190. [13] M. N. Murthi and W. B. Kleijn, “Regularized linear prediction allpole models,” in Proc. IEEE Workshop Speech Coding, Sep. 2000, pp. 96–98. [14] L. A. Ekman, W. B. Kleijn, and M. N. Murthi, “Spectral envelope estimation and regularization,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., May 2006, pp. I-245–I-248. [15] J. C. Nash, “The Choleski decomposition,” in Compact Numerical Methods for Computers: Linear Algebra and Function Minimisation. Bristol, U.K.: Adam Hilger, 1979, ch. 7, pp. 70–78. [16] “Acoustic-phonetic continuous speech corpus,” DARPA-TIMIT, 1990, NIST Speech Disc 1-1.1. [17] M. Brookes, VOICEBOX: Speech Processing Toolbox for MATLAB. London, U.K.: Imperial College, 2006. [18] L. Anders Ekman, W. Bastiaan Kleijn, and Manohar N. Murthi, “Regularized Linear Prediction of Speech”, in IEEE transactions on Audio, Speech, and Language processing, Vol. 16, No.1, pp.65-73, Jan. 2008. [19] S. M. Kay,Modern Spectral Estimation: Theory and Application, Prentice-Hall, Upper Saddle River, NJ, USA, 1988. [20] P. Stoica and R. L. Moses, Introduction to Spectral Analysis, Prentice-Hall, Upper Saddle River, NJ, USA, 1997. [21] G. Fant, J. Liljencrants, and Q. G. Lin, “A four parameter model of glottal flow,” Quarterly Progress and Status, pp. 1–13, Speech Transmission Laboratory, Royal Institute of Technology, Stockholm, Sweden, October-December 1985. [22] D. H. Klatt, “Software for a cascade/parallel formant synthesizer,” Journal of the Acoustical Society of America, vol. 67, no. 3, pp. 971–995, 1980. [23] W. Verhelst and O. Steenhaut, “A new model for the shorttime complex cepstrum of voiced speech,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 34, no. 1, pp.43–51, 1986. [24] O. Cappe and E. Moulines, “Regularization techniques for discrete cepstrum estimation,” IEEE Signal Process. Lett., vol. 3, no. 4, pp.100–102, Apr. 1996. [25] M. Oudot, O. Cappe, and E. Moulines, “Robust estimation of the spectral envelope for “harmonics+noise” models,” in Proc. IEEE Workshop Speech Coding, 1997, pp. 11–12. [26] J. Nocedal and S. J. Wright, Numerical Optimization. New York: Springer-Verlag, 1999. [27] “Adaptive multi-rate (AMR) speech codec; Transcoding functions,” 2004, 3GPP TS 26.090. [28] “Speech codec speech processing functions; Adaptive multi-rate wideband (AMR-WB) speech codec; Transcoding functions 2005, 3GPP TS 26.190. [29] Ma, C., Kamp, Y., Williams, L., 1993. Robust signal selection for linear prediction analysis of voiced speech. Speech Comm. 12 (1), 69–81. [30] Alku, P., 1992. Glottal wave analysis with pitch synchronous iterative adaptive inverse filtering. Speech Comm. 11 (2), 109–118. [31] Carlo Magi, Jouni Pohjalainen, Tom Backstrom, Paavo Alku. Stabilised weighted linear prediction. Speech Comm. 51 (2009), 401–411. | |
dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/16316 | - |
dc.description.abstract | 我們從語音信號波形觀察,可以發現在語音出現的時間內,特別是在發母音的時候,一個取樣點的振幅,與鄰近取樣點的振幅有相關性。因此,某一個時間點的振幅便可以取其前若干點的振幅來做估計,此結果導出線性預測的概念。在語音信號處理中,線性預測是一項很重要的分析方法,實作上也常運用線性預測來取出語音的特徵參數。線性預測實際上是對口腔模型的另一種解釋,在線性預測中所得到的分析參數,可以用來轉換成描述語音在頻域上的特性。但是傳統的線性預測在分析上有以下的問題:在分析high-pitch的講者(通常為女性)共振峰時,傳統的線性預測方式會有不正常的極值出現在語音的包絡線上,造成較大的估計誤差;若預測的語音激發訊號有諧波的結構組成,在傳統的線性預測中其自相關函數會有膺頻效應的情況發生,一樣對語音共振峰的估計有不良的影響;此外,在處理受到雜訊干擾的語音信號時,傳統的線性預測表現也不盡理想。本論文研究整理改良語音信號線性預測的演算法,使其在估計語音共振峰的結果更為精確;另外,利用加權的方式,來降低雜訊對線性預測的影響;這些經過改良後的演算法經過推導證實,所得到的全極點濾波器皆為穩定。 | zh_TW |
dc.description.abstract | We observe from the waveform of speech signal, and we can find out that in the period of voiced speech, the amplitude of one sample has a relationship with its neighbors. Therefore, we can estimate someone sample by taking previous other samples. And then, this result is the conception of linear prediction. In digital speech processing, linear prediction is a very important method of analysis, and we usually use linear prediction to get features in the speech signal in fact. By modeling the spectral envelope, linear prediction can capture the most essential acoustical cues of speech originating from two major parts of the human voice production mechanism, the glottal flow and the vocal tract. However, linear prediction analysis also suffers from some drawbacks, for examples, the biasing of the formant estimates by their neighboring harmonics which caused by aliasing that occurs in the autocorrelation domain and the phenomenon is most severe for high-pitch speaker in general. Additionally, it is well-known that the performance of LP deteriorates in the presence of noise. In this thesis, we try to improve conventional algorithm to solve these problems, in order to make spectral envelope estimation more accuracy and increase robustness against noise. By our verification, these improved algorithms all have the stability of the all-pole filters. | en |
dc.description.provenance | Made available in DSpace on 2021-06-07T18:09:29Z (GMT). No. of bitstreams: 1 ntu-101-R99942111-1.pdf: 3806566 bytes, checksum: ccadc0c07d1a97c89c72986ba31b41c2 (MD5) Previous issue date: 2012 | en |
dc.description.tableofcontents | 誌謝 i
中文摘要 iii ABSTRACT v CONTENTS vii LIST OF FIGURES ix LIST OF TABLES xi Chapter 1 Introduction 1 1.1 Background 1 1.2 Speech Production 2 1.3 Linear Prediction Model 7 1.4 Levinson-Durbin Recursive Method 10 Chapter 2 Refined Autocorrelation Algorithm 13 2.1 The Problem Description of Linear Prediction 14 2.2 Homomorphic Deconvolution in the Autocorrelation Domain 18 2.2.1 The Selection of Cepstral Window 23 2.2.2 The Stability of the AR Filter 24 2.3 Results on Synthetic Speech 26 2.3.1 Accuracy in Formant Frequency Estimation 27 2.3.2 Dependency on the Length of Analysis Window 30 2.3.3 Accuracy in Formant Bandwidth Estimation 31 2.4 Results on Real Speech 32 Chapter 3 Regularized Algorithm 34 3.1 Theory of Regularized Linear Prediction 36 3.2 Regularization 37 3.2.1 Stability of the All-pole Filter 40 3.3 Selection of Lambda 43 3.3.1 Objective Quality Criterion 45 3.3.2 The Reference Envelope 45 3.3.3 Finding the Optimal Lambda 49 3.4 The Error of Outliers of Spectral Estimation 55 Chapter 4 Stabilised Weighted Algorithm 61 4.1 Theory of Weighted Linear Prediction 62 4.2 Weighted Linear Prediction Model Formulation 65 4.2.1 Stability of the All-pole Filter 69 4.3 The Result of the Algorithm 73 4.3.1 Objective Spectral Distortion Measurements 77 Chapter 5 Conclusion and Future Work 83 5.1 Thesis Conclusion 83 5.2 Future Work 84 REFERENCE 85 | |
dc.language.iso | en | |
dc.title | 在語音信號上的線性預測改良演算法 | zh_TW |
dc.title | Improved Algorithms for Linear Prediction Speech | en |
dc.type | Thesis | |
dc.date.schoolyear | 100-2 | |
dc.description.degree | 碩士 | |
dc.contributor.oralexamcommittee | 馬杰,徐忠枝 | |
dc.subject.keyword | 語音訊號,線性預測,頻譜包絡線估測,正規化,全極點模型, | zh_TW |
dc.subject.keyword | Speech signal,Linear prediction,Spectral envelope estimation,Homomorphic deconvolution,Regularization,All-pole modeling, | en |
dc.relation.page | 87 | |
dc.rights.note | 未授權 | |
dc.date.accepted | 2012-07-11 | |
dc.contributor.author-college | 電機資訊學院 | zh_TW |
dc.contributor.author-dept | 電信工程學研究所 | zh_TW |
顯示於系所單位: | 電信工程學研究所 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-101-1.pdf 目前未授權公開取用 | 3.72 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。