應用貝氏網路及適應性調適方法於語音情緒辨識之研究

Chih-Yuan Yu; 游志源

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/10277

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	劉佩玲(Pei-Ling Liu)
dc.contributor.author	Chih-Yuan Yu	en
dc.contributor.author	游志源	zh_TW
dc.date.accessioned	2021-05-20T21:16:22Z	-
dc.date.available	2014-02-09
dc.date.available	2021-05-20T21:16:22Z	-
dc.date.copyright	2011-02-09
dc.date.issued	2011
dc.date.submitted	2011-01-27
dc.identifier.citation	1.M. Álvarez, R. Galán, F. Matía, D. Rodríguez-Losada, A. Jiménez, “An Emotional Model for a Guide Robot,” IEEE Trans., Systems, Man and Cybernetics, Part A: Systems and Humans, Vol. 40, No. 5, pp. 982-992, 2010. 2.J. Ang, R. Dhillon, A. Krupski, E. Shriberg, and A. Stolcke, “Prosody-based automatic detection of annoyance and frustration in human-computer dialog,” in Proc. of the International Conference on Spoken Language Processing (ICSLP), pp. 2037-2040, 2002. 3.R. Banse and K. R. Scherer, “Acoustic profiles in vocal emotion expression,” Journal of Personality and Social Psychology, Vol. 70, pp. 614-6362, 1996. 4.F. Burkhardt, A. Paeschke, M. Rolfes,W. Sendlmeier,B. Weiss, “A Database of German Emotional Speech,” Proc. INTERSPEECH (ISCA), pp.1517-1520, 2005. 5.C. Busso, S. Lee, and S. Narayanan, 'Analysis of Emotionally Salient Aspects of Fundamental Frequency for Emotion Detection,' IEEE Transactions on Audio, Speech, and Language Processing, Vol. 17, pp. 582-596, 2009. 6.S. Casale, A. Russo, G. Scebba, and S. Serrano, 'Speech Emotion Classification Using Machine Learning Algorithms,' IEEE International Conference on Semantic Computing, pp. 158-165, 2008. 7.S. Chandrakala, C. C. Sekhar, “Combination of generative models and SVM based classifier for speech emotion recognition,” International Joint Conference on Neural Networks, pp. 497-502, 2009. 8.C.-C. Chang and C.-J. Lin, LIBSVM: a library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm 9.Chui, Charles K., An Introduction to Wavelets, San Diego, Academic Press, 1992. 10.D. Chen, D. Jiang, I. Ravyse, H. Sahli, “Audio-Visual Emotion Recognition Based on a DBN Model with Constrained Asynchrony,” International Conference on Image and Graphics, pp 912-916, 2009 11.Z. J. Chuang and C. H. Wu, “Emotion recognition using acoustic features and textual content,” International Conference on Multimedia and Expo, Vol. 1, pp. 53-56, 2004 12.Cooley, James W., and John W. Tukey, 'An algorithm for the machine calculation of complex Fourier series,' Math. Comput. 19: 297-301, 1965. 13.C. Cortes and V. Vapnik, 'Support-Vector Networks', Machine Learning, 1995. 14.R. Cowie, E. Douglas-Cowie, N. Tsapatsoulis, G. Votsis, S. Kollias, and W. Fellenz, “Emotion recognition in human-computer interaction,” in Proc. of the IEEE Signal Processing Magazine, Vol. 18, pp. 32-80, 2001. 15.R. Cowie and R. R. Cornelius, “Describing the emotional states that are expressed in speech,” Speech Communication, Vol. 40, pp. 5-32, 2003. 16.S. Das, A. Halder, P. Bhowmik, A. Chakraborty, A. Konar, R. Janarthanan, 'A Support Vector Machine Classifier of Emotion from Voice and Facial Expression Data,' Nature & Biologically Inspired Computing, pp. 1010-1015, 2009 17.W. Dongrui, D. P. Thomas, M. Emily, N. Shrikanth, “Speech Emotion Estimation in 3D Space,” IEEE International Conference on Multimedia & Expo, pp. 737-742, 2010. 18.E. Douglas-Cowie, R. Cowie, and M. Schroeder, “A New Emotion Database: Considerations, Sources and Scope,” in Proc. ISCA Workshop Speech and Emotion : A conceptual framework for research, pp. 39-44, 2000. 19.H. P. Espinosa, C. A. R. García, L. V. Pineda, “Features selection for primitives estimation on emotional speech,” IEEE International Conference on Acoustics Speech and Signal Processing, pp. 5138-5141, 2010. 20.B. Fasel and J. Luettin, “Automatic facial expression analysis: A survey,” Pattern Recognition, Vol. 36, No.1, pp. 259-275, 2003. 21.N. Fragopanagos and J. Taylor, “Emotion recognition in human-computer interaction,” Neural Networks, Special Issue, pp. 1-17, 2005. 22.M. Fujita, Y. Kuroki, T. Ishida and T.T. Doi, “A Small Humanoid Robot SDR-4X for Entertainment Applications,” International Conference on Advanced Intelligent Mechatronics, pp. 938-943, 2003 23.M. Fujita, “On Activating Human Communications with Pet-type Robot AIBO,” Proc. of the IEEE, Vol. 92, No. 11, pp. 1804-1813, 2004. 24.H. J. Go, K.C. Kwak, D.J. Lee and M.G. Chun, 'Emotion recognition from the facial image and speech signal,' SICE Annual Conference, Vol. 3, pp. 2890-2895, 2003. 25.F. J. Harris, “On the use of windows for harmonic analysis with the discrete Fourier transform,” Proceedings of the IEEE, Vol. 66, pp. 51-83, 1978. 26.R. Huber, A. Batliner, J. Buckow, E. Noth, V. Warnke, and H. Niemann,“Recognition of emotion in a realistic dialogue scenario,” in Proc. of the International Conference on Spoken Language Processing (ICSLP), Vol. 1, pp. 665-668, 2000. 27.A. Iida, N. Campbell, S. Iga, F. Higuchi, and M. Yasumura, “A speech synthesis system with emotion for assisting communication”, in Proc. of the ISCA Workshop (ITRW) on Speech and Emotion: A conceptual framework for research, pp. 167-172, 2002. 28.T. Iliou, C. N. Anagnostopoulos, “Comparison of Different Classifiers for Emotion Recognition,” PCI ' Panhellenic Conference on Informatics, pp. 102-106, 2009. 29.D.N. Jiang and L.H. Cai, “Speech Emotion Classification with the Combination of Statistic Features and Temporal Features”, IEEE International Conference on Multimedia and Expo , pp. 1967-1970, 2004 30.T. Johnstone, “Emotional speech elicited using computer games,” In Proc. of the International Conference on Spoken Language Processing, pp. 1985-1988, 1996. 31.S. Kadambe and G. F. Boudreaux-Bartels, “Application of the wavelet transform for pitch detection of speech signals,” IEEE Trans., Information Theory, Vol. 38, No. 2, pp. 917-924, 1992. 32.A. B. Kandali, A. Routray, T. K. Basu, “Comparison of Features Based on MFCCs and Eigen Values of Autocorrelation Matrix for Cross-Lingual Vocal Emotion Recognition in Five Languages of Assam,” Annual IEEE India Conference, pp. 1-4, 2009. 33.O. W. Kwon, K. Chang, J. Hao, and T. W. Lee, “Emotion recognition by speech signals,” in Proc. of the 8th European Conference on Speech Communication and Technology, pp. 125-128, 2003 34.C. M. Lee, S. Narayanan, and R. Pieraccini, “Classifying emotions in human-machine spoken dialogs,” in Proc. of the International Conference on Multimedia and Expo, pp. 737-740, 2002. 35.C. M. Lee, S. Yildirim, M. Bulut, A. Kazemzadeh, C. Busso, Z. Deng, S. Lee, and S. Narayanan, “Emotion recognition based on phoneme classes,” in Proc. of the International Conference on Spoken Language Processing (ICSLP), pp. 889-892, 2004. 36.Y. L. Lin and G. Wei, “Speech emotion recognition based on HMM and SVM,” in Proc. of the 2005 International Conference on Machine Learning and Cybernetics, Vol. 8, pp. 4898-4901, 2005. 37.M. Lyons, S. Akamatsu, M. Kamachi, and J. Gyoba, “Coding facial expressions with gabor wavelets,” in Proc. third IEEE Int. C. on Automatic Face and Gesture Recognition, pp. 200-205, 1998. 38.V. Makarova and V. A. Petrushin, “RUSLANA: A database of Russian Emotional Utterances,” in Proc. of the 2002 Int. Conf. Spoken Language Processing (ICSLP), pp. 2041-2044, 2002. 39.A. P. Meshram, S. D. Shirbahadurkar, A. Kohok, S. Jadhav, “An Overview and Preparation for Recognition of Emotion from Speech Signal with Multi Modal Fusion,” Computer and Automation Engineering (ICCAE) , pp. 446-452, 2010 40.A. Metallinou, S. Lee and S. Narayanan, “Audio-Visual Emotion Recognition using Gaussian Mixture Models for Face and Voice,” Proceedings of IEEE International Symposium of Multimedia, pp 250-257, 2008 41.J. M. Montero, J. Gutierrez-Arriola, J. Colas, E. Enriquez, and J. M. Pardo, “Analysis and modelling of emotional speech in Spanish”, in Proc. of the ICPhS'99, pp. 957-960, 1999. 42.E. Mower, M. J. Mataric, S. Narayanan, “Human Perception of Audio-Visual Synthetic Character Emotion Expression in the Presence of Ambiguous and Conflicting Information,” IEEE Transactions on Multimedia, Vol. 11, No. 5, pp. 843-855, 2009. 43.C. Nadeu, J. Pascual, J. Hernando, “Pitch determination using the cepstrum of the one-sided autocorrelation sequence,” International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Vol. 5, pp. 3677 - 3680, 1991 44.T. L. New, S. W. Foo, and L. C. De Silva, “Speech emotion recognition using hidden Markov models,” Speech Communication, Vol. 41, pp. 603-623, 2003. 45.Y. Niimi, M. L. Kasamatu, T. Nishimoto, and M. Araki, “Synthesis of emotional speech using prosodically balanced VCV Segments,” in Proc. of the 4th ISCA tutorial and Workshop on research synthesis, paper 133, 2001. 46.M. Pantic and J. M. Rothkrantz, “Toward an affect-sensitive multimodal human-computer interaction,” Proc. IEEE, Vol. 91, No.9, pp. 1370-1390, 2003. 47.T. L. Pao and Y. T. Chen, 'Emotion recognition in Mandarin speech and its application in training of hearing impaired,' National Computer Symposium, pp. 379-391, 2003. 48.T. L. Pao, Y. T. Chen, J, H. Yeh, and W. Y. Liao, 'Combining acoustic features for improved emotion recognition in Mandarin speech,' Affective Computing & Intelligent Interaction (ACII), Lecture Notes in Computer Science, Vol. 3784, pp. 279-285, 2005. 49.V. Petrushin, “Emotion in speech: Recognition and application to call centers,” in Proc. of Artificial Neural Networks in Engineering, pp. 7-10, 1999. 50.V. Petrushin, “Emotion recognition in speech signal: Experimental study, development, and application,” in Proc. of the Sixth International Conference on Spoken Language Processing (ICSLP), pp. 222-225, 2000. 51.R. W. Picard, Affective computing, The MIT Press, United States, 1998 52.T. S. Polzin and A. Waibel, “Emotion-sensitive human-computer interfaces,” in Proc. of the ISCA Workshop on Speech and Emotion, pp. 201-206, 2000. 53.L. Rabiner, “On the use of autocorrelation analysis for pitch detection,” IEEE Transactions on Acoustics, Speech and Signal Processing, Vol. 25, No.1, pp. 24-33, 1977. 54.K. R. Scherer, “A cross-cultural investigation of emotion inferences from voice and speech: Implications for speech technology,” In Proc. of the ICSLP, pp. 379-382, 2000. 55.K. R. Scherer, “Vocal communication of emotion: A review of research paradigms,” Speech Communication, Vol. 40, pp. 227-256, 2003. 56.F. Schiel, Steininger Silke, and Turk Ulrich, “The Smartkom Multimodal Corpus at BAS,” in Proc. of the Language Resources and Evaluation, pp. 35-41, 2002. 57.M. Schroder and M. Grice, “Expressing vocal effort in concatenative synthesis”, in Proc. of the 15th Int. Conf. Phonetic Sciences, pp. 2589-2592, 2003. 58.B. Schuller, G. Rigoll and M. Lang, “Hidden Markov Model-based Speech Emotion Recognition”, Proc. of IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. 2, pp 1-4, 2003 59.B. Schuller, B. Vlasenko, F. Eyben, G. Rigoll, A. Wendemuth, “Acoustic Emotion Recognition-A Benchmark Comparison of Performances,” IEEE Workshop, Automatic Speech Recognition & Understanding, pp. 552-557, 2009 60.H. Seyedarabi, A. Aghagolzadeh and S. Khanmohammadi, “Recognition of Six Basic Facial Expressions by Feature-Points Tracking using RBF Neural Network and Fuzzy Inference System,” Proc. of the 2004 IEEE International Conference on Multimedia and Expo, pp. 1219-1222, 2004 61.Shakhnarovish, Darrell, and Indyk, ed, Nearest-Neighbor Methods in Learning and Vision, MIT Press, 2005. 62.J. Tao, Y. Kang, A. Li, “Prosody conversion from neutral speech to emotional speech,” IEEE Transactions on Audio, Speech, and Language Processing, Vol. 14 No. 4, pp.1145–1154, 2006. 63.S. Wang, X. Ling, F. Zhang, J. Tong, “Speech Emotion Recognition Based on Principal Component Analysis and Back Propagation Neural Network,” International Conference on Measuring Technology and Mechatronics Automation, pp. 437–440, 2010. 64.Y. Wang, L. Guan, “Recognizing Human Emotional State From Audiovisual Signals,” IEEE Transactions on Multimedia, Vol. 10, No. 4, pp. 659-668, 2008. 65.T. Wilhelm, H.J. Bohme, H.M. Grofi and A. Backhaus, “Statistical and Neural Methods for Vision-based Analysis of Facial Expressions and Gender,” IEEE International Conference on Systems, Man and Cybernetics, Vol. 3, pp. 2203-2208, 2004 66.C. H. Wu, Z. J. Chuang, Y. C. Lin, “Emotion recognition from text using semantic labels and separable mixture models,” ACM Transactions on Asian Language Information Processing, Vol. 5, Issue 2, 2006. 67.S. Wu, T. H. Falk, W.-Y. Chan, “Automatic Recognition of Speech Emotion using Long-term Spectro-temporal Features,” 16th International Conference on Digital Signal Processing, pp. 1-6, 2009. 68.T. Wu, Y. Yang, Z. Wu, and D. Li, “MASC: A speech corpus in Mandarin for emotion analysis and affective speaker recognition,” in Proc. of the IEEE Workshop on Speaker and Language Recognition, pp. 1-5, 2006. 69.L. Xin, L. Xiang, 'Novel Hilbert Energy Spectrum Based Features for Speech Emotion Recognition,' International Conference on Information Engineering, Vol. 1, pp.189-193, 2010. 70.S. Yacoub, S. Simske, X. Lin, and J. Burns, “Recognition of emotion in interactive voice response systems,” in Proc. of Eurospeech, 8th European Conference on Speech Communication and Technology, 2003. 71.I. Yanushevskaya, C. Gobl, and A. Chasaide, “Voice quality and f0 cues for affect expression: implications for synthesis,” In Proc. of the EUROSPEECH, Interspeech 2005, pp. 1849-1852, 2005. 72.Z. Zeng, M. Pantic, G. I. Roisman, T. S. Huang, “A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions,” IEEE Transaction on Pattern Analysis and Machine Intelligence, Vol. 31, No. 1, pp. 39-58, 2009. 73.S. Zhang, “Speech emotion recognition based on Fuzzy Least Squares Support Vector Machines,” World Congress on Intelligent Control and Automation, pp. 1299-1302, 2008. 74.林宗勳, Support Vector Machines簡介, http://www.cmlab.csie.ntu.edu.tw/~cyy/learning/tutorials/SVM2.pdf
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/10277	-
dc.description.abstract	本研究之主要目的在發展一貝氏網路自動化語音情緒辨識方法，透過情緒語音之相關特徵參數計算，並與資料庫中各情緒之資料作比對，將語者的情緒狀態從語音訊號中辨識出來。首先，將語者之情緒語音訊號以統計方式計算音高(Pitch)、音框能量(Frame energy)、共振峰(Formants)以及梅爾倒頻譜係數(Mel-scale Frequency Cepstral Coefficients, MFCC)等相關之語音情緒特徵，並以各特徵參數在中性情緒下之資料庫平均值為正規化特徵參數因子，將音高、音框能量以及共振峰利用正規化特徵參數因子進行正規化，得到正規化後之特徵參數，以消除不同語者之間差異。各特徵參數對情緒之辨識能力不同，如音高可大約分辨悲傷及中性，快樂及生氣會被視為同一群。由於沒有任一參數可明顯分辨四種情緒，故本研究採用分層解析的方式，先將特徵參數分群，同一群特徵參數具有相似的情緒分類效果，並以分群結果建立多層貝氏網路(Multi-Layered Bayesian Network, MLBN)，第一層的輸入參數都只能辨識兩群情緒，第二層的輸入參數可辨識三群情緒，較無明顯分群效果的參數則置於第三層，以辨識四種情緒。由於特徵參數之間具有相關性，因此，本論文將MLBN延伸，將特徵參數之相關性納入考量並提出多層共變異數貝氏網路(Multi-Layered Bayesian Network with Covariance, MLBNC)。當語者資料不在訓練資料中時，其辨識效果通常不佳。為改善此狀況，本研究提出適應性MLBN及適應性MLBNC語音情緒辨識調適方法，在調適過程中，當辨識結果與語者情緒不同時，則根據語者情緒語音所得之參數值來調整MLBN及MLBNC資料庫中各群之平均值與標準差或共變異數，以符合語者真實的情緒狀態。為驗證本研究所提出之方法，我們使用德國情緒語料庫EMO-DB當作訓練與測試語料，並以KNN、SVM、MLBN以及MLBNC分別進行Inside與Outside Test。同時，我們亦以EMO-DB為訓練語料，並以工業技術研究院所自行錄製之情緒語料為測試語料，分別對KNN、SVM、MLBN以及MLBNC進行不同語系之測試。而在適應性MLBN及MLBNC的驗證上，我們以EMO-DB為訓練語料，並以工研院之情緒語料為調適與測試語料，分別對適應性KNN、MLBN及MLBNC進行調適前後的測試。實驗結果顯示，本研究提出之MLBN、MLBNC與單純使用貝氏決策之Inside Test辨識率分別為81.1%、88.8%以及70.8%，顯示透過各參數分層分群的辨識方式，可以有效提高語音情緒之辨識率，而考慮特徵參數間相關性之MLBNC，其結果亦優於MLBN。在Outside Test的部分，KNN、SVM以及MLBN使用原始參數時，其辨識率分別為78.2%、89.1%以及69.9%，而使用正規化參數時，辨識率分別為82.6%、91.7%以及77.6%，顯示正規化特徵參數可以有效縮小語者之間在特徵參數上的差異。而當訓練與測試語料為不同語系時，KNN、SVM、MLBN以及MLBNC之辨識率分別為34.21%、46.92%、39.33%以及52.08%，此結果顯示，當發音方式或表達情緒方式與資料庫有所差異時，各分類器之辨識效果均不佳。在調適實驗部分，由調適前後之辨識結果顯示，KNN經調適過後，辨識率從34.2%提升至73.7%，而MLBN及MLBNC經調適後，其辨識率分別從37.8%提升至82.4%以及51.6%提升至81.2%，本研究提出之適應性MLBN與適應性MLBNC語音情緒辨識方法於資料庫修正後，其辨識效果明顯優於適應性KNN。而當調適次數增加時，MLBN及MLBNC經調適後，其辨識率則分別從39.3%提升至88.9%以及52.1%提升至90.0%，顯示經由本論文所提出之調適方法，經調適後，確實可以真正的反映語者的真實狀況，並得到良好的調適後辨識結果。	zh_TW
dc.description.abstract	The objective of this study is to develop an automatic speech emotion recognition method using Bayesian Network. By calculating the relevant features of emotion speech and comparing the features with emotion database, the speaker’s emotion state can be identified. Firstly, we calculate the statistical features of pitch, frame energy, formants, mel-scale frequency cepstral coefficients (MFCC). Then we use the mean value of neutral emotion in corpus as normalized factor for each feature, and calculate the normalized features of pitch, frame energy and formants. The normalized features can reduce the feature difference between speakers. Each feature has different ability of emotion recognition. For example, the normalized pitch mean can recognize sad and neutral, and happy and angry can consider as the same cluster. No features can obviously recognize the four emotions, so we use different cluster to recognize the four emotions layer by layer. We cluster the features which have similar ability of emotion recognition and establish the Multi-Layered Bayesian Network (MLBN) method for speech emotion recognition. The features of layer 1 can recognize two clusters of emotion. The features of layer 2 can recognize three clusters of emotion. The features which have no obvious clusters are put on layer 3 and recognize the four emotions. There are some relations between each feature. Therefore, we extend the MLBN method and establish the Multi-Layered Bayesian Network with Covariance (MLBNC) method, which consider the relations between each feature, for speech emotion recognition. The recognition rate will be poor if the training data of recognizer did not contain speaker’s speech emotion data. Therefore, we propose adaptive MLBN and MLBNC method for speech emotion recognition. In the adaptive MLBN and MLBNC process, we adjust the mean and standard deviation or covariance of clusters in the MLBN or MLBNC database to fit speaker’s real emotion status when the recognition result is wrong. To verify the proposed method in this research, we use German emotional database (EMO-DB) as training and testing data for inside and outside test of KNN, SVM, MLBN and MLBNC recognizer. We also use EMO-DB as training data and ITRI emotional database as testing data for different corpus test. In the adaptive tests, we use EMO-DB as training data and ITRI emotional database as adaptive and testing data for adaptive KNN, MLBN and MLBNC recognizer. The inside test recognition rate of MLBN, MLBNC and Bayesian Decision (BD) are 81.1%, 88.8% and 70.8% respectively. It shows that cluster of features layer by layer can effectively increase the recognition rate and it will be better when regards of the relations between each feature. In outside test, the recognition rate of KNN, SVM and MLBN are 78.2%, 89.1% and 69.9% respectively using original features and 82.6%, 91.7% and 77.6% respectively using normalized features. It shows that normalized features can reduce the feature difference between speakers and increase the recognition rate. In testing corpus is different with training, the recognition rate of KNN, SVM, MLBN and MLBNC are 34.21%, 46.92%, 39.33% and 52.08% respectively. It shows if speaker’s pronunciation or emotion presentation is different with training data, the recognition result is bad for each recognizer. For adaptive emotion recognition test, adaptive KNN method can increase the recognition rate from 34.2% to 73.7%, adaptive MLBN method can increase from 37.8% to 82.4% and adaptive MLBNC method can increase from 51.6% to 81.2%. The proposed adaptive MLBN and MLBNC method of this study is better than adaptive KNN method. When adjustment times increase, the recognition rate of MLBN can increase from 39.3% to 88.9% and MLBNC can increase from 52.1% to 90.0%. It shows that adaptive MLBN and MLBNC method can really reflect the real status of speaker’s emotion state and get good recognition results after appropriate adjustment.	en
dc.description.provenance	Made available in DSpace on 2021-05-20T21:16:22Z (GMT). No. of bitstreams: 1 ntu-100-D90543002-1.pdf: 4491436 bytes, checksum: 26b270001a0d5fffd58ac2e7806ed801 (MD5) Previous issue date: 2011	en
dc.description.tableofcontents	誌謝-------------------------------------------------------I 中文摘要-------------------------------------------------III ABSTRACT--------------------------------------------------VI 目錄-------------------------------------------------------X 圖目錄--------------------------------------------------XIII 表目錄---------------------------------------------------XVI 第一章導論----------------------------------------------1 1.1前言----------------------------------------------------1 1.2文獻回顧------------------------------------------------2 1.3本文簡介------------------------------------------------7 第二章語音情緒特徵參數計算------------------------------9 2.1前處理--------------------------------------------------9 2.1.1 音框(Frame)與視窗(Windows)---------------------------9 2.1.2 預強調(Pre-emphasis)--------------------------------10 2.1.3 快速傅立葉轉換(Fast Fourier Transform, FFT)---------11 2.2語音特徵(Speech Feature)計算---------------------------12 2.2.1 音高(Pitch)-----------------------------------------12 2.2.2 共振峰(Formant)-------------------------------------14 2.2.3 音框能量(Frame Energy)------------------------------14 2.2.4 梅爾頻率倒頻譜係數(Mel-frequency Cepstral coefficient, MFCC)----------------------------------------15 2.3 小結--------------------------------------------------16 第三章特徵參數之統計計算與正規化-----------------------23 3.1 情緒語音資料庫----------------------------------------23 3.2 特徵統計計算------------------------------------------24 3.2.1 語音特徵平均值(Mean)與標準差(Standard deviation)----24 3.2.2 語音特徵正規化計算(Normalization)-------------------28 3.3 小結--------------------------------------------------32 第四章 KNN與SVM於語音情緒辨識之實驗與分析---------------50 4.1 KNN語音情緒辨識---------------------------------------50 4.1.1 第K個最近鄰(K-Nearest Neighbor, KNN)----------------50 4.1.2 KNN語音情緒辨識實驗結果與分析-----------------------51 4.2 SVM語音情緒辨識---------------------------------------56 4.2.1 支持向量機(Support Vector Machine, SVM)-------------56 4.2.2 SVM情緒辨識實驗結果與分析---------------------------58 4.3 小結--------------------------------------------------62 第五章多層貝氏網路與多層共變異數貝氏網路語音情緒辨識---71 5.1 決策樹、貝氏決策與貝氏網路介紹------------------------71 5.1.1決策樹 (Decision Tree) ------------------------------71 5.1.2貝氏決策(Bayes Decision)與貝氏網路(Bayesian Network) 72 5.2 多層貝氏網路(Multi-Layer Bayesian Network, MLBN)語音情緒辨識------------------------------------------------------75 5.2.1 語音情緒特徵分群分析--------------------------------75 5.2.2 多層貝氏網路(MLBN)----------------------------------83 5.2.3 多層貝氏網路(MLBN)語音情緒辨識實驗結果與分析--------94 5.3 多層共變異數貝氏網路(Multi-Layer Bayesian Network with Covariance, MLBNC)語音情緒辨識----------------------------98 5.3.1多層共變異數貝氏網路(MLBNC)--------------------------98 5.3.2 MLBNC實驗結果與分析--------------------------------111 5.4 蒙地卡羅模擬(Monte Carlo Simulation)與分析-----------115 5.5 小結-------------------------------------------------120 第六章調適性語音情緒辨識------------------------------149 6.1 適應性KNN語音情緒辨識實驗與分析----------------------149 6.2 適應性MLBN語音情緒辨識實驗與分析---------------------152 6.2.1 適應性MLBN-----------------------------------------152 6.2.2 適應性MLBN實驗結果與分析---------------------------166 6.3 適應性MLBNC語音情緒辨識實驗與分析--------------------171 6.3.1 適應性MLBNC----------------------------------------172 6.3.2 適應性MLBNC實驗結果與分析--------------------------183 6.4 小結-------------------------------------------------189 第七章結論與未來展望----------------------------------211 參考文獻-------------------------------------------------218 附錄A----------------------------------------------------225 附錄B ---------------------------------------------------232 附錄C ---------------------------------------------------240
dc.language.iso	en
dc.title	應用貝氏網路及適應性調適方法於語音情緒辨識之研究	zh_TW
dc.title	Speech Emotion Recognition Using Bayesian Network and Adaptive Approach Methods	en
dc.type	Thesis
dc.date.schoolyear	99-1
dc.description.degree	博士
dc.contributor.oralexamcommittee	吳宗憲,余松年,包蒼龍,陳有圳,黃志賢,石明于,廖志彬
dc.subject.keyword	語音情緒辨識,特徵參數,正規化,多層貝氏網路,多層共變異數貝氏網路,適應性,	zh_TW
dc.subject.keyword	speech emotion recognition,features,normalization,MLBN,MLBNC,adaptive,	en
dc.relation.page	253
dc.rights.note	同意授權(全球公開)
dc.date.accepted	2011-01-27
dc.contributor.author-college	工學院	zh_TW
dc.contributor.author-dept	應用力學研究所	zh_TW
顯示於系所單位：	應用力學研究所

文件中的檔案：

檔案	大小	格式
ntu-100-1.pdf	4.39 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。