請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/57388完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 許永真 | |
| dc.contributor.author | Ching-Hsiu Huang | en |
| dc.contributor.author | 黃慶修 | zh_TW |
| dc.date.accessioned | 2021-06-16T06:44:05Z | - |
| dc.date.available | 2019-08-01 | |
| dc.date.copyright | 2014-08-01 | |
| dc.date.issued | 2014 | |
| dc.date.submitted | 2014-07-28 | |
| dc.identifier.citation | Bibliography
[1] http://www.speechworks.com/indexflash.cfm. [2] S. Andrews, I. Tsochantaridis, and T. Hofmann. Support vector machines for multiple-instance learning. In Advances in Neural Information Processing Systems 15, pages 561–568. MIT Press, 2003. [3] A. Batliner, D. Seppi, S. Steidl, and B. Schuller. Segmenting into adequate units for automatic recognition of emotion-related episodes: A speech-based approach. Advances in HumanComputer Interaction, 2010:1–16, 2010. [4] F. Burkhardt, A. Paeschke, M. Rolfes, W. F. Sendlmeier, and B. Weiss. A database of german emotional speech. In Interspeech, volume 5, pages 1517–1520, 2005. [5] R. Cowie and R. R. Cornelius. Describing the emotional states that are expressed in speech. Speech Commun., 40(1-2):5–32, Apr. 2003. [6] R. Cowie, E. Douglas-Cowie, N. Tsapatsoulis, G. Votsis, S. Kollias, W. Fellenz, and J. Taylor. Emotion recognition in human-computer interaction. Signal Processing Magazine, IEEE, 18(1):32 –80, jan 2001. [7] J. R. Davitz and L. J. Davitz. The communication of feelings by content-free speech. Journal of Communication, 9(1):6–13, 1959. [8] T. G. Dietterich, R. H. Lathrop, and T. Lozano-P´erez. Solving the multiple instance problem with axis-parallel rectangles. Artificial intelligence, 89(1):31–71, 1997. [9] P. Ekman. Expression and the nature of emotion, volume 319-344, pages 319–343. Erlbaum, 1984. [10] A. V. Engberg, I. S. ;Hansen. Documentation of the danish emotional speech database. 1996. [11] I. S. Engberg, A. V. Hansen, O. Andersen, and P. Dalsgaard. Design, recording and verification of a danish emotional speech database. In Eurospeech, 1997. [12] F. Eyben, A. Batliner, B. Schuller, D. Seppi, and S. Steidl. Cross-corpus classification of realistic emotions–some pilot experiments. In Proc. 3rd International Workshop on EMOTION (satellite of LREC): Corpora for Research on Emotion and Affect, Valetta, pages 77–82, 2010. [13] F. Eyben, M. W¨ollmer, and B. Schuller. Opensmile: the munich versatile and fast open-source audio feature extractor. In Proceedings of the international conference on Multimedia, pages 1459–1462. ACM, 2010. [14] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten. The weka data mining software: an update. ACM SIGKDD explorations newsletter, 11(1):10–18, 2009. [15] J. H. Hansen, S. E. Bou-Ghazale, R. Sarikaya, and B. Pellom. Getting started with susas: a speech under simulated and actual stress database. In EUROSPEECH, volume 97, pages 1743–46, 1997. [16] J.-S. R. Jang. Audio signal processing and recognition. available at the links for on-line courses at the author’s homepage at http://www.cs.nthu.edu.tw/ jang. [17] P. N. Juslin and P. Laukka. Communication of emotions in vocal expression and music performance: Different channels, same code? Psychological bulletin, 129(5):770, 2003. [18] P. N. Juslin and K. R. Scherer. Vocal expression of affect. The new handbook of methods in nonverbal behavior research, pages 65–135, 2005. [19] M. Knapp, J. Hall, and T. Horgan. Nonverbal communication in human interaction. Cengage Learning, 2013. [20] S. M. Kosslyn. Psychology: the Brain, the Person, the World, chapter 10, pages 390–435. Person Education, Inc, 2 edition, 2004. [21] C. M. Lee, S. S. Narayanan, and R. Pieraccini. Combining acoustic and language information for emotion recognition. In INTERSPEECH, 2002. [22] D. Liu, L. Lu, and H. Zhang. Automatic mood detection from acoustic music data. In ISMIR, 2003. [23] D. Morrison, R. Wang, and L. C. De Silva. Ensemble methods for spoken emotion recognition in call-centres. Speech communication, 49(2):98–112, 2007. [24] A. Ortony and T. J. Turner. Whats basic about basic emotions? Psychological Review, 97(3):315–331, 1990. [25] C. E. Osgood, G. J. Suci, and P. H. Tannenbaum. The measurement of meaning, volume 49. University of Illinois Press, 1957. [26] W. G. Parrott. Emotions in social psychology: Essential readings. Psychology Press, 2001. [27] W. Picard and J. A. Healey. Wearable and automotive systems for affect recognition from physiology. 2000. [28] J. Pittam and K. R. Scherer. Vocal expression and communication of emotion. Handbook of emotions, pages 185–197, 1993. [29] H. Prendinger, C. Becker, and M. Ishizuka. A study in users’physiological response to an empathic interface agent. International Journal of Humanoid Robotics, 3(03):371– 391, 2006. [30] P. L. Rogers, K. R. Schererf, and R. Rosenthal. Content filtering human speech: A simple electronic system. Behavior Research Methods & Instrumentation, 3(1):16–18, 1971. [31] D. Roy and A. Pentland. Automatic spoken affect analysis and classification. In Automatic Face and Gesture Recognition, 1996., Proceedings of the Second International Conference on, pages 363–367. IEEE, 1996. [32] J. A. Russell. Affective space is bipolar. Journal of Personality and Social Psychology, 37(3):345–356, 1979. [33] K. R. Scherer. Acoustic concomitants of emotional dimensions: Judging affect from synthesized tone sequences. In S. Weitz, editor, Nonverbal communication: Readings with commentary, pages 249–253. Oxford University Press, 1972. [34] B. Schuller. Towards intuitive speech interaction by the integration of emotional aspects. In Systems, Man and Cybernetics, 2002 IEEE International Conference on, volume 6, pages 6–pp. IEEE, 2002. [35] B. Schuller, D. Arsic, F. Wallhoff, and G. Rigoll. Emotion recognition in the noise applying large acoustic feature sets. Speech Prosody, Dresden, pages 276–289, 2006. [36] B. Schuller, A. Batliner, S. Steidl, and D. Seppi. Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge. Speech Commun., 53(9-10):1062–1087, Nov. 2011. [37] B. Schuller and G. Rigoll. Recognising interest in conversational speech - comparing bag of frames and supra-segmental features. In INTERSPEECH’09, pages 1999–2002, 2009. [38] B. Schuller, G. Rigoll, and M. Lang. Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture. In Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP ’04). IEEE International Conference on, volume 1, pages I – 577–80 vol.1, may 2004. [39] B. Schuller, S. Steidl, and A. Batliner. The interspeech 2009 emotion challenge. In INTERSPEECH, pages 312–315. Citeseer, 2009. [40] S. Steidl. Automatic classification of emotion-related user states in spontaneous children’s speech. University of Erlangen-Nuremberg Germany, 2009. [41] E. Velten Jr. A laboratory task for induction of mood states. Behaviour research and therapy, 6(4):473–482, 1968. [42] T. Vogt, E. Andr, and J. Wagner. Automatic recognition of emotions from speech: a review of the literature and recommendations for practical realisation. In In LNCS 4868, pages 75–91, 2008. [43] T. Vogt and E. Andre. Comparing feature sets for acted and spontaneous speech in view of automatic emotion recognition. In Multimedia and Expo, 2005. ICME 2005. IEEE International Conference on, pages 474 –477, july 2005. [44] C. M. Whissell. The dictionary of affect in language, volume 4, pages 113–131. Academic Press, 1989. [45] J. Wilting, E. Krahmer, and M. Swerts. Real vs. acted emotional speech. In Proceedings of the International Conference on Spoken Language Processing (Interspeech 2006), Pittsburgh, Pa, USA, September 2006 . - 2006, pages 805–808, 2006. [46] J. Yang. Review of multi-instance learning and its applications. Technical report, Tech. Rep, 2005. [47] X. Zhe and A. Boucouvalas. Text-to-emotion engine for real time internet communication. In Proceedings of International Symposium on Communication Systems, Networks and DSPs, pages 164–168, 2002. | |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/57388 | - |
| dc.description.abstract | 近年來因智慧型攜帶裝置逐漸普及,造就人們接觸智慧型人機介面的機會越來越多,但其情緒議題一直以來都被漠視,於是,擁有情緒的智慧型人機介面這課題就在人們快速膨大的需求中浮出。在心理學領域長期的研究下,關於情緒的研究已經有大量的資料可供參考。
近十幾年來由於情緒計算(Affective computing)的發展,在資訊工程領域方面也漸漸的累積了許多研究。以前人的研究做為基石,本研究選擇進行聲音方面的情緒辨識,為了簡化問題,本研究並不考慮說話內容,而單純就語氣和音調來辨識情緒。本研究採用個人簡報作為環境來收取自然聲音,並將情緒辨識集中在緊張情緒程度辨識上。在標記方面,本研究為了降低標記者的負擔,提出了以比較方式進行標記的解決方法。在辨識方面,本研究在各種的資料組合下比較一般傳統語音情緒辨識和MI-SVM的準確率。就結果方面,相較於傳統SVM辨識只有66%的準確率,本研究提出使用MI-SVM並獲得74%準確率。未來將以本研究為基石進而探討新的辨識方法的方向。 | zh_TW |
| dc.description.abstract | Because of the popularizing of smart carrying equipment, the chance of people to approach agents is increasing, and the issue of given agent emotion emerge from the enormous volume of increasing demand. In this research, we choose the the recognition of emotion in speech. For simplify the problem of recognition emotion, we ignore the content of speech, and simply recognizing emotion from the tone of speech. we choose the presentation environment as the research target to collect speech sound, and focus on the recognition of levels of nervous emotion. At the aspect of annotation, in order to reduce the burden of annotator, we propose a novel solution of annotation by comparing speech turns. At the aspect of recognition, we try a lot of combination of data type for comparing the recognition accuracy of traditional SVM method with the recognition accuracy of MI-SVM. As a result, the recognition accuracy of traditional SVM is only 66%, and the recognition accuracy of proposed MI-SVM is 74%. In the future, we will base on this research to find new way of recognition. | en |
| dc.description.provenance | Made available in DSpace on 2021-06-16T06:44:05Z (GMT). No. of bitstreams: 1 ntu-103-R99944023-1.pdf: 4027333 bytes, checksum: ee592a3dde6aa75063587d0db5075705 (MD5) Previous issue date: 2014 | en |
| dc.description.tableofcontents | Contents
List of Figures vii List of Tables ix Chapter 1 Introduction 1 1.1 Background and Motivation . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Thesis Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Chapter 2 Related Work 4 2.1 Emotion Analysis of Psychology . . . . . . . . . . . . . . . . . . . . . 4 2.2 Speech Emotion Recognition . . . . . . . . . . . . . . . . . . . . . . . 9 2.2.1 Recognition in Artificial Intelligence . . . . . . . . . . . . . . . 10 2.2.2 Acoustic Features . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.2.3 Learning Method . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.2.4 Corpus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 Chapter 3 Problem Definition 24 3.1 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.1.1 System Focus and Goal . . . . . . . . . . . . . . . . . . . . . . 25 3.2 Definition of Emotion . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.3 Challenge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.4 Proposed Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 Chapter 4 Experiment 30 4.1 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 4.1.1 Pre-processing . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.2 Data Annotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.2.1 Reference Point . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.2.2 Annotation System . . . . . . . . . . . . . . . . . . . . . . . . 35 4.3 Feature Extraction and Learning . . . . . . . . . . . . . . . . . . . . 36 4.4 Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 4.4.1 The Result of 0 versus 2 . . . . . . . . . . . . . . . . . . . . . 38 4.4.2 The Result of 0 versus 1+2 . . . . . . . . . . . . . . . . . . . 39 4.4.3 The Result of 0 versus 1 versus 2 . . . . . . . . . . . . . . . . 40 4.4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 Chapter 5 Conclusion 43 5.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 Bibliography 45 | |
| dc.language.iso | zh-TW | |
| dc.subject | 情緒辨識 | zh_TW |
| dc.subject | 自然語音 | zh_TW |
| dc.subject | 多實例學習法 | zh_TW |
| dc.subject | Emotion recognition | en |
| dc.subject | spontaneous speech | en |
| dc.subject | mutiple-instance learning | en |
| dc.title | 基於多實例學習法之自然語音情緒辨識 | zh_TW |
| dc.title | Emotion recognition of spontaneous speech using mutiple-instance learning | en |
| dc.type | Thesis | |
| dc.date.schoolyear | 102-2 | |
| dc.description.degree | 碩士 | |
| dc.contributor.oralexamcommittee | 蔡宗翰,林光龍,紀婉容 | |
| dc.subject.keyword | 多實例學習法,自然語音,情緒辨識, | zh_TW |
| dc.subject.keyword | Emotion recognition,spontaneous speech,mutiple-instance learning, | en |
| dc.relation.page | 49 | |
| dc.rights.note | 有償授權 | |
| dc.date.accepted | 2014-07-28 | |
| dc.contributor.author-college | 電機資訊學院 | zh_TW |
| dc.contributor.author-dept | 資訊網路與多媒體研究所 | zh_TW |
| 顯示於系所單位: | 資訊網路與多媒體研究所 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-103-1.pdf 未授權公開取用 | 3.93 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
