Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊網路與多媒體研究所
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/57388
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor許永真
dc.contributor.authorChing-Hsiu Huangen
dc.contributor.author黃慶修zh_TW
dc.date.accessioned2021-06-16T06:44:05Z-
dc.date.available2019-08-01
dc.date.copyright2014-08-01
dc.date.issued2014
dc.date.submitted2014-07-28
dc.identifier.citationBibliography
[1] http://www.speechworks.com/indexflash.cfm.
[2] S. Andrews, I. Tsochantaridis, and T. Hofmann. Support vector machines for
multiple-instance learning. In Advances in Neural Information Processing Systems
15, pages 561–568. MIT Press, 2003.
[3] A. Batliner, D. Seppi, S. Steidl, and B. Schuller. Segmenting into adequate units
for automatic recognition of emotion-related episodes: A speech-based approach.
Advances in HumanComputer Interaction, 2010:1–16, 2010.
[4] F. Burkhardt, A. Paeschke, M. Rolfes, W. F. Sendlmeier, and B. Weiss. A database
of german emotional speech. In Interspeech, volume 5, pages 1517–1520, 2005.
[5] R. Cowie and R. R. Cornelius. Describing the emotional states that are expressed in
speech. Speech Commun., 40(1-2):5–32, Apr. 2003.
[6] R. Cowie, E. Douglas-Cowie, N. Tsapatsoulis, G. Votsis, S. Kollias, W. Fellenz, and
J. Taylor. Emotion recognition in human-computer interaction. Signal Processing
Magazine, IEEE, 18(1):32 –80, jan 2001.
[7] J. R. Davitz and L. J. Davitz. The communication of feelings by content-free speech.
Journal of Communication, 9(1):6–13, 1959.
[8] T. G. Dietterich, R. H. Lathrop, and T. Lozano-P´erez. Solving the multiple instance
problem with axis-parallel rectangles. Artificial intelligence, 89(1):31–71, 1997.
[9] P. Ekman. Expression and the nature of emotion, volume 319-344, pages 319–343.
Erlbaum, 1984.
[10] A. V. Engberg, I. S. ;Hansen. Documentation of the danish emotional speech
database. 1996.
[11] I. S. Engberg, A. V. Hansen, O. Andersen, and P. Dalsgaard. Design, recording and
verification of a danish emotional speech database. In Eurospeech, 1997.
[12] F. Eyben, A. Batliner, B. Schuller, D. Seppi, and S. Steidl. Cross-corpus classification
of realistic emotions–some pilot experiments. In Proc. 3rd International Workshop
on EMOTION (satellite of LREC): Corpora for Research on Emotion and Affect,
Valetta, pages 77–82, 2010.
[13] F. Eyben, M. W¨ollmer, and B. Schuller. Opensmile: the munich versatile and fast
open-source audio feature extractor. In Proceedings of the international conference
on Multimedia, pages 1459–1462. ACM, 2010.
[14] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten.
The weka data mining software: an update. ACM SIGKDD explorations newsletter,
11(1):10–18, 2009.
[15] J. H. Hansen, S. E. Bou-Ghazale, R. Sarikaya, and B. Pellom. Getting started with
susas: a speech under simulated and actual stress database. In EUROSPEECH,
volume 97, pages 1743–46, 1997.
[16] J.-S. R. Jang. Audio signal processing and recognition. available at the links for
on-line courses at the author’s homepage at http://www.cs.nthu.edu.tw/ jang.
[17] P. N. Juslin and P. Laukka. Communication of emotions in vocal expression and music
performance: Different channels, same code? Psychological bulletin, 129(5):770, 2003.
[18] P. N. Juslin and K. R. Scherer. Vocal expression of affect. The new handbook of
methods in nonverbal behavior research, pages 65–135, 2005.
[19] M. Knapp, J. Hall, and T. Horgan. Nonverbal communication in human interaction.
Cengage Learning, 2013.
[20] S. M. Kosslyn. Psychology: the Brain, the Person, the World, chapter 10, pages
390–435. Person Education, Inc, 2 edition, 2004.
[21] C. M. Lee, S. S. Narayanan, and R. Pieraccini. Combining acoustic and language
information for emotion recognition. In INTERSPEECH, 2002.
[22] D. Liu, L. Lu, and H. Zhang. Automatic mood detection from acoustic music data.
In ISMIR, 2003.
[23] D. Morrison, R. Wang, and L. C. De Silva. Ensemble methods for spoken emotion
recognition in call-centres. Speech communication, 49(2):98–112, 2007.
[24] A. Ortony and T. J. Turner. Whats basic about basic emotions? Psychological
Review, 97(3):315–331, 1990.
[25] C. E. Osgood, G. J. Suci, and P. H. Tannenbaum. The measurement of meaning,
volume 49. University of Illinois Press, 1957.
[26] W. G. Parrott. Emotions in social psychology: Essential readings. Psychology Press,
2001.
[27] W. Picard and J. A. Healey. Wearable and automotive systems for affect recognition
from physiology. 2000.
[28] J. Pittam and K. R. Scherer. Vocal expression and communication of emotion. Handbook
of emotions, pages 185–197, 1993.
[29] H. Prendinger, C. Becker, and M. Ishizuka. A study in users’physiological response to
an empathic interface agent. International Journal of Humanoid Robotics, 3(03):371–
391, 2006.
[30] P. L. Rogers, K. R. Schererf, and R. Rosenthal. Content filtering human speech: A
simple electronic system. Behavior Research Methods & Instrumentation, 3(1):16–18,
1971.
[31] D. Roy and A. Pentland. Automatic spoken affect analysis and classification. In Automatic
Face and Gesture Recognition, 1996., Proceedings of the Second International
Conference on, pages 363–367. IEEE, 1996.
[32] J. A. Russell. Affective space is bipolar. Journal of Personality and Social Psychology,
37(3):345–356, 1979.
[33] K. R. Scherer. Acoustic concomitants of emotional dimensions: Judging affect from
synthesized tone sequences. In S. Weitz, editor, Nonverbal communication: Readings
with commentary, pages 249–253. Oxford University Press, 1972.
[34] B. Schuller. Towards intuitive speech interaction by the integration of emotional
aspects. In Systems, Man and Cybernetics, 2002 IEEE International Conference on,
volume 6, pages 6–pp. IEEE, 2002.
[35] B. Schuller, D. Arsic, F. Wallhoff, and G. Rigoll. Emotion recognition in the noise
applying large acoustic feature sets. Speech Prosody, Dresden, pages 276–289, 2006.
[36] B. Schuller, A. Batliner, S. Steidl, and D. Seppi. Recognising realistic emotions and
affect in speech: State of the art and lessons learnt from the first challenge. Speech
Commun., 53(9-10):1062–1087, Nov. 2011.
[37] B. Schuller and G. Rigoll. Recognising interest in conversational speech - comparing
bag of frames and supra-segmental features. In INTERSPEECH’09, pages 1999–2002,
2009.
[38] B. Schuller, G. Rigoll, and M. Lang. Speech emotion recognition combining acoustic
features and linguistic information in a hybrid support vector machine-belief network
architecture. In Acoustics, Speech, and Signal Processing, 2004. Proceedings.
(ICASSP ’04). IEEE International Conference on, volume 1, pages I – 577–80 vol.1,
may 2004.
[39] B. Schuller, S. Steidl, and A. Batliner. The interspeech 2009 emotion challenge. In
INTERSPEECH, pages 312–315. Citeseer, 2009.
[40] S. Steidl. Automatic classification of emotion-related user states in spontaneous children’s
speech. University of Erlangen-Nuremberg Germany, 2009.
[41] E. Velten Jr. A laboratory task for induction of mood states. Behaviour research and
therapy, 6(4):473–482, 1968.
[42] T. Vogt, E. Andr, and J. Wagner. Automatic recognition of emotions from speech: a
review of the literature and recommendations for practical realisation. In In LNCS
4868, pages 75–91, 2008.
[43] T. Vogt and E. Andre. Comparing feature sets for acted and spontaneous speech in
view of automatic emotion recognition. In Multimedia and Expo, 2005. ICME 2005.
IEEE International Conference on, pages 474 –477, july 2005.
[44] C. M. Whissell. The dictionary of affect in language, volume 4, pages 113–131.
Academic Press, 1989.
[45] J. Wilting, E. Krahmer, and M. Swerts. Real vs. acted emotional speech. In Proceedings
of the International Conference on Spoken Language Processing (Interspeech
2006), Pittsburgh, Pa, USA, September 2006 . - 2006, pages 805–808, 2006.
[46] J. Yang. Review of multi-instance learning and its applications. Technical report,
Tech. Rep, 2005.
[47] X. Zhe and A. Boucouvalas. Text-to-emotion engine for real time internet communication.
In Proceedings of International Symposium on Communication Systems,
Networks and DSPs, pages 164–168, 2002.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/57388-
dc.description.abstract近年來因智慧型攜帶裝置逐漸普及,造就人們接觸智慧型人機介面的機會越來越多,但其情緒議題一直以來都被漠視,於是,擁有情緒的智慧型人機介面這課題就在人們快速膨大的需求中浮出。在心理學領域長期的研究下,關於情緒的研究已經有大量的資料可供參考。
近十幾年來由於情緒計算(Affective computing)的發展,在資訊工程領域方面也漸漸的累積了許多研究。以前人的研究做為基石,本研究選擇進行聲音方面的情緒辨識,為了簡化問題,本研究並不考慮說話內容,而單純就語氣和音調來辨識情緒。本研究採用個人簡報作為環境來收取自然聲音,並將情緒辨識集中在緊張情緒程度辨識上。在標記方面,本研究為了降低標記者的負擔,提出了以比較方式進行標記的解決方法。在辨識方面,本研究在各種的資料組合下比較一般傳統語音情緒辨識和MI-SVM的準確率。就結果方面,相較於傳統SVM辨識只有66%的準確率,本研究提出使用MI-SVM並獲得74%準確率。未來將以本研究為基石進而探討新的辨識方法的方向。
zh_TW
dc.description.abstractBecause of the popularizing of smart carrying equipment, the chance of people to approach agents is increasing, and the issue of given agent emotion emerge from the enormous volume of increasing demand. In this research, we choose the the recognition of emotion in speech. For simplify the problem of recognition emotion, we ignore the content of speech, and simply recognizing emotion from the tone of speech. we choose the presentation environment as the research target to collect speech sound, and focus on the recognition of levels of nervous emotion. At the aspect of annotation, in order to reduce the burden of annotator, we propose a novel solution of annotation by comparing speech turns. At the aspect of recognition, we try a lot of combination of data type for comparing the recognition accuracy of traditional SVM method with the recognition accuracy of MI-SVM. As a result, the recognition accuracy of traditional SVM is only 66%, and the recognition accuracy of proposed MI-SVM is 74%. In the future, we will base on this research to find new way of recognition.en
dc.description.provenanceMade available in DSpace on 2021-06-16T06:44:05Z (GMT). No. of bitstreams: 1
ntu-103-R99944023-1.pdf: 4027333 bytes, checksum: ee592a3dde6aa75063587d0db5075705 (MD5)
Previous issue date: 2014
en
dc.description.tableofcontentsContents
List of Figures vii
List of Tables ix
Chapter 1 Introduction 1
1.1 Background and Motivation . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Thesis Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Chapter 2 Related Work 4
2.1 Emotion Analysis of Psychology . . . . . . . . . . . . . . . . . . . . . 4
2.2 Speech Emotion Recognition . . . . . . . . . . . . . . . . . . . . . . . 9
2.2.1 Recognition in Artificial Intelligence . . . . . . . . . . . . . . . 10
2.2.2 Acoustic Features . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2.3 Learning Method . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2.4 Corpus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Chapter 3 Problem Definition 24
3.1 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.1.1 System Focus and Goal . . . . . . . . . . . . . . . . . . . . . . 25
3.2 Definition of Emotion . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.3 Challenge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.4 Proposed Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Chapter 4 Experiment 30
4.1 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.1.1 Pre-processing . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.2 Data Annotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.2.1 Reference Point . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.2.2 Annotation System . . . . . . . . . . . . . . . . . . . . . . . . 35
4.3 Feature Extraction and Learning . . . . . . . . . . . . . . . . . . . . 36
4.4 Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.4.1 The Result of 0 versus 2 . . . . . . . . . . . . . . . . . . . . . 38
4.4.2 The Result of 0 versus 1+2 . . . . . . . . . . . . . . . . . . . 39
4.4.3 The Result of 0 versus 1 versus 2 . . . . . . . . . . . . . . . . 40
4.4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Chapter 5 Conclusion 43
5.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Bibliography 45
dc.language.isozh-TW
dc.subject情緒辨識zh_TW
dc.subject自然語音zh_TW
dc.subject多實例學習法zh_TW
dc.subjectEmotion recognitionen
dc.subjectspontaneous speechen
dc.subjectmutiple-instance learningen
dc.title基於多實例學習法之自然語音情緒辨識zh_TW
dc.titleEmotion recognition of spontaneous speech using mutiple-instance learningen
dc.typeThesis
dc.date.schoolyear102-2
dc.description.degree碩士
dc.contributor.oralexamcommittee蔡宗翰,林光龍,紀婉容
dc.subject.keyword多實例學習法,自然語音,情緒辨識,zh_TW
dc.subject.keywordEmotion recognition,spontaneous speech,mutiple-instance learning,en
dc.relation.page49
dc.rights.note有償授權
dc.date.accepted2014-07-28
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept資訊網路與多媒體研究所zh_TW
顯示於系所單位:資訊網路與多媒體研究所

文件中的檔案:
檔案 大小格式 
ntu-103-1.pdf
  未授權公開取用
3.93 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved