請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/7896
完整後設資料紀錄
DC 欄位 | 值 | 語言 |
---|---|---|
dc.contributor.advisor | 黃漢邦(Han-Pang Huang) | |
dc.contributor.author | Shang-Ting Li | en |
dc.contributor.author | 李尚庭 | zh_TW |
dc.date.accessioned | 2021-05-19T17:57:43Z | - |
dc.date.available | 2021-08-24 | |
dc.date.available | 2021-05-19T17:57:43Z | - |
dc.date.copyright | 2016-08-24 | |
dc.date.issued | 2016 | |
dc.date.submitted | 2016-08-12 | |
dc.identifier.citation | [1] Mel Frequency Cepstral Coefficients. Retrieved June, 2016, from http://www.practicalcryptography.com/miscellaneous/machine-learning/guide-mel-frequency-cepstral-coefficients-mfccs/
[2] Microsoft Kinect. Retrieved June, 2016, from https://developer.microsoft.com/en-us/windows/kinect/develop [3] Opencv. Retrieved June, 2016, from http://opencv.org [4] Opengl. Retrieved June, 2016, from http://www.opengl.org/ [5] J. Ahlberg, 'Candide-3-an Updated Parameterised Face,' Dept. of Electrical Engineering Technical report,No. LiTH-ISY-R-2326, 2001. [6] P. S. Aleksic and A. K. Katsaggelos, 'Automatic Facial Expression Recognition Using Facial Animation Parameters and Multistream Hmms,' IEEE Transactions on Information Forensics and Security, Vol. 1, No. 1, pp. 3-11, 2006. [7] M. Argyle, Bodily Communication, 2nd Edition, New York: Methuen & Co. Ltd, 1988. [8] A. Aristidou, P. Charalambous, and Y. Chrysanthou, 'Emotion Analysis and Classification: Understanding the Performers' Emotions Using the Lma Entities,' Computer Graphics Forum, Vol. 34, No. 6, pp. 262-276, 2015. [9] M. S. Bartlett, G. Littlewort, I. Fasel, and J. R. Movellan, 'Real Time Face Detection and Facial Expression Recognition: Development and Applications to Human Computer Interaction,' Proceeding of Conference on Computer Vision and Pattern Recognition Workshop, Madison, Wisconsin, USA, Vol. 5, pp. 53-53, 2003. [10] J. Bekios-Calfa, J. M. Buenaposada, and L. Baumela, 'Robust Gender Recognition by Exploiting Facial Attributes Dependencies,' Pattern Recognition Letters, Vol. 36, pp. 228-234, Jan 15 2014. [11] M. Boden, 'A Guide to Recurrent Neural Networks and Backpropagation,' The DALLAS Project Report,Sweden, 2002. [12] A. Camurri, I. Lagerlöf, and G. Volpe, 'Recognizing Emotion from Dance Movement: Comparison of Spectator Recognition and Automated Techniques,' International Journal of Human-Computer Studies, Vol. 59, No. 1, pp. 213-225, 2003. [13] B. Chen, S. Chen, and J. Feng, 'A Study of Multisensor Information Fusion in Welding Process by Using Fuzzy Integral Method,' The International Journal of Advanced Manufacturing Technology, Vol. 74, No. 1, pp. 413-422, 2014. [14] Y. Cheon and D. Kim, 'Natural Facial Expression Recognition Using Differential-Aam and Manifold Learning,' Pattern Recognition, Vol. 42, No. 7, pp. 1340-1350, 2009. [15] S. B. Cho and J. H. Kim, 'Combining Multiple Neural Networks by Fuzzy Integral for Robust Classification,' IEEE Transactions on Systems, Man and Cybernetics, Vol. 25, No. 2, pp. 380-384, 1995. [16] S. Y. Chung, Spatial Understanding Amd Motion Planning for a Mobile Robot, Doctoral Dissertation, Department of Mechanical Engineering, National Taiwan University, 2010. [17] A. Corradini, M. Mehta, N. O. Bernsen, J. Martin, and S. Abrilian, 'Multimodal Input Fusion in Human-Computer Interaction,' NATO Science Series Sub Series III Computer and Systems Sciences, Vol. 198, pp. 223-234, 2005. [18] M. Coulson, 'Attributing Emotion to Static Body Postures: Recognition Accuracy, Confusions, and Viewpoint Dependence,' Journal of Nonverbal Behavior, Vol. 28, No. 2, pp. 117-139, 2004. [19] H. Cruse, Neural Networks as Cybernetic Systems, 2nd Edition, Germany Brains, Minds & Media, 2006. [20] G. E. H. a. R. J. W. D. E. Rumelhart, 'Learning Internal Representations by Error Propagation,' Parallel Distrib. Process., Vol. 1, pp. 318-362, 1985. [21] D. Datcu and L. J. Rothkrantz, Emotion Recognition: A Pattern Analysis Approach hoboken, new jersey: John Wiley & Sons, 2014. [22] S. B. Davis and P. Mermelstein, 'Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences,' IEEE Transactions on Acoustics, Speech and Signal Vol. 28, No. 4, pp. 357-366, 1980. [23] B. De Gelder, 'Towards the Neurobiology of Emotional Body Language,' Nature Reviews Neuroscience, Vol. 7, No. 3, pp. 242-249, 2006. [24] M. Del Giudice and L. Colle, 'Differences between Children and Adults in the Recognition of Enjoyment Smiles,' Developmental Psychology, Vol. 43, No. 3, pp. 796-803, 2007. [25] L. Devillers and L. Vidrascu, 'Real-Life Emotions Detection with Lexical and Paralinguistic Cues on Human-Human Call Center Dialogs,' Proceeding of Interspeech, Pittsburgh, pp. 801-804, 2006. [26] G. J. Edwards, C. J. Taylor, and T. F. Cootes, 'Interpreting Face Images Using Active Appearance Models,' IEEE International Conference on Automatic Face and Gesture Recognition, Nara, Japan, pp. 300-305, 1998. [27] P. Ekman and W. Friesen, 'Facial Action Coding System: A Technique for the Measurement of Facial Movement,' Consulting Psychologists, San Francisco,1978. [28] M. El Ayadi, M. S. Kamel, and F. Karray, 'Survey on Speech Emotion Recognition: Features, Classification Schemes, and Databases,' Pattern Recognition, Vol. 44, No. 3, pp. 572-587, 2011. [29] J. L. Elman, 'Finding Structure in Time,' Cognitive Science, Vol. 14, No. 2, pp. 179-211, Apr-Jun 1990. [30] P. E. a. W. V. Friesen, 'Constants across Cultures in the Face and Emotion,' Journal of Personality and Social Psychology, Vol. 17, No. 2, pp. 124-129, Feb 1971. [31] A. Gera and A. Bhattacharya, 'Emotion Recognition from Audio and Visual Data Using F-Score Based Fusion,' Proceeding of 1st IKDD Conference on Data Sciences, Delhi, India, pp. 1-10, 2014. [32] D. Glowinski, A. Camurri, G. Volpe, N. Dael, and K. Scherer, 'Technique for Automatic Emotion Recognition by Body Gesture Analysis,' Proceeding of IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Anchorage, AK, pp. 1-6, 2008. [33] M. A. Goodrich and A. C. Schultz, 'Human-Robot Interaction: A Survey,' Foundations and Trends in Human-Computer Interaction, Vol. 1, No. 3, pp. 203-275, 2007. [34] H. Gunes and M. Piccardi, 'Bi-Modal Emotion Recognition from Expressive Face and Body Gestures,' Journal of Network and Computer Applications, Vol. 30, No. 4, pp. 1334-1345, 2007. [35] E. Hall, The Hidden Dimension: Man’s Use of Space in Public and Private, London: The Bodley Head Ltd, 1966. [36] S. Hoch, F. Althoff, G. McGlaun, and G. Rigoll, 'Bimodal Fusion of Emotional Data in an Automotive Environment,' Proceeding of IEEE International Conference on Acoustics, Speech, and Signal Processing, Philadephia PA, USA, Vol. 2, pp. ii/1085-ii/1088 Vol. 2, 2005. [37] J. M. Ilya Sutskever, George Dahl and Geoffrey Hinton, 'On the Importance of Initialization and Momentum in Deep Learning ' Proceeding of Proc. of the 30th International Conference on Machine Learning, Atlanta, Georgia, USA, Vol. 28, pp. 1139-1147, 2013. [38] A. Kleinsmith and N. Bianchi-Berthouze, 'Affective Body Expression Perception and Recognition: A Survey,' IEEE Transactions on Affective Computing, Vol. 4, No. 1, pp. 15-33, 2013. [39] N. Kumar, A. C. Berg, P. N. Belhumeur, and S. K. Nayar, 'Describable Visual Attributes for Face Verification and Image Search,' IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 33, No. 10, pp. 1962-1977, Oct 2011. [40] O. W. Kwon, K. Chan, J. Hao, and T. W. Lee, 'Emotion Recognition by Speech Signals,' Proceeding of European Conference on Speech Communication and Technology, Geneva, Switzerland, pp. 125-128, 2003. [41] R. Laban and L. Ullmann, The Mastery of Movement, 3rd Edition, USA.: Macdonald & Evans Ltd., 1971. [42] A. C. Lints‐Martindale, T. Hadjistavropoulos, B. Barber, and S. J. Gibson, 'A Psychophysical Investigation of the Facial Action Coding System as an Index of Pain Variability among Older Adults with and without Alzheimer’s Disease,' Pain Medicine, Vol. 8, No. 8, pp. 678-689, 2007. [43] P. Lucey, J. F. Cohn, T. Kanade, J. Saragih, Z. Ambadar, and I. Matthews, 'The Extended Cohn-Kanade Dataset (Ck+): A Complete Dataset for Action Unit and Emotion-Specified Expression,' Proceeding of IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, San Francisco, CA, pp. 94-101, 2010. [44] S. Q. Ma, F. C. Chen, Q. Wang, and Z. Q. Zhao, 'Sugeno Type Fuzzy Complex-Value Integral and Its Application in Classification,' Procedia Engineering, Vol. 29, pp. 4140-4151, 2012. [45] S. Mitsuyoshi, F. Ren, Y. Tanaka, and S. Kuroiwa, 'Non-Verbal Voice Emotion Analysis System,' International Journal of Innovative Computing Information and Control, Vol. 2, No. 4, pp. 819-830, Aug 2006. [46] M. C. Mozer, 'A Focused Back-Propagation Algorithm for Temporal Pattern Recognition,' Complex Systems, Vol. 3, pp. 349-381, 1989. [47] Y. Nesterov, 'A Method of Solving a Convex Programming Problem with Convergence Rate O (1/K2),' Soviet Mathematics Doklady, Vol. 27, No. 2, pp. 372-376, 1983. [48] R. Pascanu, T. Mikolov, and Y. Bengio, 'On the Difficulty of Training Recurrent Neural Networks,' Proceeding of International Machine Learning Conference, Atlanta, USA, pp. 1310-1318, 2013. [49] R. W. Picard and R. Picard, Affective Computing vol. 252, Cambridge MA: The MIT press, 1997. [50] B. T. Polyak, 'Some Methods of Speeding up the Convergence of Iteration Methods,' USSR Computational Mathematics and Mathematical Physics, Vol. 4, No. 5, pp. 1-17, 1964. [51] S. Poria, E. Cambria, A. Hussain, and G. B. Huang, 'Towards an Intelligent Framework for Multimodal Affective Data Analysis,' Neural Networks, Vol. 63, pp. 104-116, 2015. [52] J. A. Prado, C. Simplício, N. F. Lori, and J. Dias, 'Visuo-Auditory Multimodal Emotional Structure to Improve Human-Robot-Interaction,' International Journal of Social Robotics, Vol. 4, No. 1, pp. 29-51, 2012. [53] L. I. Reed, M. A. Sayette, and J. F. Cohn, 'Impact of Depression on Response to Comedy: A Dynamic Facial Coding Analysis,' Journal of Abnormal Psychology, Vol. 116, No. 4, pp. 804-809, 2007. [54] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, 'Learning Representations by Back-Propagating Errors,' Nature, Vol. 323, No. 6088, pp. 533-536, Oct 9 1986. [55] K. R. Scherer, 'Vocal Affect Expression: A Review and a Model for Future Research,' Psychological Bulletin, Vol. 99, No. 2, pp. 143-165, 1986. [56] K. R. Scherer, 'Vocal Communication of Emotion: A Review of Research Paradigms,' Speech Communication, Vol. 40, No. 1, pp. 227-256, 2003. [57] B. Schuller, A. Batliner, S. Steidl, and D. Seppi, 'Recognising Realistic Emotions and Affect in Speech: State of the Art and Lessons Learnt from the First Challenge,' Speech Communication, Vol. 53, No. 9, pp. 1062-1087, 2011. [58] M. Schuster and K. K. Paliwal, 'Bidirectional Recurrent Neural Networks,' IEEE Transactions on Signal Processing, Vol. 45, No. 11, pp. 2673-2681, Nov 1997. [59] M. Sugeno, Theory of Fuzzy Integrals and Its Applications, Doctoral Thesis, Tokyo Institute of Technology, 1974. [60] M. Turk, 'Multimodal Human-Computer Interaction,' in Real-Time Vision for Human-Computer Interaction Berlin: Springer Science & Business Media, 2005 pp. 269-283. [61] M. Valstar and M. Pantic, 'Fully Automatic Facial Action Unit Detection and Temporal Analysis,' Proceeding of Conference on Computer Vision and Pattern Recognition Workshop, New York, USA, pp. 149-149, 2006. [62] A. Vinciarelli, M. Pantic, and H. Bourlard, 'Social Signal Processing: Survey of an Emerging Domain,' Image and Vision Computing, Vol. 27, No. 12, pp. 1743-1759, 2009. [63] H. G. Wallbott, 'Bodily Expression of Emotion,' European Journal of Social Psychology, Vol. 28, No. 6, pp. 879-896, 1998. [64] B. M. Waller, M. Lembeck, P. Kuchenbuch, A. M. Burrows, and K. Liebal, 'Gibbonfacs: A Muscle-Based Facial Movement Coding System for Hylobatids,' International Journal of Primatology, Vol. 33, No. 4, pp. 809-821, 2012. [65] Y. Wang and L. Guan, 'Recognizing Human Emotion from Audiovisual Information,' Proceeding of IEEE International Conference on Acoustics, Speech, and Signal, Philadelphia, PA, USA, Vol. 2, pp. 1125-1128, 2005. [66] P. J. Werbos, 'Backpropagation through Time - What It Does and How to Do It,' Proceeding of the IEEE, Vol. 78, No. 10, pp. 1550-1560, Oct 1990. [67] P. J. Werbos, 'Generalization of Backpropagation with Application to a Recurrent Gas Market Model,' Neural Networks, Vol. 1, No. 4, pp. 339-356, 1988. [68] C. H. Wu, J. C. Lin, W. L. Wei, and K. C. Cheng, 'Emotion Recognition from Multi-Modal Information,' Proceeding of Signal and Information Processing Association Annual Summit and Conference, Kaohsiung, pp. 1-8, 2013. [69] Q. H. Xing, L. Xia, F. X. Liu, and C. A. Shang, 'Multi-Sensor Intelligent Information Fusion Based on Fuzzy Integral with Applications in Target Recognition,' International Conference AICI, Chengdu, China, Vol. 315, pp. 463-470, 2012. [70] H. Zacharatos, C. Gatzoulis, and Y. L. Chrysanthou, 'Automatic Emotion Recognition Based on Body Movement Analysis: A Survey,' IEEE Comput Graph Appl, Vol. 34, No. 6, pp. 35-45, Nov-Dec 2014. [71] Z. Zeng, M. Pantic, G. I. Roisman, and T. S. Huang, 'A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions,' IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 31, No. 1, pp. 39-58, 2009. [72] Y. Zhang and L. Zhang, 'Semi-Feature Level Fusion for Bimodal Affect Regression Based on Facial and Bodily Expressions,' Proceeding of International Conference on Autonomous Agents and Multiagent Systems, Istanbul Congress Center, Turkey, pp. 1557-1565, 2015. [73] Z. Zhang, M. Lyons, M. Schuster, and S. Akamatsu, 'Comparison between Geometry-Based and Gabor-Wavelets-Based Facial Expression Recognition Using Multi-Layer Perceptron,' Proceeding of IEEE International Conference on Automatic Face and Gesture Recognition, Nara, Japan, pp. 454-459, 1998. | |
dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/7896 | - |
dc.description.abstract | 隨著機器人領域的蓬勃發展,許多產業已經引入機器人協助人類工作,未來機器人與人類共存將會是重要課題。然而,在工業環境下機器人的指令與環境複雜性都較為簡單,倘若是在居家環境及公共場合,服務型機器人必須學習與理解人類的社會行為才能融入人類社會並給予協助。因此,為了達到機器人與人類之間的自然互動與和諧的共存,人機互動便是此發展重點。本論文致力於結合臉部表情、肢體動作、聲音進行多模情緒辨識,利用遞迴歸類神經網路及模糊積分進行學習並整合預測人類情緒來提高機器人對人類行為認知能力。情緒屬於高階認知並影響人類的行為與決策,我們提出的多模情緒辨識系統使機器人能夠穩健的理解人類情緒反應,機器人能夠擁有充足的資訊去因應環境變化,使用時間序列資料進行學習並動態調整進行決策,讓機器人不單只是擁有情緒辨識更有像人的認知能力。 | zh_TW |
dc.description.abstract | Robotics has seen much development over the last few decades. The way in which robots live with humans has become an important issue. Robots need to understand human social cues and rules to correctly interact with human in home and public environments. Therefore, in order to a reach natural and harmonious interaction between humans and robots, the human-robot interaction is a key issue. This thesis integrates facial expression, body movement and speech tone to conduct multi-modal emotion recognition. We use recurrent neural network for learning model and fuzzy integral for multi-modal fusion to enhance the cognitive ability of robots to understand human behaviors and emotion. Emotion is a kind of high-level cognition which heavily affect humans’ behaviors and decision. We propose the multi-modal emotion recognition system allowing robots to predict emotion robustly. Multi-information provides more complete information for emotion recognition and deals with different environments. Each uni-model is trained by time sequence data and fused by dynamic adjustment. In terms of the proposed method, robots not only have the ability to predict emotion, but also have the human-like cognitive ability. | en |
dc.description.provenance | Made available in DSpace on 2021-05-19T17:57:43Z (GMT). No. of bitstreams: 1 ntu-105-R03522805-1.pdf: 6193891 bytes, checksum: e95c5d480d8478c07a8f3a9d2e99f222 (MD5) Previous issue date: 2016 | en |
dc.description.tableofcontents | 誌謝 vi
摘要 viii Abstract x List of Tables xiv List of Figures xvi Chapter 1 Introduction 1 1.1 Coexistence of Robots and Humans 1 1.2 Motivations 2 1.3 Objectives and Contributions 4 1.4 Organization of the Thesis 5 Chapter 2 Emotion Recognition System 7 2.1 Introduction 7 2.1.1 Emotion and Human-Robot Interaction 8 2.2 Facial Modality 10 2.2.1 CANDIDE-3 12 2.2.2 Pre-processing and Feature Extraction 17 2.3 Vocal Modality 19 2.3.1 Mel Frequency Cepstral Coefficients (MFCCs) 20 2.3.2 Pre-processing and Feature Extraction 23 2.4 Bodily Modality 26 2.4.1 Laban Movement Analysis (LMA) 27 2.4.2 Pre-processing and Feature Extraction 29 Chapter 3 Multi-Modal Emotion Recognition and Fusion 31 3.1 Introduction 31 3.2 Emotion Recognition Structure 32 3.3 Recognition Algorithm 34 3.3.1 Recurrent Neural Network (RNN) 34 3.3.2 Nesterov’s Accelerated Gradient (NAG) 38 3.3.3 Learning Algorithm 40 3.4 Multi-Modal Fusion Method 41 3.4.1 Fuzzy Integral (FI) 41 3.4.2 Fusion Strategy 44 Chapter 4 Experiments 47 4.1 Software Platform 48 4.2 Hardware Platform 49 4.3 Method and Algorithm 52 4.3.1 Data Collection 53 4.3.2 Training Setup 55 4.4 Experimental Results 56 4.4.1 Offline Testing: Facial Emotion Identification 56 4.4.2 Offline Testing: Vocal Emotion Identification 60 4.4.3 Offline Testing: Bodily Emotion Identification 64 4.4.4 Online Testing: Multi-Modal Emotion Identification 67 4.4.5 Robot Service in Real World 72 Chapter 5 Conclusions and Future Works 81 5.1 Conclusions 81 5.2 Future Works 82 References 83 | |
dc.language.iso | en | |
dc.title | 基於多模情緒辨識之人機互動 | zh_TW |
dc.title | Multi-Modal Emotion Recognition for Human-Robot Interaction | en |
dc.type | Thesis | |
dc.date.schoolyear | 104-2 | |
dc.description.degree | 碩士 | |
dc.contributor.oralexamcommittee | 林沛群,趙儀珊,黃心健,林顯達 | |
dc.subject.keyword | 遞迴歸類神經網路,模糊積分,情緒辨識,多模型整合,輪型機器人,人機互動,Kinect, | zh_TW |
dc.subject.keyword | Recurrent Neural Network,Fuzzy Integral,Emotion Recognition,Multi-modal Fusion,Mobile Robots,Human-robot Interaction,Kinect, | en |
dc.relation.page | 87 | |
dc.identifier.doi | 10.6342/NTU201602439 | |
dc.rights.note | 同意授權(全球公開) | |
dc.date.accepted | 2016-08-14 | |
dc.contributor.author-college | 工學院 | zh_TW |
dc.contributor.author-dept | 機械工程學研究所 | zh_TW |
顯示於系所單位: | 機械工程學系 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-105-1.pdf | 6.05 MB | Adobe PDF | 檢視/開啟 |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。