Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 工學院
  3. 機械工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/86414
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor黃漢邦zh_TW
dc.contributor.advisorHan-Pang Huangen
dc.contributor.author楊博淋zh_TW
dc.contributor.authorPo-Lin Yangen
dc.date.accessioned2023-03-19T23:54:26Z-
dc.date.available2023-12-26-
dc.date.copyright2022-09-12-
dc.date.issued2022-
dc.date.submitted2002-01-01-
dc.identifier.citation[1] "Mel Frequency Cepstral Coefficients." http://www.practicalcryptography.com/miscellaneous/machine-learning/guide-mel-frequency-cepstral-coefficients-mfccs/ (accessed June, 2022).
[2] M. Abdullah, M. Ahmad, and D. Han, "Facial Expression Recognition in Videos: An CNN-LSTM Based Model for Video Classification," 2020 International Conference on Electronics, Information, and Communication (ICEIC), Barcelona, Spain, pp. 1-3, 2020.
[3] C. T. Allen, K. A. Machleit, and S. S. Marine, "On Assessing the Emotionality of Advertising Via Izard's Differential Emotions Scale," ACR North American Advances, Vol. 15, No. 1, pp. 226-231, 1988.
[4] J. L. Ba, J. R. Kiros, and G. E. Hinton, "Layer Normalization," arXiv preprint arXiv:1607.06450, 2016.
[5] E. Bagheri, A. Bagheri, P. G. Esteban, and B. Vanderborgth, "A Novel Model for Emotion Detection from Facial Muscles Activity," Robot 2019: Fourth Iberian Robotics Conference, Porto, Portugal, pp. 237-249, 2020.
[6] D. Bahdanau, K. Cho, and Y. Bengio, "Neural Machine Translation by Jointly Learning to Align and Translate," arXiv preprint arXiv:1409.0473, 2014.
[7] S. Borson, J. Scanlan, M. Brush, P. Vitaliano, and A. Dokmak, "The Mini‐Cog: A Cognitive ‘Vital Signs’ Measure for Dementia Screening in Multi‐Lingual Elderly," International Journal of Geriatric Psychiatry, Vol. 15, No. 11, pp. 1021-1027, 2000.
[8] C. Breazeal and R. Brooks, Robot Emotion: A Functional Perspective, 1st Edition. Oxford Scholarship Online, pp. 271-310, 2005.
[9] Z. Bu, A. Huang, M. Xue, Q. Li, Y. Bai, and G. Xu, "Cognitive Frailty as a Predictor of Adverse Outcomes among Older Adults: A Systematic Review and Meta‐Analysis," Brain and Behavior, Vol. 11, No. 1, pp. 1-14, 2021.
[10] A. Buchman and D. A. Bennett, "Cognitive Frailty," The Journal of Nutrition, Health & Aging, Vol. 17, No. 9, pp. 738-739, 2013.
[11] A. Bulat and G. Tzimiropoulos, "How Far Are We from Solving the 2d & 3d Face Alignment Problem?(and a Dataset of 230,000 3d Facial Landmarks)," Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, pp. 1021-1030, 2017.
[12] A. Bulat and G. Tzimiropoulos, "Human Pose Estimation Via Convolutional Part Heatmap Regression," Computer Vision – ECCV 2016, Amsterdam, The Netherlands, pp. 717-732, 2016.
[13] H. Burnham and E. Hogervorst, "Recognition of Facial Expressions of Emotion by Patients with Dementia of the Alzheimer Type," Dementia and Geriatric Cognitive Disorders, Vol. 18, No. 1, pp. 75-79, 2004.
[14] C. Busso, Z. Deng, S. Yildirim, M. Bulut, C. M. Lee, A. Kazemzadeh, S. Lee, U. Neumann, and S. Narayanan, "Analysis of Emotion Recognition Using Facial Expressions, Speech and Multimodal Information," Proceedings of the 6th International Conference on Multimodal Interfaces, State College, PA, USA, pp. 205–211, 2004.
[15] L. Chen, W. Su, Y. Feng, M. Wu, J. She, and K. Hirota, "Two-Layer Fuzzy Multiple Random Forest for Speech Emotion Recognition in Human-Robot Interaction," Information Sciences, Vol. 509, No. 1, pp. 150-163, 2020.
[16] L. Chen, M. Wu, M. Zhou, Z. Liu, J. She, and K. Hirota, "Dynamic Emotion Understanding in Human–Robot Interaction Based on Two-Layer Fuzzy Svr-Ts Model," IEEE Transactions on Systems, Man, and Cybernetics: Systems, Vol. 50, No. 2, pp. 490-501, 2020.
[17] S. H.-W. Chuah and J. Yu, "The Future of Service: The Power of Emotion in Human-Robot Interaction," Journal of Retailing and Consumer Services, Vol. 61, No. 3, p. 102551, 2021.
[18] C. Darwin and P. Prodger, The Expression of the Emotions in Man and Animals, 3rd Edition. USA: Oxford University Press pp. 1-459, 1998.
[19] N. Dave, "Feature Extraction Methods Lpc, Plp and Mfcc in Speech Recognition," International Journal for Advance Research in Engineering and Technology, Vol. 1, No. 6, pp. 1-4, 2013.
[20] S. Davis and P. Mermelstein, "Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences," IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. 28, No. 4, pp. 357-366, 1980.
[21] L. Deng and D. Yu, "Deep Learning: Methods and Applications," Found. Trends Signal Process., Vol. 7, No. 3–4, pp. 197–387, 2014.
[22] M. C. N. Dourado, B. Torres Mendonça de Melo Fádel, J. P. Simoes Neto, G. Alves, and C. Alves, "Facial Expression Recognition Patterns in Mild and Moderate Alzheimer’s Disease," Journal of Alzheimer's Disease, Vol. 69, No. 2, pp. 539-549, 2019.
[23] P. Ekman and W. V. Friesen, "Facial Action Coding System," Environmental Psychology & Nonverbal Behavior, Vol. 1, No. 1, pp. 1-42, 1978.
[24] M. El Ayadi, M. S. Kamel, and F. Karray, "Survey on Speech Emotion Recognition: Features, Classification Schemes, and Databases," Pattern Recognition, Vol. 44, No. 3, pp. 572-587, 2011.
[25] W. Fan, X. Xu, B. Cai, and X. Xing, "Isnet: Individual Standardization Network for Speech Emotion Recognition," IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol. 30, No. 1, pp. 1803-1814, 2022.
[26] C. Feichtenhofer, H. Fan, J. Malik, and K. He, "Slowfast Networks for Video Recognition," Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, pp. 6202-6211, 2019.
[27] M. Ficocelli, J. Terao, and G. Nejat, "Promoting Interactions between Humans and Robots Using Robotic Emotional Behavior," IEEE Transactions on Cybernetics, Vol. 46, No. 12, pp. 2911-2923, 2016.
[28] T. Fleiner, S. Leucht, H. Förstl, W. Zijlstra, and P. Haussermann, "Effects of Short-Term Exercise Interventions on Behavioral and Psychological Symptoms in Patients with Dementia: A&Nbsp;Systematic Review," Journal of Alzheimer's Disease, Vol. 55, No. 4, pp. 1583-1594, 2017.
[29] M. F. Folstein, S. E. Folstein, and P. R. McHugh, "“Mini-Mental State”: A Practical Method for Grading the Cognitive State of Patients for the Clinician," Journal of Psychiatric Research, Vol. 12, No. 3, pp. 189-198, 1975.
[30] P. E. a. W. V. Friesen, "Constants across Cultures in the Face and Emotion," Journal of Personality and Social Psychology, Vol. 17, No. 2, pp. 124-129, 1971.
[31] N. Garcia-Casares, R. M. Moreno-Leiva, and J. A. Garcia-Arnes, "Music Therapy as a Non-Pharmacological Treatment in Alzheimer's Disease. A Systematic Review," Rev Neurol, Vol. 65, No. 12, pp. 529-538, 2017.
[32] M. S. Goodkind, V. E. Sturm, E. A. Ascher, S. M. Shdo, B. L. Miller, K. P. Rankin, and R. W. Levenson, "Emotion Recognition in Frontotemporal Dementia and Alzheimer’s Disease: A New Film-Based Assessment," Emotion, Vol. 15, No. 4, p. 416, 2015.
[33] W. Guojiang and Y. Guoliang, "A Modified Optical Flow Algorithm and Its Application in Facial Expression Recognition," 2017 3rd IEEE International Conference on Computer and Communications (ICCC), Chengdu, China, pp. 1601-1605, 2017.
[34] K. He, X. Zhang, S. Ren, and J. Sun, "Deep Residual Learning for Image Recognition," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Nevada, pp. 770-778, 2016.
[35] Z. He, T. Jin, A. Basu, J. Soraghan, G. D. Caterina, and L. Petropoulakis, "Human Emotion Recognition in Video Using Subtraction Pre-Processing," Proceedings of the 2019 11th International Conference on Machine Learning and Computing, Zhuhai, China, pp. 374–379, 2019.
[36] S. Hochreiter and J. Schmidhuber, "Long Short-Term Memory," Neural Computation, Vol. 9, No. 8, pp. 1735-1780, 1997.
[37] A. Hong, N. Lunscher, T. Hu, Y. Tsuboi, X. Zhang, S. F. dos Reis Alves, G. Nejat, and B. Benhabib, "A Multimodal Emotional Human-Robot Interaction Architecture for Social Robots Engaged in Bidirectional Communication," IEEE Transactions on Cybernetics, Vol. 51, No. 12, pp. 5954 - 5968, 2020.
[38] M. Hou, Z. Zhang, Q. Cao, D. Zhang, and G. Lu, "Multi-View Speech Emotion Recognition Via Collective Relation Construction," IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol. 30, pp. 218-229, 2022.
[39] J. Hu, L. Shen, and G. Sun, "Squeeze-and-Excitation Networks," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, Utah, pp. 7132-7141, 2018.
[40] M. Hu, H. Wang, X. Wang, J. Yang, and R. Wang, "Video Facial Emotion Recognition Based on Local Enhanced Motion History Image and CNN-CTSLSTM Networks," Journal of Visual Communication and Image Representation, Vol. 59, No. 1, pp. 176-185, 2019.
[41] P. Hu, D. Cai, S. Wang, A. Yao, and Y. Chen, "Learning Supervised Scoring Ensemble for Emotion Recognition in the Wild," Proceedings of the 19th ACM International Conference on Multimodal Interaction, Glasgow, UK, pp. 553–560, 2017.
[42] H. P. Huang and S. R. Lu, "Implementation of Pre-Engagement Detection on Human-Robot Interaction in Complex Environments," Master Thesis, Graduate Institute of Mechanical Engineering, National Taiwan University, 2020.
[43] J. Huang, J. Tao, B. Liu, Z. Lian, and M. Niu, "Multimodal Transformer Fusion for Continuous Emotion Recognition," ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, pp. 3507-3511, 2020.
[44] E. Insafutdinov, L. Pishchulin, B. Andres, M. Andriluka, and B. Schiele, "Deepercut: A Deeper, Stronger, and Faster Multi-Person Pose Estimation Model," Computer Vision – ECCV 2016, Amsterdam, The Netherlands, pp. 34-50, 2016.
[45] L. Itti and C. Koch, "Computational Modelling of Visual Attention," Nature Reviews Neuroscience, Vol. 2, No. 3, pp. 194-203, 2001.
[46] C. E. Izard, Human Emotions, 2nd Edition. New York: Springer Science & Business Media, pp. 1-494, 2013.
[47] K. Izdebski, Emotions in the Human Voice, Volume 3: Culture and Perception, 1st Edition. Abingdon: Plural Publishing Inc., pp. 1-337, 2008.
[48] M. A. Jalal, E. Loweimi, R. K. Moore, and T. Hain, "Learning Temporal Clusters Using Capsule Routing for Speech Emotion Recognition," Proceedings of Interspeech 2019, Graz, Austria, pp. 1701-1705, 2019.
[49] Y. Jang, H. Gunes, and I. Patras, "Registration-Free Face-Ssd: Single Shot Analysis of Smiles, Facial Attributes, and Affect in the Wild," Computer Vision and Image Understanding, Vol. 182, No. 1, pp. 17-29, 2019.
[50] H. Jung, S. Lee, J. Yim, S. Park, and J. Kim, "Joint Fine-Tuning in Deep Neural Networks for Facial Expression Recognition," 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, pp. 2983-2991, 2015.
[51] W. Kay, J. Carreira, K. Simonyan, B. Zhang, C. Hillier, S. Vijayanarasimhan, F. Viola, T. Green, T. Back, and P. Natsev, "The Kinetics Human Action Video Dataset," arXiv preprint arXiv:1705.06950, 2017.
[52] J. Kim and E. André, "Emotion Recognition Based on Physiological Changes in Music Listening," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 30, No. 12, pp. 2067-2083, 2008.
[53] N. Koceska, S. Koceski, P. Beomonte Zobel, V. Trajkovik, and N. Garcia, "A Telemedicine Robot System for Assisted and Independent Living," Sensors, Vol. 19, No. 4, pp. 834-850, 2019.
[54] S. Koelstra, M. Pantic, and I. Patras, "A Dynamic Texture-Based Approach to Recognition of Facial Actions and Their Temporal Models," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 32, No. 11, pp. 1940-1954, 2010.
[55] A. Kołakowska, A. Landowska, M. Szwoch, W. Szwoch, and M. R. Wróbel, "Modeling Emotions for Affect-Aware Applications," Information Systems Development and Applications, Vol. 1, No. 1, pp. 55-69, 2015.
[56] J. Kossaifi, G. Tzimiropoulos, S. Todorovic, and M. Pantic, "Afew-Va Database for Valence and Arousal Estimation in-the-Wild," Image and Vision Computing, Vol. 65, No. 1, pp. 23-36, 2017.
[57] J. Lee, S. Kim, S. Kim, and K. Sohn, "Multi-Modal Recurrent Attention Networks for Facial Expression Recognition," IEEE Transactions on Image Processing, Vol. 29, No. 1, pp. 6977-6991, 2020.
[58] S. Lee and A. M. Naguib, "Toward a Sociable and Dependable Elderly Care Robot: Design, Implementation and User Study," Journal of Intelligent & Robotic Systems, Vol. 98, No. 1, pp. 5-17, 2020.
[59] S. Li and W. Deng, "Deep Facial Expression Recognition: A Survey," IEEE Transactions on Affective Computing, Vol. 1, No. 1, pp. 1-20, 2020.
[60] D. K. Limbu, W. C. Y. Anthony, T. H. J. Adrian, T. A. Dung, T. Y. Kee, T. H. Dat, W. H. Y. Alvin, N. W. Z. Terence, J. Ridong, and L. Jun, "Affective Social Interaction with Cuddler Robot," 2013 6th IEEE Conference on Robotics, Automation and Mechatronics (RAM), Manila, Philippines, pp. 179-184, 2013.
[61] G. W. Lindsay, "Attention in Psychology, Neuroscience, and Machine Learning," Frontiers in Computational Neuroscience, Vol. 14, No. 29, pp. 1-21, 2020.
[62] P. Liu, S. Han, Z. Meng, and Y. Tong, "Facial Expression Recognition Via a Boosted Deep Belief Network," 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, Ohio, pp. 1805-1812, 2014.
[63] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg, "Ssd: Single Shot Multibox Detector," Computer Vision – ECCV 2016, Amsterdam, The Netherlands, pp. 21-37, 2016.
[64] S. R. Livingstone and F. A. Russo, "The Ryerson Audio-Visual Database of Emotional Speech and Song (Ravdess): A Dynamic, Multimodal Set of Facial and Vocal Expressions in North American English," PloS One, Vol. 13, No. 5, p. e0196391, 2018.
[65] C. Lugaresi, J. Tang, H. Nash, C. McClanahan, E. Uboweja, M. Hays, F. Zhang, C.-L. Chang, M. G. Yong, and J. Lee, "Mediapipe: A Framework for Building Perception Pipelines," arXiv preprint arXiv:1906.08172, 2019.
[66] M.-T. Luong, H. Pham, and C. D. Manning, "Effective Approaches to Attention-Based Neural Machine Translation," arXiv preprint arXiv:1508.04025, 2015.
[67] P. Maresova and B. Klimova, "Supporting Technologies for Old People with Dementia: A Review," IFAC-PapersOnLine, Vol. 48, No. 4, pp. 129-134, 2015.
[68] D. McColl, A. Hong, N. Hatakeyama, G. Nejat, and B. Benhabib, "A Survey of Autonomous Human Affect Detection Methods for Social Robots Engaged in Natural Hri," Journal of Intelligent & Robotic Systems, Vol. 82, No. 1, pp. 101-133, 2016.
[69] T. McLellan, L. Johnston, J. Dalrymple‐Alford, and R. Porter, "The Recognition of Facial Expressions of Emotion in Alzheimer’s Disease: A Review of Findings," Acta Neuropsychiatrica, Vol. 20, No. 5, pp. 236-250, 2008.
[70] A. Mehrabian, Nonverbal Communication, 1st Edition. New York: Routledge, pp. 1-225, 2017.
[71] S. Mitsuyoshi, F. Ren, Y. Tanaka, and S. Kuroiwa, "Non-Verbal Voice Emotion Analysis System," International Journal of Innovative Computing Information and Control, Vol. 2, No. 4, pp. 819-830, 2006.
[72] A. Mollahosseini, D. Chan, and M. H. Mahoor, "Going Deeper in Facial Expression Recognition Using Deep Neural Networks," 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Placid, NY, USA, pp. 1-10, 2016.
[73] A. Mollahosseini, B. Hasani, and M. H. Mahoor, "Affectnet: A Database for Facial Expression, Valence, and Arousal Computing in the Wild," IEEE Transactions on Affective Computing, Vol. 10, No. 1, pp. 18-31, 2017.
[74] A. Mollahosseini, B. Hasani, and M. H. Mahoor, "Affectnet: A Database for Facial Expression, Valence, and Arousal Computing in the Wild," IEEE Transactions on Affective Computing, Vol. 10, No. 1, pp. 18-31, 2019.
[75] Z. S. Nasreddine, N. A. Phillips, V. Bédirian, S. Charbonneau, V. Whitehead, I. Collin, J. L. Cummings, and H. Chertkow, "The Montreal Cognitive Assessment, Moca: A Brief Screening Tool for Mild Cognitive Impairment," Journal of the American Geriatrics Society, Vol. 53, No. 4, pp. 695-699, 2005.
[76] A. Newell, K. Yang, and J. Deng, "Stacked Hourglass Networks for Human Pose Estimation," Computer Vision – ECCV 2016, Amsterdam, The Netherlands, pp. 483-499, 2016.
[77] H. D. Nguyen, S. H. Kim, G. S. Lee, H. J. Yang, I. S. Na, and S. H. Kim, "Facial Expression Recognition Using a Temporal Ensemble of Multi-Level Convolutional Neural Networks," IEEE Transactions on Affective Computing, Vol. 13, No. 1, pp. 226-237, 2022.
[78] N. Otberdout, M. Daoudi, A. Kacem, L. Ballihi, and S. Berretti, "Dynamic Facial Expression Generation on Hilbert Hypersphere with Conditional Wasserstein Generative Adversarial Nets," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 44, No. 2, pp. 848-863, 2022.
[79] V. V. P and A. George, "Automatic Recognition of Facial Expression Using Features of Salient Patches with Svm and Ann Classifier," 2017 International Conference on Trends in Electronics and Informatics (ICEI), Tirunelveli, India, pp. 908-913, 2017.
[80] Z. Peng, X. Li, Z. Zhu, M. Unoki, J. Dang, and M. Akagi, "Speech Emotion Recognition Using 3d Convolutions and Attention-Based Sliding Recurrent Networks with Auditory Front-Ends," IEEE Access, Vol. 8, No. 1, pp. 16560-16572, 2020.
[81] R. W. Picard, E. Vyzas, and J. Healey, "Toward Machine Emotional Intelligence: Analysis of Affective Physiological State," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 23, No. 10, pp. 1175-1191, 2001.
[82] D. Portugal, P. Alvito, E. Christodoulou, G. Samaras, and J. Dias, "A Study on the Deployment of a Service Robot in an Elderly Care Center," International Journal of Social Robotics, Vol. 11, No. 2, pp. 317-341, 2019.
[83] J. Posner, J. A. Russell, and B. S. Peterson, "The Circumplex Model of Affect: An Integrative Approach to Affective Neuroscience, Cognitive Development, and Psychopathology," Development and Psychopathology, Vol. 17, No. 3, pp. 715-734, 2005.
[84] D. Poux, B. Allaert, N. Ihaddadene, I. M. Bilasco, C. Djeraba, and M. Bennamoun, "Dynamic Facial Expression Recognition under Partial Occlusion with Optical Flow Reconstruction," IEEE Transactions on Image Processing, Vol. 31, No. 1, pp. 446-457, 2022.
[85] M. J. Prince, A. Comas-Herrera, M. Knapp, M. M. Guerchet, and M. Karagiannidou, World Alzheimer Report 2016 - Improving Healthcare for People Living with Dementia: Coverage, Quality and Costs Now and in the Future, 1st Edition. London: Alzheimer's Disease International, pp. 1-131, 2016.
[86] X. Qu, Z. Zou, X. Su, P. Zhou, W. Wei, S. Wen, and D. Wu, "Attend to Where and When: Cascaded Attention Network for Facial Expression Recognition," IEEE Transactions on Emerging Topics in Computational Intelligence, Vol. 6, No. 3, pp. 580-592, 2022.
[87] L. R. Rabiner, "A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition," Proceedings of the IEEE, Vol. 77, No. 2, pp. 257-286, 1989.
[88] G. Ramkumar and E. Logashanmugam, "An Effectual Facial Expression Recognition Using Hmm," 2016 International Conference on Advanced Communication Control and Computing Technologies (ICACCCT), Ramanathapuram, India, pp. 12-15, 2016.
[89] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You Only Look Once: Unified, Real-Time Object Detection," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Nevada, pp. 779-788, 2016.
[90] B. Reisberg, S. H. Ferris, M. J. de Leon, and T. Crook, "The Global Deterioration Scale for Assessment of Primary Degenerative Dementia," The American Journal of Psychiatry, Vol. 9, No. 139, pp. 1136-1139, 1982.
[91] J. A. Russell, "A Circumplex Model of Affect," Journal of Personality and Social Psychology, Vol. 39, No. 6, p. 1161, 1980.
[92] S. Sabour, N. Frosst, and G. E. Hinton, "Dynamic Routing between Capsules," Advances in Neural Information Processing Systems, Montréal, Canada, Vol. 30, 2017.
[93] K. R. Scherer, "Vocal Affect Expression: A Review and a Model for Future Research," Psychological Bulletin, Vol. 99, No. 2, pp. 143-165, 1986.
[94] K. R. Scherer, "Vocal Communication of Emotion: A Review of Research Paradigms," Speech Communication, Vol. 40, No. 1, pp. 227-256, 2003.
[95] J. Schmidhuber, "Deep Learning in Neural Networks: An Overview," Neural Networks, Vol. 61, No. 1, pp. 85-117, 2015.
[96] M. Shah Fahad, A. Ranjan, J. Yadav, and A. Deepak, "A Survey of Speech Emotion Recognition in Natural Environment," Digital Signal Processing, Vol. 110, No. 102951, pp. 1-28, 2021.
[97] C. Shan, S. Gong, and P. W. McOwan, "Facial Expression Recognition Based on Local Binary Patterns: A Comprehensive Study," Image and Vision Computing, Vol. 27, No. 6, pp. 803-816, 2009.
[98] L. Shang and K. Chan, "Temporal Exemplar-Based Bayesian Networks for Facial Expression Recognition," 2008 Seventh International Conference on Machine Learning and Applications, San Diego, CA, USA, pp. 16-22, 2008.
[99] W. Shang, K. Sohn, D. Almeida, and H. Lee, "Understanding and Improving Convolutional Neural Networks Via Concatenated Rectified Linear Units," Proceedings of The 33rd International Conference on Machine Learning, Proceedings of Machine Learning Research, Vol. 48, pp. 2217--2225, 2016.
[100] A. Shimokawa, N. Yatomi, S. Anamizu, S. Torii, H. Isono, and Y. Sugai, "Recognition of Facial Expressions and Emotional Situations in Patients with Dementia of the Alzheimer and Vascular Types," Dementia and Geriatric Cognitive Disorders, Vol. 15, No. 3, pp. 163-168, 2003.
[101] M. H. Siddiqi, R. Ali, A. M. Khan, Y. T. Park, and S. Lee, "Human Facial Expression Recognition Using Stepwise Linear Discriminant Analysis and Hidden Conditional Random Fields," IEEE Transactions on Image Processing, Vol. 24, No. 4, pp. 1386-1398, 2015.
[102] C. L. Sidner, C. Lee, C. D. Kidd, N. Lesh, and C. Rich, "Explorations in Engagement for Humans and Robots," Artificial Intelligence, Vol. 166, No. 1-2, pp. 140-164, 2005.
[103] R. Singh, H. Puri, N. Aggarwal, and V. Gupta, "An Efficient Language-Independent Acoustic Emotion Classification System," Arabian Journal for Science and Engineering, Vol. 45, No. 4, pp. 3111-3121, 2020.
[104] X. Sun, S. Zheng, and H. Fu, "Roi-Attention Vectorized CNN Model for Static Facial Expression Recognition," IEEE Access, Vol. 8, No. 1, pp. 7183-7194, 2020.
[105] Y. I. Tian, T. Kanade, and J. F. Cohn, "Recognizing Action Units for Facial Expression Analysis," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 23, No. 2, pp. 97-115, 2001.
[106] Y. Tong, R. Chen, and Y. Cheng, "Facial Expression Recognition Algorithm Using Lgc Based on Horizontal and Diagonal Prior Principle," Optik, Vol. 125, No. 16, pp. 4186-4189, 2014.
[107] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, "Attention Is All You Need," arXiv preprint arXiv:1706.03762, 2017.
[108] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, "Attention Is All You Need," Advances in neural information processing systems, pp. 5998-6008, 2017.
[109] C. Wang, J. Xue, K. Lu, and Y. Yan, "Light Attention Embedding for Facial Expression Recognition," IEEE Transactions on Circuits and Systems for Video Technology, Vol. 32, No. 4, pp. 1834-1847, 2022.
[110] F. Wang, M. Jiang, C. Qian, S. Yang, C. Li, H. Zhang, X. Wang, and X. Tang, "Residual Attention Network for Image Classification," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Hawaii, pp. 3156-3164, 2017.
[111] X. Wang, C. Yu, Y. Gu, M. Hu, and F. Ren, "Multi‐Task and Attention Collaborative Network for Facial Emotion Recognition," IEEJ Transactions on Electrical and Electronic Engineering, Vol. 16, No. 4, pp. 568-576, 2021.
[112] Y. S. Wang and H. P. Huang, "Human-Robot Interaction through Emotions and Eyes Gaze," Master Thesis, Department of Mechanical Engineering, National Taiwan University, 2019.
[113] D. Watson and L. A. Clark, "The Panas-X: Manual for the Positive and Negative Affect Schedule-Expanded Form," Citeseer, 1994.
[114] S.-E. Wei, V. Ramakrishna, T. Kanade, and Y. Sheikh, "Convolutional Pose Machines," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Nevada, pp. 4724-4732, 2016.
[115] Y. Xia, W. Zheng, Y. Wang, H. Yu, J. Dong, and F. Y. Wang, "Local and Global Perception Generative Adversarial Network for Facial Expression Synthesis," IEEE Transactions on Circuits and Systems for Video Technology, Vol. 32, No. 3, pp. 1443-1452, 2022.
[116] W. Xiaohua, P. Muzi, P. Lijuan, H. Min, J. Chunhua, and R. Fuji, "Two-Level Attention with Two-Stage Multi-Task Learning for Facial Emotion Recognition," Journal of Visual Communication and Image Representation, Vol. 62, No. 1, pp. 217-225, 2019.
[117] F. Xu and Z. Wang, "A Facial Expression Recognition Method Based on Cubic Spline Interpolation and Hog Features," 2017 IEEE International Conference on Robotics and Biomimetics (ROBIO), Macau, pp. 1300-1305, 2017.
[118] J. Yang, Q. Liu, and K. Zhang, "Stacked Hourglass Network for Robust Facial Landmark Localisation," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, Hawaii, pp. 79-87, 2017.
[119] T.-Y. Yang, Y.-T. Chen, Y.-Y. Lin, and Y.-Y. Chuang, "Fsa-Net: Learning Fine-Grained Structure Aggregation for Head Pose Estimation from a Single Image," Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, pp. 1087-1096, 2019.
[120] A. Yao, D. Cai, P. Hu, S. Wang, L. Sha, and Y. Chen, "Holonet: Towards Robust Emotion Recognition in the Wild," Proceedings of the 18th ACM International Conference on Multimodal Interaction, Tokyo, Japan, pp. 472–478, 2016.
[121] L. Yi and M. W. Mak, "Improving Speech Emotion Recognition with Adversarial Data Augmentation Network," IEEE Transactions on Neural Networks and Learning Systems, Vol. 33, No. 1, pp. 172-184, 2022.
[122] L. A. Zadeh, G. J. Klir, and B. Yuan, Fuzzy Sets, Fuzzy Logic, and Fuzzy Systems: Selected Papers, 1st Edition. Singapore: World Scientific, pp. 1-840, 1996.
[123] Y. Zeng, H. Mao, D. Peng, and Z. Yi, "Spectrogram Based Multi-Task Audio Classification," Multimedia Tools and Applications, Vol. 78, No. 3, pp. 3705-3722, 2019.
[124] C. Zhang and L. Xue, "Autoencoder with Emotion Embedding for Speech Emotion Recognition," IEEE Access, Vol. 9, No. 1, pp. 51231-51241, 2021.
[125] S. Zhang, X. Pan, Y. Cui, X. Zhao, and L. Liu, "Learning Affective Video Features for Facial Expression Recognition Via Hybrid Deep Learning," IEEE Access, Vol. 7, No. 1, pp. 32297-32304, 2019.
[126] S. Zhang, X. Zhao, and Q. Tian, "Spontaneous Speech Emotion Recognition Using Multiscale Deep Convolutional LSTM," IEEE Transactions on Affective Computing, Vol. 13, No. 2, pp. 680-688, 2022.
[127] G. Zhao and M. Pietikainen, "Dynamic Texture Recognition Using Local Binary Patterns with an Application to Facial Expressions," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 29, No. 6, pp. 915-928, 2007.
[128] X. Zhao, X. Liang, L. Liu, T. Li, Y. Han, N. Vasconcelos, and S. Yan, "Peak-Piloted Deep Network for Facial Expression Recognition," Computer Vision – ECCV 2016, Amsterdam, The Netherlands, pp. 425-442, 2016.
[129] Q. Zhou, S. Ur Rehman, Y. Zhou, X. Wei, L. Wang, and B. Zheng, "Face Recognition Using Dense Sift Feature Alignment," Chinese Journal of Electronics, Vol. 25, No. 6, pp. 1034-1039, 2016.
-
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/86414-
dc.description.abstract在本論文中,我們設計了一個可以為老年人或輕度認知障礙者提供情感支持(聊天、陪伴、分享情緒)的人機互動架構來減緩失智症的進展。在我們的人機互動架構之下,我們利用參與度模型來檢測人類的參與度。另一方面,我們結合臉部表情、語音和文字來檢測人類的情緒。針對臉部表情部分,我們提出了空間注意力機制沙漏卷積神經網絡 (SA-HCNN)模型,並結合 Transformer Encoder模型針對空間和時序特徵進行學習。針對語音部分,我們使用滑動窗口方法來提取梅爾頻率倒譜係數(MFCCs)特徵,並且建立一個 Transformer Encoder模型來提取時序特徵。針對文字部分,我們使用 Google Cloud Natural Language API獲得情緒分數。隨後,提出了一種機器人情緒生成系統,該系統使用人類情緒、人類參與度和機器人語音內容作為輸入,為機器人生成適當的情緒與行為。
此外,我們提出了基於人類情緒和參與度的HRI失智症狀指標,使機器人能夠在和人類互動的過程中檢測早期的失智症狀。我們同時將此HRI失智症狀指標與其他基於問答的認知任務相結合,以增強對失智症的評估。
我們提出的架構實現於本實驗室研發的服務機器人Mobi。我們招募了九位60歲以上的老年人與 Mobi進行互動。實驗結果表明,我們的人機互動架構使老年人和機器人互動時的體驗更加愉快且放鬆。
zh_TW
dc.description.abstractIn this thesis, we designed a human–robot interaction structure that can provide emotional support (chatting, company, sharing mood) for older adults or people with mild cognitive impairment to prevent their dementia progression. Our engagement model detected human engagement levels, and human emotions were detected using a unique combination of facial expression, speech, and text. For facial expression modality, we proposed a spatial attention hourglass convolutional neural network (SA–HCNN) model and used the Transformer Encoder model to simultaneously compute the spatial and temporal information. For speech modality, we used the sliding window method to extract Mel-Frequency Cepstral Coefficients (MFCCs) features and built a Transformer Encoder model to extract temporal features. For text modality, we used the Google Cloud Natural Language API for sentiment analysis to obtain sentiment scores and magnitudes. Afterward, a robot emotion generation system that uses human emotions, human engagement levels, and robot speech content as inputs was proposed to generate appro-priate emotional states and expressions for the robot.
In addition, we proposed HRI dementia symptom metrics based on human emotions and engagement levels to enable robots to detect early-stage dementia symptoms through interaction and combine them with several question–answer-based cognitive tasks to enhance the assessment of dementia.
The proposed architecture was implemented via a service robot constructed by the NTU Robotics Laboratory called Mobi. We recruited nine older adults over the age of 60 to interact with Mobi. The experimental results showed that older adults had more enjoyable experience interacting with robots under our human–robot interaction structure.
en
dc.description.provenanceMade available in DSpace on 2023-03-19T23:54:26Z (GMT). No. of bitstreams: 1
U0001-1108202217552000.pdf: 31704925 bytes, checksum: 70b4bd0f2c16b3c9fbe2b70f3bd70fe1 (MD5)
Previous issue date: 2022
en
dc.description.tableofcontents誌謝 i
摘要 iii
Abstract v
List of Tables xi
List of Figures xiii
Chapter 1 Introduction 1
1.1 Motivation 1
1.2 Contributions 2
1.3 Organization of Thesis 4
Chapter 2 Continuous Emotion Recognition System 7
2.1 Introduction 7
2.2 Related Works 8
2.2.1 Emotion Recognition in Facial Modality 8
2.2.2 Emotion Recognition in Speech Modality 8
2.3 Emotion Representation 21
2.3.1 Discrete Emotion 21
2.3.2 Continuous Emotion 22
2.4 Deep Learning Architecture 24
2.4.1 Hourglass Network 24
2.4.2 Self-Attention Mechanism 27
2.4.3 Transformer 31
2.5 Continuous Facial Expression Recognition System 35
2.5.1 Spatial Feature Extraction 36
2.5.2 Spatial-Temporal Feature Extraction 40
2.6 Speech Emotion Recognition System 42
2.7 Summary 45
Chapter 3 Human–Robot Interaction Structure 47
3.1 Introduction 47
3.2 Engagement Detection System 49
3.2.1 Head Pose Estimation 50
3.2.2 Eye Gaze Estimation 50
3.2.3 Action Recognition 51
3.2.4 Hidden Markov Model for Engagement 52
3.3 Multimodal Emotions Fusion 53
3.3.1 Speech Analyzer and Conversation Agent 53
3.3.2 Multimodal Emotions Fusion 55
3.4 Robot Emotion Generation System 58
3.4.1 Robot Emotion Generation Module 59
3.4.2 Robot Emotional Expression Module 63
3.5 Behavioral and Psychological Symptoms of Dementia 65
3.5.1 Stages of Dementia 65
3.5.2 Early-Stage Signs and Symptoms of Dementia in HRI 68
3.6 Summary 71
Chapter 4 Deployment and Experiments 73
4.1 Hardware and Communication Platform 73
4.2 Implementation and Testing Results 80
4.2.1 Continuous Emotion Recognition from Facial Expression 80
4.2.2 Continuous Emotion Recognition from Speech 89
4.2.3 Online Testing on Multimodal Continuous Emotion Recognition System 91
4.3 Applications 99
4.3.1 Scenario 99
4.3.2 Cognitive Assessment 101
4.3.3 Results 118
4.3.4 Discussion 139
Chapter 5 Conclusions and Future Works 141
5.1 Conclusions 141
5.2 Future Works 142
References 143
-
dc.language.isoen-
dc.subjectHourglass Networkzh_TW
dc.subject服務型機器人zh_TW
dc.subject輕度認知功能障礙zh_TW
dc.subject情緒辨識zh_TW
dc.subject注意力機制zh_TW
dc.subjectTransformerzh_TW
dc.subjectHourglass Networkzh_TW
dc.subject人機互動zh_TW
dc.subject服務型機器人zh_TW
dc.subject輕度認知功能障礙zh_TW
dc.subject情緒辨識zh_TW
dc.subject注意力機制zh_TW
dc.subjectTransformerzh_TW
dc.subject人機互動zh_TW
dc.subjectService Roboten
dc.subjectHRIen
dc.subjectHourglass Networken
dc.subjectTransformeren
dc.subjectAttention Mechanismen
dc.subjectEmotion Recognitionen
dc.subjectMild Cognitive Impairmenten
dc.subjectService Roboten
dc.subjectHRIen
dc.subjectHourglass Networken
dc.subjectTransformeren
dc.subjectAttention Mechanismen
dc.subjectEmotion Recognitionen
dc.subjectMild Cognitive Impairmenten
dc.title應用連續情緒辨識於輕度認知功能障礙者的人機互動架構zh_TW
dc.titleHuman–Robot Interaction Framework with Continuous Emotion Recognition for People with Mild Cognitive Impairmenten
dc.typeThesis-
dc.date.schoolyear110-2-
dc.description.degree碩士-
dc.contributor.oralexamcommittee程藴菁;陳人豪;傅楸善;林峻永;蔣本基zh_TW
dc.contributor.oralexamcommitteeYen-Ching Chen;Jen-Hau Chen;Chiou-Shann Fuh;Chun-Yeon Lin;Pen-Chi Chiangen
dc.subject.keyword人機互動,Hourglass Network,Transformer,注意力機制,情緒辨識,輕度認知功能障礙,服務型機器人,zh_TW
dc.subject.keywordHRI,Hourglass Network,Transformer,Attention Mechanism,Emotion Recognition,Mild Cognitive Impairment,Service Robot,en
dc.relation.page153-
dc.identifier.doi10.6342/NTU202202309-
dc.rights.note同意授權(全球公開)-
dc.date.accepted2022-08-22-
dc.contributor.author-college工學院-
dc.contributor.author-dept機械工程學系-
dc.date.embargo-lift2024-08-20-
顯示於系所單位:機械工程學系

文件中的檔案:
檔案 大小格式 
ntu-110-2.pdf30.96 MBAdobe PDF檢視/開啟
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved