請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/21848完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 羅仁權(Ren C. Luo) | |
| dc.contributor.author | Ho Hsin Hsiang | en |
| dc.contributor.author | 向何鑫 | zh_TW |
| dc.date.accessioned | 2021-06-08T03:49:35Z | - |
| dc.date.copyright | 2018-12-17 | |
| dc.date.issued | 2018 | |
| dc.date.submitted | 2018-12-05 | |
| dc.identifier.citation | [1] S. H. Ivanov and C. Webster and K. Berezina, 'Adoption of Robots and Service Automation by Tourism and Hospitality Companies,' Revista Turismo & Desenvolvimento, 27/28, 1501-1517, 2017. Available at SSRN: https://ssrn.com/abstract=2964308
[2] H. I. Christensen, V. Kumar, G. Hager, M. Mason, J. Hollerbach, A. Okamura, and M. Mataric. 'A roadmap for US robotics. from Internet to Robotics.' In Computing Community Consortium, 2016. [3] 'Older People Projected to Outnumber Children for First Time in U.S. History,' News Release, United States Census Bureau , Available: https://www.census.gov/newsroom/press-releases/2018/cb18-41-population-projections.html [4] R. Yonck, 'Heart of the Machine: Our Future in a World of Artificial Emotional Intelligence'. New York: Arcade Publishing, 2017. pp. 150–153. [5] G.R. VandenBos, ed. 'APA Dictionary of Psychology Washington, DC: American Psychological Association', page 26. [6] R. C. Luo, C. K. Hsieh, 'Robotic Sensory Perception on Human Mentation for Offering Proper Services,' IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI), 2017, pp. 524-529. [7] J. Lafaye, D. Gouaillier, and P. B. Wieber, 'Linear model predictive control of the locomotion of Pepper, a humanoid robot with omnidirectional wheels,' in 2014 IEEE-RAS International Conference on Humanoid Robots, 2014, pp. 336-341. [8] P. Rane, V. Mhatre, and L. Kurup, 'Study of a home robot: Jibo,' International journal of engineering research and technology, vol. 3, 2014. [9] B. Graf, U. Reiser, M. Hägele, K. Mauz, and P. Klein, 'Robotic home assistant Care-O-bot® 3-product vision and innovation platform,' 2009 IEEE Workshop on Advanced Robotics and its Social Impacts (ARSO), 2009, pp. 139-144. [10] D. Fischinger, P. Einramhof, K. Papoutsakis, W. Wohlkinger, P. Mayer, P. Panek, et al., 'Hobbit, a care robot supporting independent living at home: First prototype and lessons learned,' Robotics and Autonomous Systems, vol. 75, pp. 60-78, 2016. [11] C. Schroeter, S. Mueller, M. Volkhardt, E. Einhorn, C. Huijnen, H. van den Heuvel, et al., 'Realization and user evaluation of a companion robot for people with mild cognitive impairments,' 2013 IEEE International Conference on Robotics and Automation (ICRA), 2013, pp. 1153-1159. [12] A. Sciutti, M. Mara, V. Tagliasco and G. Sandini, 'Humanizing Human-Robot Interaction: On the Importance of Mutual Understanding,' IEEE Technology and Society Magazine, vol. 37, no. 1, pp. 22-29, March 2018. [13] A. Frischen, A.P. Bayliss, and S.P. Tipper, 'Gaze cueing of attention: Visual attention, social cognition, and individual differences,' Psych. Bull., vol. 133, no. 4, pp. 694–724, Jul. 2007. [14] O. Palinko, F. Rea, G. Sandini, and A. Sciutti, 'Robot reading human gaze: Why eye tracking is better than head tracking for human-robot collaboration,' in Proc. 2016 IEEE/RSJ Int. Conf. Intelligent Robots and Systems (IROS), 2016, pp. 5048–5054. [15] S. Ivaldi, S.M. Anzalone, W. Rousseau, O. Sigaud, and M. Chetouani, 'Robot initiative in a team learning task increases the rhythm of interaction but not the perceived engagement,' Front. Neurorobot., vol. 8, pp. 1–16, Feb. 2014. [16] A. Sciutti, C. Ansuini, C. Becchio, and G. Sandini, 'Investigating the ability to read others’ intentions using humanoid robots,' Front. Psychol., vol. 6, 2015. [17] M. Cocchi; L. Tonello; S. Tsaluchidu; B.K. Puri. 'The use of artificial neural networks to study fatty acids in neuropsychiatric disorders,' BMC Psychiatry, 2008, Apr 17;8 Suppl 1:53. [18] R. Khosla, M. T. Chu and K. Nguyen, 'Affective Robot Enabled Capacity and Quality Improvement of Nursing Home Aged Care Services in Australia,' 2013 IEEE 37th Annual Computer Software and Applications Conference Workshops, Japan, 2013, pp. 409-414. [19] Y. Gu, K.N. Yang, S.Y. Fu, S.H. Chen, X.Y. Li and I. Marsic, 'Multimodal Affective Analysis Using Hierarchical Attention Strategy with Word-Level Alignment,' ACL 2018. [20] S.Q. Zhang, S.L. Zhang, T.J. Huang, W. Gao and Q. Tian, 'Learning affective features with a hybrid deep model for audio-visual emotion recognition,' IEEE Transactions on Circuits and Systems for Video Technology, 2017. [21] J. Chen, Z. Chen, Z. Chi and H. Fu, 'Facial Expression Recognition in Video with Multiple Feature Fusion,' in IEEE Transactions on Affective Computing, vol. 9, no. 1, pp. 38-50, Jan.-March 1 2018. [22] E. Sariyanidi, H. Gunes, and A. Cavallaro. Automatic analysis of facial affect: A survey of registration, representation, and recognition. IEEE transactions on pattern analysis and machine intelligence, 37(6):1113–1133, 2015. [23] C. Shan, S. Gong, and P. W. McOwan. Facial expression recognition based on local binary patterns: A comprehensive study. Image and Vision Computing, 27(6):803–816, 2009. [24] N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, volume 1, pages 886–893. IEEE, 2005. [25] Z. Wang and Z. Ying. Facial expression recognition based on local phase quantization and sparse representation. In Natural Computation (ICNC), 2012 Eighth International Conference on, pages 222–225. IEEE, 2012. [26] N. Dalal, B. Triggs, and C. Schmid. Human detection using oriented histograms of flow and appearance. In European conference on computer vision, pages 428–441. Springer, 2006. [27] T. F. Cootes, G. J. Edwards, C. J. Taylor, et al. Active appearance models. IEEE Transactions on pattern analysis and machine intelligence, 23(6):681–685, 2001. [28] T. F. Cootes, C. J. Taylor, D. H. Cooper, and J. Graham. Active shape models-their training and application. Computer vision and image understanding, 61(1):38–59, 1995. [29] M. Mohammadi, E. Fatemizadeh, and M. H. Mahoor. Pcabased dictionary building for accurate facial expression recognition via sparse representation. Journal of Visual Communication and Image Representation, 25(5):1082– 1092, 2014. [30] M. Yeasin, B. Bullot, and R. Sharma. Recognition of facial expressions and measurement of levels of interest from video. Multimedia, IEEE Transactions on, 8(3):500–508, 2006. [31] Y. Zhu, L. C. De Silva, and C. C. Ko. Using moment invariants and hmm in facial expression recognition. Pattern Recognition Letters, 23(1):83–91, 2002. [32] Y. Sun, X. Chen, M. Rosato, and L. Yin. Tracking vertex flow and model adaptation for three-dimensional spatiotemporal face analysis. Systems, Man and Cybernetics, Part A: Systems and Humans, IEEE Transactions on, 40(3):461–474, 2010. [33] N. Sebe, M. S. Lew, Y. Sun, I. Cohen, T. Gevers, and T. S. Huang. Authentic facial expression analysis. Image and Vision Computing, 25(12):1856–1863, 2007. [34] Y. Zhang and Q. Ji. Active and dynamic information fusion for facial expression understanding from image sequences. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 27(5):699–714, 2005. [35] C. Shan, S. Gong, and P. W. McOwan. Dynamic facial expression recognition using a bayesian temporal manifold model. In BMVC, pages 297–306. Citeseer, 2006. [36] B. Hasani, M. M. Arzani, M. Fathy, and K. Raahemifar. Facial expression recognition with discriminatory graphical models. In 2016 2nd International Conference of Signal Processing and Intelligent Systems (ICSPIS), pages 1–7, Dec 2016. [37] B. Hasani and M. H. Mahoor. Spatio-temporal facial expression recognition using convolutional neural networks and conditional random fields. arXiv preprint arXiv:1703.06995, 2017. [38] S. B. Wang, A. Quattoni, L.-P. Morency, D. Demirdjian, and T. Darrell. “Hidden conditional random fields for gesture recognition.” In Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on, volume 2, pages 1521–1527. IEEE, 2006. [39] A. Krizhevsky, I. Sutskever, and G. E. Hinton. “Imagenet classification with deep convolutional neural networks.” In Advances in neural information processing systems, pages 1097–1105, 2012. [40] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. “Going deeper with convolutions.” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1–9, 2015. [41] A. Mollahosseini, D. Chan, and M. H. Mahoor. Going deeper in facial expression recognition using deep neural networks. In 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), pages 1–10. IEEE, 2016. [42] A. Mollahosseini, B. Hasani, M. J. Salvador, H. Abdollahi, D. Chan, and M. H. Mahoor. Facial expression recognition from world wild web. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, June 2016. [43] S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167, 2015. [44] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna. Rethinking the inception architecture for computer vision. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016. [45] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016. [46] C. Szegedy, S. Ioffe, and V. Vanhoucke. Inception-v4, inception-resnet and the impact of residual connections on learning. arXiv preprint arXiv:1602.07261, 2016. [47] S. Song and J. Xiao. Deep sliding shapes for amodal 3d object detection in rgb-d images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 808–816, 2016. [48] P. Molchanov, X. Yang, S. Gupta, K. Kim, S. Tyree, and J. Kautz. “Online detection and classification of dynamic hand gestures with recurrent 3d convolutional neural network.” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4207–4215, 2016. [49] Y. Fan, X. Lu, D. Li, and Y. Liu. “Video-based emotion recognition using cnn-rnn and c3d hybrid networks.” In Proceedings of the 18th ACM International Conference on Multimodal Interaction, ICMI 2016, pages 445–450, New York, NY, USA, 2016. ACM. [50] J. Donahue, L. Anne Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell. Long-term recurrent convolutional networks for visual recognition and description. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2625–2634, 2015. [51] S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997. [52] W. Byeon, T. M. Breuel, F. Raue, and M. Liwicki. Scene labeling with lstm recurrent neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3547–3555, 2015. [53] M. El Ayadi, M. S. Kamel, and F. Karray, “Survey on speech emotion recognition: Features, classification schemes, and databases,” Pattern Recogn., vol. 44, no. 3, pp. 572–587, 2011. [54] Z. Zeng, J. Tu, B. M. Pianfetti, and T. S. Huang, “Audio-visual affective expression recognition through multistream fused hmm,” IEEE Trans. Multimedia, vol. 10, no. 4, pp. 570–577, 2008. [55] Y. Wang and L. Guan, “Recognizing human emotional state from audiovisual signals*,” IEEE Trans. Multimedia, vol. 10, no. 5, pp. 936 – 946, 2008. [56] Y. Wang, L. Guan, and A. N. Venetsanopoulos, “Kernel cross-modal factor analysis for information fusion with application to bimodal emotion recognition,” IEEE Trans. Multimedia, vol. 14, no. 3, pp. 597– 607, 2012. [57] S. Zhalehpour, O. Onder, Z. Akhtar, and C. E. Erdem, “Baum-1: A spontaneous audio-visual face database of affective and mental states,” IEEE Trans. Affect. Comput., 2016. [58] N. E. D. Elmadany, Y. He, and L. Guan, “Multiview emotion recognition via multi-set locality preserving canonical correlation analysis,” in 2016 IEEE International Symposium on Circuits and Systems (ISCAS), Montral, QC, Canada, 2016, pp. 590–593. [59] Y. Tian, T. Kanade, and J. F. Cohn, “Facial expression recognition,” in Handbook of face recognition. Springer, 2011, pp. 487–519. [60] X. Zhao and S. Zhang, “A review on facial expression recognition: Feature extraction and classification,” IETE Tech. Review, vol. 33, no. 5, pp. 505–507, 2016. [61] G. Zhao and M. Pietikainen, “Dynamic texture recognition using local binary patterns with an application to facial expressions,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 29, no. 6, pp. 915–928, 2007. [62] W. Ding, M. Xu, D. Huang, W. Lin, M. Dong, X. Yu, and H. Li, “Audio and face video emotion recognition in the wild using deep neural networks and small datasets,” in Proceedings of the 18th ACM International Conference on Multimodal Interaction (ICMI), Tokyo, Japan, 2016, pp. 506–513. [63] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural comput., vol. 9, no. 8, pp. 1735–1780, 1997. [64] M. Mansoorizadeh and N. M. Charkari, “Multimodal information fusion application to human emotion recognition from face and speech,” Multimed. Tool. Appl., vol. 49, no. 2, pp. 277–297, 2010. [65] M. Song, M. You, N. Li, and C. Chen, “A robust multimodal approach for emotion recognition,” Neurocomput., vol. 71, no. 10, pp. 1913–1920, 2008. [66] J.-C. Lin, C.-H. Wu, and W.-L. Wei, “Error weighted semi-coupled hidden markov model for audio-visual emotion recognition,” IEEE Trans. Multimedia, vol. 14, no. 1, pp. 142–156, 2012. [67] Y. Kim, H. Lee, and E. M. Provost, “Deep learning for robust feature generation in audiovisual emotion recognition,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Vancouver, BC, 2013, pp. 3687–3691. [68] L. Pang, S. Zhu, and C.-W. Ngo, “Deep multimodal learning for affective analysis and retrieval,” IEEE Trans. Multimedia, vol. 17, no. 11, pp. 2008–2020, 2015. [69] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in neural information processing systems (NIPS), 2012, pp. 1097–1105. [70] D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri, “Learning spatiotemporal features with 3d convolutional networks,” in 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 2015, pp. 4489–4497. [71] J. Ngiam, A. Khosla, M. Kim, J. Nam, H. Lee, and A. Y. Ng, “Multimodal deep learning,” in International Conference on Machine Learning (ICML), 2011, pp. 689–696. [72] PCL, http://pointclouds.org/ [Online; accessed 15-July-2017] [73] OpenCV, http://opencv.org/ [Online; accessed 15-July-2017] [74] Scikit-Learn, http://scikit-learn.org/stable/ [Online; accessed 15-July-2017] [75] Keras, https://keras.io/ [Online; accessed 15-July-2017] [76] API.AI, https://api.ai/ [Online; accessed 15-July-2017] [77] ROS, http://www.ros.org/ [Online; accessed 15-July-2017] [78] C. Busso, S. Lee, and S. S. Narayanan, “Using neutral speech models for emotional speech analysis,” in Interspeech, Antwerp, Belgium, 2007, pp. 2225–2228. [79] X. Huang, A. Acero, H.-W. Hon, and R. Foreword By-Reddy, Spoken language processing: A guide to theory, algorithm, and system development. Prentice Hall PTR, 2001. [80] K. Sasaki, H. Tjandra, K. Noda, K. Takahashi and T. Ogata, “Neural network based model for visual-motor integration learning of robot’s drawing behavior: Association of a drawing motion from a drawn image,” IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Hamburg, 2015. [81] V. Veeriah, N. Zhuang and G. J. Qi, “Differential Recurrent Neural Networks for Action Recognition,” IEEE International Conference on Computer Vision (ICCV), Santiago, 2015, pp. 4041-4049. [82] Q.Li,X.ZhaoandK.Huang,“Learningtemporallycorrelatedrepresentations using lstms for visual tracking,” IEEE International Conference on Image Processing (ICIP), Phoenix, AZ, USA, 2016, pp. 1614-1618. [83] Y. Bengio, P. Simard and P. Frasconi, “Learning long-term dependencies with gradient descent is difficult,” IEEE Transactions on Neural Networks, vol. 5, no. 2, pp. 157-166, 1994. [84] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Computation, vol. 9, no. 8, pp. 1735-1780, 1997. [85] N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), San Diego, CA, USA, 2005, pp. 886-893 vol. 1. [86] Optical Flow, http://docs.opencv.org/trunk/d7/d8b/tutorial_py_lucas_kanade.html [Online; accessed 15-July-2017] [87] Autoencoders, https://blog.keras.io/building-autoencoders-in-keras.html [Online; accessed 15-July-2017] [88] R. C. Luo, C. K. Hsieh, 'Robotic Sensory Perception on Human Mentation for Offering Proper Services,' IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI), 2017, pp. 524-529. [89] Category classification by CNNs, https://ujjwalkarn.me/2016/08/11/intuitive-explanation-convnets/ [Online; accessed 15-July-2017] [90] Activation function, https://en.wikibooks.org/wiki/Artificial_Neural_Networks/-Print_Version [Online; accessed 15-July-2017] [91] Gradient descent, http://sebastianruder.com/optimizing-gradient-descent/ [Online; accessed 15-July-2017] [92] Early stopping, https://deeplearning4j.org/earlystopping [Online; accessed 15-July-2017] [93] K. Simonyan and A. Zisserman. Two-stream convolutional networks for action recognition in videos. In NIPS, 2014. [94] A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei. Large-scale video classification with convolutional neural networks. In CVPR, 2014. [95] A. Mollahosseini, B. Hasani, and M. H. Mahoor, “AffectNet: A New Database for Facial Expression, Valence, and Arousal Computation in the Wild”, IEEE Transactions on Affective Computing, 2017. [96] Y. Wang and L. Guan, “Recognizing human emotional state from audiovisual signals*,” IEEE Trans. Multimedia, vol. 10, no. 5, pp. 936–946, 2008. [97] P. Lucey, J. F. Cohn, T. Kanade, J. Saragih, Z. Ambadar, and I. Matthews. “The extended cohn-kanade dataset (ck+): A complete dataset for action unit and emotion-specified expression,” 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 94–101. IEEE, 2010. [98] S. M. Mavadati, M. H. Mahoor, K. Bartlett, P. Trinh, and J. F. Cohn. “Disfa: A spontaneous facial action intensity database.” IEEE Transactions on Affective Computing, 4(2):151–160, 2013. [99] O. Martin, I. Kotsia, B. Macq, and I. Pitas, “The enterface’05 audio-visual emotion database,” in 22nd International Conference on Data Engineering Workshops, Atlanta, GA, USA, 2006, pp. 8–8. [100] B. Schuller, S. Steidl, A. Batliner, F. Burkhardt, L. Devillers, C. A. M¨uller, and S. S. Narayanan, “The interspeech 2010 paralinguistic challenge.” in INTERSPEECH,Makuhari, Chiba, Japan, 2010, pp. 2794–2797. [101] J. Kittler, M. Hatef, R. P. Duin, and J. Matas, “On combining classifiers,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 20, no. 3, pp. 226–239, 1998. [102] Z. Sun, Q. Song, X. Zhu, H. Sun, B. Xu, and Y. Zhou, “A novel ensemble method for classifying imbalanced data,” Pattern Recogn., vol. 48, no. 5, pp. 1623–1637, 2015. [103] S. Greene, H. Thapliyal and A. Caban-Holt, 'A Survey of Affective Computing for Stress Detection: Evaluating technologies in stress detection for better health,' in IEEE Consumer Electronics Magazine, vol. 5, no. 4, pp. 44-56, Oct. 2016. [104] E. Cambria, 'Affective Computing and Sentiment Analysis,' in IEEE Intelligent Systems, vol. 31, no. 2, pp. 102-107, Mar.-Apr. 2016. [105] D. Novak, G. Chanel, P. Guillotel and A. Koenig, 'Guest Editorial: Toward Commercial Applications of Affective Computing,' in IEEE Transactions on Affective Computing, vol. 8, no. 2, pp. 145-147, April-June 1 2017. [106] V. Venek, S. Scherer, L. P. Morency, A. “. Rizzo and J. Pestian, 'Adolescent Suicidal Risk Assessment in Clinician-Patient Interaction,' in IEEE Transactions on Affective Computing, vol. 8, no. 2, pp. 204-215, April-June 1 2017. | |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/21848 | - |
| dc.description.abstract | 隨著機器人技術的迅速發展,有關於機器人系統與人類日常生活環境深度結合已經成為近年來大家關注的重點之一。為了實現這一目標,機器人系統必須考慮到人類的感受和情緒等資訊,才能表現得更像人類。而這也是人與人之間的交流中最重要的部分之一。
理解人類情感尤其對於服務型機器人至關重要,因為他們需要根據不同情況提供更合適的服務,並表現得更像人類行為,提共更貼心的服務。本文提出了一種用於智能服務機器人感知人類感受的新型多模式方法。而本文中所提及的人類感受,有別於一般的人類情緒,是對於機器人來說更為有用的資訊。 在本文中,我們考慮了類似於機場及養生村等大型公共場所內的移動服務機器人的情況,許多具有各種心理狀態、情緒和感受的人需要不同的幫助。尤其是在養生村這類需要大量看護機器人的地方,能夠理解人類情感對服務機器人非常重要卻也是一大挑戰,如何在適當的情況下提供適當的幫助尤為關鍵。如果能夠透過感知人類的心理狀態而主動提供服務那就更理想。我們通過結合面部表情,語音和用戶動作的情緒識別結果來估計用戶的心理狀態。當中,我們運用了數種最先進的深度學習模型,例如二維和三維的卷積神經網路、遞歸神經網路以及輻射基底類神經網路等。 實驗的結果也顯示,即使在信息有限的情況下,例如其中一項偵測情緒的資訊無法辨識,這樣的方法也可以頗準確地預測人類的心理狀態,若機器人能夠根據人類不同的心理狀態而提供相應的服務,絕對可以促進機器人和人類的交流。 關鍵字:人機互動、深度學習、多模型感測、機器視覺。 | zh_TW |
| dc.description.abstract | With the rapid developments of robotics, the idea of a deep integration of robotic systems in human environments has been raised. To achieve this prospect, the robot systems have to take into account human affects, which is one of the most important aspect of human-human interaction (HHI).
Understanding human affects is crucial especially for service robots, so that they can provide more appropriate services base on different situations and most importantly, the robots can act more like human. Rather than emotion, human affects may be a more useful information as it directly refers to how people actually feel. This paper presents a novel multi-modality approach for Intelligent Service Robot to sense human’s feelings. In this thesis, we consider the situation of a mobile service robot inside a large public area such as airport or co-housing district, where many people with various affects require different kinds of help, awareness of user’s affects is especially vital for service robots to provide appropriate assistance to people. We estimate users' affects by combining the results of emotion recognition from facial expression, speech and the motion of users. We use state-of-the-art deep learning models such as 2D and 3D Convolutional Neural Network, Recurrent Neural Network and Radial Basis Function Neural Network. Experiments show that this approach can accurately predict human feelings even in cases with limited information and this can ease the interaction of Robot and Human. Our experiment results also outperform some recent research in audio-visual emotion recognition in terms of accuracy. Besides, this multi-modality approach allows the robot to work even if one of the sources of detection is missed, which is more applicable in real life situation. | en |
| dc.description.provenance | Made available in DSpace on 2021-06-08T03:49:35Z (GMT). No. of bitstreams: 1 ntu-107-R05921088-1.pdf: 4339979 bytes, checksum: b197654fe7bd85cdbe9eba808fa377e9 (MD5) Previous issue date: 2018 | en |
| dc.description.tableofcontents | 誌謝 ................................................................................................................................... I
中文摘要 ......................................................................................................................... II ABSTRACT ...................................................................................................................... III TABLE OF CONTENTS ............................................................................................... V LIST OF FIGURES ................................................................................................... VIII LIST OF TABLE ........................................................................................................ XII CHAPTER 1 INTRODUCTION .................................................................................. 1 1.1 MOTIVATION ......................................................................................... 2 1.2 OBJECTIVE ............................................................................................. 4 1.3 LITERATURE REVIEW ......................................................................... 5 1.3.1 Health Care Service Robots ...................................................................... 5 1.3.2 Humanizing Service Robot ....................................................................... 6 1.3.3 Affective Computing on Robotics ............................................................ 7 1.3.4 Facial Emotion Recognition ..................................................................... 9 1.3.5 Audio-Visual Emotion Recognition. ...................................................... 11 1.4 THESIS STRUCTURE .......................................................................... 16 CHAPTER 2 SYSTEM ARCHITECTURE .............................................................. 17 2.1 HARDWARE STRUCTURE ................................................................. 17 2.1.1 RenBo-S Service Robot .......................................................................... 17 2.1.2 Kinect RGB-D Camera ........................................................................... 18 2.2 SOFTWARE STRUCTURE .................................................................. 20 2.2.1 Point Cloud Library (PCL) ..................................................................... 20 2.2.2 Open Source Computer Vision Library (OpenCV) ................................ 21 2.2.3 Scikit-Learn ............................................................................................ 23 2.2.4 KERAS ................................................................................................... 25 2.2.5 API.AI .................................................................................................... 27 2.2.6 Robot Operating System (ROS) ............................................................. 29 CHAPTER 3 BACKGROUND KNOWLEDGE ....................................................... 34 3.1 UNDERSTANDING AND USING CONTEXT ...................................... 34 3.1.1 Definition of Context .............................................................................. 35 3.1.2 Definition of Context-Aware .................................................................. 35 3.1.3 Convolutional Neural Networks Auto-encoder ...................................... 36 3.2 OUR PREVIOUS WORK ..................................................................... 53 3.2.1 Data Collection ....................................................................................... 53 CHAPTER 4 PROPSED SYSTEM ............................................................................ 54 4.1 AUDIO-VISUAL EMOTION RECOGNITION .................................... 55 4.1.1 Network Overview ................................................................................. 55 4.1.2 AlexNet ................................................................................................... 56 4.1.3 VGG Network ........................................................................................ 59 4.1.4 C3D Convolution Network ..................................................................... 60 4.1.5 Network Input ......................................................................................... 63 4.2 GESTURE EMOTION RECOGNITION ............................................... 64 4.2.1 LSTM Based Classifier .......................................................................... 64 4.3 FEATURE EXTRACTION .................................................................... 67 4.3.1 Handcraft Feature ................................................................................... 67 4.4 SUPPORT VECTOR MACHINE (SVM) .............................................. 69 4.5 HUMAN BODY AND FACE DETECTION ......................................... 69 4.5.1 Human Detection .................................................................................... 69 4.5.2 Face Detection ........................................................................................ 73 CHAPTER 5 EXPERIMENTS ................................................................................... 75 5.1 DATASETS ............................................................................................ 75 5.1.1 Training Datasets .................................................................................... 75 5.1.2 Testing Datasets ...................................................................................... 77 5.2 EXPERIMENTAL SETUP .................................................................... 78 5.3 DEEP LEARNING MODELS EVALUATION ..................................... 78 5.3.1 K-fold Cross-Validation ......................................................................... 78 5.3.2 Uni-Modality Performance ..................................................................... 79 5.3.3 Multi-Modality Performance .................................................................. 83 5.3.4 Multi-Modality Fusion ........................................................................... 84 5.4 ADAPTIVE RESPONSE ....................................................................... 85 5.4.1 Online Survey ......................................................................................... 86 5.4.2 Robot Behaviors ..................................................................................... 88 5.4.3 Results .................................................................................................... 89 CHAPTER 6 CONCLUSION, CONTRIBUTIONS AND FUTURE WORKS ........ 90 6.1 CONCLUSIONS .................................................................................... 90 6.2 CONTRIBUTIONS ................................................................................ 90 6.3 FUTURE WORKS ................................................................................. 91 REFERENCES .................................................................................................................. 92 VITA ............................................................................................................................ 106 | |
| dc.language.iso | en | |
| dc.title | 俱多模式感知及適應性應對人類感受之智慧型服務機器人 | zh_TW |
| dc.title | Multi-Modality Sensation and Adaptation of Human Affects for Intelligent Service Robot | en |
| dc.type | Thesis | |
| dc.date.schoolyear | 107-1 | |
| dc.description.degree | 碩士 | |
| dc.contributor.oralexamcommittee | 張帆人(Fan-Ren Chang),顏炳郎(Ping-Lang Yen) | |
| dc.subject.keyword | 人機互動,深度學習,多模型感測,機器視覺, | zh_TW |
| dc.subject.keyword | Human Robot Interaction,Deep Learning,Multi-modality sensation,Computer Vision, | en |
| dc.relation.page | 106 | |
| dc.identifier.doi | 10.6342/NTU201804259 | |
| dc.rights.note | 未授權 | |
| dc.date.accepted | 2018-12-05 | |
| dc.contributor.author-college | 電機資訊學院 | zh_TW |
| dc.contributor.author-dept | 電機工程學研究所 | zh_TW |
| 顯示於系所單位: | 電機工程學系 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-107-1.pdf 未授權公開取用 | 4.24 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
