透過序列到序列模型翻譯多模態情感

Devina Ekawati; 王廉花

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/74435

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	許永真(Jane Yung-jen Hsu)
dc.contributor.author	Devina Ekawati	en
dc.contributor.author	王廉花	zh_TW
dc.date.accessioned	2021-06-17T08:35:38Z	-
dc.date.available	2024-08-15
dc.date.copyright	2019-08-15
dc.date.issued	2019
dc.date.submitted	2019-08-08
dc.identifier.citation	[1] D. Acharya, Z. Huang, D. Pani Paudel, and L. Van Gool. Covariance pooling for facial expression recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 367–374, 2018. [2] D. Bahdanau, K. Cho, and Y. Bengio. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473, 2014. [3] T. Baltrusaitis, C. Ahuja, and L. Morency. Multimodal machine learning: A survey and taxonomy. CoRR, abs/1705.09406, 2017. [4] C. Baziotis, N. Athanasiou, A. Chronopoulou, A. Kolovou, G. Paraskevopoulos, N. Ellinas, S. Narayanan, and A. Potamianos. Ntua-slp at semeval-2018 task 1: Predicting affective content in tweets with deep attentive rnns and transfer learning. arXiv preprint arXiv:1804.06658, 2018. [5] S. Buechel and U. Hahn. Emotion analysis as a regression problem - dimensional models and their implications on emotion representation and metrical evaluation. In Proceedings of the Twenty-second European Conference on Artificial Intelligence, pages 1114–1122. IOS Press, 2016. [6] C. Busso, M. Bulut, C.-C. Lee, A. Kazemzadeh, E. Mower, S. Kim, J. N. Chang, S. Lee, and S. S. Narayanan. Iemocap: Interactive emotional dyadic motion capture database. Language resources and evaluation, 42(4):335, 2008. [7] R. A. Calvo and S. Mac Kim. Emotions in text: dimensional and categorical models. Computational Intelligence, 29(3):527–543, 2013. [8] J. Chikazoe, D. H. Lee, N. Kriegeskorte, and A. K. Anderson. Population coding of affect across stimuli, modalities and individuals. Nature neuroscience, 17(8):1114, 2014. [9] G. Degottex, J. Kane, T. Drugman, T. Raitio, and S. Scherer. Covarep - a collaborative voice analysis repository for speech technologies. In 2014 ieee international conference on acoustics, speech and signal processing (icassp), pages 960–964. IEEE, 2014. [10] A. Dhall, R. Goecke, S. Ghosh, J. Joshi, J. Hoey, and T. Gedeon. From individual to group-level emotion recognition: Emotiw 5.0. In Proceedings of the 19th ACM international conference on multimodal interaction, pages 524–528. ACM, 2017. [11] A. Dhall, O. Ramana Murthy, R. Goecke, J. Joshi, and T. Gedeon. Video and image based emotion recognition challenges in the wild: Emotiw 2015. In Proceedings of the 2015 ACM on international conference on multimodal interaction, pages 423–426. ACM, 2015. [12] P. Ekman. An argument for basic emotions. Cognition & emotion, 6(3-4):169–200, 1992. [13] B. Felbo, A. Mislove, A. Søgaard, I. Rahwan, and S. Lehmann. Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm. arXiv preprint arXiv:1708.00524, 2017. [14] O. Firat, K. Cho, and Y. Bengio. Multi-way, multilingual neural machine translation with a shared attention mechanism. arXiv preprint arXiv:1601.01073, 2016. [15] F. A. Gers, J. Schmidhuber, and F. Cummins. Learning to forget: Continual prediction with lstm. 1999. [16] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. CoRR, abs/1512.03385, 2015. [17] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016. [18] iMotions. Facial expression analysis. [19] G. Klein, Y. Kim, Y. Deng, J. Senellart, and A. M. Rush. Opennmt: Open-source toolkit for neural machine translation. arXiv preprint arXiv:1701.02810, 2017. [20] M.-T. Luong, H. Pham, and C. D. Manning. Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025, 2015. [21] H. Meisheri and L. Dey. Tcs research at semeval-2018 task 1: Learning robust representations using multi-attention architecture. In Proceedings of The 12th International Workshop on Semantic Evaluation, pages 291–299, 2018. [22] S. Mirsamadi, E. Barsoum, and C. Zhang. Automatic speech emotion recognition using recurrent neural networks with local attention. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 2227–2231. IEEE, 2017. [23] S. Mohammad, F. Bravo-Marquez, M. Salameh, and S. Kiritchenko. Semeval-2018 task 1: Affect in tweets. In Proceedings of The 12th International Workshop on Semantic Evaluation, pages 1–17, 2018. [24] M. Neumann and N. T. Vu. Attentive convolutional neural network based speech emotion recognition: A study on the impact of input features, signal length, and acted speech. arXiv preprint arXiv:1706.00612, 2017. [25] J. H. Park, P. Xu, and P. Fung. Plusemo2vec at semeval-2018 task 1: Exploiting emotion knowledge from emoji and# hashtags. arXiv preprint arXiv:1804.08280, 2018. [26] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer. Automatic differentiation in pytorch. 2017. [27] J. Pennington, R. Socher, and C. Manning. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 1532–1543, 2014. [28] S. Poria, A. Hussain, and E. Cambria. Multimodal Sentiment Analysis, volume 8. Springer, 2018. [29] J. A. Russell and A. Mehrabian. Evidence for a three-factor theory of emotions. Journal of research in Personality, 11(3):273–294, 1977. [30] A. Sriram, H. Jun, S. Satheesh, and A. Coates. Cold fusion: Training seq2seq models together with language models. arXiv preprint arXiv:1708.06426, 2017. [31] I. Sutskever, O. Vinyals, and Q. V. Le. Sequence to sequence learning with neural networks. CoRR, abs/1409.3215, 2014. [32] Z. Tu, Z. Lu, Y. Liu, X. Liu, and H. Li. Modeling coverage for neural machine translation. arXiv preprint arXiv:1601.04811, 2016. [33] Y.Wu, M. Schuster, Z. Chen, Q. V. Le, M. Norouzi, W. Macherey, M. Krikun, Y. Cao, Q. Gao, K. Macherey, et al. Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144, 2016. [34] A. B. Zadeh, P. P. Liang, S. Poria, E. Cambria, and L.-P. Morency. Multimodal language analysis in the wild: Cmu-mosei dataset and interpretable dynamic fusion graph. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2236–2246, 2018. [35] Z. Zhang, B. Wu, and B. Schuller. Attention-augmented end-to-end multi-task learning for emotion prediction from speech. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 6705–6709. IEEE, 2019.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/74435	-
dc.description.abstract	Emotion plays a big role in our daily life. When we try to perceive emotion, we do not only rely on one modality, but rely on several modalities. Psychology studies show that our human sensories perceives several signals from our environment and translate them to codes that are similar across people. In our work, we formulate the emotion recognition as emotion translation task using Sequence-to-sequence (Seq2seq) models which are widely used in neural machine translation task. Additionally, we add attention mechanism as this mechanism can help the model to remember long sequences. Motivated by Google Neural Machine Translation (GNMT), we also try to add residual connection to resolve the decreasing performance when the models have several stacks of hidden layers. We use CMU-MOSEI dataset to train and evaluate our models. Experiment shows that our proposed Seq2seq architecture outperforms the baseline model on emotion translation task. Moreover, the models that use several modalities achieve better performance than the models that only use one modality. This observation proves that multimodal representation escalates the performance of emotion translation or emotion recognition.	en
dc.description.provenance	Made available in DSpace on 2021-06-17T08:35:38Z (GMT). No. of bitstreams: 1 ntu-108-R06922146-1.pdf: 2603205 bytes, checksum: fcdb23dd3f1d1b51710798772fe6aa84 (MD5) Previous issue date: 2019	en
dc.description.tableofcontents	Abstract i List of Figures vi List of Tables viii Chapter 1 Introduction 1 1.1 Background and Motivation . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 Thesis Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Chapter 2 Related Work 5 2.1 Emotion Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 Multimodal Fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.3 Sequence-to-sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.4 Sequence-to-sequence with Attention Mechanism . . . . . . . . . . . . 10 2.5 Sequence-to-sequence with Residual Connection . . . . . . . . . . . . 12 Chapter 3 Multimodal Emotion Translation 14 3.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.2 Multimodal Feature Representation . . . . . . . . . . . . . . . . . . . 16 3.3 Proposed Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.3.1 Baseline Model . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.3.2 Sequence-to-sequence Model . . . . . . . . . . . . . . . . . . . 18 3.3.3 Sequence-to-sequence with Attention Model . . . . . . . . . . 20 3.3.4 Sequence-to-sequence with Attention and Residual Connection Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Chapter 4 Experiments 24 4.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 4.2 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.3.1 Baseline Model . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.3.2 Sequence-to-sequence Model . . . . . . . . . . . . . . . . . . . 30 4.3.3 Sequence-to-sequence with Attention Mechanism Model . . . . 30 4.3.4 Sequence-to-sequence with Attention and Residual Connection Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.4 Experimental Results Analysis . . . . . . . . . . . . . . . . . . . . . . 34 Chapter 5 Conclusion and Future Work 37 5.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 Bibliography 39
dc.language.iso	en
dc.subject	Emotion Recognition	zh_TW
dc.subject	Emotion Translation	zh_TW
dc.subject	Sequence-to-sequence	zh_TW
dc.subject	Attention Mechanism	zh_TW
dc.subject	Residual Connection	zh_TW
dc.title	透過序列到序列模型翻譯多模態情感	zh_TW
dc.title	Translating Multimodal Emotion through Sequence-to-sequence Model	en
dc.type	Thesis
dc.date.schoolyear	107-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	蔡宗翰(Richard Tzong-Han Tsai),古倫維(Lun-Wei Ku)
dc.subject.keyword	Emotion Recognition,Emotion Translation,Sequence-to-sequence,Attention Mechanism,Residual Connection,	zh_TW
dc.relation.page	43
dc.identifier.doi	10.6342/NTU201902572
dc.rights.note	有償授權
dc.date.accepted	2019-08-12
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	資訊工程學研究所	zh_TW
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-108-1.pdf 未授權公開取用	2.54 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。