請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/74435完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 許永真(Jane Yung-jen Hsu) | |
| dc.contributor.author | Devina Ekawati | en |
| dc.contributor.author | 王廉花 | zh_TW |
| dc.date.accessioned | 2021-06-17T08:35:38Z | - |
| dc.date.available | 2024-08-15 | |
| dc.date.copyright | 2019-08-15 | |
| dc.date.issued | 2019 | |
| dc.date.submitted | 2019-08-08 | |
| dc.identifier.citation | [1] D. Acharya, Z. Huang, D. Pani Paudel, and L. Van Gool. Covariance pooling for facial expression recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 367–374, 2018.
[2] D. Bahdanau, K. Cho, and Y. Bengio. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473, 2014. [3] T. Baltrusaitis, C. Ahuja, and L. Morency. Multimodal machine learning: A survey and taxonomy. CoRR, abs/1705.09406, 2017. [4] C. Baziotis, N. Athanasiou, A. Chronopoulou, A. Kolovou, G. Paraskevopoulos, N. Ellinas, S. Narayanan, and A. Potamianos. Ntua-slp at semeval-2018 task 1: Predicting affective content in tweets with deep attentive rnns and transfer learning. arXiv preprint arXiv:1804.06658, 2018. [5] S. Buechel and U. Hahn. Emotion analysis as a regression problem - dimensional models and their implications on emotion representation and metrical evaluation. In Proceedings of the Twenty-second European Conference on Artificial Intelligence, pages 1114–1122. IOS Press, 2016. [6] C. Busso, M. Bulut, C.-C. Lee, A. Kazemzadeh, E. Mower, S. Kim, J. N. Chang, S. Lee, and S. S. Narayanan. Iemocap: Interactive emotional dyadic motion capture database. Language resources and evaluation, 42(4):335, 2008. [7] R. A. Calvo and S. Mac Kim. Emotions in text: dimensional and categorical models. Computational Intelligence, 29(3):527–543, 2013. [8] J. Chikazoe, D. H. Lee, N. Kriegeskorte, and A. K. Anderson. Population coding of affect across stimuli, modalities and individuals. Nature neuroscience, 17(8):1114, 2014. [9] G. Degottex, J. Kane, T. Drugman, T. Raitio, and S. Scherer. Covarep - a collaborative voice analysis repository for speech technologies. In 2014 ieee international conference on acoustics, speech and signal processing (icassp), pages 960–964. IEEE, 2014. [10] A. Dhall, R. Goecke, S. Ghosh, J. Joshi, J. Hoey, and T. Gedeon. From individual to group-level emotion recognition: Emotiw 5.0. In Proceedings of the 19th ACM international conference on multimodal interaction, pages 524–528. ACM, 2017. [11] A. Dhall, O. Ramana Murthy, R. Goecke, J. Joshi, and T. Gedeon. Video and image based emotion recognition challenges in the wild: Emotiw 2015. In Proceedings of the 2015 ACM on international conference on multimodal interaction, pages 423–426. ACM, 2015. [12] P. Ekman. An argument for basic emotions. Cognition & emotion, 6(3-4):169–200, 1992. [13] B. Felbo, A. Mislove, A. Søgaard, I. Rahwan, and S. Lehmann. Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm. arXiv preprint arXiv:1708.00524, 2017. [14] O. Firat, K. Cho, and Y. Bengio. Multi-way, multilingual neural machine translation with a shared attention mechanism. arXiv preprint arXiv:1601.01073, 2016. [15] F. A. Gers, J. Schmidhuber, and F. Cummins. Learning to forget: Continual prediction with lstm. 1999. [16] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. CoRR, abs/1512.03385, 2015. [17] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016. [18] iMotions. Facial expression analysis. [19] G. Klein, Y. Kim, Y. Deng, J. Senellart, and A. M. Rush. Opennmt: Open-source toolkit for neural machine translation. arXiv preprint arXiv:1701.02810, 2017. [20] M.-T. Luong, H. Pham, and C. D. Manning. Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025, 2015. [21] H. Meisheri and L. Dey. Tcs research at semeval-2018 task 1: Learning robust representations using multi-attention architecture. In Proceedings of The 12th International Workshop on Semantic Evaluation, pages 291–299, 2018. [22] S. Mirsamadi, E. Barsoum, and C. Zhang. Automatic speech emotion recognition using recurrent neural networks with local attention. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 2227–2231. IEEE, 2017. [23] S. Mohammad, F. Bravo-Marquez, M. Salameh, and S. Kiritchenko. Semeval-2018 task 1: Affect in tweets. In Proceedings of The 12th International Workshop on Semantic Evaluation, pages 1–17, 2018. [24] M. Neumann and N. T. Vu. Attentive convolutional neural network based speech emotion recognition: A study on the impact of input features, signal length, and acted speech. arXiv preprint arXiv:1706.00612, 2017. [25] J. H. Park, P. Xu, and P. Fung. Plusemo2vec at semeval-2018 task 1: Exploiting emotion knowledge from emoji and# hashtags. arXiv preprint arXiv:1804.08280, 2018. [26] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer. Automatic differentiation in pytorch. 2017. [27] J. Pennington, R. Socher, and C. Manning. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 1532–1543, 2014. [28] S. Poria, A. Hussain, and E. Cambria. Multimodal Sentiment Analysis, volume 8. Springer, 2018. [29] J. A. Russell and A. Mehrabian. Evidence for a three-factor theory of emotions. Journal of research in Personality, 11(3):273–294, 1977. [30] A. Sriram, H. Jun, S. Satheesh, and A. Coates. Cold fusion: Training seq2seq models together with language models. arXiv preprint arXiv:1708.06426, 2017. [31] I. Sutskever, O. Vinyals, and Q. V. Le. Sequence to sequence learning with neural networks. CoRR, abs/1409.3215, 2014. [32] Z. Tu, Z. Lu, Y. Liu, X. Liu, and H. Li. Modeling coverage for neural machine translation. arXiv preprint arXiv:1601.04811, 2016. [33] Y.Wu, M. Schuster, Z. Chen, Q. V. Le, M. Norouzi, W. Macherey, M. Krikun, Y. Cao, Q. Gao, K. Macherey, et al. Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144, 2016. [34] A. B. Zadeh, P. P. Liang, S. Poria, E. Cambria, and L.-P. Morency. Multimodal language analysis in the wild: Cmu-mosei dataset and interpretable dynamic fusion graph. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2236–2246, 2018. [35] Z. Zhang, B. Wu, and B. Schuller. Attention-augmented end-to-end multi-task learning for emotion prediction from speech. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 6705–6709. IEEE, 2019. | |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/74435 | - |
| dc.description.abstract | Emotion plays a big role in our daily life. When we try to perceive emotion, we do not only rely on one modality, but rely on several modalities. Psychology studies show that our human sensories perceives several signals from our environment and translate them to codes that are similar across people.
In our work, we formulate the emotion recognition as emotion translation task using Sequence-to-sequence (Seq2seq) models which are widely used in neural machine translation task. Additionally, we add attention mechanism as this mechanism can help the model to remember long sequences. Motivated by Google Neural Machine Translation (GNMT), we also try to add residual connection to resolve the decreasing performance when the models have several stacks of hidden layers. We use CMU-MOSEI dataset to train and evaluate our models. Experiment shows that our proposed Seq2seq architecture outperforms the baseline model on emotion translation task. Moreover, the models that use several modalities achieve better performance than the models that only use one modality. This observation proves that multimodal representation escalates the performance of emotion translation or emotion recognition. | en |
| dc.description.provenance | Made available in DSpace on 2021-06-17T08:35:38Z (GMT). No. of bitstreams: 1 ntu-108-R06922146-1.pdf: 2603205 bytes, checksum: fcdb23dd3f1d1b51710798772fe6aa84 (MD5) Previous issue date: 2019 | en |
| dc.description.tableofcontents | Abstract i
List of Figures vi List of Tables viii Chapter 1 Introduction 1 1.1 Background and Motivation . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 Thesis Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Chapter 2 Related Work 5 2.1 Emotion Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 Multimodal Fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.3 Sequence-to-sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.4 Sequence-to-sequence with Attention Mechanism . . . . . . . . . . . . 10 2.5 Sequence-to-sequence with Residual Connection . . . . . . . . . . . . 12 Chapter 3 Multimodal Emotion Translation 14 3.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.2 Multimodal Feature Representation . . . . . . . . . . . . . . . . . . . 16 3.3 Proposed Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.3.1 Baseline Model . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.3.2 Sequence-to-sequence Model . . . . . . . . . . . . . . . . . . . 18 3.3.3 Sequence-to-sequence with Attention Model . . . . . . . . . . 20 3.3.4 Sequence-to-sequence with Attention and Residual Connection Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Chapter 4 Experiments 24 4.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 4.2 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.3.1 Baseline Model . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.3.2 Sequence-to-sequence Model . . . . . . . . . . . . . . . . . . . 30 4.3.3 Sequence-to-sequence with Attention Mechanism Model . . . . 30 4.3.4 Sequence-to-sequence with Attention and Residual Connection Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.4 Experimental Results Analysis . . . . . . . . . . . . . . . . . . . . . . 34 Chapter 5 Conclusion and Future Work 37 5.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 Bibliography 39 | |
| dc.language.iso | en | |
| dc.subject | Emotion Recognition | zh_TW |
| dc.subject | Emotion Translation | zh_TW |
| dc.subject | Sequence-to-sequence | zh_TW |
| dc.subject | Attention Mechanism | zh_TW |
| dc.subject | Residual Connection | zh_TW |
| dc.title | 透過序列到序列模型翻譯多模態情感 | zh_TW |
| dc.title | Translating Multimodal Emotion through Sequence-to-sequence Model | en |
| dc.type | Thesis | |
| dc.date.schoolyear | 107-2 | |
| dc.description.degree | 碩士 | |
| dc.contributor.oralexamcommittee | 蔡宗翰(Richard Tzong-Han Tsai),古倫維(Lun-Wei Ku) | |
| dc.subject.keyword | Emotion Recognition,Emotion Translation,Sequence-to-sequence,Attention Mechanism,Residual Connection, | zh_TW |
| dc.relation.page | 43 | |
| dc.identifier.doi | 10.6342/NTU201902572 | |
| dc.rights.note | 有償授權 | |
| dc.date.accepted | 2019-08-12 | |
| dc.contributor.author-college | 電機資訊學院 | zh_TW |
| dc.contributor.author-dept | 資訊工程學研究所 | zh_TW |
| 顯示於系所單位: | 資訊工程學系 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-108-1.pdf 未授權公開取用 | 2.54 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
