Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/74435
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor許永真(Jane Yung-jen Hsu)
dc.contributor.authorDevina Ekawatien
dc.contributor.author王廉花zh_TW
dc.date.accessioned2021-06-17T08:35:38Z-
dc.date.available2024-08-15
dc.date.copyright2019-08-15
dc.date.issued2019
dc.date.submitted2019-08-08
dc.identifier.citation[1] D. Acharya, Z. Huang, D. Pani Paudel, and L. Van Gool. Covariance pooling for facial expression recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 367–374, 2018.
[2] D. Bahdanau, K. Cho, and Y. Bengio. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473, 2014.
[3] T. Baltrusaitis, C. Ahuja, and L. Morency. Multimodal machine learning: A survey and taxonomy. CoRR, abs/1705.09406, 2017.
[4] C. Baziotis, N. Athanasiou, A. Chronopoulou, A. Kolovou, G. Paraskevopoulos, N. Ellinas, S. Narayanan, and A. Potamianos. Ntua-slp at semeval-2018 task 1: Predicting affective content in tweets with deep attentive rnns and transfer learning. arXiv preprint arXiv:1804.06658, 2018.
[5] S. Buechel and U. Hahn. Emotion analysis as a regression problem - dimensional models and their implications on emotion representation and metrical evaluation. In Proceedings of the Twenty-second European Conference on Artificial Intelligence, pages 1114–1122. IOS Press, 2016.
[6] C. Busso, M. Bulut, C.-C. Lee, A. Kazemzadeh, E. Mower, S. Kim, J. N. Chang, S. Lee, and S. S. Narayanan. Iemocap: Interactive emotional dyadic motion capture database. Language resources and evaluation, 42(4):335, 2008.
[7] R. A. Calvo and S. Mac Kim. Emotions in text: dimensional and categorical models. Computational Intelligence, 29(3):527–543, 2013.
[8] J. Chikazoe, D. H. Lee, N. Kriegeskorte, and A. K. Anderson. Population coding of affect across stimuli, modalities and individuals. Nature neuroscience, 17(8):1114, 2014.
[9] G. Degottex, J. Kane, T. Drugman, T. Raitio, and S. Scherer. Covarep - a collaborative voice analysis repository for speech technologies. In 2014 ieee international conference on acoustics, speech and signal processing (icassp), pages 960–964. IEEE, 2014.
[10] A. Dhall, R. Goecke, S. Ghosh, J. Joshi, J. Hoey, and T. Gedeon. From individual to group-level emotion recognition: Emotiw 5.0. In Proceedings of the 19th ACM international conference on multimodal interaction, pages 524–528. ACM, 2017.
[11] A. Dhall, O. Ramana Murthy, R. Goecke, J. Joshi, and T. Gedeon. Video and image based emotion recognition challenges in the wild: Emotiw 2015. In Proceedings of the 2015 ACM on international conference on multimodal interaction, pages 423–426. ACM, 2015.
[12] P. Ekman. An argument for basic emotions. Cognition & emotion, 6(3-4):169–200, 1992.
[13] B. Felbo, A. Mislove, A. Søgaard, I. Rahwan, and S. Lehmann. Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm. arXiv preprint arXiv:1708.00524, 2017.
[14] O. Firat, K. Cho, and Y. Bengio. Multi-way, multilingual neural machine translation with a shared attention mechanism. arXiv preprint arXiv:1601.01073, 2016.
[15] F. A. Gers, J. Schmidhuber, and F. Cummins. Learning to forget: Continual prediction with lstm. 1999.
[16] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. CoRR, abs/1512.03385, 2015.
[17] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
[18] iMotions. Facial expression analysis.
[19] G. Klein, Y. Kim, Y. Deng, J. Senellart, and A. M. Rush. Opennmt: Open-source toolkit for neural machine translation. arXiv preprint arXiv:1701.02810, 2017.
[20] M.-T. Luong, H. Pham, and C. D. Manning. Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025, 2015.
[21] H. Meisheri and L. Dey. Tcs research at semeval-2018 task 1: Learning robust representations using multi-attention architecture. In Proceedings of The 12th International Workshop on Semantic Evaluation, pages 291–299, 2018.
[22] S. Mirsamadi, E. Barsoum, and C. Zhang. Automatic speech emotion recognition using recurrent neural networks with local attention. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 2227–2231. IEEE, 2017.
[23] S. Mohammad, F. Bravo-Marquez, M. Salameh, and S. Kiritchenko. Semeval-2018 task 1: Affect in tweets. In Proceedings of The 12th International Workshop on Semantic Evaluation, pages 1–17, 2018.
[24] M. Neumann and N. T. Vu. Attentive convolutional neural network based speech emotion recognition: A study on the impact of input features, signal length, and acted speech. arXiv preprint arXiv:1706.00612, 2017.
[25] J. H. Park, P. Xu, and P. Fung. Plusemo2vec at semeval-2018 task 1: Exploiting emotion knowledge from emoji and# hashtags. arXiv preprint arXiv:1804.08280, 2018.
[26] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer. Automatic differentiation in pytorch. 2017.
[27] J. Pennington, R. Socher, and C. Manning. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 1532–1543, 2014.
[28] S. Poria, A. Hussain, and E. Cambria. Multimodal Sentiment Analysis, volume 8. Springer, 2018.
[29] J. A. Russell and A. Mehrabian. Evidence for a three-factor theory of emotions. Journal of research in Personality, 11(3):273–294, 1977.
[30] A. Sriram, H. Jun, S. Satheesh, and A. Coates. Cold fusion: Training seq2seq models together with language models. arXiv preprint arXiv:1708.06426, 2017.
[31] I. Sutskever, O. Vinyals, and Q. V. Le. Sequence to sequence learning with neural networks. CoRR, abs/1409.3215, 2014.
[32] Z. Tu, Z. Lu, Y. Liu, X. Liu, and H. Li. Modeling coverage for neural machine translation. arXiv preprint arXiv:1601.04811, 2016.
[33] Y.Wu, M. Schuster, Z. Chen, Q. V. Le, M. Norouzi, W. Macherey, M. Krikun, Y. Cao, Q. Gao, K. Macherey, et al. Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144, 2016.
[34] A. B. Zadeh, P. P. Liang, S. Poria, E. Cambria, and L.-P. Morency. Multimodal language analysis in the wild: Cmu-mosei dataset and interpretable dynamic fusion graph. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2236–2246, 2018.
[35] Z. Zhang, B. Wu, and B. Schuller. Attention-augmented end-to-end multi-task learning for emotion prediction from speech. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 6705–6709. IEEE, 2019.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/74435-
dc.description.abstractEmotion plays a big role in our daily life. When we try to perceive emotion, we do not only rely on one modality, but rely on several modalities. Psychology studies show that our human sensories perceives several signals from our environment and translate them to codes that are similar across people.
In our work, we formulate the emotion recognition as emotion translation task using Sequence-to-sequence (Seq2seq) models which are widely used in neural machine translation task. Additionally, we add attention mechanism as this mechanism can help the model to remember long sequences. Motivated by Google Neural Machine Translation (GNMT), we also try to add residual connection to resolve the decreasing performance when the models have several stacks of hidden layers.
We use CMU-MOSEI dataset to train and evaluate our models. Experiment shows that our proposed Seq2seq architecture outperforms the baseline model on emotion translation task. Moreover, the models that use several modalities achieve better performance than the models that only use one modality. This observation proves that multimodal representation escalates the performance of emotion translation or emotion recognition.
en
dc.description.provenanceMade available in DSpace on 2021-06-17T08:35:38Z (GMT). No. of bitstreams: 1
ntu-108-R06922146-1.pdf: 2603205 bytes, checksum: fcdb23dd3f1d1b51710798772fe6aa84 (MD5)
Previous issue date: 2019
en
dc.description.tableofcontentsAbstract i
List of Figures vi
List of Tables viii
Chapter 1 Introduction 1
1.1 Background and Motivation . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Thesis Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Chapter 2 Related Work 5
2.1 Emotion Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Multimodal Fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 Sequence-to-sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4 Sequence-to-sequence with Attention Mechanism . . . . . . . . . . . . 10
2.5 Sequence-to-sequence with Residual Connection . . . . . . . . . . . . 12
Chapter 3 Multimodal Emotion Translation 14
3.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.2 Multimodal Feature Representation . . . . . . . . . . . . . . . . . . . 16
3.3 Proposed Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.3.1 Baseline Model . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.3.2 Sequence-to-sequence Model . . . . . . . . . . . . . . . . . . . 18
3.3.3 Sequence-to-sequence with Attention Model . . . . . . . . . . 20
3.3.4 Sequence-to-sequence with Attention and Residual Connection
Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Chapter 4 Experiments 24
4.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.2 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.3.1 Baseline Model . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.3.2 Sequence-to-sequence Model . . . . . . . . . . . . . . . . . . . 30
4.3.3 Sequence-to-sequence with Attention Mechanism Model . . . . 30
4.3.4 Sequence-to-sequence with Attention and Residual Connection
Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.4 Experimental Results Analysis . . . . . . . . . . . . . . . . . . . . . . 34
Chapter 5 Conclusion and Future Work 37
5.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
Bibliography 39
dc.language.isoen
dc.subjectEmotion Recognitionzh_TW
dc.subjectEmotion Translationzh_TW
dc.subjectSequence-to-sequencezh_TW
dc.subjectAttention Mechanismzh_TW
dc.subjectResidual Connectionzh_TW
dc.title透過序列到序列模型翻譯多模態情感zh_TW
dc.titleTranslating Multimodal Emotion through Sequence-to-sequence Modelen
dc.typeThesis
dc.date.schoolyear107-2
dc.description.degree碩士
dc.contributor.oralexamcommittee蔡宗翰(Richard Tzong-Han Tsai),古倫維(Lun-Wei Ku)
dc.subject.keywordEmotion Recognition,Emotion Translation,Sequence-to-sequence,Attention Mechanism,Residual Connection,zh_TW
dc.relation.page43
dc.identifier.doi10.6342/NTU201902572
dc.rights.note有償授權
dc.date.accepted2019-08-12
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept資訊工程學研究所zh_TW
顯示於系所單位:資訊工程學系

文件中的檔案:
檔案 大小格式 
ntu-108-1.pdf
  未授權公開取用
2.54 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved