任務式多模態對話狀態追蹤的遷移學習能力之研究

徐新凱; Hsin-Kai Hsu

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/88562

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	陳縕儂	zh_TW
dc.contributor.advisor	Yun-Nung Chen	en
dc.contributor.author	徐新凱	zh_TW
dc.contributor.author	Hsin-Kai Hsu	en
dc.date.accessioned	2023-08-15T16:50:54Z	-
dc.date.available	2023-11-09	-
dc.date.copyright	2023-08-15	-
dc.date.issued	2023	-
dc.date.submitted	2023-08-01	-
dc.identifier.citation	A. Agrawal, S. Singh, L. Schneider, and M. Samuels. On the role of corpus ordering in language modeling. In Proceedings of the Second Workshop on Simple and Efficient Natural Language Processing, pages 142–154, Virtual, Nov. 2021. Association for Computational Linguistics. J.-B. Alayrac, J. Donahue, P. Luc, A. Miech, I. Barr, Y. Hasson, K. Lenc, A. Mensch, K. Millican, M. Reynolds, R. Ring, E. Rutherford, S. Cabi, T. Han, Z. Gong, S. Samangooei, M. Monteiro, J. L. Menick, S. Borgeaud, A. Brock, A. Nematzadeh, S. Sharifzadeh, M. a. Bińkowski, R. Barreira, O. Vinyals, A. Zisserman, and K. Simonyan. Flamingo: a visual language model for few-shot learning. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems, volume 35, pages 23716–23736. Curran Associates, Inc., 2022. L. J. Ba, J. R. Kiros, and G. E. Hinton. Layer normalization. CoRR, abs/1607.06450, 2016. P. Budzianowski and I. Vulić. Hello, it's GPT-2 - how can I help you? towards the use of pretrained language models for task-oriented dialogue systems. In Proceedings of the 3rd Workshop on Neural Generation and Translation, pages 15–22, Hong Kong, Nov. 2019. Association for Computational Linguistics. P. Budzianowski, T.-H. Wen, B.-H. Tseng, I. Casanueva, S. Ultes, O. Ramadan, and M. Gašić. MultiWOZ - a large-scale multi-domain Wizard of Oz dataset for task-oriented dialogue modelling. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 5016–5026, Brussels, Belgium, Oct.-Nov. 2018. Association for Computational Linguistics. P. A. Crook, S. Poddar, A. De, S. Shafi, D. Whitney, A. Geramifard, and R. Subba. Simmc: Situated interactive multi-modal conversational data collection and evaluation platform. arXiv preprint arXiv:1911.02690, 2019. A. Das, S. Kottur, K. Gupta, A. Singh, D. Yadav, J. M. F. Moura, D. Parikh, and D. Batra. Visual dialog. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017. J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics. M. Eric, R. Goel, S. Paul, A. Sethi, S. Agarwal, S. Gao, A. Kumar, A. Goyal, P. Ku, and D. Hakkani-Tur. MultiWOZ 2.1: A consolidated multi-domain dialogue dataset with state corrections and state tracking baselines. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 422–428, Marseille, France, May 2020. European Language Resources Association. R. C. Gunasekara, S. Kim, L. F. D'Haro, A. Rastogi, Y. Chen, M. Eric, B. Hedayatnia, K. Gopalakrishnan, Y. Liu, C. Huang, D. Hakkani-Tür, J. Li, Q. Zhu, L. Luo, L. Liden, K. Huang, S. Shayandeh, R. Liang, B. Peng, Z. Zhang, S. Shukla, M. Huang, J. Gao, S. Mehri, Y. Feng, C. Gordon, S. H. Alavi, D. R. Traum, M. Eskénazi, A. Beirami, E. Cho, P. A. Crook, A. De, A. Geramifard, S. Kottur, S. Moon, S. Poddar, and R. Subba. Overview of the ninth dialog system technology challenge: DSTC9. CoRR, abs/2011.06486, 2020. D. Ham, J.-G. Lee, Y. Jang, and K.-E. Kim. End-to-end neural pipeline for goal-oriented dialogue systems using GPT-2. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 583–592, Online, July 2020. Association for Computational Linguistics. T. Han, X. Liu, R. Takanobu, Y. Lian, C. Huang, W. Peng, and M. Huang. Multiwoz 2.3: A multi-domain task-oriented dataset enhanced with annotation corrections and co-reference annotation. CoRR, abs/2010.05594, 2020. M. Henderson, B. Thomson, and J. D. Williams. The second dialog state tracking challenge. In Proceedings of the 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL), pages 263–272, Philadelphia, PA, USA, June 2014. Association for Computational Linguistics. A. Holtzman, J. Buys, L. Du, M. Forbes, and Y. Choi. The curious case of neural text degeneration. In International Conference on Learning Representations, 2020. E. Hosseini-Asl, B. McCann, C.-S. Wu, S. Yavuz, and R. Socher. A simple language model for task-oriented dialogue, 2020. N. Houlsby, A. Giurgiu, S. Jastrzebski, B. Morrone, Q. De Laroussilhe, A. Gesmundo, M. Attariyan, and S. Gelly. Parameter-efficient transfer learning for NLP. In K. Chaudhuri and R. Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 2790–2799. PMLR, 09–15 Jun 2019. S. Kottur, S. Moon, A. Geramifard, and B. Damavandi. SIMMC 2.0: A task-oriented dialog dataset for immersive multimodal conversations. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 4903–4912, Online and Punta Cana, Dominican Republic, Nov. 2021. Association for Computational Linguistics. P.-N. Kung, C.-C. Chang, T.-H. Yang, H.-K. Hsu, Y.-J. Liou, and Y.-N. Chen. Multi-task learning for situated multi-domain end-to-end dialogue systems, 2021. K. Lee, D. Ippolito, A. Nystrom, C. Zhang, D. Eck, C. Callison-Burch, and N. Carlini. Deduplicating training data makes language models better. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8424–8445, Dublin, Ireland, May 2022. Association for Computational Linguistics. M. Lewis, Y. Liu, N. Goyal, M. Ghazvininejad, A. Mohamed, O. Levy, V. Stoyanov, and L. Zettlemoyer. BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7871–7880, Online, July 2020. Association for Computational Linguistics. K. Margatina, G. Vernikos, L. Barrault, and N. Aletras. Active learning by acquiring contrastive examples. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 650–663, Online and Punta Cana, Dominican Republic, Nov. 2021. Association for Computational Linguistics. S. Moon, S. Kottur, P. A. Crook, A. De, S. Poddar, T. Levin, D. Whitney, D. Difranco, A. Beirami, E. Cho, R. Subba, and A. Geramifard. Situated and interactive multimodal conversations. arXiv preprint arXiv:2006.01460, 2020. V. Murahari, D. Batra, D. Parikh, and A. Das. Large-scale pretraining for visual dialog: A simple state-of-the-art baseline. CoRR, abs/1912.02379, 2019. A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever. Improving language understanding by generative pre-training. 2018. A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever. Language models are unsupervised multitask learners. 2019. C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, and P. J. Liu. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 21(140):1–67, 2020. M. Treviso, J.-U. Lee, T. Ji, B. van Aken, Q. Cao, M. R. Ciosici, M. Hassid, K. Heafield, S. Hooker, C. Raffel, P. H. Martins, A. F. T. Martins, J. Z. Forde, P. Milder, E. Simpson, N. Slonim, J. Dodge, E. Strubell, N. Balasubramanian, L. Derczynski, I. Gurevych, and R. Schwartz. Efficient methods for natural language processing: A survey, 2023. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. u. Kaiser, and I. Polosukhin. Attention is all you need. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017. F. Ye, J. Manotumruksa, and E. Yilmaz. MultiWOZ 2.4: A multi-domain task-oriented dialogue dataset with additional annotation corrections to improve state tracking evaluation. In Proceedings of the 23rd Annual Meeting of the Special Interest Group on Discourse and Dialogue, pages 351–360, Edinburgh, UK, Sept. 2022. Association for Computational Linguistics. K. Yoshino, Y.-N. Chen, P. Crook, S. Kottur, J. Li, B. Hedayatnia, S. Moon, Z. Fei, Z. Li, J. Zhang, Y. Feng, J. Zhou, S. Kim, Y. Liu, D. Jin, A. Papangelis, K. Gopalakrishnan, D. Hakkani-Tur, B. Damavandi, A. Geramifard, C. Hori, A. Shah, C. Zhang, H. Li, J. Sedoc, L. F. D'Haro, R. Banchs, and A. Rudnicky. Overview of the tenth dialog system technology challenge: Dstc10. IEEE/ACM Transactions on Audio, Speech, and Language Processing, pages 1–14, 2023. X. Zang, A. Rastogi, S. Sunkara, R. Gupta, J. Zhang, and J. Chen. MultiWOZ 2.2: A dialogue dataset with additional annotation corrections and state tracking baselines. In Proceedings of the 2nd Workshop on Natural Language Processing for Conversational AI, pages 109–117, Online, July 2020. Association for Computational Linguistics.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/88562	-
dc.description.abstract	在多模態任務導向型對話中，代理人（agent）需要理解使用者感受到的多模態環境的能力。Simmc和Simmc2是針對購物場景的多模態任務導向對話的資料集。Simmc在傢俱領域使用了輪播的視角（carousel view），並在時尚領域使用了在真實世界使用者視角下的圖片作為多模態上下文。與Simmc不同，Simmc2使用了逼真的場景圖片。由於環境設置的難度不同，相較於Simmc，Simmc2在資料收集上更加困難。在現實世界中，像Simmc2這樣情境的資料可能很少，而從像Simmc這樣的情境中收集資料則更為容易。在本論文中，我們藉由在Simmc上進行預訓練（pre-training），然後在資源較少的Simmc2上進行微調（fine-tuning），來研究多模態任務導向對話的可轉移性。結果顯示，在Simmc上進行預訓練可以提高聯合準確性（joint accuracy），並且每個意圖（intent）都從預訓練中受益。	zh_TW
dc.description.abstract	In multi-modal task-oriented dialogues, agents need the ability to comprehend the multi-modal context perceived by the user. For this purpose, Simmc and Simmc2 were introduced as multi-modal task-oriented dialogue datasets, specifically designed for shopping scenarios. Simmc incorporates a carousel view within the furniture domain and real-world images captured from the user's perspective in the fashion domain as their multi-modal contexts. In contrast, Simmc2 utilizes photo-realistic scenes, making the data collection process more challenging due to its elaborate setup compared to Simmc. In real-world scenarios, obtaining data similar to Simmc2 may be a rarity, whereas collecting data akin to Simmc is comparatively easier. This thesis aims to explore the transferability of multi-modal task-oriented dialogue by employing a two-step process: pre-training on the Simmc dataset, followed by fine-tuning on the lower resource Simmc2 dataset. The results indicate that pre-training on Simmc significantly enhances joint accuracy, and all intents demonstrate improvements from the pre-training process.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2023-08-15T16:50:54Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2023-08-15T16:50:54Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	口試委員審定書 i 致謝 iii 摘要 v Abstract vii Contents ix List of Figures xiii List of Tables xv Chapter 1 Introduction 1 1.1 Motivation 1 1.2 Task Introduction 2 1.3 Main Contributions 2 1.4 Thesis Structure 3 Chapter 2 Background 5 2.1 Transformer 5 2.1.1 Attention 5 2.1.2 Multi-Head Attention 6 2.1.3 Difference of Attention between Encoder and Decoder 6 2.1.4 Input of Transformer Model 7 2.1.5 An Example of Transformer-based Model : GPT-2 7 2.2 Training and Inference of Generative Language Model 8 2.2.1 Training of Generative Language Model 8 2.2.2 Inference of Generative Language Model 9 Chapter 3 Related Work 11 3.1 Task-Oriented Dialogue 11 3.2 Multi-Modal Dialogue 12 3.3 Dialogue State Tracking Datasets 12 3.4 Transfer Learning in Natural Language Processing 14 3.5 Data Efficiency in Natural Language Processing 15 Chapter 4 Proposed Method 17 4.1 Input Representation of Simmc2 for GPT-2 17 Chapter 5 Experiments 21 5.1 Experiments of Low-Resource DST 21 5.1.1 Experimental Setup 21 5.1.1.1 Training Details 22 5.1.2 Results of Low-Resource DST 23 5.1.2.1 Fine-tuning with 0.01 Proportion 23 5.1.2.2 Summary of Fine-tuning with Different Proportion of Data 24 5.2 Experiments with Sufficient Data 25 5.2.1 Experimental Setup 25 5.2.1.1 Training Details 25 5.2.2 Results of Sufficient Data 25 Chapter 6 Error Analysis 27 6.1 Error Analysis of Low-Resource DST 27 6.2 Error Analysis of Training with Sufficient Data 27 Chapter 7 Conclusion 31 References 33 Appendices 39 Chapter A Results of Low-Resource DST 41	-
dc.language.iso	en	-
dc.subject	遷移學習	zh_TW
dc.subject	多模態對話	zh_TW
dc.subject	任務式對話	zh_TW
dc.subject	對話狀態追蹤	zh_TW
dc.subject	對話系統	zh_TW
dc.subject	Task-Oriented Dialogue	en
dc.subject	Multimodal Dialogue	en
dc.subject	Dialogue State Tracking	en
dc.subject	Transfer Learning	en
dc.subject	Dialogue System	en
dc.title	任務式多模態對話狀態追蹤的遷移學習能力之研究	zh_TW
dc.title	Transferability of Task-Oriented Multimodal Dialogue State Tracking	en
dc.type	Thesis	-
dc.date.schoolyear	111-2	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	李宏毅;陳尚澤;馬偉雲	zh_TW
dc.contributor.oralexamcommittee	Hung-Yi Lee;Shang-Tse Chen;Wei-Yun Ma	en
dc.subject.keyword	遷移學習,任務式對話,多模態對話,對話狀態追蹤,對話系統,	zh_TW
dc.subject.keyword	Transfer Learning,Task-Oriented Dialogue,Multimodal Dialogue,Dialogue State Tracking,Dialogue System,	en
dc.relation.page	42	-
dc.identifier.doi	10.6342/NTU202302109	-
dc.rights.note	未授權	-
dc.date.accepted	2023-08-04	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	資訊工程學系	-
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-111-2.pdf 未授權公開取用	1.81 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。