Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/88562
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor陳縕儂zh_TW
dc.contributor.advisorYun-Nung Chenen
dc.contributor.author徐新凱zh_TW
dc.contributor.authorHsin-Kai Hsuen
dc.date.accessioned2023-08-15T16:50:54Z-
dc.date.available2023-11-09-
dc.date.copyright2023-08-15-
dc.date.issued2023-
dc.date.submitted2023-08-01-
dc.identifier.citationA. Agrawal, S. Singh, L. Schneider, and M. Samuels. On the role of corpus ordering in language modeling. In Proceedings of the Second Workshop on Simple and Efficient Natural Language Processing, pages 142–154, Virtual, Nov. 2021. Association for Computational Linguistics.
J.-B. Alayrac, J. Donahue, P. Luc, A. Miech, I. Barr, Y. Hasson, K. Lenc, A. Mensch, K. Millican, M. Reynolds, R. Ring, E. Rutherford, S. Cabi, T. Han, Z. Gong, S. Samangooei, M. Monteiro, J. L. Menick, S. Borgeaud, A. Brock, A. Nematzadeh, S. Sharifzadeh, M. a. Bińkowski, R. Barreira, O. Vinyals, A. Zisserman, and K. Simonyan. Flamingo: a visual language model for few-shot learning. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems, volume 35, pages 23716–23736. Curran Associates, Inc., 2022.
L. J. Ba, J. R. Kiros, and G. E. Hinton. Layer normalization. CoRR, abs/1607.06450, 2016.
P. Budzianowski and I. Vulić. Hello, it's GPT-2 - how can I help you? towards the use of pretrained language models for task-oriented dialogue systems. In Proceedings of the 3rd Workshop on Neural Generation and Translation, pages 15–22, Hong Kong, Nov. 2019. Association for Computational Linguistics.
P. Budzianowski, T.-H. Wen, B.-H. Tseng, I. Casanueva, S. Ultes, O. Ramadan, and M. Gašić. MultiWOZ - a large-scale multi-domain Wizard of Oz dataset for task-oriented dialogue modelling. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 5016–5026, Brussels, Belgium, Oct.-Nov. 2018. Association for Computational Linguistics.
P. A. Crook, S. Poddar, A. De, S. Shafi, D. Whitney, A. Geramifard, and R. Subba. Simmc: Situated interactive multi-modal conversational data collection and evaluation platform. arXiv preprint arXiv:1911.02690, 2019.
A. Das, S. Kottur, K. Gupta, A. Singh, D. Yadav, J. M. F. Moura, D. Parikh, and D. Batra. Visual dialog. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017.
J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics.
M. Eric, R. Goel, S. Paul, A. Sethi, S. Agarwal, S. Gao, A. Kumar, A. Goyal, P. Ku, and D. Hakkani-Tur. MultiWOZ 2.1: A consolidated multi-domain dialogue dataset with state corrections and state tracking baselines. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 422–428, Marseille, France, May 2020. European Language Resources Association.
R. C. Gunasekara, S. Kim, L. F. D'Haro, A. Rastogi, Y. Chen, M. Eric, B. Hedayatnia, K. Gopalakrishnan, Y. Liu, C. Huang, D. Hakkani-Tür, J. Li, Q. Zhu, L. Luo, L. Liden, K. Huang, S. Shayandeh, R. Liang, B. Peng, Z. Zhang, S. Shukla, M. Huang, J. Gao, S. Mehri, Y. Feng, C. Gordon, S. H. Alavi, D. R. Traum, M. Eskénazi, A. Beirami, E. Cho, P. A. Crook, A. De, A. Geramifard, S. Kottur, S. Moon, S. Poddar, and R. Subba. Overview of the ninth dialog system technology challenge: DSTC9. CoRR, abs/2011.06486, 2020.
D. Ham, J.-G. Lee, Y. Jang, and K.-E. Kim. End-to-end neural pipeline for goal-oriented dialogue systems using GPT-2. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 583–592, Online, July 2020. Association for Computational Linguistics.
T. Han, X. Liu, R. Takanobu, Y. Lian, C. Huang, W. Peng, and M. Huang. Multiwoz 2.3: A multi-domain task-oriented dataset enhanced with annotation corrections and co-reference annotation. CoRR, abs/2010.05594, 2020.
M. Henderson, B. Thomson, and J. D. Williams. The second dialog state tracking challenge. In Proceedings of the 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL), pages 263–272, Philadelphia, PA, USA, June 2014. Association for Computational Linguistics.
A. Holtzman, J. Buys, L. Du, M. Forbes, and Y. Choi. The curious case of neural text degeneration. In International Conference on Learning Representations, 2020.
E. Hosseini-Asl, B. McCann, C.-S. Wu, S. Yavuz, and R. Socher. A simple language model for task-oriented dialogue, 2020.
N. Houlsby, A. Giurgiu, S. Jastrzebski, B. Morrone, Q. De Laroussilhe, A. Gesmundo, M. Attariyan, and S. Gelly. Parameter-efficient transfer learning for NLP. In K. Chaudhuri and R. Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 2790–2799. PMLR, 09–15 Jun 2019.
S. Kottur, S. Moon, A. Geramifard, and B. Damavandi. SIMMC 2.0: A task-oriented dialog dataset for immersive multimodal conversations. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 4903–4912, Online and Punta Cana, Dominican Republic, Nov. 2021. Association for Computational Linguistics.
P.-N. Kung, C.-C. Chang, T.-H. Yang, H.-K. Hsu, Y.-J. Liou, and Y.-N. Chen. Multi-task learning for situated multi-domain end-to-end dialogue systems, 2021.
K. Lee, D. Ippolito, A. Nystrom, C. Zhang, D. Eck, C. Callison-Burch, and N. Carlini. Deduplicating training data makes language models better. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8424–8445, Dublin, Ireland, May 2022. Association for Computational Linguistics.
M. Lewis, Y. Liu, N. Goyal, M. Ghazvininejad, A. Mohamed, O. Levy, V. Stoyanov, and L. Zettlemoyer. BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7871–7880, Online, July 2020. Association for Computational Linguistics.
K. Margatina, G. Vernikos, L. Barrault, and N. Aletras. Active learning by acquiring contrastive examples. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 650–663, Online and Punta Cana, Dominican Republic, Nov. 2021. Association for Computational Linguistics.
S. Moon, S. Kottur, P. A. Crook, A. De, S. Poddar, T. Levin, D. Whitney, D. Difranco, A. Beirami, E. Cho, R. Subba, and A. Geramifard. Situated and interactive multimodal conversations. arXiv preprint arXiv:2006.01460, 2020.
V. Murahari, D. Batra, D. Parikh, and A. Das. Large-scale pretraining for visual dialog: A simple state-of-the-art baseline. CoRR, abs/1912.02379, 2019.
A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever. Improving language understanding by generative pre-training. 2018.
A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever. Language models are unsupervised multitask learners. 2019.
C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, and P. J. Liu. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 21(140):1–67, 2020.
M. Treviso, J.-U. Lee, T. Ji, B. van Aken, Q. Cao, M. R. Ciosici, M. Hassid, K. Heafield, S. Hooker, C. Raffel, P. H. Martins, A. F. T. Martins, J. Z. Forde, P. Milder, E. Simpson, N. Slonim, J. Dodge, E. Strubell, N. Balasubramanian, L. Derczynski, I. Gurevych, and R. Schwartz. Efficient methods for natural language processing: A survey, 2023.
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. u. Kaiser, and I. Polosukhin. Attention is all you need. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017.
F. Ye, J. Manotumruksa, and E. Yilmaz. MultiWOZ 2.4: A multi-domain task-oriented dialogue dataset with additional annotation corrections to improve state tracking evaluation. In Proceedings of the 23rd Annual Meeting of the Special Interest Group on Discourse and Dialogue, pages 351–360, Edinburgh, UK, Sept. 2022. Association for Computational Linguistics.
K. Yoshino, Y.-N. Chen, P. Crook, S. Kottur, J. Li, B. Hedayatnia, S. Moon, Z. Fei, Z. Li, J. Zhang, Y. Feng, J. Zhou, S. Kim, Y. Liu, D. Jin, A. Papangelis, K. Gopalakrishnan, D. Hakkani-Tur, B. Damavandi, A. Geramifard, C. Hori, A. Shah, C. Zhang, H. Li, J. Sedoc, L. F. D'Haro, R. Banchs, and A. Rudnicky. Overview of the tenth dialog system technology challenge: Dstc10. IEEE/ACM Transactions on Audio, Speech, and Language Processing, pages 1–14, 2023.
X. Zang, A. Rastogi, S. Sunkara, R. Gupta, J. Zhang, and J. Chen. MultiWOZ 2.2: A dialogue dataset with additional annotation corrections and state tracking baselines. In Proceedings of the 2nd Workshop on Natural Language Processing for Conversational AI, pages 109–117, Online, July 2020. Association for Computational Linguistics.
-
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/88562-
dc.description.abstract在多模態任務導向型對話中,代理人(agent)需要理解使用者感受到的多模態環境的能力。Simmc和Simmc2是針對購物場景的多模態任務導向對話的資料集。Simmc在傢俱領域使用了輪播的視角(carousel view),並在時尚領域使用了在真實世界使用者視角下的圖片作為多模態上下文。與Simmc不同,Simmc2使用了逼真的場景圖片。由於環境設置的難度不同,相較於Simmc,Simmc2在資料收集上更加困難。在現實世界中,像Simmc2這樣情境的資料可能很少,而從像Simmc這樣的情境中收集資料則更為容易。

在本論文中,我們藉由在Simmc上進行預訓練(pre-training),然後在資源較少的Simmc2上進行微調(fine-tuning),來研究多模態任務導向對話的可轉移性。結果顯示,在Simmc上進行預訓練可以提高聯合準確性(joint accuracy),並且每個意圖(intent)都從預訓練中受益。
zh_TW
dc.description.abstractIn multi-modal task-oriented dialogues, agents need the ability to comprehend the multi-modal context perceived by the user. For this purpose, Simmc and Simmc2 were introduced as multi-modal task-oriented dialogue datasets, specifically designed for shopping scenarios. Simmc incorporates a carousel view within the furniture domain and real-world images captured from the user's perspective in the fashion domain as their multi-modal contexts. In contrast, Simmc2 utilizes photo-realistic scenes, making the data collection process more challenging due to its elaborate setup compared to Simmc. In real-world scenarios, obtaining data similar to Simmc2 may be a rarity, whereas collecting data akin to Simmc is comparatively easier.

This thesis aims to explore the transferability of multi-modal task-oriented dialogue by employing a two-step process: pre-training on the Simmc dataset, followed by fine-tuning on the lower resource Simmc2 dataset. The results indicate that pre-training on Simmc significantly enhances joint accuracy, and all intents demonstrate improvements from the pre-training process.
en
dc.description.provenanceSubmitted by admin ntu (admin@lib.ntu.edu.tw) on 2023-08-15T16:50:54Z
No. of bitstreams: 0
en
dc.description.provenanceMade available in DSpace on 2023-08-15T16:50:54Z (GMT). No. of bitstreams: 0en
dc.description.tableofcontents口試委員審定書 i
致謝 iii
摘要 v
Abstract vii
Contents ix
List of Figures xiii
List of Tables xv
Chapter 1 Introduction 1
1.1 Motivation 1
1.2 Task Introduction 2
1.3 Main Contributions 2
1.4 Thesis Structure 3
Chapter 2 Background 5
2.1 Transformer 5
2.1.1 Attention 5
2.1.2 Multi-Head Attention 6
2.1.3 Difference of Attention between Encoder and Decoder 6
2.1.4 Input of Transformer Model 7
2.1.5 An Example of Transformer-based Model : GPT-2 7
2.2 Training and Inference of Generative Language Model 8
2.2.1 Training of Generative Language Model 8
2.2.2 Inference of Generative Language Model 9
Chapter 3 Related Work 11
3.1 Task-Oriented Dialogue 11
3.2 Multi-Modal Dialogue 12
3.3 Dialogue State Tracking Datasets 12
3.4 Transfer Learning in Natural Language Processing 14
3.5 Data Efficiency in Natural Language Processing 15
Chapter 4 Proposed Method 17
4.1 Input Representation of Simmc2 for GPT-2 17
Chapter 5 Experiments 21
5.1 Experiments of Low-Resource DST 21
5.1.1 Experimental Setup 21
5.1.1.1 Training Details 22
5.1.2 Results of Low-Resource DST 23
5.1.2.1 Fine-tuning with 0.01 Proportion 23
5.1.2.2 Summary of Fine-tuning with Different Proportion of Data 24
5.2 Experiments with Sufficient Data 25
5.2.1 Experimental Setup 25
5.2.1.1 Training Details 25
5.2.2 Results of Sufficient Data 25
Chapter 6 Error Analysis 27
6.1 Error Analysis of Low-Resource DST 27
6.2 Error Analysis of Training with Sufficient Data 27
Chapter 7 Conclusion 31
References 33
Appendices 39
Chapter A Results of Low-Resource DST 41
-
dc.language.isoen-
dc.subject遷移學習zh_TW
dc.subject多模態對話zh_TW
dc.subject任務式對話zh_TW
dc.subject對話狀態追蹤zh_TW
dc.subject對話系統zh_TW
dc.subjectTask-Oriented Dialogueen
dc.subjectMultimodal Dialogueen
dc.subjectDialogue State Trackingen
dc.subjectTransfer Learningen
dc.subjectDialogue Systemen
dc.title任務式多模態對話狀態追蹤的遷移學習能力之研究zh_TW
dc.titleTransferability of Task-Oriented Multimodal Dialogue State Trackingen
dc.typeThesis-
dc.date.schoolyear111-2-
dc.description.degree碩士-
dc.contributor.oralexamcommittee李宏毅;陳尚澤;馬偉雲zh_TW
dc.contributor.oralexamcommitteeHung-Yi Lee;Shang-Tse Chen;Wei-Yun Maen
dc.subject.keyword遷移學習,任務式對話,多模態對話,對話狀態追蹤,對話系統,zh_TW
dc.subject.keywordTransfer Learning,Task-Oriented Dialogue,Multimodal Dialogue,Dialogue State Tracking,Dialogue System,en
dc.relation.page42-
dc.identifier.doi10.6342/NTU202302109-
dc.rights.note未授權-
dc.date.accepted2023-08-04-
dc.contributor.author-college電機資訊學院-
dc.contributor.author-dept資訊工程學系-
顯示於系所單位:資訊工程學系

文件中的檔案:
檔案 大小格式 
ntu-111-2.pdf
  未授權公開取用
1.81 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved