Skip navigation

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets

Learn More
DSpace logo
English
中文
  • Browse
    • Communities
      & Collections
    • Publication Year
    • Author
    • Title
    • Subject
    • Advisor
  • Search TDR
  • Rights Q&A
    • My Page
    • Receive email
      updates
    • Edit Profile
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/88562
Title: 任務式多模態對話狀態追蹤的遷移學習能力之研究
Transferability of Task-Oriented Multimodal Dialogue State Tracking
Authors: 徐新凱
Hsin-Kai Hsu
Advisor: 陳縕儂
Yun-Nung Chen
Keyword: 遷移學習,任務式對話,多模態對話,對話狀態追蹤,對話系統,
Transfer Learning,Task-Oriented Dialogue,Multimodal Dialogue,Dialogue State Tracking,Dialogue System,
Publication Year : 2023
Degree: 碩士
Abstract: 在多模態任務導向型對話中,代理人(agent)需要理解使用者感受到的多模態環境的能力。Simmc和Simmc2是針對購物場景的多模態任務導向對話的資料集。Simmc在傢俱領域使用了輪播的視角(carousel view),並在時尚領域使用了在真實世界使用者視角下的圖片作為多模態上下文。與Simmc不同,Simmc2使用了逼真的場景圖片。由於環境設置的難度不同,相較於Simmc,Simmc2在資料收集上更加困難。在現實世界中,像Simmc2這樣情境的資料可能很少,而從像Simmc這樣的情境中收集資料則更為容易。

在本論文中,我們藉由在Simmc上進行預訓練(pre-training),然後在資源較少的Simmc2上進行微調(fine-tuning),來研究多模態任務導向對話的可轉移性。結果顯示,在Simmc上進行預訓練可以提高聯合準確性(joint accuracy),並且每個意圖(intent)都從預訓練中受益。
In multi-modal task-oriented dialogues, agents need the ability to comprehend the multi-modal context perceived by the user. For this purpose, Simmc and Simmc2 were introduced as multi-modal task-oriented dialogue datasets, specifically designed for shopping scenarios. Simmc incorporates a carousel view within the furniture domain and real-world images captured from the user's perspective in the fashion domain as their multi-modal contexts. In contrast, Simmc2 utilizes photo-realistic scenes, making the data collection process more challenging due to its elaborate setup compared to Simmc. In real-world scenarios, obtaining data similar to Simmc2 may be a rarity, whereas collecting data akin to Simmc is comparatively easier.

This thesis aims to explore the transferability of multi-modal task-oriented dialogue by employing a two-step process: pre-training on the Simmc dataset, followed by fine-tuning on the lower resource Simmc2 dataset. The results indicate that pre-training on Simmc significantly enhances joint accuracy, and all intents demonstrate improvements from the pre-training process.
URI: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/88562
DOI: 10.6342/NTU202302109
Fulltext Rights: 未授權
Appears in Collections:資訊工程學系

Files in This Item:
File SizeFormat 
ntu-111-2.pdf
  Restricted Access
1.81 MBAdobe PDF
Show full item record


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved