Skip navigation

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets

Learn More
DSpace logo
English
中文
  • Browse
    • Communities
      & Collections
    • Publication Year
    • Author
    • Title
    • Subject
    • Advisor
  • Search TDR
  • Rights Q&A
    • My Page
    • Receive email
      updates
    • Edit Profile
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/93448
Title: 文本中心對齊的多模態監督學習
Text-centric Alignment for Multi-Modality Supervised Learning
Authors: 顏廷聿
Ting-Yu Yen
Advisor: 林守德
Shou-De Lin
Keyword: 大語言模型,深度學習,跨模態信息提取,跨模態內容生成,
Large Language Model,Deep Learning,Cross-Modal Information Extraction,Cross-Modal Content Generation,
Publication Year : 2024
Degree: 碩士
Abstract: 本文探討了多模態監督學習中模態不匹配的挑戰,其指的是推理過程中出現的模態和訓練時出現的模態不同的情況,我們提出了一種創新方法TAMML(文本中心對齊的多模態監督學習),該方法利用具有上下文學習能力的大語言模型和基礎模型提高多模態系統在這種情況下的泛化能力。通過利用文本作為統一語義空間的獨特特性,本文展示了在處理未見過、多樣、不可預測的模態組合時的顯著改進。所提出的解決方法不僅能夠適應不同的模態,還能保持穩健的性能,展示了基礎模型在克服傳統模型框架在嵌入表示的局限性的潛力。本研究通過提供一種靈活且有效的方法,為動態且不確定模態可用性的真實應用作出貢獻。
This paper addresses the challenge of modality mismatch in supervised learning, where the modalities available during inference differ from those available during training. We propose an innovative method, TAMML(Text-centric Alignment for Multi-Modality Supervised Learning), that utilizes Large Language Models with in-context learning and foundation models to enhance the generalizability of multimodal systems under these conditions. By leveraging the unique properties of text as a unified semantic space, this paper demonstrates significant improvements in handling unseen, diverse, and unpredictable modality combinations. The proposed solution not only adapts to varying modalities but also maintains robust performance, showcasing the potential of foundation models in overcoming the limitations of traditional fixed-modality frameworks in embedding representations. This study contributes to the field by offering a flexible, effective solution for real-world applications where modality availability is dynamic and uncertain.
URI: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/93448
DOI: 10.6342/NTU202402322
Fulltext Rights: 同意授權(限校園內公開)
Appears in Collections:資訊工程學系

Files in This Item:
File SizeFormat 
ntu-112-2.pdf
Access limited in NTU ip range
6.37 MBAdobe PDF
Show full item record


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved