Skip navigation

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets

Learn More
DSpace logo
English
中文
  • Browse
    • Communities
      & Collections
    • Publication Year
    • Author
    • Title
    • Subject
    • Advisor
  • Search TDR
  • Rights Q&A
    • My Page
    • Receive email
      updates
    • Edit Profile
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/95626
Title: 基於多模態大型語言模型之個性化影像編輯技術
Personalized Image Editing Based on Multimodal Large Language Model
Authors: 謝欣玉
Hsin-Yu Hsieh
Advisor: 陳祝嵩
Chu-Song Chen
Keyword: 個性化內容生成,影像編輯,多模態大型語言模型,深度學習,
Personalization,Image Editing,Multimodal Large Language Model,Deep Learning,
Publication Year : 2024
Degree: 碩士
Abstract: 在本研究中,我們設計了一個自動化流程生成高品質的個性化影像編輯資料集,並將一個多模態大型語言模型微調於此資料集上,得到了歷史上第一個可以進行個性化影像編輯任務的大型語言模型:SEED-­PIE,我們的方法在個性化影像編輯的推論速度打敗過去所有的方法,提昇了將近 10 倍的速度,此外,我們的模型無須對新個性化主體進行新一輪的個性化訓練,而是能以新個性化主體之參考圖片直接進行個性化影像編輯任務(零樣本學習)。任何使用者都能簡易地使用SEED-­PIE 模型以高速地進行個性化影像編輯任務,我們的模型在公開數據集:DreamEditBench 上達到不俗的表現,顯示我們的模型所生成的圖片能忠於參考圖片中的個性化主體並與來源圖片中的背景保持一致性。
In this study, we design an automated pipeline to generate a high­-quality personalized im age editing dataset. We then fine­tune a multimodal large language model on this dataset, resulting in SEED-­PIE, the first large language model capable of performing personal ized image editing tasks. Our method is more computationally efficient than previous ap proaches, achieving nearly a tenfold improvement in terms of the inference speed. One of our method’s advantage is that it eliminates the need for additional personalization training for new subjects, enabling direct personalized image editing tasks using reference images of new personalized subjects (zero­-shot learning). SEED-­PIE allows any user to easily
perform high­speed personalized image editing tasks. Our model demonstrates satisfiable performance on the public benchmark DreamEditBench, indicating its ability to gener ate images that remain faithful to the personalized subject in the reference image while maintaining consistency with the background of the source image.
URI: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/95626
DOI: 10.6342/NTU202404267
Fulltext Rights: 同意授權(全球公開)
metadata.dc.date.embargo-lift: 2029-08-13
Appears in Collections:資訊工程學系

Files in This Item:
File SizeFormat 
ntu-112-2.pdf
  Until 2029-08-13
2.53 MBAdobe PDF
Show full item record


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved