基於多模態大型語言模型之個性化影像編輯技術

謝欣玉; Hsin-Yu Hsieh

Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/95626

Title:	基於多模態大型語言模型之個性化影像編輯技術 Personalized Image Editing Based on Multimodal Large Language Model
Authors:	謝欣玉 Hsin-Yu Hsieh
Advisor:	陳祝嵩 Chu-Song Chen
Keyword:	個性化內容生成,影像編輯,多模態大型語言模型,深度學習, Personalization,Image Editing,Multimodal Large Language Model,Deep Learning,
Publication Year :	2024
Degree:	碩士
Abstract:	在本研究中，我們設計了一個自動化流程生成高品質的個性化影像編輯資料集，並將一個多模態大型語言模型微調於此資料集上，得到了歷史上第一個可以進行個性化影像編輯任務的大型語言模型：SEED-PIE，我們的方法在個性化影像編輯的推論速度打敗過去所有的方法，提昇了將近 10 倍的速度，此外，我們的模型無須對新個性化主體進行新一輪的個性化訓練，而是能以新個性化主體之參考圖片直接進行個性化影像編輯任務（零樣本學習）。任何使用者都能簡易地使用SEED-PIE 模型以高速地進行個性化影像編輯任務，我們的模型在公開數據集：DreamEditBench 上達到不俗的表現，顯示我們的模型所生成的圖片能忠於參考圖片中的個性化主體並與來源圖片中的背景保持一致性。 In this study, we design an automated pipeline to generate a high-quality personalized im age editing dataset. We then finetune a multimodal large language model on this dataset, resulting in SEED-PIE, the first large language model capable of performing personal ized image editing tasks. Our method is more computationally efficient than previous ap proaches, achieving nearly a tenfold improvement in terms of the inference speed. One of our method’s advantage is that it eliminates the need for additional personalization training for new subjects, enabling direct personalized image editing tasks using reference images of new personalized subjects (zero-shot learning). SEED-PIE allows any user to easily perform highspeed personalized image editing tasks. Our model demonstrates satisfiable performance on the public benchmark DreamEditBench, indicating its ability to gener ate images that remain faithful to the personalized subject in the reference image while maintaining consistency with the background of the source image.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/95626
DOI:	10.6342/NTU202404267
Fulltext Rights:	同意授權(全球公開)
metadata.dc.date.embargo-lift:	2029-08-13
Appears in Collections:	資訊工程學系

Files in This Item:

File	Size	Format
ntu-112-2.pdf Until 2029-08-13	2.53 MB	Adobe PDF

Show full item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets