Please use this identifier to cite or link to this item:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/95626| Title: | 基於多模態大型語言模型之個性化影像編輯技術 Personalized Image Editing Based on Multimodal Large Language Model |
| Authors: | 謝欣玉 Hsin-Yu Hsieh |
| Advisor: | 陳祝嵩 Chu-Song Chen |
| Keyword: | 個性化內容生成,影像編輯,多模態大型語言模型,深度學習, Personalization,Image Editing,Multimodal Large Language Model,Deep Learning, |
| Publication Year : | 2024 |
| Degree: | 碩士 |
| Abstract: | 在本研究中,我們設計了一個自動化流程生成高品質的個性化影像編輯資料集,並將一個多模態大型語言模型微調於此資料集上,得到了歷史上第一個可以進行個性化影像編輯任務的大型語言模型:SEED-PIE,我們的方法在個性化影像編輯的推論速度打敗過去所有的方法,提昇了將近 10 倍的速度,此外,我們的模型無須對新個性化主體進行新一輪的個性化訓練,而是能以新個性化主體之參考圖片直接進行個性化影像編輯任務(零樣本學習)。任何使用者都能簡易地使用SEED-PIE 模型以高速地進行個性化影像編輯任務,我們的模型在公開數據集:DreamEditBench 上達到不俗的表現,顯示我們的模型所生成的圖片能忠於參考圖片中的個性化主體並與來源圖片中的背景保持一致性。 In this study, we design an automated pipeline to generate a high-quality personalized im age editing dataset. We then finetune a multimodal large language model on this dataset, resulting in SEED-PIE, the first large language model capable of performing personal ized image editing tasks. Our method is more computationally efficient than previous ap proaches, achieving nearly a tenfold improvement in terms of the inference speed. One of our method’s advantage is that it eliminates the need for additional personalization training for new subjects, enabling direct personalized image editing tasks using reference images of new personalized subjects (zero-shot learning). SEED-PIE allows any user to easily perform highspeed personalized image editing tasks. Our model demonstrates satisfiable performance on the public benchmark DreamEditBench, indicating its ability to gener ate images that remain faithful to the personalized subject in the reference image while maintaining consistency with the background of the source image. |
| URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/95626 |
| DOI: | 10.6342/NTU202404267 |
| Fulltext Rights: | 同意授權(全球公開) |
| metadata.dc.date.embargo-lift: | 2029-08-13 |
| Appears in Collections: | 資訊工程學系 |
Files in This Item:
| File | Size | Format | |
|---|---|---|---|
| ntu-112-2.pdf Until 2029-08-13 | 2.53 MB | Adobe PDF |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.
