Skip navigation

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets

Learn More
DSpace logo
English
中文
  • Browse
    • Communities
      & Collections
    • Publication Year
    • Author
    • Title
    • Subject
    • Advisor
  • Search TDR
  • Rights Q&A
    • My Page
    • Receive email
      updates
    • Edit Profile
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/89905
Title: 可控制風格的場景文字編輯
Style Controllable Scene Text Editing
Authors: 吳玉辰
Yu-Chen Wu
Advisor: 陳祝嵩
Chu-Song Chen
Keyword: 場景文字,場景文字編輯,擴散模型,
Scene Text,Scene Text Editing,Diffusion Model,
Publication Year : 2023
Degree: 碩士
Abstract: 場景文字編輯近年來取得了顯著進展,讓我們能夠將現實世界中的文字轉換成指定的文本內容。過去的研究主要依賴生成對抗網絡(GANs),並著重於從圖像中裁剪目標文字區域來引導編輯過程。隨著擴散模型生成品質的提升與進展,使得場景文字編輯也可使用擴散模型來實現。與大部分 GAN 研究不同,擴散模型通常使用整個場景進行填補,並考慮全局資訊,使填補區域得以更加真實。然而過去的研究比較無法控制所生成的文字風格與輸入及參考影像間的關係。在本研究中,我們著重於提升場景文字編輯的風格可控性。我們開發一個方法,讓用戶在交換真實圖像中的文字時能夠操縱文字風格。我們的方法基於近期的擴散模型­DiffSTE 模型。利用 DiffSTE 可在指令中指定風格的特性,我們提出了一個集成風格分類和預訓練文本識別的框架,以引導 DiffSTE 在現實場景中生成帶有所需風格的文字。我們的主要貢獻包括實現真實場景的文字交換,以及對文字外觀的精細控制以及定制字體風格和顏色的能力。所開發的方法與技術可以根據用戶的偏好和具體應用需求增強提取文字的呈現效果。
Scene text editing aims to enable the rewriting and style transformation of texts in realworld images. Previous works mainly relied on Generative Adversarial Networks (GANs) and focused on cropping target text regions for guidance. With the improved generation quality of diffusion models, scene text editing has also adopted diffusion models for implementation. In this work, we emphasize style controllability in scene text editing. Our goal is to develop a system that allows users to manipulate text styles while swapping texts between real images. Our work leverages DiffSTE, a diffusion­based work, to specify styles as instructions. We introduce an approach that integrates style classification and pretrained text recognition for guiding DiffSTE in generating the texts with desired styles in realworld scenes. Our main contributions include achieving realistic scene text swapping, fine­grained control over text appearance, and the ability to customize font styles and colors. This approach enhances the rewriting of extracted text according to user preferences and specific application requirements.
URI: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/89905
DOI: 10.6342/NTU202301954
Fulltext Rights: 同意授權(限校園內公開)
Appears in Collections:資訊工程學系

Files in This Item:
File SizeFormat 
ntu-111-2.pdf
Access limited in NTU ip range
2.95 MBAdobe PDF
Show full item record


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved