請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/89905
完整後設資料紀錄
DC 欄位 | 值 | 語言 |
---|---|---|
dc.contributor.advisor | 陳祝嵩 | zh_TW |
dc.contributor.advisor | Chu-Song Chen | en |
dc.contributor.author | 吳玉辰 | zh_TW |
dc.contributor.author | Yu-Chen Wu | en |
dc.date.accessioned | 2023-09-22T16:37:21Z | - |
dc.date.available | 2023-11-09 | - |
dc.date.copyright | 2023-09-22 | - |
dc.date.issued | 2023 | - |
dc.date.submitted | 2023-08-09 | - |
dc.identifier.citation | [1] Google font. https://fonts.google.com/.
[2] Xkcd. https://xkcd.com/color/rgb/. [3] O. Avrahami, O. Fried, and D. Lischinski. Blended latent diffusion, 2023. [4] O. Avrahami, D. Lischinski, and O. Fried. Blended diffusion for text-driven editing of natural images. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, jun 2022. [5] D. Bautista and R. Atienza. Scene text recognition with permuted autoregressive sequence models. In Proceedings of the 17th European Conference on Computer Vision (ECCV), Cham, 10 2022. Springer International Publishing. [6] D. Bazazian, R. Gomez, A. Nicolaou, L. Gomez, D. Karatzas, and A. D. Bagdanov. Improving text proposals for scene images with fully convolutional networks, 2017. [7] H. Chen, Z. Xu, Z. Gu, J. Lan, X. Zheng, Y. Li, C. Meng, H. Zhu, and W. Wang. Diffute: Universal text editing diffusion model, 2023. [8] C.-K. Chng, Y. Liu, Y. Sun, C. C. Ng, C. Luo, Z. Ni, C. Fang, S. Zhang, J. Han, E. Ding, J. Liu, D. Karatzas, C. S. Chan, and L. Jin. Icdar2019 robust reading challenge on arbitrary-shaped text (rrc-art), 2019. [9] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: A largescale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 248–255, 2009. [10] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding, 2019. [11] P. Dhariwal and A. Nichol. Diffusion models beat gans on image synthesis, 2021. [12] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale, 2021. [13] R. Gomez, A. F. Biten, L. Gomez, J. Gibert, M. Rusiñol, and D. Karatzas. Selective style transfer for text, 2019. [14] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition, 2015. [15] P. He, W. Huang, T. He, Q. Zhu, Y. Qiao, and X. Li. Single shot text detector with regional attention, 2017. [16] J. Ho and T. Salimans. Classifier-free diffusion guidance, 2022. [17] J. Ji, G. Zhang, Z. Wang, B. Hou, Z. Zhang, B. Price, and S. Chang. Improving diffusion models for scene text editing with dual encoders, 2023. [18] T. Karras, S. Laine, and T. Aila. A style-based generator architecture for generative adversarial networks, 2019. [19] G. Kim, T. Kwon, and J. C. Ye. Diffusionclip: Text-guided diffusion models for robust image manipulation, 2022. [20] J. Lee, Y. Kim, S. Kim, M. Yim, S. Shin, G. Lee, and S. Park. Rewritenet: Reliable scene text editing with implicit decomposition of text contents and styles, 2022. [21] X. Liu, D. H. Park, S. Azadi, G. Zhang, A. Chopikyan, Y. Hu, H. Shi, A. Rohrbach, and T. Darrell. More control for free! image synthesis with semantic diffusion guidance, 2022. [22] A. Mishra, K. Alahari, and C. Jawahar. Scene text recognition using higher order language priors. 09 2012. [23] N. Nayef, F. Yin, I. Bizid, H. Choi, Y. Feng, D. Karatzas, Z. Luo, U. Pal, C. Rigaud, J. Chazalon, W. Khlif, M. M. Luqman, J.-C. Burie, C.-l. Liu, and J.-M. Ogier. Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification - rrc-mlt. In 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), volume 01, pages 1454–1459, 2017. [24] A. Nichol, P. Dhariwal, A. Ramesh, P. Shyam, P. Mishkin, B. McGrew, I. Sutskever, and M. Chen. Glide: Towards photorealistic image generation and editing with text-guided diffusion models, 2022. [25] T. Q. Phan, P. Shivakumara, S. Tian, and C. L. Tan. Recognizing text with perspective distortion in natural scenes. In 2013 IEEE International Conference on Computer Vision, pages 569–576, 2013. [26] G. P. B. V. T. H. Praveen Krishnan, Rama Kovvuri. Textstylebrush: Transfer of text aesthetics from a single example. Facebook AI, 2022. [27] Y. Qu, Q. Tan, H. Xie, J. Xu, Y. Wang, and Y. Zhang. Exploring stroke-level modifications for scene text editing. arXiv preprint arXiv:2212.01982, 2022. [28] A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, and I. Sutskever. Learning transferable visual models from natural language supervision, 2021. [29] R. Ranftl, K. Lasinger, D. Hafner, K. Schindler, and V. Koltun. Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(3), 2022. [30] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer. High-resolution image synthesis with latent diffusion models, 2022. [31] T. Su, F. Yang, X. Zhou, D. Di, Z. Wang, and S. Li. Scene style text editing, 2023. [32] K. Wang, B. Babenko, and S. Belongie. End-to-end scene text recognition. In 2011 International Conference on Computer Vision, pages 1457–1464, 2011. [33] L. Wu, C. Zhang, J. Liu, J. Han, J. Liu, E. Ding, and X. Bai. Editing text in the wild. ACMMM, 2019. [34] Q. Yang, H. Jin, J. Huang, and W. Lin. Swaptext: Image based texts transfer in scenes. CVPR, 2020. [35] Y. Zhu, J. Chen, L. Liang, Z. Kuang, L. Jin, and W. Zhang. Fourier contour embedding for arbitrary-shaped text detection, 2021. | - |
dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/89905 | - |
dc.description.abstract | 場景文字編輯近年來取得了顯著進展,讓我們能夠將現實世界中的文字轉換成指定的文本內容。過去的研究主要依賴生成對抗網絡(GANs),並著重於從圖像中裁剪目標文字區域來引導編輯過程。隨著擴散模型生成品質的提升與進展,使得場景文字編輯也可使用擴散模型來實現。與大部分 GAN 研究不同,擴散模型通常使用整個場景進行填補,並考慮全局資訊,使填補區域得以更加真實。然而過去的研究比較無法控制所生成的文字風格與輸入及參考影像間的關係。在本研究中,我們著重於提升場景文字編輯的風格可控性。我們開發一個方法,讓用戶在交換真實圖像中的文字時能夠操縱文字風格。我們的方法基於近期的擴散模型DiffSTE 模型。利用 DiffSTE 可在指令中指定風格的特性,我們提出了一個集成風格分類和預訓練文本識別的框架,以引導 DiffSTE 在現實場景中生成帶有所需風格的文字。我們的主要貢獻包括實現真實場景的文字交換,以及對文字外觀的精細控制以及定制字體風格和顏色的能力。所開發的方法與技術可以根據用戶的偏好和具體應用需求增強提取文字的呈現效果。 | zh_TW |
dc.description.abstract | Scene text editing aims to enable the rewriting and style transformation of texts in realworld images. Previous works mainly relied on Generative Adversarial Networks (GANs) and focused on cropping target text regions for guidance. With the improved generation quality of diffusion models, scene text editing has also adopted diffusion models for implementation. In this work, we emphasize style controllability in scene text editing. Our goal is to develop a system that allows users to manipulate text styles while swapping texts between real images. Our work leverages DiffSTE, a diffusionbased work, to specify styles as instructions. We introduce an approach that integrates style classification and pretrained text recognition for guiding DiffSTE in generating the texts with desired styles in realworld scenes. Our main contributions include achieving realistic scene text swapping, finegrained control over text appearance, and the ability to customize font styles and colors. This approach enhances the rewriting of extracted text according to user preferences and specific application requirements. | en |
dc.description.provenance | Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2023-09-22T16:37:21Z No. of bitstreams: 0 | en |
dc.description.provenance | Made available in DSpace on 2023-09-22T16:37:21Z (GMT). No. of bitstreams: 0 | en |
dc.description.tableofcontents | Verification Letter from the Oral Examination Committee i
Acknowledgements ii 摘要 iii Abstract iv Contents vi List of Figures viii List of Tables xi Chapter 1 Introduction 1 1.1 Introduction of Scene Text Editing 1 Chapter 2 Related Works 5 2.1 Image Editing 5 2.2 Scene Text Editing 7 2.2.1 GAN-Based STE 7 2.2.2 Diffusion-Based STE 9 Chapter 3 Proposed Method 10 3.1 Framework Structure 11 3.2 Methodology 12 3.2.1 DiffSTE 12 3.2.2 Style Classification 13 3.2.3 Text Recognition 16 3.3 Training and Inference 16 3.3.1 Generate Training Data 17 3.3.2 Implementation etails 19 3.3.3 Inference 19 Chapter 4 Experiments 20 4.1 Text Swapping 20 4.1.1 Datasets and Baselines 21 4.1.2 Experiments Results 22 4.2 Style Classification 23 4.2.1 Different Backbone 23 4.2.2 User Study 24 4.2.2.1 Font Analysis 25 4.2.2.2 Color Analysis 26 4.3 Extensions 28 4.3.1 Multi-Reference Manipulation 28 Chapter 5 Conclusion 33 References 34 | - |
dc.language.iso | en | - |
dc.title | 可控制風格的場景文字編輯 | zh_TW |
dc.title | Style Controllable Scene Text Editing | en |
dc.type | Thesis | - |
dc.date.schoolyear | 111-2 | - |
dc.description.degree | 碩士 | - |
dc.contributor.oralexamcommittee | 楊惠芳;黃文良 | zh_TW |
dc.contributor.oralexamcommittee | Huei-Fang Yang;Wen-Liang Hwang | en |
dc.subject.keyword | 場景文字,場景文字編輯,擴散模型, | zh_TW |
dc.subject.keyword | Scene Text,Scene Text Editing,Diffusion Model, | en |
dc.relation.page | 37 | - |
dc.identifier.doi | 10.6342/NTU202301954 | - |
dc.rights.note | 同意授權(限校園內公開) | - |
dc.date.accepted | 2023-08-11 | - |
dc.contributor.author-college | 電機資訊學院 | - |
dc.contributor.author-dept | 資訊工程學系 | - |
顯示於系所單位: | 資訊工程學系 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-111-2.pdf 授權僅限NTU校內IP使用(校園外請利用VPN校外連線服務) | 2.95 MB | Adobe PDF | 檢視/開啟 |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。