Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 管理學院
  3. 資訊管理學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98493
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor陳炳宇zh_TW
dc.contributor.advisorBing-Yu Chenen
dc.contributor.author秦孝媛zh_TW
dc.contributor.authorHsiao Yuan Chinen
dc.date.accessioned2025-08-14T16:19:51Z-
dc.date.available2025-08-15-
dc.date.copyright2025-08-14-
dc.date.issued2025-
dc.date.submitted2025-08-01-
dc.identifier.citation[1] I. Berger, A. Shamir, M. Mahler, E. Carter, and J. Hodgins. Style and abstraction in portrait sketching. ACM Transactions on Graphics (TOG), 32(4):1–12, 2013.
[2] M. Cai, Z. Huang, Y. Li, U. Ojha, H. Wang, and Y. J. Lee. Leveraging large language models for scalable vector graphics-driven image understanding. arXiv preprint arXiv:2306.06094, 2023.
[3] M. Caron, H. Touvron, I. Misra, H. Jégou, J. Mairal, P. Bojanowski, and A. Joulin. Emerging properties in self-supervised vision transformers. In Proceedings of the IEEE/CVF international conference on computer vision, pages 9650–9660, 2021.
[4] M. Eitz, J. Hays, and M. Alexa. How do humans sketch objects? ACM Trans. Graph. (Proc. SIGGRAPH), 31(4):44:1–44:10, 2012.
[5] K. Frans, L. Soros, and O. Witkowski. CLIPDraw: Exploring Text-to-drawing Synthesis Through language-Image Encoders. Advances in Neural Information Processing Systems, 35:5207–5218, 2022.
[6] S. Fu, N. Tamir, S. Sundaram, L. Chai, R. Zhang, T. Dekel, and P. Isola. Dream-sim: Learning new dimensions of human visual similarity using synthetic data. InAdvances in Neural Information Processing Systems, volume 36, pages 50742–50768, 2023.
[7] R. Gal, Y. Vinker, Y. Alaluf, A. Bermano, D. Cohen-Or, A. Shamir, and G. Chechik. Breathing life into sketches using text-to-video priors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, page Accepted, 2024.
[8] Y. Gryaditskaya, M. Sypesteyn, J. W. Hoftijzer, S. Pont, F. Durand, and A. Bousseau. Opensketch: A richly-annotated dataset of product design sketches. ACM Transactions on Graphics (Proceedings of SIGGRAPH Asia), 38(6):232, 2019.
[9] D. Ha and D. Eck. A neural representation of sketch drawings. In Proc. ICLR, 2018.
[10] A. Hurst, A. Lerer, A. P. Goucher, A. Perelman, A. Ramesh, A. Clark, A. Ostrow, A. Welihinda, A. Hayes, A. Radford, et al. Gpt-4o system card. arXiv preprint arXiv:2410.21276, 2024.
[11] A. Jain, A. Xie, and P. Abbeel. VectorFusion: Text-to-SVG by Abstracting Pixel-Based Diffusion Models. In Proc. CVPR, pages 1911–1920, 2023.
[12] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. In Proc. ICLR, 2015.
[13] H. Li, H. Zhang, Y. Wang, J. Cao, A. Shamir, and D. Cohen-Or. Curve style analysis in a set of shapes. In Computer Graphics Forum, volume 32, pages 77–88. Wiley Online Library, 2013.
[14] H. Lin, Y. Fu, X. Xue, and Y.-G. Jiang. Sketch-bert: Learning sketch bidirec-tional encoder representation from transformers by self-supervised learning of sketch gestalt. In Proc. CVPR, pages 6758–6767, 2020.
[15] Z. Lin, D. Pathak, B. Li, J. Li, X. Xia, G. Neubig, P. Zhang, and D. Ramanan. Evaluating text-to-visual generation with image-to-text generation. arXiv preprint arXiv:2404.01291, 2024.
[16] K. Nishina and Y. Matsui. Svgeditbench: A benchmark dataset for quantitative assessment of llm’s svg editing capabilities. arXiv preprint arXiv:2404.13710, 2024.
[17] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala. Py-torch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32, pages 8024–8035. Curran Associates, Inc., 2019.
[18] Z. Qu, T. Xiang, and Y.-Z. Song. Sketchdreamer: Interactive text-augmented creative sketch ideation. In Proc. BMVC, 2023.
[19] A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, and I. Sutskever. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning (ICML), pages 8748–8763. PMLR, 2021.
[20] L. S. F. Ribeiro, T. Bui, J. Collomosse, and M. Ponti. Sketchformer: Transformer-based representation for sketched structure. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 14153–14162, 2020.
[21] J. A. Rodriguez, A. Puri, S. Agarwal, I. H. Laradji, S. Rajeswar, D. Vazquez, C. Pal, and M. Pedersoli. Starvector: Generating scalable vector graphics code from images and text. In Proc. AAAI, volume 39, pages 29691–29693, 2025.
[22] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022.
[23] P. Sangkloy, N. Burnell, C. Ham, and J. Hays. The sketchy database: Learning to retrieve badly drawn bunnies. ACM Transactions on Graphics (proceedings of SIGGRAPH), 2016.
[24] Z. Tang, C. Wu, Z. Zhang, M. Ni, S. Yin, Y. Liu, Z. Yang, L. Wang, Z. Liu, J. Li, and D. Nan. Strokenuwa: Tokenizing strokes for vector graphic synthesis. arXiv preprint arXiv:2401.17093, 2024.
[25] G. Team, R. Anil, S. Borgeaud, J.-B. Alayrac, J. Yu, R. Soricut, J. Schalkwyk, A. M. Dai, A. Hauth, K. Millican, et al. Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805, 2023.
[26] Y. Vinker, Y. Alaluf, D. Cohen-Or, and A. Shamir. Clipascene: Scene sketching with different types and levels of abstraction. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4146–4156, 2023.
[27] Y. Vinker, E. Pajouheshgar, J. Y. Bo, R. C. Bachmann, A. H. Bermano, D. Cohen-Or, A. Zamir, and A. Shamir. Clipasso: Semantically-aware object sketching. ACM Trans. Graph., 41(4), jul 2022.
[28] Y. Vinker, T. R. Shaham, K. Zheng, A. Zhao, J. E. Fan, and A. Torralba. Sketchagent:Language-driven sequential sketch generation. arXiv preprint arXiv:2411.17673, 2024.
[29] J. Wang, H. Yuan, D. Chen, Y. Zhang, X. Wang, and S. Zhang. Modelscope text-to-video technical report. arXiv preprint arXiv:2308.06571, 2023.
[30] R. Wu, W. Su, and J. Liao. Chat2svg: Vector graphics generation with large language models and image diffusion models. arXiv preprint arXiv:2411.16602, 2024.
[31] R. Wu, W. Su, K. Ma, and J. Liao. Iconshop: Text-guided vector icon synthesis with autoregressive transformers. ACM Transactions on Graphics (TOG), 42(6):1–14, 2023.
[32] X. Xing, J. Hu, G. Liang, J. Zhang, D. Xu, and Q. Yu. Empowering llms to understand and generate complex vector graphics. arXiv preprint arXiv:2412.11102, 2024.
[33] X. Xing, C. Wang, H. Zhou, J. Zhang, Q. Yu, and D. Xu. Diffsketcher: Text guided vector sketch synthesis through latent diffusion models. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
[34] L. Zhang, A. Rao, and M. Agrawala. Adding conditional control to text-to-image diffusion models. In Proc. ICCV, pages 3836–3847, 2023.
[35] T. Zhou, C. Fang, Z. Wang, J. Yang, B. Kim, Z. Chen, J. Brandt, and D. Terzopoulos. Learning to sketch with deep q networks and demonstrated strokes. arXiv preprint arXiv:1810.05977, 2018.
[36] B. Zou, M. Cai, J. Zhang, and Y. J. Lee. Vgbench: Evaluating large language models on vector graphics understanding and generation. arXiv preprint arXiv:2407.10972, 2024.
-
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98493-
dc.description.abstract草圖是重要的表達媒介,近年來已有眾多研究致力於自動草圖生成。其中一項對業餘使用者極具實用性的功能,是根據文字描述自動補全部分草圖以生成複雜場景,同時保留原始草圖的風格。現有方法僅著重於產出符合輸入提示內容、且具預設風格的草圖,而忽略了輸入部分草圖中的風格特徵,例如整體的抽象程度與局部筆劃風格等。為解決此挑戰,我們提出 AutoSketch ,一種能適應多樣化草圖風格並支援多輪補全的風格感知向量草圖補全方法。AutoSketch 透過兩階段的流程,風格一致地補全輸入草圖。在第一階段,我們首先優化筆劃以符合一組輸入提示,該提示由原始文字描述擴充而來,擴充內容包含由視覺語言模型(VLM)所提取的風格描述。這些風格描述進一步產生非寫實的引導圖像,藉此引導補全更多內容筆劃。在第二階段,我們利用 VLM 將第一階段生成的筆劃調整為與輸入草圖風格一致,並透過一個迭代風格調整機制實現此目標。在每次迭代中,VLM 辨識輸入草圖與前一階段筆劃之間的風格差異,並將這些差異轉換為調整碼,用以更新筆劃。我們在各種草圖風格與文字提示下,將本方法與現有技術進行比較,並進行廣泛的消融研究、質性與量化評估,證實 AutoSketch 能支援多樣化的草圖創作情境。zh_TW
dc.description.abstractSketches are an important medium of expression and recently many works concentrate on automatic sketch creations. One such ability very useful for amateurs is text-based completion of a partial sketch to create a complex scene, while preserving the style of the partial sketch. Existing methods focus solely on generating sketch that match the content in the input prompt in a predefined style, ignoring the styles of the input partial sketches, e.g., the global abstraction level and local stroke styles. To address this challenge, we introduce AutoSketch, a style-aware vector sketch completion method that accommodates diverse sketch styles and supports iterative sketch completion. AutoSketch completes the input sketch in a style-consistent manner using a two-stage method. In the first stage, we initially optimize the strokes to match an input prompt augmented by style descriptions extracted from a vision-language model (VLM). Such style descriptions lead to non-photorealistic guidance images which enable more content to be depicted through new strokes. In the second stage, we utilize the VLM to adjust the strokes from the previous stage to adhere to the style present in the input partial sketch through an iterative style adjustment process. In each iteration, the VLM identifies a list of style differences between the input sketch and the strokes generated in the previous stage, translating these differences into adjustment codes to modify the strokes. We compare our method with existing methods using various sketch styles and prompts, perform extensive ablation studies and qualitative and quantitative evaluations, and demonstrate that AutoSketch can support diverse sketching scenarios.en
dc.description.provenanceSubmitted by admin ntu (admin@lib.ntu.edu.tw) on 2025-08-14T16:19:51Z
No. of bitstreams: 0
en
dc.description.provenanceMade available in DSpace on 2025-08-14T16:19:51Z (GMT). No. of bitstreams: 0en
dc.description.tableofcontentsAcknowledgements i
摘要 iii
Abstract v
Contents vii
List of Figures ix
List of Tables xi
Chapter 1 Introduction 1
Chapter 2 Related Work 5
2.1 Vector Sketch Generation . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Sketch Styles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 LLM-based Sketch and SVG Editing . . . . . . . . . . . . . . . . . 6
Chapter 3 Overview 9
Chapter 4 Content-centric Sketch Completion 11
4.1 Prompt Augmentation . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.2 Stroke Optimization for Completion . . . . . . . . . . . . . . . . . . 12
Chapter 5 VLM-based Sketch Style Adjustment 15
Chapter 6 Experiment 19
6.1 Implementation Details . . . . . . . . . . . . . . . . . . . . . . . . . 19
6.2 Comparison with Existing Methods . . . . . . . . . . . . . . . . . . 20
6.3 Diverse Sketch Scenario . . . . . . . . . . . . . . . . . . . . . . . . 25
6.4 Ablation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
6.4.1 The effectiveness of the style adjustment stage . . . . . . . . . . . . 26
6.4.2 The effectiveness of adaptive prompt augmentation . . . . . . . . . 27
6.4.3 The effectiveness of using style adjustment code . . . . . . . . . . . 28
6.4.4 Generalization of VLMs. . . . . . . . . . . . . . . . . . . . . . . . 28
Chapter 7 Limitations and Future Work 31
Chapter 8 Conclusion 33
References 35
Appendix A: VLM Preamble Detail 40
A.1 Style Difference Detection Preamble . . . . . . . . . . . . . . . . . . 40
A.2 Adjustment Code Generation Preamble . . . . . . . . . . . . . . . . 42
Appendix B: Detailed Case Examples 45
B.1 Case: Dogs Playing . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
B.1.1 Style Difference . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
B.1.2 Adjustment Code . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
B.2 Case: Girl Walking in the Park . . . . . . . . . . . . . . . . . . . . . 49
B.2.1 Style Difference . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
B.2.2 Adjustment Code . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
-
dc.language.isozh_TW-
dc.subject草圖補全zh_TW
dc.subject向量草圖zh_TW
dc.subject貝茲曲線zh_TW
dc.subject場景補全zh_TW
dc.subject風格感知zh_TW
dc.subjectStyle-Awareen
dc.subjectScene Completionen
dc.subjectBézier Curvesen
dc.subjectSketch Completionen
dc.subjectVector Sketchesen
dc.title視覺語言模型輔助之風格感知向量草圖補全zh_TW
dc.titleAutoSketch: VLM-Assisted Style-Aware Vector Sketch Completionen
dc.typeThesis-
dc.date.schoolyear113-2-
dc.description.degree碩士-
dc.contributor.oralexamcommittee林文杰;王昱舜;朱宏國zh_TW
dc.contributor.oralexamcommitteeWen-Chieh Lin;Yu-Shuen Wang;Hung-Kuo Chuen
dc.subject.keyword向量草圖,草圖補全,風格感知,場景補全,貝茲曲線,zh_TW
dc.subject.keywordVector Sketches,Sketch Completion,Style-Aware,Scene Completion,Bézier Curves,en
dc.relation.page52-
dc.identifier.doi10.6342/NTU202502832-
dc.rights.note未授權-
dc.date.accepted2025-08-06-
dc.contributor.author-college管理學院-
dc.contributor.author-dept資訊管理學系-
dc.date.embargo-liftN/A-
顯示於系所單位:資訊管理學系

文件中的檔案:
檔案 大小格式 
ntu-113-2.pdf
  未授權公開取用
31.6 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved