Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 管理學院
  3. 資訊管理學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/100950
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor陳炳宇zh_TW
dc.contributor.advisorBing-Yu Chenen
dc.contributor.author朱育萱zh_TW
dc.contributor.authorYu-Hsuan Chuen
dc.date.accessioned2025-11-26T16:13:25Z-
dc.date.available2025-11-27-
dc.date.copyright2025-11-26-
dc.date.issued2025-
dc.date.submitted2025-10-28-
dc.identifier.citation[1] M. AI. Llama 3.2: Large language model meta ai. https://ai.meta.com/llama, 2024. Accessed: 2024-11-25.
[2] Google. Gemini api: Gemini-1.5 pro. https://ai.google.dev/gemini-api, 2024. Accessed: 2024-06-06.
[3] H. Laurençon, L. Tronchon, and V. Sanh. Unlocking the conversion of web screenshots into html code with the websight dataset. arXiv preprint arXiv:2403.09029, 2024.
[4] J. Lin, J. Guo, S. Sun, Z. Yang, J.-G. Lou, and D. Zhang. Layoutprompter: Awaken the design ability of large language models. Advances in Neural Information Processing Systems, 36, 2024.
[5] A. Lozhkov, R. Li, L. B. Allal, F. Cassano, J. Lamy-Poirier, N. Tazi, A. Tang, D. Pykhtar, J. Liu, Y. Wei, et al. Starcoder 2 and the stack v2: The next generation. arXiv preprint arXiv:2402.19173, 2024.
[6] OpenAI. Gpt-4o: Advanced multimodal language model. https://openai.com, 2024. Accessed: 2024-10-06.31
[7] PaddleOCR Contributors. PaddleOCR: Multi-language, awesome OCR toolkits based on PaddlePaddle. https://github.com/PaddlePaddle/PaddleOCR, 2023.
[8] A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, and I. Sutskever. Learning transferable visual models from natural language supervision. arXiv preprint arXiv:2103.00020, 2021.
[9] C. Si, Y. Zhang, Z. Yang, R. Liu, and D. Yang. Design2code: How far are we from automating front-end engineering? arXiv preprint arXiv:2403.03163, 2024.
[10] Y. Wan, C. Wang, Y. Dong, W. Wang, S. Li, Y. Huo, and M. R. Lyu. Automatically generating ui code from screenshot: A divide-and-conquer-based approach. arXiv preprint arXiv:2406.16386, 2024.
[11] Q. Ye, H. Xu, G. Xu, J. Ye, M. Yan, Y. Zhou, J. Wang, A. Hu, P. Shi, Y. Shi, et al. mplug-owl: Modularization empowers large language models with multimodality. arXiv preprint arXiv:2304.14178, 2023.
[12] P. Zhang, X. Dong, Y. Zang, Y. Cao, R. Qian, L. Chen, Q. Guo, H. Duan, B. Wang, L. Ouyang, et al. Internlm-xcomposer-2.5: A versatile large vision language model supporting long-contextual input and output. arXiv preprint arXiv:2407.03320, 2024.
-
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/100950-
dc.description.abstract目前實務上,將使用者介面設計草稿轉換為前端程式碼的過程通常需要透過人工方式進行,這不僅是一項繁瑣且耗時的任務,對於非技術領域的從業人員而言更是如此。然而,隨著多模態大型語言模型的迅速發展,將網頁截圖轉換為HTML/CSS 程式碼的相關研究已經取得了一些突破。儘管如此,現有大多數方法仍依賴於大量訓練數據集和龐大的運算資源來進行模型微調(fine-tuning),這凸顯出當前多模態大型語言模型在有效擷取網頁設計中的關鍵資訊方面仍存在限制。此外,即使透過嵌入對網頁的純文字描述來輔助前端網頁程式碼的生成,結果仍差強人意。
因此,本研究提出了一種創新方法,旨在通過直接將網頁設計中的關鍵元素(例如主要顏色、文字和邊界定位信息)融入到模型的 prompt 中,來改進多模態大型語言模型的生成效能。此方法不僅能夠避免耗時且資源密集的模型微調過程,還能提高模型生成前端程式碼的準確性和一致性。在我們的實驗結果顯示也證明將顏色和文字坐標信息結合於提示中,可以顯著提升模型在復現網頁截圖的精準度,即便我們的方法相比以往需要微調的模型,計算複雜度較低,卻能達到更高的效能和效率。
這項研究的貢獻在於,無需進行繁瑣的微調過程,我們的模型已能有效地捕捉網頁設計中最關鍵的視覺元素,並且能夠高效地生成對應的 HTML/CSS 程式碼,這為未來網頁設計自動化及前端開發工具的創新奠定了基礎。
zh_TW
dc.description.abstractManually converting visual user interface designs into code is both complex and time-consuming, particularly for non-experts. With the rapid development of multi-modal large language models(MLLMs), there has been significant progress in generating accurate HTML/CSS code based on provided website screenshots. However, most existing approaches depend on extensive training datasets and costly model fine-tuning, underscoring the limitations of current models in capturing key web design features. Despite incorporating text descriptions of the given website screenshot into the prompts, the generated code remains significantly different from the ground truth. To address these challenges, this paper introduces a novel method leveraging MLLMs to embed dominant color codes and text margins extracted from website screenshots directly into the input prompt. Our approach aims to reduce reliance on large-scale training and fine-tuning while enhancing the extraction of web design elements critical for code generation. Experimental results demonstrate that incorporating color and positional information of text elements significantly improves the consistency and accuracy of translating website screenshots into functional code. Despite its low computational complexity, our method outperforms state-of-the-art methods on code generation tasks, achieving superior results without requiring additional model training or fine-tuning.en
dc.description.provenanceSubmitted by admin ntu (admin@lib.ntu.edu.tw) on 2025-11-26T16:13:25Z
No. of bitstreams: 0
en
dc.description.provenanceMade available in DSpace on 2025-11-26T16:13:25Z (GMT). No. of bitstreams: 0en
dc.description.tableofcontents摘要 i
Abstract iii
Contents v
List of Figures vii
List of Tables ix
Chapter 1 Introduction 1
Chapter 2 Related Work 3
2.1 Multi-modal Large Language Model . . . . . . . . . . . . . . . . . . 3
2.2 MLLMs for Webpage Generation . . . . . . . . . . . . . . . . . . . 4
2.3 Prompt Engineering . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Chapter 3 Methodology 7
3.1 From UI-Design into Front-end code . . . . . . . . . . . . . . . . . 8
3.1.1 Screen Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.1.2 Color Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.1.3 Spacing Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2 Key Feature Extraction: Dominant Colors and Text Margins . . . . . 9
3.2.1 Dominant Color Extraction . . . . . . . . . . . . . . . . . . . . . . 9
3.2.2 Extraction of Text Element Margins . . . . . . . . . . . . . . . . . 10
Chapter 4 Experiments 13
4.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.2 Backbone Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.3 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.4 Quantiative Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.5 Qualitative Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.6 User Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Chapter 5 Discussion 21
5.1 Closed-source v.s. Open-source models . . . . . . . . . . . . . . . . 21
5.2 Correlation Analysis: Enhancing Performance in Challenging Web-page Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.3 Effect of number of dominant colors . . . . . . . . . . . . . . . . . . 23
Chapter 6 Conclusion 27
6.1 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
6.2 Limitation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
6.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
References 31
Appendix A — Prompting Details 33
A.1 Direct Asking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
A.2 Our Method Prompting . . . . . . . . . . . . . . . . . . . . . . . . . 33
Appendix B — User Study Details 35
-
dc.language.isoen-
dc.subject多模態大型語言模型-
dc.subject網頁程式碼生成-
dc.subject電腦視覺-
dc.subjectWebpage Code Generation-
dc.subjectMLLM-
dc.subjectVisual-to-Code-
dc.title基於網頁截圖生成前端程式碼:結合主要配色與文字邊距擷取資訊zh_TW
dc.titleWebpage Code Generation from Screenshots with Dominant Color and Text Margin Extractionen
dc.typeThesis-
dc.date.schoolyear114-1-
dc.description.degree碩士-
dc.contributor.oralexamcommittee朱宏國;王昱舜zh_TW
dc.contributor.oralexamcommitteeHung-Kuo Chu;Yu-Shuen Wangen
dc.subject.keyword多模態大型語言模型,網頁程式碼生成電腦視覺zh_TW
dc.subject.keywordWebpage Code Generation,MLLMVisual-to-Codeen
dc.relation.page36-
dc.identifier.doi10.6342/NTU202504621-
dc.rights.note同意授權(全球公開)-
dc.date.accepted2025-10-29-
dc.contributor.author-college管理學院-
dc.contributor.author-dept資訊管理學系-
dc.date.embargo-lift2025-11-27-
顯示於系所單位:資訊管理學系

文件中的檔案:
檔案 大小格式 
ntu-114-1.pdf1.4 MBAdobe PDF檢視/開啟
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved