Skip navigation

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets

Learn More
DSpace logo
English
中文
  • Browse
    • Communities
      & Collections
    • Publication Year
    • Author
    • Title
    • Subject
    • Advisor
  • Search TDR
  • Rights Q&A
    • My Page
    • Receive email
      updates
    • Edit Profile
  1. NTU Theses and Dissertations Repository
  2. 管理學院
  3. 資訊管理學系
Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/100950
Title: 基於網頁截圖生成前端程式碼:結合主要配色與文字邊距擷取資訊
Webpage Code Generation from Screenshots with Dominant Color and Text Margin Extraction
Authors: 朱育萱
Yu-Hsuan Chu
Advisor: 陳炳宇
Bing-Yu Chen
Keyword: 多模態大型語言模型,網頁程式碼生成電腦視覺
Webpage Code Generation,MLLMVisual-to-Code
Publication Year : 2025
Degree: 碩士
Abstract: 目前實務上,將使用者介面設計草稿轉換為前端程式碼的過程通常需要透過人工方式進行,這不僅是一項繁瑣且耗時的任務,對於非技術領域的從業人員而言更是如此。然而,隨著多模態大型語言模型的迅速發展,將網頁截圖轉換為HTML/CSS 程式碼的相關研究已經取得了一些突破。儘管如此,現有大多數方法仍依賴於大量訓練數據集和龐大的運算資源來進行模型微調(fine-tuning),這凸顯出當前多模態大型語言模型在有效擷取網頁設計中的關鍵資訊方面仍存在限制。此外,即使透過嵌入對網頁的純文字描述來輔助前端網頁程式碼的生成,結果仍差強人意。
因此,本研究提出了一種創新方法,旨在通過直接將網頁設計中的關鍵元素(例如主要顏色、文字和邊界定位信息)融入到模型的 prompt 中,來改進多模態大型語言模型的生成效能。此方法不僅能夠避免耗時且資源密集的模型微調過程,還能提高模型生成前端程式碼的準確性和一致性。在我們的實驗結果顯示也證明將顏色和文字坐標信息結合於提示中,可以顯著提升模型在復現網頁截圖的精準度,即便我們的方法相比以往需要微調的模型,計算複雜度較低,卻能達到更高的效能和效率。
這項研究的貢獻在於,無需進行繁瑣的微調過程,我們的模型已能有效地捕捉網頁設計中最關鍵的視覺元素,並且能夠高效地生成對應的 HTML/CSS 程式碼,這為未來網頁設計自動化及前端開發工具的創新奠定了基礎。
Manually converting visual user interface designs into code is both complex and time-consuming, particularly for non-experts. With the rapid development of multi-modal large language models(MLLMs), there has been significant progress in generating accurate HTML/CSS code based on provided website screenshots. However, most existing approaches depend on extensive training datasets and costly model fine-tuning, underscoring the limitations of current models in capturing key web design features. Despite incorporating text descriptions of the given website screenshot into the prompts, the generated code remains significantly different from the ground truth. To address these challenges, this paper introduces a novel method leveraging MLLMs to embed dominant color codes and text margins extracted from website screenshots directly into the input prompt. Our approach aims to reduce reliance on large-scale training and fine-tuning while enhancing the extraction of web design elements critical for code generation. Experimental results demonstrate that incorporating color and positional information of text elements significantly improves the consistency and accuracy of translating website screenshots into functional code. Despite its low computational complexity, our method outperforms state-of-the-art methods on code generation tasks, achieving superior results without requiring additional model training or fine-tuning.
URI: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/100950
DOI: 10.6342/NTU202504621
Fulltext Rights: 同意授權(全球公開)
metadata.dc.date.embargo-lift: 2025-11-27
Appears in Collections:資訊管理學系

Files in This Item:
File SizeFormat 
ntu-114-1.pdf1.4 MBAdobe PDFView/Open
Show full item record


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved