請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/94575
標題: | 使用「前圖像描述」作為圖像生成的提示詞工程架構 Using "Pre-Iconographical Descriptions" as the Prompt Engineering Framework in Artificial Intelligence-Generated Images |
作者: | 陳冠邑 Guan-Yi Chen |
指導教授: | 陳達仁 Dar-Zen Chen |
關鍵字: | 圖像語義描述框架,電腦圖像,提示微調,提示學習,DALL·E 3,視覺分析,生成式人工智能, Image Semantic Description Framework,Computer Vision,Prompt Fine-Tuning,Prompt Learning,DALL·E 3,Visual Analysis,Generative artificial intelligence,Generative AI,GAI, |
出版年 : | 2024 |
學位: | 碩士 |
摘要: | 提示詞工程(Prompt Engineering)是自然語言處理界興起的學科,是對預訓練模型進行微調的方法。在自然語言處理中,這種策略利用了模型能夠根據上下文中的詞彙關係理解和生成圖像。然而,圖像資料本質上依賴於空間語義資訊,這與文本的線性的序列化特性不同,大眾使用者無法判斷如何組織出完整的圖像訊息,經常使得生成的圖像與期望有落差,反而須要更多時間調整提示詞。
因此,本研究藉由藝術鑑賞的前圖像描述方法學,將其標準化成提示詞框架,幫助使用者按照此結構化的步驟,有效的完善提示詞工程,達成透過人工智能模型,穩定將腦中想像畫面得到相似的生成圖像。因此,研究方法將前圖像描述(Pre-Iconographical Descriptions)結構化,分出三個層次:事實和事實的元素、表現含意和關係含意,讓使用者可按此進行標準化的提示詞工程。 為了驗證此方法有效,設計實驗分析,分成實驗組和對照組讓同一組提示詞生成多組圖像,實驗組須按照前圖像描述的框架撰寫提示詞,對照組則依照直覺撰寫提示詞。繼而將兩組提示詞放進「DALL·E 3」模型,各生成四張圖片,最後透過相性度匹配,並且,以140份問卷調查進行第三方的客觀性驗證。經過三組實驗測試確認前圖像描述的提示詞架構:(1)同一組提示詞生成多張圖像特徵具有更好的穩定度與相似度,(2)生成式模型能較完整的理解提示詞,且生成圖像符合提示詞內容;(3)能讓生成的多張圖像穩定的與「想像假設」的圖像有較相似的結果。為跨領域的方法學應用提供了新的視角和方法。 Prompt engineering is an emerging discipline in the field of natural language processing (NLP) that involves fine-tuning pre-trained models. In NLP, this strategy leverages the model's ability to understand and generate images based on the relationships between words in the context. However, image data inherently relies on spatial semantic information, which differs from the linear sequential characteristics of text. General users often struggle to organize complete image information, resulting in generated images that deviate from expectations, necessitating more time to adjust prompts. Therefore, this study utilizes the pre-iconographical description methodology from art appreciation, standardizing it into a prompt framework. This framework assists users in following a structured approach to effectively refine prompt engineering, enabling artificial intelligence models to consistently generate images that closely match the envisioned concepts. Consequently, the research method structures pre-iconographical descriptions into three levels: factual elements, representational meaning, and relational meaning, allowing users to conduct standardized prompt engineering accordingly. To validate the effectiveness of this method, an experimental analysis was designed, dividing participants into experimental and control groups. Both groups were tasked with generating multiple images using the same set of prompts. The experimental group wrote prompts following the pre-iconographical description framework, while the control group wrote prompts based on intuition. These prompts were then input into the "DALL·E 3" model to generate four images each. Finally, the results were objectively verified through a matching survey with 140 responses. After three sets of experimental tests, the pre-iconographical description prompt framework was confirmed to: (1) Produce more stable and similar features across multiple images generated from the same set of prompts. (2) Enable the generative model to better understand the prompts, resulting in images that accurately reflect the prompt content. (3) Consistently generate multiple images that closely resemble the "imagined hypothetical" image. This research provides a new perspective and methodology for the cross-disciplinary application of prompt engineering. |
URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/94575 |
DOI: | 10.6342/NTU202404068 |
全文授權: | 同意授權(全球公開) |
顯示於系所單位: | 工業工程學研究所 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-112-2.pdf | 2.64 MB | Adobe PDF | 檢視/開啟 |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。