請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/94575完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 陳達仁 | zh_TW |
| dc.contributor.advisor | Dar-Zen Chen | en |
| dc.contributor.author | 陳冠邑 | zh_TW |
| dc.contributor.author | Guan-Yi Chen | en |
| dc.date.accessioned | 2024-08-16T16:49:35Z | - |
| dc.date.available | 2024-08-17 | - |
| dc.date.copyright | 2024-08-16 | - |
| dc.date.issued | 2024 | - |
| dc.date.submitted | 2024-08-14 | - |
| dc.identifier.citation | 一、 期刊:(依字母及筆劃排列)
Bohnsack, R. (2009). The Interpretation of Pictures and the Documentary Method. Historical Social Research / Historische Sozialforschung, 34(2 (128)), 296–321. http://www.jstor.org/stable/20762367 Dong, Q., Li, L., Dai, D., Zheng, C., Wu, Z., Chang, B., Sun, X., Xu, J., Li, L., & Sui, Z. (2022, December 31). A survey on in-context learning. Manuscript submitted for publication. Emanuele, L. M., Petrov, A., Frieder, S., Weinhuber, C., Burnell, R., Nazar, R., Cohn, A. G., Shadbolt, N., & Wooldridge, M. (2023, September 28). Language Models as a Service: Overview of a New Paradigm and its Challenges. Manuscript submitted for publication. Karen markey. (1988). Access to Iconographical Research Collections. Library Trends, 37(2), 154–74. https://eric.ed.gov/?id=EJ382474 Karki, S., & Baral, R. K. (2023). Masquerading in the name of world peace: An analysis of Sunil Sigdel’s painting “Peace Owners II.” Cogent Arts & Humanities, 10(1). https://doi.org/10.1080/23311983.2023.2219489 Kim, H. S. A. (2010). Iconological Interpretation of Makeup depicted in Alexander McQueen’s Collection. Journal of the Korean Society of Costume, 60(10), 118-132. https://koreascience.kr/article/JAKO201014435571024.page Mitot, K., Kuan, V., & Sanusi, K. (2016). Sarawak Culture: A study on the cosmic influence on the bidayuh traditional woodcarving (‘Rasang’). Advanced Science Letters (Print), 22(5), 1156–1159. https://doi.org/10.1166/asl.2016.6629 Oppenlaender, J. (2023). A taxonomy of prompt modifiers for text-to-image generation. Behaviour & Information Technology, 1–14. https://doi.org/10.1080/0144929x.2023.2286532 Sartini, B. (2024, May). IICONGRAPH: improved Iconographic and Iconological Statements in Knowledge Graphs. In European Semantic Web Conference (pp. 57-74). Cham: Springer Nature Switzerland. Syahid, S., Wardani, W. G. W., & Wulandari, W. (2021). The Image of the Gunung Padang Site as A Cultural Heritage in the Perspective of Pre-Iconographical. Cultural Syndrome, 3(2), 83-99. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30. Wang, J., Liu, Z., Zhao, L., Wu, Z., Ma, C., Yu, S., ... & Zhang, S. (2023). Review of large vision models and visual prompt engineering. Meta-Radiology, 100047. White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., ... & Schmidt, D. C. (2023). A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382. Zhong, W., Gao, Y., Ding, N., Qin, Y., Liu, Z., Zhou, M., ... & Duan, N. (2022). ProQA: Structural prompt-based pre-training for unified question answering. arXiv preprint arXiv:2205.04040. 二、 學術研討會資料:(依字母及筆劃排列) Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever (2021, July 18-24). Learning transferable visual models from natural language supervision. Virtual: Proceedings of the 38th International Conference on Machine Learning. http://proceedings.mlr.press/v139/radford21a Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily L. Denton, Kamyar Ghasemipour, Raphael Gontijo Lopes, Burcu Karagol Ayan, Tim Salimans, Jonathan Ho, David J. Fleet, Mohammad Norouzi (2022, November 28 -December 9). Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding. New Orleans, United States: https://proceedings.neurips.cc/paper_files/paper/2022/hash/ec795aeadae0b7d230fa35cbaf04c041-Abstract-Conference.html Hao, Y., Chi, Z., Dong, L., & Wei, F. (2023). Optimizing prompts for Text-to-Image generation [NeurIPS Proceedings]. Microsoft Research. https://proceedings.neurips.cc/paper_files/paper/2023/file/d346d91999074dd8d6073d4c3b13733b-Paper-Conference.pdf Hong Sun, Xue Li, Yinchuan Xu, Youkow Homma, Qi Cao, Min Wu, Jian Jiao, Denis Charles (2023, July 13). AutoHint: Automatic Prompt Optimization with Hint Generation. [Workshop Presentation] Long Beach, United States: The 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. Karami, E., Prasad, S., & Shehata, M. (2015). Image matching using SIFT, SURF, BRIEF and ORB. St. John's, Canada: Conference: 2015 Newfoundland Electrical and Computer Engineering Conference. Lee, D., Song, S., Suh, J., Choi, J., Lee, S., & Kim, H. J. (2023). Read-only Prompt Optimization for Vision-Language Few-shot Learning. Paris, France: 2023 IEEE/CVF International Conference on Computer Vision (ICCV). https://doi.org/10.1109/iccv51070.2023.00135 Liu, V., & Chilton, L. B. (2022). Design guidelines for prompt Engineering Text-to-Image generative models. New Orleans, United States: CHI '22: Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. https://doi.org/10.1145/3491102.3501825 Mansimov, E., Parisotto, E., Ba, J. L., & Salakhutdinov, R. (2016). Generating Images from Captions with Attention. San Juan, Puerto Rico: ICLR 2016. https://doi.org/10.48550/arXiv.1511.02793 Min, S., Lewis, M., Zettlemoyer, L., & Hajishirzi, H. (2022). MetaICL: Learning to Learn in context. Seattle, United States: :2022 Conference of the North American Chapter of the Association for Computational Linguistics. https://doi.org/10.18653/v1/2022.naacl-main.201 Nichol, A., Dhariwal, P., Ramesh, A., Shyam, P., Mishkin, P., McGrew, B., Sutskever, I., & Chen, M. (2022). GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models. Virtual: 22” Proceedings of Machine Learning Research. https://proceedings.mlr.press/v162/nichol22a/nichol22a.pdf Raquel, P., Huerta, P., Doyle, P., Ip, E., Liberto, G., Higgins, D., Mcdonnell, R., Branigan, H., Gustafson, J., Mcmillan, D., Moore, R., & Cowan, B. (2023). Programming Without a programming Language: Challenges and opportunities for designing developer tools for prompt programming. (P. 1-7). New York, United States: CHI EA '23: Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems. Scott Reed, Zeynep Akata, Xinchen Yan, Lajanugen Logeswaran, Bernt Schiele, Honglak Lee. (2016, May 17). Generative adversarial text to image synthesis. Virtual: Proceedings of The 33rd International Conference on Machine Learning. https://proceedings.mlr.press/v48/reed16.html Sun, Z., Shen, Y., Zhou, Q., Zhang, H., Chen, Z., Cox, D., Yang, Y., & Gan, C. (2023, May 4). Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision. [Poster] Virtual: NeurIPS 2023. https://neurips.cc/virtual/2023/poster/70433 三、書籍 Panofsky, E. (1972). Studies in iconology: Humanistic Themes In The Art Of The Renaissance. Westview Press. 四、網路相關資源 (依字母及筆劃排列) Akshay K. (2024, March 13). Prompt Engineering: What it is and 15 techniques for effective AI prompting + tips. Hostinger Tutorials. OpenAI. Six strategies for getting better results. OpenAI Platform. https://platform.openai.co m/docs/guides/prompt-engineering Rebelo, M. (2023, May 25). How to write effective AI art prompts. Zapier. https://zapier.com/blog/ai-art-prompts/ | - |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/94575 | - |
| dc.description.abstract | 提示詞工程(Prompt Engineering)是自然語言處理界興起的學科,是對預訓練模型進行微調的方法。在自然語言處理中,這種策略利用了模型能夠根據上下文中的詞彙關係理解和生成圖像。然而,圖像資料本質上依賴於空間語義資訊,這與文本的線性的序列化特性不同,大眾使用者無法判斷如何組織出完整的圖像訊息,經常使得生成的圖像與期望有落差,反而須要更多時間調整提示詞。
因此,本研究藉由藝術鑑賞的前圖像描述方法學,將其標準化成提示詞框架,幫助使用者按照此結構化的步驟,有效的完善提示詞工程,達成透過人工智能模型,穩定將腦中想像畫面得到相似的生成圖像。因此,研究方法將前圖像描述(Pre-Iconographical Descriptions)結構化,分出三個層次:事實和事實的元素、表現含意和關係含意,讓使用者可按此進行標準化的提示詞工程。 為了驗證此方法有效,設計實驗分析,分成實驗組和對照組讓同一組提示詞生成多組圖像,實驗組須按照前圖像描述的框架撰寫提示詞,對照組則依照直覺撰寫提示詞。繼而將兩組提示詞放進「DALL·E 3」模型,各生成四張圖片,最後透過相性度匹配,並且,以140份問卷調查進行第三方的客觀性驗證。經過三組實驗測試確認前圖像描述的提示詞架構:(1)同一組提示詞生成多張圖像特徵具有更好的穩定度與相似度,(2)生成式模型能較完整的理解提示詞,且生成圖像符合提示詞內容;(3)能讓生成的多張圖像穩定的與「想像假設」的圖像有較相似的結果。為跨領域的方法學應用提供了新的視角和方法。 | zh_TW |
| dc.description.abstract | Prompt engineering is an emerging discipline in the field of natural language processing (NLP) that involves fine-tuning pre-trained models. In NLP, this strategy leverages the model's ability to understand and generate images based on the relationships between words in the context. However, image data inherently relies on spatial semantic information, which differs from the linear sequential characteristics of text. General users often struggle to organize complete image information, resulting in generated images that deviate from expectations, necessitating more time to adjust prompts.
Therefore, this study utilizes the pre-iconographical description methodology from art appreciation, standardizing it into a prompt framework. This framework assists users in following a structured approach to effectively refine prompt engineering, enabling artificial intelligence models to consistently generate images that closely match the envisioned concepts. Consequently, the research method structures pre-iconographical descriptions into three levels: factual elements, representational meaning, and relational meaning, allowing users to conduct standardized prompt engineering accordingly. To validate the effectiveness of this method, an experimental analysis was designed, dividing participants into experimental and control groups. Both groups were tasked with generating multiple images using the same set of prompts. The experimental group wrote prompts following the pre-iconographical description framework, while the control group wrote prompts based on intuition. These prompts were then input into the "DALL·E 3" model to generate four images each. Finally, the results were objectively verified through a matching survey with 140 responses. After three sets of experimental tests, the pre-iconographical description prompt framework was confirmed to: (1) Produce more stable and similar features across multiple images generated from the same set of prompts. (2) Enable the generative model to better understand the prompts, resulting in images that accurately reflect the prompt content. (3) Consistently generate multiple images that closely resemble the "imagined hypothetical" image. This research provides a new perspective and methodology for the cross-disciplinary application of prompt engineering. | en |
| dc.description.provenance | Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2024-08-16T16:49:35Z No. of bitstreams: 0 | en |
| dc.description.provenance | Made available in DSpace on 2024-08-16T16:49:35Z (GMT). No. of bitstreams: 0 | en |
| dc.description.tableofcontents | 口試委員會審定書 i
謝辭 ii 摘要 iii Abstract iv 目次 vi 圖次 vii 表次 ix 第一章 緒論 1 第一節 研究動機 1 第二節 研究目的 2 第二章 文獻回顧 4 第一節 前圖像描述之方法學的應用研究 4 第二節 提示詞工程在生成式語言模型的討論與指引 6 第三章 研究方法 9 第一節 應用前圖像描述結構的提示詞工程 9 第二節 提示詞工程之實驗設計 12 第四章 實驗成果與分析 16 第一節 提示詞工程之實驗過程 16 第二節 實驗組和對照組生成圖像的相似度比對 21 第三節 研究方法的客觀有效性驗證 31 第五章 結論 33 參考文獻 40 附錄:客觀有效性驗證之問卷調查 46 | - |
| dc.language.iso | zh_TW | - |
| dc.subject | 視覺分析 | zh_TW |
| dc.subject | DALL·E 3 | zh_TW |
| dc.subject | 圖像語義描述框架 | zh_TW |
| dc.subject | 電腦圖像 | zh_TW |
| dc.subject | 生成式人工智能 | zh_TW |
| dc.subject | 提示微調 | zh_TW |
| dc.subject | 提示學習 | zh_TW |
| dc.subject | Prompt Learning | en |
| dc.subject | DALL·E 3 | en |
| dc.subject | Visual Analysis | en |
| dc.subject | Generative artificial intelligence | en |
| dc.subject | Generative AI | en |
| dc.subject | GAI | en |
| dc.subject | Prompt Fine-Tuning | en |
| dc.subject | Computer Vision | en |
| dc.subject | Image Semantic Description Framework | en |
| dc.title | 使用「前圖像描述」作為圖像生成的提示詞工程架構 | zh_TW |
| dc.title | Using "Pre-Iconographical Descriptions" as the Prompt Engineering Framework in Artificial Intelligence-Generated Images | en |
| dc.type | Thesis | - |
| dc.date.schoolyear | 112-2 | - |
| dc.description.degree | 碩士 | - |
| dc.contributor.oralexamcommittee | 洪一薰;黃奎隆;梁容輝 | zh_TW |
| dc.contributor.oralexamcommittee | I-Hsuan Hong;Kwei-Long huang;Rung-Huei Liang | en |
| dc.subject.keyword | 圖像語義描述框架,電腦圖像,提示微調,提示學習,DALL·E 3,視覺分析,生成式人工智能, | zh_TW |
| dc.subject.keyword | Image Semantic Description Framework,Computer Vision,Prompt Fine-Tuning,Prompt Learning,DALL·E 3,Visual Analysis,Generative artificial intelligence,Generative AI,GAI, | en |
| dc.relation.page | 51 | - |
| dc.identifier.doi | 10.6342/NTU202404068 | - |
| dc.rights.note | 同意授權(全球公開) | - |
| dc.date.accepted | 2024-08-14 | - |
| dc.contributor.author-college | 工學院 | - |
| dc.contributor.author-dept | 工業工程學研究所 | - |
| 顯示於系所單位: | 工業工程學研究所 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-112-2.pdf | 2.64 MB | Adobe PDF | 檢視/開啟 |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
