基於大型語言模型的五項程式碼生成組件及其評估

林育辰; Yu-Chen Lin

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/94700

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	張智星	zh_TW
dc.contributor.advisor	Jyh-Shing Roger Jang	en
dc.contributor.author	林育辰	zh_TW
dc.contributor.author	Yu-Chen Lin	en
dc.date.accessioned	2024-08-16T17:36:06Z	-
dc.date.available	2024-08-22	-
dc.date.copyright	2024-08-16	-
dc.date.issued	2024	-
dc.date.submitted	2024-08-06	-
dc.identifier.citation	[1] A. Balaguer, V. Benara, R. L. de Freitas Cunha, R. de M. Estevão Filho, T. Hendry, D. Holstein, et al. Rag vs fine-tuning: Pipelines, tradeoffs, and a case study on agriculture. arXiv preprint arXiv:2401.08406, 2024. [2] R. A. Bradley and M. E. Terry. Rank analysis of incomplete block designs: I. the method of paired comparisons. Biometrika, 39(3/4):324–345, 1952. [3] W.-L. Chiang, L. Zheng, Y. Sheng, A. N. Angelopoulos, T. Li, D. Li, et al. Chatbot arena: An open platform for evaluating llms by human preference. arXiv preprint arXiv:2403.04132, 2024. [4] J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. Communications of the ACM, 51(1):107–113, 2008. [5] D. Dua, S. Gupta, S. Singh, and M. Gardner. Successive prompting for decomposing complex questions. arXiv preprint arXiv:2212.04092, 2022. [6] A. Dubey, A. Jauhri, A. Pandey, A. Kadian, A. Al-Dahle, A. Letman, et al. The llama 3 herd of kids. arXiv preprint arXiv:2407.21783, 2024. [7] D. Edge, H. Trinh, N. Cheng, J. Bradley, A. Cha, A. Mody, et al. From local to global: A graph rag approach to query-focused summarization. arXiv preprint arXiv:2404.16130, 2024. [8] B. Efron. Bootstrap methods: another look at the jackknife. In Breakthroughs in statistics: Methodology and distribution, pages 569–593. Springer, 1992. [9] A. Eliseev and D. Mazur. Fast inference of mixture-of-experts language models with offloading. arXiv preprint arXiv:2312.17238, 2023. [10] A. Elo. The Rating of Chessplayers: Past and Present. Ishi Press International, 2008. [11] Y. Gao, Y. Xiong, X. Gao, K. Jia, J. Pan, Y. Bi, et al. Retrieval-augmented generation for large language models: A survey. arXiv preprint arXiv:2312.10997, 2024. [12] M. E. Glickman and A. C. Jones. Rating the chess rating system. CHANCE-BERLIN THEN NEW YORK-, 12:21–28, 1999. [13] A. Gu and T. Dao. Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752, 2024. [14] Z. He, H. Wu, X. Zhang, X. Yao, S. Zheng, H. Zheng, et al. Chateda: A large language model powered autonomous agent for eda. In 2023 ACM/IEEE 5th Workshop on Machine Learning for CAD (MLCAD), pages 1–6. IEEE, 2023. [15] C.-Y. Hsieh, C.-L. Li, C.-K. Yeh, H. Nakhost, Y. Fujii, A. Ratner, et al. Distilling step-by-step! outperforming larger language models with less training data and smaller model sizes. arXiv preprint arXiv:2305.02301, 2023. [16] D. Huang, Q. Bu, J. M. Zhang, M. Luck, and H. Cui. Agentcoder: Multi-agent code generation with iterative testing and optimization. arXiv preprint arXiv:2312.13000, 2024. [17] A. Q. Jiang, A. Sablayrolles, A. Roux, A. Mensch, B. Savary, C. Bamford, et al. Mixtral of experts. arXiv preprint arXiv:2401.04088, 2024. [18] X. Jiang, Y. Dong, L. Wang, Z. Fang, Q. Shang, G. Li, et al. Self-planning code generation with large language models. arXiv preprint arXiv:2303.06689, 2023. [19] Z. Jiang, F. F. Xu, L. Gao, Z. Sun, Q. Liu, J. Dwivedi-Yu, et al. Active retrieval augmented generation. arXiv preprint arXiv:2305.06983, 2023. [20] T. Kojima, S. S. Gu, M. Reid, Y. Matsuo, and Y. Iwasawa. Large language models are zero-shot reasoners. arXiv preprint arXiv:2205.11916, 2023. [21] P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin, N. Goyal, et al. Retrieval-augmented generation for knowledge-intensive nlp tasks. arXiv preprint arXiv:2005.11401, 2021. [22] C. Li, J. Liang, A. Zeng, X. Chen, K. Hausman, D. Sadigh, et al. Chain of code: Reasoning with a language model-augmented code emulator. arXiv preprint arXiv:2312.04474, 2023. [23] C. Li, J. Wang, Y. Zhang, K. Zhu, W. Hou, J. Lian, et al. Large language models understand and can be enhanced by emotional stimuli. arXiv preprint arXiv:2307.11760, 2023. [24] Y.-C. Lin, A. Kumar, N. Chang, W. Zhang, M. Zakir, R. Apte, et al. Novel pre-processing technique for data embedding in engineering code generation using large language model. In 1st IEEE International Workshop on LLM-Aided Design, 2024. [25] Y.-C. Lin, A. Kumar, N. Chang, W. Zhang, M. Zakir, R. Apte, et al. Novel pre-processing technique for data embedding in engineering code generation using large language model. arXiv preprint arXiv:2311.16267, 2024. [26] M. Liu, N. Pinckney, B. Khailany, and H. Ren. Verilogeval: Evaluating large language models for verilog code generation. arXiv preprint arXiv:2309.07544, 2023. [27] S. Liu, W. Fang, Y. Lu, Q. Zhang, H. Zhang, and Z. Xie. Rtlcoder: Outperforming gpt-3.5 in design rtl generation with our open-source dataset and lightweight solution. In 1st IEEE International Workshop on LLM-Aided Design, 2024. [28] S. Minaee, T. Mikolov, N. Nikzad, M. Chengahlu, R. Socher, X. Amatriain, et al. Large language models: A survey. arXiv preprint arXiv:2402.06196, 2024. [29] A. Mitra, L. D. Corro, S. Mahajan, A. Codas, C. Simoes, S. Agarwal, et al. Orca 2: Teaching small language models how to reason. arXiv preprint arXiv:2311.11045, 2023. [30] E. Nijkamp, B. Pang, H. Hayashi, L. Tu, H. Wang, Y. Zhou, et al. Codegen: An open large language model for code with multi-turn program synthesis. arXiv preprint arXiv:2203.13474, 2023. [31] OpenAI, J. Achiam, S. Adler, S. Agarwal, L. Ahmad, I. Akkaya, et al. Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023. [32] B. Peng, C. Li, P. He, M. Galley, and J. Gao. Instruction tuning with gpt-4. arXiv preprint arXiv:2304.03277, 2023. [33] M. Schäfer, S. Nadi, A. Eghbali, and F. Tip. An empirical evaluation of using large language models for automated unit test generation. arXiv preprint arXiv:2302.06527, 2023. [34] G. Team, M. Riviere, S. Pathak, P. G. Sessa, C. Hardin, S. Bhupatiraju, et al. Gemma 2: Improving open language models at a practical size. arXiv preprint arXiv:2408.00118, 2024. [35] S. Thakur, B. Ahmad, H. Pearce, B. Tan, B. Dolan-Gavitt, R. Karri, et al. Verigen: A large language model for verilog code generation. arXiv preprint arXiv:2308.00708, 2023. [36] H. Touvron, L. Martin, K. Stone, P. Albert, A. Almahairi, Y. Babaei, et al. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023. [37] Y.-D. Tsai, M. Liu, and H. Ren. Rtlfixer: Automatically fixing rtl syntax errors with large language models. In Proceedings of the 61th ACM/IEEE Design Automation Conference (DAC’24), 2024. [38] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, et al. Attention is all you need. Advances in neural information processing systems, 30, 2017. [39] G. Xiao, Y. Tian, B. Chen, S. Han, and M. Lewis. Efficient streaming language models with attention sinks. arXiv preprint arXiv:2309.17453, 2023. [40] S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. Narasimhan, et al. React: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629, 2023. [41] W. X. Zhao, K. Zhou, J. Li, T. Tang, X. Wang, Y. Hou, et al. A survey of large language models. arXiv preprint arXiv:2303.18223, 2023. [42] H. S. Zheng, S. Mishra, X. Chen, H.-T. Cheng, E. H. Chi, Q. V. Le, et al. Take a step back: Evoking reasoning via abstraction in large language models. arXiv preprint arXiv:2310.06117, 2023. [43] L. Zheng, W.-L. Chiang, Y. Sheng, S. Zhuang, Z. Wu, Y. Zhuang, et al. Judging llm-as-a-judge with mt-bench and chatbot arena. arXiv preprint arXiv:2306.05685, 2023.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/94700	-
dc.description.abstract	本文提出五項基於大型語言模型的程式碼生成組件，用於特定領域的腳本生成，並評估其有效性。貢獻：(i) 基於大型語言模型的語義分割 (Semantic Splitter) 以及資料翻新 (Data Renovation) 組件以改進資料語義的表示；(ii) 運用大型語言模型重構以產生高品質程式碼的組件 Script Augmentation；(iii) 提出提示技術隱形知識擴展與思考 (Implicit Knowledge Expansion and Contemplation, IKEC) 組件；(iv) 提出程式碼生成的流程，以五項組件漸進式生成工程模擬軟體 RedHawk-SC 的程式碼；(v) 評估不同參考資料型態之於程式碼生成的有效性。零樣本連鎖思維 (Zero-shot Chain-of-Thought, ZCoT) 為有效的提示技術，包括在五項建設性組件中，以利評估其餘組件之有效性。我們邀請 28 位領域專家透過競技場式評估蒐集 187 份成對比較結果以驗證前述組件之有效性，其中最佳組件於工程軟體 RedHawk-SC 上 MapReduce 程式碼生成表現達到 21.26% 的勝率提升，相較零樣本連鎖思維 6.68% 勝率提升顯著許多。	zh_TW
dc.description.abstract	We propose five constructive components based on Large Language Models (LLMs) for domain-specific code generation and evaluate their effectiveness. The contributions are (i) Semantic splitter and data renovation for improved data semantic representation; (ii) Script augmentation for enhanced code quality; (iii) Implicit Knowledge Expansion and Contemplation (IKEC) prompting technique; (iv) A workflow using hierarchical generation for scripts in the engineering software RedHawk-SC; (v) An evaluation of different reference data types for code generation. We invited 28 domain experts to conduct an arena-style evaluation, collecting 187 paired comparisons to validate the effectiveness of those components. The best component achieved a 21.26% win rate improvement in MapReduce code generation performance for RedHawk-SC, significantly outperforming the 6.68% win rate improvement of the Zero-shot Chain-of-Thought (ZCoT).	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2024-08-16T17:36:06Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2024-08-16T17:36:06Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	Verification Letter from the Oral Examination Committee - i Acknowledgements -------------------------- iii 摘要 ------------------------------------- v Abstract ---------------------------------- vii Contents --------------------------------- ix List of Figures -------------------------- xv List of Tables --------------------------- xix Chapter 1 Introduction ------------------- 1 1.1 Research Topic ------------------------ 1 1.2 Motivation ---------------------------- 2 1.3 Challenges ---------------------------- 3 1.4 Contributions ------------------------- 5 1.5 Methodology Overview ------------------ 6 1.6 Overview of Engineering Tools Applicable to the Code Generator - 8 1.6.1 RedHawk-SC ----------------------- 8 1.6.1.1 RedHawk vs RedHawk-SC ------ 8 1.6.1.2 Technical Architecture ----- 9 1.6.2 MapReduce ----------------------- 10 1.7 Chapter Overview ---------------------- 11 Chapter 2 Literature Review -------------- 13 2.1 Large Language Models (LLMs) ---------- 13 2.1.1 Various Large Language Models ---- 14 2.1.2 Retrieval-Augmented Generation (RAG) - 15 2.1.3 RAG vs Fine-tuning --------------- 16 2.1.4 Enhancing Input Tokens ----------- 17 2.1.5 Prompt Techniques and Mechanisms - 17 2.1.6 Distillation --------------------- 18 2.2 Related Work in Code Generation ------- 18 2.3 Evaluation ---------------------------- 20 2.3.1 Chatbot Arena -------------------- 20 2.3.2 Elo ------------------------------ 21 2.3.3 Bradley-Terry Model -------------- 22 2.3.4 Bootstrap ------------------------ 23 Chapter 3 Methodology -------------------- 25 3.1 Problem Definition -------------------- 25 3.1.1 Technical Bottlenecks and Issues in Existing RAG Technologies - 26 3.1.2 Proposed Methods ----------------- 27 3.2 Method Description -------------------- 28 3.2.1 Semantic Splitter ---------------- 29 3.2.2 Data Renovation ------------------ 32 3.2.3 Script Augmentation -------------- 33 3.2.4 Implicit Knowledge Expansion and Contemplation (IKEC) - 35 3.2.5 Zero-shot Chain-of-Thought (ZCoT) - 35 3.2.6 Code Generation Pipeline --------- 36 3.2.6.1 Task Planning -------------- 36 3.2.6.2 Script Generation ---------- 36 3.3 Experimental Methods ------------------ 40 3.3.1 Component and Script Generation -- 40 3.3.2 Combinations --------------------- 41 3.3.3 Selection Strategy for Combinations - 43 3.3.4 Arena-style Evaluation ----------- 44 3.3.4.1 Explanation and Examples --- 45 3.4 Methodology Review -------------------- 46 Chapter 4 Datasets and Experimental Setup - 47 4.1 Application Scenarios ----------------- 47 4.2 Dataset ------------------------------- 48 4.2.1 RAG Reference -------------------- 48 4.2.2 Ansys-RHSC-20 -------------------- 49 4.3 Evaluation Metrics -------------------- 50 4.4 Machine Environment ------------------- 50 4.5 Experimental Environment -------------- 51 4.6 Experimental Parameters --------------- 52 4.7 Roadmap of Experiments ---------------- 53 Chapter 5 Experiments and Analysis -------- 55 5.1 Experiment 1: Arena-style Evaluation of Component Effectiveness in Code Generation - 55 5.1.1 Arena Information ---------------- 56 5.1.2 Pairwise Voting ------------------ 58 5.1.3 Pairwise Voting - Even Sample ---- 61 5.1.4 Elo Rating ----------------------- 62 5.1.5 Elo Rating - Even Sample --------- 62 5.1.6 Bradley-Terry Model -------------- 63 5.1.7 Bradley-Terry Model - Even Sample - 64 5.1.8 Ablation Study ------------------- 69 5.1.8.1 Semantic Splitter ---------- 69 5.1.8.2 Data Renovation ------------ 70 5.1.8.3 Script Augmentation -------- 71 5.1.8.4 Implicit Knowledge Expansion and Contemplation (IKEC) - 72 5.1.8.5 Zero-shot Chain-of-Thought (ZCoT) - 73 5.1.9 Comprehensive Explanation -------- 74 5.2 Experiment 2: Comparison of RAG Data Source Proportions in Different Component Combinations - 76 5.2.1 Data Preprocessing --------------- 76 5.2.2 Prompt Techniques ---------------- 79 Chapter 6 Conclusions --------------------- 81 6.1 Summary of Findings ------------------- 82 6.2 Future Prospects ---------------------- 82 References --------------------------------- 85	-
dc.language.iso	en	-
dc.subject	自然語言處理	zh_TW
dc.subject	大型語言模型	zh_TW
dc.subject	程式碼生成	zh_TW
dc.subject	資料嵌入	zh_TW
dc.subject	資料前處理	zh_TW
dc.subject	語義分割	zh_TW
dc.subject	資料翻新	zh_TW
dc.subject	腳本擴增	zh_TW
dc.subject	提示技術	zh_TW
dc.subject	Data Renovation	en
dc.subject	Natural Language Processing (NLP)	en
dc.subject	Semantic Splitter	en
dc.subject	Large Language Models (LLMs)	en
dc.subject	Code Generation	en
dc.subject	Data Embedding	en
dc.subject	Data Preprocessing	en
dc.subject	Prompt Techniques	en
dc.subject	Script Augmentation	en
dc.title	基於大型語言模型的五項程式碼生成組件及其評估	zh_TW
dc.title	Proposition and Evaluation of Five Constructive Components for Code Generation via Large Language Models	en
dc.type	Thesis	-
dc.date.schoolyear	112-2	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	李宏毅;張鴻嘉	zh_TW
dc.contributor.oralexamcommittee	Hung-yi Lee;Norman Chang	en
dc.subject.keyword	自然語言處理,大型語言模型,程式碼生成,資料嵌入,資料前處理,語義分割,資料翻新,腳本擴增,提示技術,	zh_TW
dc.subject.keyword	Natural Language Processing (NLP),Large Language Models (LLMs),Code Generation,Data Embedding,Data Preprocessing,Semantic Splitter,Data Renovation,Script Augmentation,Prompt Techniques,	en
dc.relation.page	90	-
dc.identifier.doi	10.6342/NTU202402417	-
dc.rights.note	同意授權(全球公開)	-
dc.date.accepted	2024-08-09	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	資訊工程學系	-
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-112-2.pdf	3.89 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。