TALM：應用於可擴展程式碼生成的具長期記憶之樹狀多代理協作框架

沈明彤; Ming-Tung Shen

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/100601

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	莊裕澤	zh_TW
dc.contributor.advisor	Yuh-Jzer Joung	en
dc.contributor.author	沈明彤	zh_TW
dc.contributor.author	Ming-Tung Shen	en
dc.date.accessioned	2025-10-08T16:04:42Z	-
dc.date.available	2025-10-09	-
dc.date.copyright	2025-10-08	-
dc.date.issued	2025	-
dc.date.submitted	2025-08-08	-
dc.identifier.citation	[1] X. Jiang, Y. Dong, L. Wang, Z. Fang, Q. Shang, G. Li, Z. Jin, and W. Jiao, “Self-planning code generation with large language models” ACM Transactions on Software Engineering and Methodology, vol. 33, no. 7, pp. 1–30, 2024. [2] M. A. Islam, M. E. Ali, and M. R. Parvez, “MapCoder: Multi-agent code generation for competitive problem solving” in Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Bangkok, Thailand: Association for Computational Linguistics, Aug. 2024, pp. 4912–4944. [3] X. Du, M. Liu, K. Wang, H. Wang, J. Liu, Y. Chen, J. Feng, C. Sha, X. Peng, and Y. Lou, “Classeval: A manually-crafted benchmark for evaluating llms on class-level code generation” 2023. [4] R. Li, L. B. Allal, Y. Zi, N. Muennighoff, D. Kocetkov, C. Mou, M. Marone, C. Akiki, J. Li, J. Chim et al., “Starcoder: may the source be with you!” Transactions on Machine Learning Research, 2023. [5] Y. Wang, W. Wang, S. Joty, and S. C. Hoi, “CodeT5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation” in Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Nov. 2021, pp. 8696–8708. [6] Z. Luo, C. Xu, P. Zhao, Q. Sun, X. Geng, W. Hu, C. Tao, J. Ma, Q. Lin, and D. Jiang, “Wizardcoder: Empowering code large language models with evol-instruct” in The Twelfth International Conference on Learning Representations, 2024. [7] K. Zhang, J. Li, G. Li, X. Shi, and Z. Jin, “CodeAgent: Enhancing code generation with tool-integrated agent systems for real-world repo-level coding challenges” in Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Aug. 2024, pp. 13 643–13 658. [8] M. Levy, A. Jacoby, and Y. Goldberg, “Same task, more tokens: the impact of input length on the reasoning performance of large language models” in Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), L.-W. Ku, A. Martins, and V. Srikumar, Eds. Bangkok, Thailand: Association for Computational Linguistics, Aug. 2024, pp. 15 339–15 353. [9] U. Shaham, M. Ivgi, A. Efrat, J. Berant, and O. Levy, “ZeroSCROLLS: A zero-shot benchmark for long text understanding” in Findings of the Association for Computational Linguistics: EMNLP 2023, H. Bouamor, J. Pino, and K. Bali, Eds. Singapore: Association for Computational Linguistics, Dec. 2023, pp. 7977–7989. [10] J. Li, M. Wang, Z. Zheng, and M. Zhang, “LooGLE: Can long-context language models understand long contexts?” in Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), L.-W. Ku, A. Martins, and V. Srikumar, Eds. Bangkok, Thailand: Association for Computational Linguistics, Aug. 2024, pp. 16 304–16 333. [11] Y. Ishibashi and Y. Nishimura, “Self-organized agents: A llm multi-agent framework toward ultra large-scale code generation and optimization” arXiv preprint arXiv:2404.02183, 2024. [12] S. Hong, M. Zhuge, J. Chen, X. Zheng, Y. Cheng, J. Wang, C. Zhang, Z. Wang, S. K. S. Yau, Z. Lin, L. Zhou, C. Ran, L. Xiao, C. Wu, and J. Schmidhuber, “MetaGPT: Meta programming for a multi-agent collaborative framework” in The Twelfth International Conference on Learning Representations, 2024. [13] W. W. Royce, “Managing the development of large software systems: concepts and techniques” in Proceedings of the 9th international conference on Software Engineering, 1987, pp. 328–338. [14] C. Qian, W. Liu, H. Liu, N. Chen, Y. Dang, J. Li, C. Yang, W. Chen, Y. Su, X. Cong et al., “Chatdev: Communicative agents for software development” in Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024, pp. 15 174–15 186. [15] Y. Hu, Y. Cai, Y. Du, X. Zhu, X. Liu, Z. Yu, Y. Hou, S. Tang, and S. Chen, “Self-evolving multi-agent collaboration networks for software development” in The Thirteenth International Conference on Learning Representations, 2025. [16] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need” Advances in neural information processing systems, vol. 30, 2017. [17] J. L. Elman, “Finding structure in time” Cognitive science, vol. 14, no. 2, pp. 179–211, 1990. [18] S. Hochreiter and J. Schmidhuber, “Long short-term memory” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997. [19] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks” Advances in neural information processing systems, vol. 25, 2012. [20] A. Radford, K. Narasimhan, T. Salimans, I. Sutskever et al., “Improving language understanding by generative pre-training” 2018. [21] J. Achiam, S. Adler, S. Agarwal, L. Ahmad, I. Akkaya, F. L. Aleman, D. Almeida, J. Altenschmidt, S. Altman, S. Anadkat et al., “Gpt-4 technical report” arXiv preprint arXiv:2303.08774, 2023. [22] H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozière, N. Goyal, E. Hambro, F. Azhar et al., “Llama: Open and efficient foundation language models” arXiv preprint arXiv:2302.13971, 2023. [23] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding” in Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), 2019, pp. 4171–4186. [24] C. Christophe, P. Kanithi, P. Munjal, T. Raha, N. Hayat, R. Rajan, A. A. Mahrooqi, A. Gupta, M. U. Salman, M. A. Pimentel, S. Khan, and B. B. Amor, “Med42 - evaluating fine-tuning strategies for medical LLMs: Full-parameter vs. parameter-efficient approaches” in AAAI 2024 Spring Symposium on Clinical Foundation Models, 2024. [25] K. Lv, Y. Yang, T. Liu, Q. Guo, and X. Qiu, “Full parameter fine-tuning for large language models with limited resources” in Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Bangkok, Thailand: Association for Computational Linguistics, Aug. 2024, pp. 8187–8198. [26] Y. Liu, Y. Zhang, Q. Li, T. Liu, S. Feng, D. Wang, Y. Zhang, and H. Schuetze, “HiFT: A hierarchical full parameter fine-tuning strategy” in Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. Miami, Florida, USA: Association for Computational Linguistics, Nov. 2024, pp. 18 266–18 287. [27] E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, W. Chen et al., “Lora: Low-rank adaptation of large language models.” ICLR, vol. 1, no. 2, p. 3, 2022. [28] X. L. Li and P. Liang, “Prefix-tuning: Optimizing continuous prompts for generation” arXiv preprint arXiv:2101.00190, 2021. [29] H. Liu, D. Tam, M. Muqeeth, J. Mohta, T. Huang, M. Bansal, and C. A. Raffel, “Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning” Advances in Neural Information Processing Systems, vol. 35, pp. 1950–1965, 2022. [30] X. Zhou, J. He, Y. Ke, G. Zhu, V. Gutiérrez-Basulto, and J. Z. Pan, “An empirical study on parameter-efficient fine-tuning for multimodal large language models” arXiv preprint arXiv:2406.05130, 2024. [31] B. Liu, C. Chen, Z. Gong, C. Liao, H. Wang, Z. Lei, M. Liang, D. Chen, M. Shen, H. Zhou et al., “Mftcoder: Boosting code llms with multitask fine-tuning” in Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2024, pp. 5430–5441. [32] T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell et al., “Language models are few-shot learners” Advances in neural information processing systems, vol. 33, pp. 1877–1901, 2020. [33] J. Wei, X. Wang, D. Schuurmans, M. Bosma, F. Xia, E. Chi, Q. V. Le, D. Zhou et al., “Chain-of-thought prompting elicits reasoning in large language models” Advances in neural information processing systems, vol. 35, pp. 24 824–24 837, 2022. [34] K. Cobbe, V. Kosaraju, M. Bavarian, M. Chen, H. Jun, L. Kaiser, M. Plappert, J. Tworek, J. Hilton, R. Nakano, C. Hesse, and J. Schulman, “Training verifiers to solve math word problems” arXiv preprint arXiv:2110.14168, 2021. [35] S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. Narasimhan, and Y. Cao, “React: Synergizing reasoning and acting in language models” in International Conference on Learning Representations (ICLR), 2023. [36] S. Yao, D. Yu, J. Zhao, I. Shafran, T. Griffiths, Y. Cao, and K. Narasimhan, “Tree of thoughts: Deliberate problem solving with large language models” Advances in neural information processing systems, vol. 36, pp. 11 809–11 822, 2023. [37] N. Shinn, F. Cassano, A. Gopinath, K. Narasimhan, and S. Yao, “Reflexion: Language agents with verbal reinforcement learning” Advances in Neural Information Processing Systems, vol. 36, pp. 8634–8652, 2023. [38] S. Yeo, S.-W. Hwang, and Y.-S. Ma, “Chain of grounded objectives: Concise goal-oriented prompting for code generation,” in 39th European Conference on Object-Oriented Programming (ECOOP 2025). Schloss Dagstuhl–Leibniz-Zentrum für Informatik, 2025, pp. 35–1. [39] E. Zelikman, Q. Huang, G. Poesia, N. Goodman, and N. Haber, “Parsel: Algorithmic reasoning with language models by composing decompositions” Advances in Neural Information Processing Systems, vol. 36, pp. 31 466–31 523, 2023. [40] Y. Lu, H. Yu, and D. Khashabi, “Gear: Augmenting language models with generalizable and efficient tool resolution” arXiv preprint arXiv:2307.08775, 2023. [41] Q. Wu, G. Bansal, J. Zhang, Y. Wu, B. Li, E. Zhu, L. Jiang, X. Zhang, S. Zhang, J. Liu et al., “Autogen: Enabling next-gen llm applications via multi-agent conversation” arXiv preprint arXiv:2308.08155, 2023. [42] C. Qian, C. Han, Y. Fung, Y. Qin, Z. Liu, and H. Ji, “CREATOR: Tool creation for disentangling abstract and concrete reasoning of large language models” in The 2023 Conference on Empirical Methods in Natural Language Processing, 2023. [43] Y. Wang, Z. Wu, J. Yao, and J. Su, “Tdag: A multi-agent framework based on dynamic task decomposition and agent generation” Neural Networks, p. 107200, 2025. [44] J. Chen, S. Saha, and M. Bansal, “ReConcile: Round-table conference improves reasoning via consensus among diverse LLMs” in Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), L.-W. Ku, A. Martins, and V. Srikumar, Eds. Bangkok, Thailand: Association for Computational Linguistics, Aug. 2024, pp. 7066–7085. [45] Y. Yang, H. Chai, S. Shao, Y. Song, S. Qi, R. Rui, and W. Zhang, “Agentnet: Decentralized evolutionary coordination for llm-based multi-agent systems” arXiv preprint arXiv:2504.00587, 2025. [46] Z. Ji, N. Lee, R. Frieske, T. Yu, D. Su, Y. Xu, E. Ishii, Y. J. Bang, A. Madotto, and P. Fung, “Survey of hallucination in natural language generation” ACM Comput. Surv., vol. 55, no. 12, Mar. 2023. [47] N. Reimers and I. Gurevych, “Sentence-BERT: Sentence embeddings using Siamese BERT-networks” in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Hong Kong, China: Association for Computational Linguistics, Nov. 2019, pp. 3982–3992. [48] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettle-moyer, and V. Stoyanov, “Roberta: A robustly optimized bert pretraining approach” 2019. [49] M. Chen, J. Tworek, H. Jun, Q. Yuan, H. P. D. O. Pinto, J. Kaplan, H. Edwards, Y. Burda, N. Joseph, G. Brockman et al., “Evaluating large language models trained on code” arXiv preprint arXiv:2107.03374, 2021. [50] T. Y. Zhuo, M. C. Vu, J. Chim, H. Hu, W. Yu, R. Widyasari, I. N. B. Yusuf, H. Zhan, J. He, I. Paul, S. Brunner, C. Gong, J. Hoang, A. R. Zebaze, X. Hong, W.-D. Li, J. Kaddour, M. Xu, Z. Zhang, P. Yadav, and et al., “Bigcodebench: Benchmarking code generation with diverse function calls and complex instructions.” in ICLR, 2025.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/100601	-
dc.description.abstract	程式碼生成任務被視為最具工程應用潛力的代理落地場域，對語意理解、邏輯推理與錯誤修正能力皆具高度要求。儘管大型語言模型（LLMs）在自然語言處理領域表現卓越，然而在面對結構複雜、需多階段推理的任務時，仍顯能力不足。為突破此侷限，過去研究陸續發展多代理架構，試圖透過多個代理間的互動協作來處理複雜推理任務，但往往受限於流程僵化、錯誤修正成本高與缺乏經驗累積等問題。為此，本研究提出 TALM（Tree-Structured Multi-Agent Framework with Long-Term Memory），一套結合結構化任務拆解、區域性重推理與長期記憶模組的多代理框架。TALM 採用可拓展的樹狀協作結構，配合 divide and conquer 的策略，提升應對不同任務規模時的推理彈性；並進一步利用樹狀架構明確的父子關係，在推理錯誤發生時，僅需針對特定子樹進行局部重推理，顯著提升錯誤修正效率。同時，本研究亦設計以向量資料庫為基礎的長期記憶模組，支援知識的語意查詢與整合，實現隱性的自我強化學習機制。實驗結果顯示，TALM 在 HumanEval、BigCodeBench 與 ClassEval 等程式碼生成基準資料集上皆表現優異，展現穩定的推理效能與良好的詞元效率，證實本研究所提框架在處理高複雜度任務時的實用性與推論價值。	zh_TW
dc.description.abstract	Code generation presents a promising domain for deploying agents in real-world engineering, requiring strong semantic understanding, reasoning, and error correction. While LLMs excel in natural language tasks, they struggle with complex, multi-step reasoning. Prior multi-agent frameworks aim to address this via collaboration but often suffer from rigid workflows, high correction costs, and lack of memory accumulation. To address these challenges, we propose TALM (Tree-Structured Multi-Agent Framework with Long-Term Memory), a framework that integrates structured task decomposition, localized re-reasoning, and long-term memory. TALM adopts an extensible tree-based collaboration structure, which, combined with a divide-and-conquer strategy, enhances reasoning flexibility across varying task scopes. The clear parent-child relationships in the tree enable efficient subtree-level error correction when reasoning flaws occur. In addition, a long-term memory module built on a vector database supports semantic querying and integration of prior knowledge, enabling implicit self-improvement through experience reuse. Experimental results on HumanEval, BigCodeBench, and ClassEval benchmarks show that TALM consistently achieves strong reasoning performance and high token efficiency, demonstrating its practical value and robustness in solving complex code generation tasks.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2025-10-08T16:04:42Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2025-10-08T16:04:42Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	誌謝 i 摘要 ii Abstract iii 目次 v 圖次 viii 表次 ix 第一章緒論 1 1.1 研究背景及動機 1 1.2 研究目的 4 1.3 論文架構 5 第二章文獻回顧 6 2.1 大型語言模型與程式碼生成 6 2.1.1 大型語言模型 6 2.1.2 預訓練與微調 8 2.1.3 針對程式生成任務訓練的語言模型 9 2.2 提示詞框架 11 2.2.1 強化推理能力的提示詞框架 11 2.2.2 強化程式能力的提示詞框架 13 2.3 多代理協作框架 14 2.3.1 通用多代理框架 15 2.3.2 應用於程式生成任務的多代理框架 16 2.4 總結 18 第三章研究方法 19 3.1 框架概述 19 3.2 TALM 之工作流程 21 3.3 部分重推理機制 25 3.4 長期記憶模組 29 3.5 總結 31 第四章實驗結果 33 4.1 實驗設計 33 4.1.1 資料集 33 4.1.2 比較基準線 34 4.1.3 實驗設置 35 4.2 實驗結果 36 4.2.1 效能評估 37 4.2.2 消融實驗 41 4.2.2.1 導入長期記憶模組前後的效能比較 41 4.2.2.2 導入重推理機制前後的效能比較 42 4.2.2.3 不同樹高 m 的效能比較 43 4.2.2.4 不同分歧度 n 的效能比較 44 4.3 總結 45 第五章結論與未來展望 47 5.1 結論 47 5.2 未來展望 48 5.3 研究限制 49 參考文獻 50 附錄 A — TALM 生成之程式碼展示與比較 58	-
dc.language.iso	zh_TW	-
dc.subject	大型語言模型,人工智慧代理,多代理框架,樹狀結構協作,	zh_TW
dc.subject	Large Language Model,AI Agent,Multi-Agent Framework,Tree-Structured Collaboration,	en
dc.title	TALM：應用於可擴展程式碼生成的具長期記憶之樹狀多代理協作框架	zh_TW
dc.title	TALM: Tree-Structured Multi-Agent Framework with Long-Term Memory for Scalable Code Generation	en
dc.type	Thesis	-
dc.date.schoolyear	113-2	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	魏志平;陳建錦;林俊叡;楊立偉	zh_TW
dc.contributor.oralexamcommittee	Chih-Ping Wei;Chien-Chin Chen;June-Ray Lin;Li-wei Yang	en
dc.relation.page	63	-
dc.identifier.doi	10.6342/NTU202504045	-
dc.rights.note	未授權	-
dc.date.accepted	2025-08-13	-
dc.contributor.author-college	管理學院	-
dc.contributor.author-dept	資訊管理學系	-
dc.date.embargo-lift	N/A	-
顯示於系所單位：	資訊管理學系

文件中的檔案：

檔案	大小	格式
ntu-113-2.pdf 未授權公開取用	1.41 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。