Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 管理學院
  3. 資訊管理學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/100601
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor莊裕澤zh_TW
dc.contributor.advisorYuh-Jzer Joungen
dc.contributor.author沈明彤zh_TW
dc.contributor.authorMing-Tung Shenen
dc.date.accessioned2025-10-08T16:04:42Z-
dc.date.available2025-10-09-
dc.date.copyright2025-10-08-
dc.date.issued2025-
dc.date.submitted2025-08-08-
dc.identifier.citation[1] X. Jiang, Y. Dong, L. Wang, Z. Fang, Q. Shang, G. Li, Z. Jin, and W. Jiao, “Self-planning code generation with large language models” ACM Transactions on Software Engineering and Methodology, vol. 33, no. 7, pp. 1–30, 2024.
[2] M. A. Islam, M. E. Ali, and M. R. Parvez, “MapCoder: Multi-agent code generation for competitive problem solving” in Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Bangkok, Thailand: Association for Computational Linguistics, Aug. 2024, pp. 4912–4944.
[3] X. Du, M. Liu, K. Wang, H. Wang, J. Liu, Y. Chen, J. Feng, C. Sha, X. Peng, and Y. Lou, “Classeval: A manually-crafted benchmark for evaluating llms on class-level code generation” 2023.
[4] R. Li, L. B. Allal, Y. Zi, N. Muennighoff, D. Kocetkov, C. Mou, M. Marone, C. Akiki, J. Li, J. Chim et al., “Starcoder: may the source be with you!” Transactions on Machine Learning Research, 2023.
[5] Y. Wang, W. Wang, S. Joty, and S. C. Hoi, “CodeT5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation” in Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Nov. 2021, pp. 8696–8708.
[6] Z. Luo, C. Xu, P. Zhao, Q. Sun, X. Geng, W. Hu, C. Tao, J. Ma, Q. Lin, and D. Jiang, “Wizardcoder: Empowering code large language models with evol-instruct” in The Twelfth International Conference on Learning Representations, 2024.
[7] K. Zhang, J. Li, G. Li, X. Shi, and Z. Jin, “CodeAgent: Enhancing code generation with tool-integrated agent systems for real-world repo-level coding challenges” in Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Aug. 2024, pp. 13 643–13 658.
[8] M. Levy, A. Jacoby, and Y. Goldberg, “Same task, more tokens: the impact of input length on the reasoning performance of large language models” in Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), L.-W. Ku, A. Martins, and V. Srikumar, Eds. Bangkok, Thailand: Association for Computational Linguistics, Aug. 2024, pp. 15 339–15 353.
[9] U. Shaham, M. Ivgi, A. Efrat, J. Berant, and O. Levy, “ZeroSCROLLS: A zero-shot benchmark for long text understanding” in Findings of the Association for Computational Linguistics: EMNLP 2023, H. Bouamor, J. Pino, and K. Bali, Eds. Singapore: Association for Computational Linguistics, Dec. 2023, pp. 7977–7989.
[10] J. Li, M. Wang, Z. Zheng, and M. Zhang, “LooGLE: Can long-context language models understand long contexts?” in Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), L.-W. Ku, A. Martins, and V. Srikumar, Eds. Bangkok, Thailand: Association for Computational Linguistics, Aug. 2024, pp. 16 304–16 333.
[11] Y. Ishibashi and Y. Nishimura, “Self-organized agents: A llm multi-agent framework toward ultra large-scale code generation and optimization” arXiv preprint arXiv:2404.02183, 2024.
[12] S. Hong, M. Zhuge, J. Chen, X. Zheng, Y. Cheng, J. Wang, C. Zhang, Z. Wang, S. K. S. Yau, Z. Lin, L. Zhou, C. Ran, L. Xiao, C. Wu, and J. Schmidhuber, “MetaGPT: Meta programming for a multi-agent collaborative framework” in The Twelfth International Conference on Learning Representations, 2024.
[13] W. W. Royce, “Managing the development of large software systems: concepts and techniques” in Proceedings of the 9th international conference on Software Engineering, 1987, pp. 328–338.
[14] C. Qian, W. Liu, H. Liu, N. Chen, Y. Dang, J. Li, C. Yang, W. Chen, Y. Su, X. Cong et al., “Chatdev: Communicative agents for software development” in Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024, pp. 15 174–15 186.
[15] Y. Hu, Y. Cai, Y. Du, X. Zhu, X. Liu, Z. Yu, Y. Hou, S. Tang, and S. Chen, “Self-evolving multi-agent collaboration networks for software development” in The Thirteenth International Conference on Learning Representations, 2025.
[16] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need” Advances in neural information processing systems, vol. 30, 2017.
[17] J. L. Elman, “Finding structure in time” Cognitive science, vol. 14, no. 2, pp. 179–211, 1990.
[18] S. Hochreiter and J. Schmidhuber, “Long short-term memory” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997.
[19] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks” Advances in neural information processing systems, vol. 25, 2012.
[20] A. Radford, K. Narasimhan, T. Salimans, I. Sutskever et al., “Improving language understanding by generative pre-training” 2018.
[21] J. Achiam, S. Adler, S. Agarwal, L. Ahmad, I. Akkaya, F. L. Aleman, D. Almeida, J. Altenschmidt, S. Altman, S. Anadkat et al., “Gpt-4 technical report” arXiv preprint arXiv:2303.08774, 2023.
[22] H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozière, N. Goyal, E. Hambro, F. Azhar et al., “Llama: Open and efficient foundation language models” arXiv preprint arXiv:2302.13971, 2023.
[23] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding” in Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), 2019, pp. 4171–4186.
[24] C. Christophe, P. Kanithi, P. Munjal, T. Raha, N. Hayat, R. Rajan, A. A. Mahrooqi, A. Gupta, M. U. Salman, M. A. Pimentel, S. Khan, and B. B. Amor, “Med42 - evaluating fine-tuning strategies for medical LLMs: Full-parameter vs. parameter-efficient approaches” in AAAI 2024 Spring Symposium on Clinical Foundation Models, 2024.
[25] K. Lv, Y. Yang, T. Liu, Q. Guo, and X. Qiu, “Full parameter fine-tuning for large language models with limited resources” in Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Bangkok, Thailand: Association for Computational Linguistics, Aug. 2024, pp. 8187–8198.
[26] Y. Liu, Y. Zhang, Q. Li, T. Liu, S. Feng, D. Wang, Y. Zhang, and H. Schuetze, “HiFT: A hierarchical full parameter fine-tuning strategy” in Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. Miami, Florida, USA: Association for Computational Linguistics, Nov. 2024, pp. 18 266–18 287.
[27] E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, W. Chen et al., “Lora: Low-rank adaptation of large language models.” ICLR, vol. 1, no. 2, p. 3, 2022.
[28] X. L. Li and P. Liang, “Prefix-tuning: Optimizing continuous prompts for generation” arXiv preprint arXiv:2101.00190, 2021.
[29] H. Liu, D. Tam, M. Muqeeth, J. Mohta, T. Huang, M. Bansal, and C. A. Raffel, “Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning” Advances in Neural Information Processing Systems, vol. 35, pp. 1950–1965, 2022.
[30] X. Zhou, J. He, Y. Ke, G. Zhu, V. Gutiérrez-Basulto, and J. Z. Pan, “An empirical study on parameter-efficient fine-tuning for multimodal large language models” arXiv preprint arXiv:2406.05130, 2024.
[31] B. Liu, C. Chen, Z. Gong, C. Liao, H. Wang, Z. Lei, M. Liang, D. Chen, M. Shen, H. Zhou et al., “Mftcoder: Boosting code llms with multitask fine-tuning” in Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2024, pp. 5430–5441.
[32] T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell et al., “Language models are few-shot learners” Advances in neural information processing systems, vol. 33, pp. 1877–1901, 2020.
[33] J. Wei, X. Wang, D. Schuurmans, M. Bosma, F. Xia, E. Chi, Q. V. Le, D. Zhou et al., “Chain-of-thought prompting elicits reasoning in large language models” Advances in neural information processing systems, vol. 35, pp. 24 824–24 837, 2022.
[34] K. Cobbe, V. Kosaraju, M. Bavarian, M. Chen, H. Jun, L. Kaiser, M. Plappert, J. Tworek, J. Hilton, R. Nakano, C. Hesse, and J. Schulman, “Training verifiers to solve math word problems” arXiv preprint arXiv:2110.14168, 2021.
[35] S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. Narasimhan, and Y. Cao, “React: Synergizing reasoning and acting in language models” in International Conference on Learning Representations (ICLR), 2023.
[36] S. Yao, D. Yu, J. Zhao, I. Shafran, T. Griffiths, Y. Cao, and K. Narasimhan, “Tree of thoughts: Deliberate problem solving with large language models” Advances in neural information processing systems, vol. 36, pp. 11 809–11 822, 2023.
[37] N. Shinn, F. Cassano, A. Gopinath, K. Narasimhan, and S. Yao, “Reflexion: Language agents with verbal reinforcement learning” Advances in Neural Information Processing Systems, vol. 36, pp. 8634–8652, 2023.
[38] S. Yeo, S.-W. Hwang, and Y.-S. Ma, “Chain of grounded objectives: Concise goal-oriented prompting for code generation,” in 39th European Conference on Object-Oriented Programming (ECOOP 2025). Schloss Dagstuhl–Leibniz-Zentrum für Informatik, 2025, pp. 35–1.
[39] E. Zelikman, Q. Huang, G. Poesia, N. Goodman, and N. Haber, “Parsel: Algorithmic reasoning with language models by composing decompositions” Advances in Neural Information Processing Systems, vol. 36, pp. 31 466–31 523, 2023.
[40] Y. Lu, H. Yu, and D. Khashabi, “Gear: Augmenting language models with generalizable and efficient tool resolution” arXiv preprint arXiv:2307.08775, 2023.
[41] Q. Wu, G. Bansal, J. Zhang, Y. Wu, B. Li, E. Zhu, L. Jiang, X. Zhang, S. Zhang, J. Liu et al., “Autogen: Enabling next-gen llm applications via multi-agent conversation” arXiv preprint arXiv:2308.08155, 2023.
[42] C. Qian, C. Han, Y. Fung, Y. Qin, Z. Liu, and H. Ji, “CREATOR: Tool creation for disentangling abstract and concrete reasoning of large language models” in The 2023 Conference on Empirical Methods in Natural Language Processing, 2023.
[43] Y. Wang, Z. Wu, J. Yao, and J. Su, “Tdag: A multi-agent framework based on dynamic task decomposition and agent generation” Neural Networks, p. 107200, 2025.
[44] J. Chen, S. Saha, and M. Bansal, “ReConcile: Round-table conference improves reasoning via consensus among diverse LLMs” in Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), L.-W. Ku, A. Martins, and V. Srikumar, Eds. Bangkok, Thailand: Association for Computational Linguistics, Aug. 2024, pp. 7066–7085.
[45] Y. Yang, H. Chai, S. Shao, Y. Song, S. Qi, R. Rui, and W. Zhang, “Agentnet: Decentralized evolutionary coordination for llm-based multi-agent systems” arXiv preprint arXiv:2504.00587, 2025.
[46] Z. Ji, N. Lee, R. Frieske, T. Yu, D. Su, Y. Xu, E. Ishii, Y. J. Bang, A. Madotto, and P. Fung, “Survey of hallucination in natural language generation” ACM Comput. Surv., vol. 55, no. 12, Mar. 2023.
[47] N. Reimers and I. Gurevych, “Sentence-BERT: Sentence embeddings using Siamese BERT-networks” in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Hong Kong, China: Association for Computational Linguistics, Nov. 2019, pp. 3982–3992.
[48] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettle-moyer, and V. Stoyanov, “Roberta: A robustly optimized bert pretraining approach” 2019.
[49] M. Chen, J. Tworek, H. Jun, Q. Yuan, H. P. D. O. Pinto, J. Kaplan, H. Edwards, Y. Burda, N. Joseph, G. Brockman et al., “Evaluating large language models trained on code” arXiv preprint arXiv:2107.03374, 2021.
[50] T. Y. Zhuo, M. C. Vu, J. Chim, H. Hu, W. Yu, R. Widyasari, I. N. B. Yusuf, H. Zhan, J. He, I. Paul, S. Brunner, C. Gong, J. Hoang, A. R. Zebaze, X. Hong, W.-D. Li, J. Kaddour, M. Xu, Z. Zhang, P. Yadav, and et al., “Bigcodebench: Benchmarking code generation with diverse function calls and complex instructions.” in ICLR, 2025.
-
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/100601-
dc.description.abstract程式碼生成任務被視為最具工程應用潛力的代理落地場域,對語意理解、邏輯推理與錯誤修正能力皆具高度要求。儘管大型語言模型(LLMs)在自然語言處理領域表現卓越,然而在面對結構複雜、需多階段推理的任務時,仍顯能力不足。為突破此侷限,過去研究陸續發展多代理架構,試圖透過多個代理間的互動協作來處理複雜推理任務,但往往受限於流程僵化、錯誤修正成本高與缺乏經驗累積等問題。為此,本研究提出 TALM(Tree-Structured Multi-Agent Framework with Long-Term Memory),一套結合結構化任務拆解、區域性重推理與長期記憶模組的多代理框架。TALM 採用可拓展的樹狀協作結構,配合 divide and conquer 的策略,提升應對不同任務規模時的推理彈性;並進一步利用樹狀架構明確的父子關係,在推理錯誤發生時,僅需針對特定子樹進行局部重推理,顯著提升錯誤修正效率。同時,本研究亦設計以向量資料庫為基礎的長期記憶模組,支援知識的語意查詢與整合,實現隱性的自我強化學習機制。實驗結果顯示,TALM 在 HumanEval、BigCodeBench 與 ClassEval 等程式碼生成基準資料集上皆表現優異,展現穩定的推理效能與良好的詞元效率,證實本研究所提框架在處理高複雜度任務時的實用性與推論價值。zh_TW
dc.description.abstractCode generation presents a promising domain for deploying agents in real-world engineering, requiring strong semantic understanding, reasoning, and error correction. While LLMs excel in natural language tasks, they struggle with complex, multi-step reasoning. Prior multi-agent frameworks aim to address this via collaboration but often suffer from rigid workflows, high correction costs, and lack of memory accumulation. To address these challenges, we propose TALM (Tree-Structured Multi-Agent Framework with Long-Term Memory), a framework that integrates structured task decomposition, localized re-reasoning, and long-term memory. TALM adopts an extensible tree-based collaboration structure, which, combined with a divide-and-conquer strategy, enhances reasoning flexibility across varying task scopes. The clear parent-child relationships in the tree enable efficient subtree-level error correction when reasoning flaws occur. In addition, a long-term memory module built on a vector database supports semantic querying and integration of prior knowledge, enabling implicit self-improvement through experience reuse. Experimental results on HumanEval, BigCodeBench, and ClassEval benchmarks show that TALM consistently achieves strong reasoning performance and high token efficiency, demonstrating its practical value and robustness in solving complex code generation tasks.en
dc.description.provenanceSubmitted by admin ntu (admin@lib.ntu.edu.tw) on 2025-10-08T16:04:42Z
No. of bitstreams: 0
en
dc.description.provenanceMade available in DSpace on 2025-10-08T16:04:42Z (GMT). No. of bitstreams: 0en
dc.description.tableofcontents誌謝 i
摘要 ii
Abstract iii
目次 v
圖次 viii
表次 ix
第一章 緒論 1
1.1 研究背景及動機 1
1.2 研究目的 4
1.3 論文架構 5
第二章 文獻回顧 6
2.1 大型語言模型與程式碼生成 6
2.1.1 大型語言模型 6
2.1.2 預訓練與微調 8
2.1.3 針對程式生成任務訓練的語言模型 9
2.2 提示詞框架 11
2.2.1 強化推理能力的提示詞框架 11
2.2.2 強化程式能力的提示詞框架 13
2.3 多代理協作框架 14
2.3.1 通用多代理框架 15
2.3.2 應用於程式生成任務的多代理框架 16
2.4 總結 18
第三章 研究方法 19
3.1 框架概述 19
3.2 TALM 之工作流程 21
3.3 部分重推理機制 25
3.4 長期記憶模組 29
3.5 總結 31
第四章 實驗結果 33
4.1 實驗設計 33
4.1.1 資料集 33
4.1.2 比較基準線 34
4.1.3 實驗設置 35
4.2 實驗結果 36
4.2.1 效能評估 37
4.2.2 消融實驗 41
4.2.2.1 導入長期記憶模組前後的效能比較 41
4.2.2.2 導入重推理機制前後的效能比較 42
4.2.2.3 不同樹高 m 的效能比較 43
4.2.2.4 不同分歧度 n 的效能比較 44
4.3 總結 45
第五章 結論與未來展望 47
5.1 結論 47
5.2 未來展望 48
5.3 研究限制 49
參考文獻 50
附錄 A — TALM 生成之程式碼展示與比較 58
-
dc.language.isozh_TW-
dc.subject大型語言模型,人工智慧代理,多代理框架,樹狀結構協作,zh_TW
dc.subjectLarge Language Model,AI Agent,Multi-Agent Framework,Tree-Structured Collaboration,en
dc.titleTALM:應用於可擴展程式碼生成的具長期記憶之樹狀多代理協作框架zh_TW
dc.titleTALM: Tree-Structured Multi-Agent Framework with Long-Term Memory for Scalable Code Generationen
dc.typeThesis-
dc.date.schoolyear113-2-
dc.description.degree碩士-
dc.contributor.oralexamcommittee魏志平;陳建錦;林俊叡;楊立偉zh_TW
dc.contributor.oralexamcommitteeChih-Ping Wei;Chien-Chin Chen;June-Ray Lin;Li-wei Yangen
dc.relation.page63-
dc.identifier.doi10.6342/NTU202504045-
dc.rights.note未授權-
dc.date.accepted2025-08-13-
dc.contributor.author-college管理學院-
dc.contributor.author-dept資訊管理學系-
dc.date.embargo-liftN/A-
顯示於系所單位:資訊管理學系

文件中的檔案:
檔案 大小格式 
ntu-113-2.pdf
  未授權公開取用
1.41 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved