結合動態策略生成於環境驅動學習下大型語言模型持續適應能力之研究

張庭維; Ting-Wei Chang

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98316

標題:	結合動態策略生成於環境驅動學習下大型語言模型持續適應能力之研究 Continual Adaptation of Large Language Models through Environment-Driven Learning with Dynamic Policy Generation
作者:	張庭維 Ting-Wei Chang
指導教授:	陳信希 Hsin-Hsi Chen
關鍵字:	大型語言模型,持續適應,線上適應,串流學習, Large Language Models,Continual Adaptation,Online Adaptation,Streaming,
出版年 :	2025
學位:	碩士
摘要:	大型語言模型在多種領域皆展現出卓越表現，但如何讓模型持續適應不斷變化的任務與環境仍是一項關鍵挑戰。現有的記憶增強與回饋驅動方法雖可促使大型語言模型隨時間進步，但往往受限於靜態策略或回饋利用效率不足。針對此問題，本論文提出動態檢索式策略生成 Dynamic Retrieval-based Policy Generation (DRPG)架構，結合記憶檢索與動態策略生成器，能整合歷史資料與環境回饋，持續增強大型語言模型在各類任務下的適應與表現。我們在多個標準資料集（涵蓋 Text-to-SQL、多步驟問答、醫學診斷與 Python 程式設計）以及多種主流大型語言模型上，系統性驗證 DRPG 的成效。實驗結果顯示，DRPG 在大多數資料集和模型上皆顯著優於 Self-StreamICL、Self-Refine 等強力基線方法。進一步分析亦發現，DRPG 所生成之策略具備可轉移性，對模型更換具有強健性，且即使僅以少數範例也能維持高效能。本論文結果彰顯動態策略生成在真實線上情境下，成為語言智慧型代理人持續學習與自我改進的通用機制之潛力。 Large Language Models (LLMs) have achieved remarkable progress across diverse domains, but continual adaptation to evolving tasks and environments remains a key challenge. Existing memory-augmented and feedback-driven approaches enable LLMs to improve over time, but are often limited by static policies or inefficient usage of feedback. In this thesis, we propose Dynamic Retrieval-based Policy Generation (DRPG), a novel framework that integrates memory-based retrieval with a dynamic policy generator, leveraging both historical data and environment feedback to continually enhance LLM performance. We systematically evaluate DRPG on a wide range of benchmarks—including text-to-SQL, multi-hop question answering, medical diagnosis, and Python programming—using LLMs from different providers. Experimental results show that DRPG consistently outperforms strong baselines such as Self-StreamICL and Self-Refine across most datasets and models. Further analysis demonstrates that the policies generated by DRPG are transferable, robust to model changes, and effective with fewer few-shot examples. Our findings highlight the potential of dynamic policy generation as a general mechanism for adaptive, self-improving language agents in real-world online settings.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98316
DOI:	10.6342/NTU202502590
全文授權:	未授權
電子全文公開日期:	N/A
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-113-2.pdf 未授權公開取用	1.12 MB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。