結合動態策略生成於環境驅動學習下大型語言模型持續適應能力之研究

張庭維; Ting-Wei Chang

Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98316

Title:	結合動態策略生成於環境驅動學習下大型語言模型持續適應能力之研究 Continual Adaptation of Large Language Models through Environment-Driven Learning with Dynamic Policy Generation
Authors:	張庭維 Ting-Wei Chang
Advisor:	陳信希 Hsin-Hsi Chen
Keyword:	大型語言模型,持續適應,線上適應,串流學習, Large Language Models,Continual Adaptation,Online Adaptation,Streaming,
Publication Year :	2025
Degree:	碩士
Abstract:	大型語言模型在多種領域皆展現出卓越表現，但如何讓模型持續適應不斷變化的任務與環境仍是一項關鍵挑戰。現有的記憶增強與回饋驅動方法雖可促使大型語言模型隨時間進步，但往往受限於靜態策略或回饋利用效率不足。針對此問題，本論文提出動態檢索式策略生成 Dynamic Retrieval-based Policy Generation (DRPG)架構，結合記憶檢索與動態策略生成器，能整合歷史資料與環境回饋，持續增強大型語言模型在各類任務下的適應與表現。我們在多個標準資料集（涵蓋 Text-to-SQL、多步驟問答、醫學診斷與 Python 程式設計）以及多種主流大型語言模型上，系統性驗證 DRPG 的成效。實驗結果顯示，DRPG 在大多數資料集和模型上皆顯著優於 Self-StreamICL、Self-Refine 等強力基線方法。進一步分析亦發現，DRPG 所生成之策略具備可轉移性，對模型更換具有強健性，且即使僅以少數範例也能維持高效能。本論文結果彰顯動態策略生成在真實線上情境下，成為語言智慧型代理人持續學習與自我改進的通用機制之潛力。 Large Language Models (LLMs) have achieved remarkable progress across diverse domains, but continual adaptation to evolving tasks and environments remains a key challenge. Existing memory-augmented and feedback-driven approaches enable LLMs to improve over time, but are often limited by static policies or inefficient usage of feedback. In this thesis, we propose Dynamic Retrieval-based Policy Generation (DRPG), a novel framework that integrates memory-based retrieval with a dynamic policy generator, leveraging both historical data and environment feedback to continually enhance LLM performance. We systematically evaluate DRPG on a wide range of benchmarks—including text-to-SQL, multi-hop question answering, medical diagnosis, and Python programming—using LLMs from different providers. Experimental results show that DRPG consistently outperforms strong baselines such as Self-StreamICL and Self-Refine across most datasets and models. Further analysis demonstrates that the policies generated by DRPG are transferable, robust to model changes, and effective with fewer few-shot examples. Our findings highlight the potential of dynamic policy generation as a general mechanism for adaptive, self-improving language agents in real-world online settings.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98316
DOI:	10.6342/NTU202502590
Fulltext Rights:	未授權
metadata.dc.date.embargo-lift:	N/A
Appears in Collections:	資訊工程學系

Files in This Item:

File	Size	Format
ntu-113-2.pdf Restricted Access	1.12 MB	Adobe PDF

Show full item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets