Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98316
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor陳信希zh_TW
dc.contributor.advisorHsin-Hsi Chenen
dc.contributor.author張庭維zh_TW
dc.contributor.authorTing-Wei Changen
dc.date.accessioned2025-08-01T16:11:31Z-
dc.date.available2025-08-02-
dc.date.copyright2025-08-01-
dc.date.issued2025-
dc.date.submitted2025-07-26-
dc.identifier.citationAbdelrahman Abouelenin, Atabak Ashfaq, Adam Atkinson, Hany Awadalla, Nguyen Bach, Jianmin Bao, Alon Benhaim, Martin Cai, Vishrav Chaudhary, Congcong Chen, et al. Phi-4-mini technical report: Compact yet powerful multimodal language models via mixture-of-loras. arXiv preprint arXiv:2503.01743, 2025.
Shengnan An, Zexiong Ma, Zeqi Lin, Nanning Zheng, Jian-Guang Lou, and Weizhu Chen. Learning from mistakes makes llm better reasoner. arXiv preprint arXiv:2310.20689, 2023.
Po-Chun Chen, Sheng-Lun Wei, Hen-Hsen Huang, and Hsin-Hsi Chen. Induct-learn: Short phrase prompting with instruction induction. In Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen, editors, Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 5204–5231, Miami, Florida, USA, November 2024. Association for Computational Linguistics. doi: 10.18653/v1/2024. emnlp-main.297. URL https://aclanthology.org/2024.emnlp-main.297/.
Shangheng Du, Jiabao Zhao, Jinxin Shi, Zhentao Xie, Xin Jiang, Yanhong Bai, and Liang He. A survey on the optimization of large language model-based agents. arXiv preprint arXiv:2503.12434, 2025.
Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad AlDahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, et al. The llama 3 herd of models. arXiv preprint arXiv:2407.21783, 2024.
Arsene Fansi Tchango, Rishab Goel, Zhi Wen, Julien Martel, and Joumana Ghosn. Ddxplus: A new dataset for automatic medical diagnosis. Advances in neural information processing systems, 35:31306–31318, 2022.
Xiang Gao and Kamalika Das. Customizing language model responses with contrastive in-context learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 18039–18046, 2024.
Google Gemini Team. Gemini: A family of highly capable multimodal models. ArXiv, abs/2312.11805, 2023. URL https://arxiv.org/pdf/2312.11805.
Jianping Gou, Baosheng Yu, Stephen J Maybank, and Dacheng Tao. Knowledge distillation: A survey. International Journal of Computer Vision, 129(6):1789–1819, 2021.
Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015.
Steven CH Hoi, Doyen Sahoo, Jing Lu, and Peilin Zhao. Online learning: A comprehensive survey. Neurocomputing, 459:249–289, 2021.
Damjan Kalajdzievski. Scaling laws for forgetting when fine-tuning large language models. arXiv preprint arXiv:2401.05605, 2024.
Takeshi Kojima, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo, and Yusuke Iwasawa. Large language models are zero-shot reasoners. Advances in neural information processing systems, 35:22199–22213, 2022.
Yuhang Lai, Chengxi Li, Yiming Wang, Tianyi Zhang, Ruiqi Zhong, Luke Zettlemoyer, Wen-tau Yih, Daniel Fried, Sida Wang, and Tao Yu. Ds-1000: A natural and reliable benchmark for data science code generation. In International Conference on Machine Learning, pages 18319–18345. PMLR, 2023.
Hongyu Li, Liang Ding, Meng Fang, and Dacheng Tao. Revisiting catastrophic forgetting in large language model tuning. In Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen, editors, Findings of the Association for Computational Linguistics: EMNLP 2024, pages 4297–4308, Miami, Florida, USA, November 2024. Association for Computational Linguistics. doi: 10.18653/v1/2024.findings-emnlp.249. URL https://aclanthology.org/2024.findings-emnlp.249/.
Jinyang Li, Binyuan Hui, Ge Qu, Jiaxi Yang, Binhua Li, Bowen Li, Bailin Wang, Bowen Qin, Ruiying Geng, Nan Huo, et al. Can llm already serve as a database interface? a big bench for large-scale database grounded text-to-sqls. Advances in Neural Information Processing Systems, 36:42330–42357, 2023.
Xuechen Liang, Yangfan He, Yinghui Xia, Xinyuan Song, Jianhui Wang, Meiling Tao, Li Sun, Xinhang Yuan, Jiayi Su, Keqin Li, et al. Self-evolving agents with reflective and memory-augmented abilities. arXiv preprint arXiv:2409.00872, 2024.
Aman Madaan, Niket Tandon, Peter Clark, and Yiming Yang. Memory-assisted prompt editing to improve GPT-3 after deployment. In Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang, editors, Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 2833–2861, Abu Dhabi, United Arab Emirates, December 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022. emnlp-main.183. URL https://aclanthology.org/2022.emnlp-main.183/.
Aman Madaan, Niket Tandon, Prakhar Gupta, Skyler Hallinan, Luyu Gao, Sarah Wiegreffe, Uri Alon, Nouha Dziri, Shrimai Prabhumoye, Yiming Yang, et al. Self-refine: Iterative refinement with self-feedback. Advances in Neural Information Processing Systems, 36:46534–46594, 2023.
Meta AI Team. The llama 4 herd: The beginning of a new era of natively multimodal ai innovation, 2025. URL https://ai.meta.com/blog/llama-4-multimodal-intelligence.
Sewon Min, Xinxi Lyu, Ari Holtzman, Mikel Artetxe, Mike Lewis, Hannaneh Hajishirzi, and Luke Zettlemoyer. Rethinking the role of demonstrations: What makes in-context learning work? arXiv preprint arXiv:2202.12837, 2022.
Mistral AI Team. Medium is the new large., 2025a. URL https://mistral.ai/news/mistral-medium-3. Section: news.
Mistral AI Team. Mistral small 3.1, 2025b. URL https://mistral.ai/news/mistral-small-3-1. Section: news.
Venkatesh Balavadhani Parthasarathy, Ahtsham Zafar, Aafaq Khan, and Arsalan Shahid. The ultimate guide to fine-tuning llms from basics to breakthroughs: An exhaustive review of technologies, research, best practices, applied research challenges and opportunities. arXiv preprint arXiv:2408.13296, 2024.
Rafael Rafailov, Archit Sharma, Eric Mitchell, Christopher D Manning, Stefano Ermon, and Chelsea Finn. Direct preference optimization: Your language model is secretly a reward model. Advances in Neural Information Processing Systems, 36:53728–53741, 2023.
Amal Rannen-Triki, Jorg Bornschein, Razvan Pascanu, Marcus Hutter, Andras György, Alexandre Galashov, Yee Whye Teh, and Michalis K Titsias. Revisiting dynamic evaluation: Online adaptation for large language models. arXiv preprint arXiv:2403.01518, 2024.
Matthew Renze and Erhan Guven. Self-reflection in llm agents: Effects on problemsolving performance. arXiv preprint arXiv:2405.06682, 2024.
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. Reflexion: Language agents with verbal reinforcement learning. Advances in Neural Information Processing Systems, 36:8634–8652, 2023.
Shezheng Song, Hao Xu, Jun Ma, Shasha Li, Long Peng, Qian Wan, Xiaodong Liu, and Jie Yu. How to complete domain tuning while keeping general ability in llm: Adaptive layer-wise and element-wise regularization. arXiv preprint arXiv:2501.13669, 2025.
Mirac Suzgun, Mert Yuksekgonul, Federico Bianchi, Dan Jurafsky, and James Zou. Dynamic cheatsheet: Test-time learning with adaptive memory. arXiv preprint arXiv:2504.07952, 2025.
Gemma Team. Gemma 3 technical report, 2025. URL https://arxiv.org/abs/2503. 19786.
Danqing Wang and Lei Li. Learn from mistakes through cooperative interaction with study assistant. The 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP) 2023, 2023.
Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdhery, and Denny Zhou. Self-consistency improves chain of thought reasoning in language models. arXiv preprint arXiv:2203.11171, 2022a.
Yihan Wang, Si Si, Daliang Li, Michal Lukasik, Felix Yu, Cho-Jui Hsieh, Inderjit S Dhillon, and Sanjiv Kumar. Two-stage llm fine-tuning with less specialization and more generalization. arXiv preprint arXiv:2211.00635, 2022b.
Jerry Wei, Jason Wei, Yi Tay, Dustin Tran, Albert Webson, Yifeng Lu, Xinyun Chen, Hanxiao Liu, Da Huang, Denny Zhou, et al. Larger language models do in-context learning differently. arXiv preprint arXiv:2303.03846, 2023.
Cheng-Kuang Wu, Zhi Rui Tam, Chieh-Yen Lin, Yun-Nung Vivian Chen, and Hungyi Lee. Streambench: Towards benchmarking continuous improvement of language agents. Advances in Neural Information Processing Systems, 37:107039–107063, 2024.
Shitao Xiao, Zheng Liu, Peitian Zhang, and Niklas Muennighoff. C-pack: Packaged resources to advance general chinese embedding, 2023.
Zhilin Yang, Peng Qi, Saizheng Zhang, Yoshua Bengio, William W Cohen, Ruslan Salakhutdinov, and Christopher D Manning. Hotpotqa: A dataset for diverse, explainable multi-hop question answering. arXiv preprint arXiv:1809.09600, 2018.
Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Tom Griffiths, Yuan Cao, and Karthik Narasimhan. Tree of thoughts: Deliberate problem solving with large language models. Advances in neural information processing systems, 36:11809–11822, 2023.
Tao Yu, Rui Zhang, Kai Yang, Michihiro Yasunaga, Dongxu Wang, Zifan Li, James Ma, Irene Li, Qingning Yao, Shanelle Roman, et al. Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-sql task. arXiv preprint arXiv:1809.08887, 2018.
Tao Yu, Rui Zhang, He Yang Er, Suyi Li, Eric Xue, Bo Pang, Xi Victoria Lin, Yi Chern Tan, Tianze Shi, Zihan Li, et al. Cosql: A conversational text-to-sql challenge towards cross domain natural language interfaces to databases. arXiv preprint arXiv:1909.05378, 2019.
Tianjun Zhang, Aman Madaan, Luyu Gao, Steven Zheng, Swaroop Mishra, Yiming Yang, Niket Tandon, and Uri Alon. In-context principle learning from mistakes. arXiv preprint arXiv:2402.05403, 2024a.
Wenqi Zhang, Ke Tang, Hai Wu, Mengna Wang, Yongliang Shen, Guiyang Hou, Zeqi Tan, Peng Li, Yueting Zhuang, and Weiming Lu. Agent-pro: Learning to evolve via policy-level reflection and optimization. arXiv preprint arXiv:2402.17574, 2024b.
Terry Yue Zhuo, Minh Chien Vu, Jenny Chim, Han Hu, Wenhao Yu, Ratnadira Widyasari, Imam Nur Bani Yusuf, Haolan Zhan, Junda He, Indraneil Paul, et al. Bigcodebench: Benchmarking code generation with diverse function calls and complex instructions. arXiv preprint arXiv:2406.15877, 2024.
-
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98316-
dc.description.abstract大型語言模型在多種領域皆展現出卓越表現,但如何讓模型持續適應不斷變化的任務與環境仍是一項關鍵挑戰。現有的記憶增強與回饋驅動方法雖可促使大型語言模型隨時間進步,但往往受限於靜態策略或回饋利用效率不足。針對此問題,本論文提出動態檢索式策略生成 Dynamic Retrieval-based Policy Generation (DRPG)架構,結合記憶檢索與動態策略生成器,能整合歷史資料與環境回饋,持續增強大型語言模型在各類任務下的適應與表現。我們在多個標準資料集(涵蓋 Text-to-SQL、多步驟問答、醫學診斷與 Python 程式設計)以及多種主流大型語言模型上,系統性驗證 DRPG 的成效。實驗結果顯示,DRPG 在大多數資料集和模型上皆顯著優於 Self-StreamICL、Self-Refine 等強力基線方法。進一步分析亦發現,DRPG 所生成之策略具備可轉移性,對模型更換具有強健性,且即使僅以少數範例也能維持高效能。本論文結果彰顯動態策略生成在真實線上情境下,成為語言智慧型代理人持續學習與自我改進的通用機制之潛力。zh_TW
dc.description.abstractLarge Language Models (LLMs) have achieved remarkable progress across diverse domains, but continual adaptation to evolving tasks and environments remains a key challenge. Existing memory-augmented and feedback-driven approaches enable LLMs to improve over time, but are often limited by static policies or inefficient usage of feedback. In this thesis, we propose Dynamic Retrieval-based Policy Generation (DRPG), a novel framework that integrates memory-based retrieval with a dynamic policy generator, leveraging both historical data and environment feedback to continually enhance LLM performance. We systematically evaluate DRPG on a wide range of benchmarks—including text-to-SQL, multi-hop question answering, medical diagnosis, and Python programming—using LLMs from different providers. Experimental results show that DRPG consistently outperforms strong baselines such as Self-StreamICL and Self-Refine across most datasets and models. Further analysis demonstrates that the policies generated by DRPG are transferable, robust to model changes, and effective with fewer few-shot examples. Our findings highlight the potential of dynamic policy generation as a general mechanism for adaptive, self-improving language agents in real-world online settings.en
dc.description.provenanceSubmitted by admin ntu (admin@lib.ntu.edu.tw) on 2025-08-01T16:11:31Z
No. of bitstreams: 0
en
dc.description.provenanceMade available in DSpace on 2025-08-01T16:11:31Z (GMT). No. of bitstreams: 0en
dc.description.tableofcontentsAcknowledgements i
摘要 iii
Abstract iv
Contents vi
List of Figures ix
List of Tables xi
Chapter 1 Introduction 1
1.1 Motivation 1
1.2 Thesis Organization 3
Chapter 2 Related Work 4
2.1 LLMs Improving in Online Settings 4
2.1.1 Parameter Update based Approaches 4
2.1.2 Memory-based Approaches 4
2.2 LLMs Improving in Offline or Non-Streaming Settings 6
2.2.1 Fine-tuning and RL-based Approaches 6
2.2.2 Concept Generation from Data 7
2.2.3 Self-Reflection and Iterative Feedback 8
Chapter 3 Methodology 10
3.1 Dynamic Retrieval-based Policy Generation (DRPG) Framework 10
3.2 Prompt design 12
3.2.1 Prompt templates for the Agent 13
3.2.2 Prompt templates for Self-Refine 14
3.2.3 Prompt templates of Policy Generator 15
Chapter 4 Experimental Setup 17
4.1 Benchmarks 17
4.2 Evaluation Metrics 18
4.3 Compared Approaches 19
4.4 Models 20
4.5 Parameters 21
Chapter 5 Results and Discussion 22
5.1 Main Results 22
5.2 Ablation Studies on the Policy Generator 27
5.3 DRPG Compatibility with Agent Types 30
5.4 Teacher-Student Setting in Agent and Policy Generator 34
5.5 Memory Transferability across models 35
Chapter 6 Conclusion, Limitations and Future Work 37
6.1 Conclusion 37
6.2 Limitation and Future Work 38
Appendix A — Prompt design across different model 40
A.1 Prompt templates for the Agent 40
A.2 Prompt templates of Policy Generator 42
References 47
-
dc.language.isoen-
dc.subject大型語言模型zh_TW
dc.subject持續適應zh_TW
dc.subject線上適應zh_TW
dc.subject串流學習zh_TW
dc.subjectOnline Adaptationen
dc.subjectLarge Language Modelsen
dc.subjectStreamingen
dc.subjectContinual Adaptationen
dc.title結合動態策略生成於環境驅動學習下大型語言模型持續適應能力之研究zh_TW
dc.titleContinual Adaptation of Large Language Models through Environment-Driven Learning with Dynamic Policy Generationen
dc.typeThesis-
dc.date.schoolyear113-2-
dc.description.degree碩士-
dc.contributor.oralexamcommittee鄭卜壬;陳建錦;古倫維zh_TW
dc.contributor.oralexamcommitteePu-Jen Cheng;Chien-Chin Chen;Lun-Wei Kuen
dc.subject.keyword大型語言模型,持續適應,線上適應,串流學習,zh_TW
dc.subject.keywordLarge Language Models,Continual Adaptation,Online Adaptation,Streaming,en
dc.relation.page53-
dc.identifier.doi10.6342/NTU202502590-
dc.rights.note未授權-
dc.date.accepted2025-07-28-
dc.contributor.author-college電機資訊學院-
dc.contributor.author-dept資訊工程學系-
dc.date.embargo-liftN/A-
顯示於系所單位:資訊工程學系

文件中的檔案:
檔案 大小格式 
ntu-113-2.pdf
  未授權公開取用
1.12 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved