利用合作轉換器來學習情境合作

王懷志; Huai-Chih Wang

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98384

標題:	利用合作轉換器來學習情境合作 Learning to Coordinate In-Context with Coordination Transformers
作者:	王懷志 Huai-Chih Wang
指導教授:	孫紹華 Shao-Hua Sun
關鍵字:	情境學習,情境強化學習,多智能體合作,人機合作,決策用轉換器, In-Context Learning,In-Context Reinforcement Learning,Multi-Agent Coordination,Human-AI Collaboration,Transformers for Decision Making,
出版年 :	2025
學位:	碩士
摘要:	在動態且具有不確定性的環境中實現人工智能體之間的有效協作，仍然是多智能體系統中的一大挑戰。現有方法，如自我對弈（self-play）與族群式訓練（population-based methods），要不是無法適應陌生合作對象的行為模式，就是需要極為大量的訓練次數。為了解決這些限制，我們提出了一種新的情境式協作框架——Coot（Coordination Transformers），該方法利用近期的互動歷史，能快速適應未見過的協作對象。與過往著重於增加訓練對象多樣性的做法不同，Coot 明確地聚焦於根據觀察到的夥伴行為進行動作預測，進而達到對新夥伴行為的適應。我們使用行為互補的智能體所產生的互動軌跡進行訓練，使得 Coot 能在無需額外監督或微調訓練的情況下，快速學會有效的協作策略。在 Overcooked 環境中的實驗顯示，Coot 在與陌生夥伴協作的任務中，明顯優於其他基準方法。人類評估也進一步證實 Coot 是最有效的協作夥伴；而大量的消融實驗則凸顯了其在多智能體場景中的韌性、靈活性與對情境的敏感度。 Effective coordination among artificial agents in dynamic and uncertain environments remains a significant challenge in multi-agent systems. Existing approaches, such as self-play and population-based methods, either generalize poorly to unseen partners or require impractically extensive training. To overcome these limitations, we propose Coordination Transformers (Coot), a novel in-context coordination framework that uses recent interaction histories to rapidly adapt to unseen partners. Unlike previous approaches that primarily aim to increase the diversity of training partners, Coot explicitly focuses on adapting to new partner behaviors by predicting actions aligned with observed partner interactions. Trained on interaction trajectories collected from diverse pairs of agents with complementary behaviors, \coot quickly learns effective coordination strategies without explicit supervision or fine-tuning. Evaluations on the Overcooked benchmark demonstrate that \coot significantly outperforms baseline methods in coordination tasks involving previously unseen partners. Human evaluations further confirm \coot as the most effective collaborative partner, while extensive ablations highlight its robustness, flexibility, and sensitivity to context in multi-agent scenarios.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98384
DOI:	10.6342/NTU202502134
全文授權:	同意授權(限校園內公開)
電子全文公開日期:	2030-07-21
顯示於系所單位：	電信工程學研究所

文件中的檔案：

檔案	大小	格式
ntu-113-2.pdf 未授權公開取用	8.39 MB	Adobe PDF	檢視/開啟

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。