Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 電信工程學研究所
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98384
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor孫紹華zh_TW
dc.contributor.advisorShao-Hua Sunen
dc.contributor.author王懷志zh_TW
dc.contributor.authorHuai-Chih Wangen
dc.date.accessioned2025-08-05T16:09:29Z-
dc.date.available2025-08-06-
dc.date.copyright2025-08-05-
dc.date.issued2025-
dc.date.submitted2025-07-28-
dc.identifier.citation[1] M. Bain and C. Sammut. A framework for behavioural cloning. In Machine Intelligence 15, 1995.
[2] D. S. Bernstein, R. Givan, N. Immerman, and S. Zilberstein. The complexity of decentralized control of markov decision processes. In Mathematics of operations research, 2002.
[3] T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, et al. Language models are few-shot learners. In Neural Information Processing Systems, 2020.
[4] Y. Cao, W. Yu, W. Ren, and G. Chen. An overview of recent progress in the study of distributed multi-agent coordination. IEEE Transactions on Industrial informatics, 9, 2012.
[5] M. Carroll, R. Shah, M. K. Ho, T. Griffiths, S. Seshia, P. Abbeel, and A. Dra- gan. On the utility of learning about humans for human-ai coordination. In Neural Information Processing Systems, 2019.
[6] R. Charakorn, P. Manoonpong, and N. Dilokthanakul. Generating diverse cooperative agents by learning incompatible policies. In International Conference on Learning Representations, 2023.
[7] L. Chen, K. Lu, A. Rajeswaran, K. Lee, A. Grover, M. Laskin, P. Abbeel, A. Srinivas, and I. Mordatch. Decision transformer: Reinforcement learning via sequence modeling. In Neural Information Processing Systems, 2021.
[8] Z. Dai, F. Tomasi, and S. Ghiassian. In-context exploration-exploitation for reinforcement learning. In International Conference on Learning Representations, 2024.
[9] Y. Duan, J. Schulman, X. Chen, P. L. Bartlett, I. Sutskever, and P. Abbeel. Rl ^2: Fast reinforcement learning via slow reinforcement learning. arXiv preprint arXiv:1611.02779, 2016.
[10] C. Finn, P. Abbeel, and S. Levine. Model-agnostic meta-learning for fast adaptation of deep networks. In International Conference on Machine Learning, 2017.
[11] A. Gupta, C. Devin, Y. Liu, P. Abbeel, and S. Levine. Learning invariant feature spaces to transfer skills with reinforcement learning. International Conference on Learning Representations, 2017.

[12] H. Hu, A. Lerer, B. Cui, L. Pineda, N. Brown, and J. Foerster. Off-belief learning. In International Conference on Machine Learning, 2021.
[13] H. Hu, A. Lerer, A. Peysakhovich, and J. Foerster. “other-play" for zero-shot coordination. In International Conference on Machine Learning, 2020.
[14] M. Jaderberg, V. Dalibard, S. Osindero, W. M. Czarnecki, J. Donahue, A. Razavi, O. Vinyals, T. Green, I. Dunning, K. Simonyan, et al. Population-based training of neural networks. arXiv preprint arXiv:1711.09846, 2017.
[15] L. Kirsch, J. Harrison, C. Freeman, J. Sohl-Dickstein, and J. Schmidhuber. Towards general-purpose in-context learning agents. In NeurIPS 2023 Foundation Models for Decision Making Workshop, 2023.
[16] L. Kirsch, J. Harrison, C. D. Freeman, J. Sohl-Dickstein, and J. Schmidhuber. Towards general-purpose in-context learning agents. In NeurIPS 2023 Workshop on Distribution Shifts: New Frontiers with Foundation Models, 2024.
[17] A. Kulesza, B. Taskar, et al. Determinantal point processes for machine learning. Foundations and Trends® in Machine Learning, 2012.
[18] M. Laskin, L. Wang, J. Oh, E. Parisotto, S. Spencer, R. Steigerwald, D. Strouse, S. S. Hansen, A. Filos, E. Brooks, Maxime Gazeau, H. Sahni, S. Singh, and V. Mnih. In-context reinforcement learning with algorithm distillation. In International Conference on Learning Representations, 2023.
[19] N. Lauffer, A. Shah, M. Carroll, M. D. Dennis, and S. Russell. Who needs to know? minimal knowledge for optimal coordination. In International Conference on Machine Learning, 2023.
[20] J. Lee, A. Xie, A. Pacchiano, Y. Chandak, C. Finn, O. Nachum, and E. Brun- skill. Supervised pretraining can learn in-context reinforcement learning. In Neural Information Processing Systems, 2023.
[21] K.-H. Lee, O. Nachum, M. S. Yang, L. Lee, D. Freeman, S. Guadarrama, I. Fischer, W. Xu, E. Jang, H. Michalewski, et al. Multi-game decision transformers. In Neural Information Processing Systems, 2022.
[22] Y. Li, M. E. Ildiz, D. Papailiopoulos, and S. Oymak. Transformers as algorithms: Generalization and stability in in-context learning. In International Conference on Machine Learning, 2023.
[23] Y. Li, S. Zhang, J. Sun, Y. Du, Y. Wen, X. Wang, and W. Pan. Cooperative open-ended learning framework for zero-shot coordination. In International Conference on Machine Learning, 2023.
[24] Q. Long*, Z. Zhou*, A. Gupta, F. Fang, Y. Wu†, and X. Wang†. Evolutionary population curriculum for scaling multi-agent reinforcement learning. In International Conference on Learning Representations, 2020.
[25] X. Lou, J. Guo, J. Zhang, J. Wang, K. Huang, and Y. Du. Pecan: Leveraging policy ensemble for context-aware zero-shot human-ai coordination. In Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems, 2023.
[26] K. Lucas and R. E. Allen. Any-play: An intrinsic augmentation for zero-shot coordination. In Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems, 2022.
[27] A. Lupu, B. Cui, H. Hu, and J. Foerster. Trajectory diversity for zero-shot coordina- tion. In International Conference on Machine Learning, 2021.
[28] L. Matignon, G. J. Laurent, and N. Le Fort-Piat. Independent reinforcement learn- ers in cooperative markov games: a survey regarding coordination problems. The Knowledge Engineering Review, 27, 2012.
[29] H.Nekoei,X.Zhao,J.Rajendran,M.Liu,andS.Chandar.Towardsfew-shotcoordi- nation: Revisiting ad-hoc teamplay challenge in the game of hanabi. In Conference on Lifelong Learning Agents, 2023.
[30] K. Rakelly, A. Zhou, C. Finn, S. Levine, and D. Quillen. Efficient off-policy meta-reinforcement learning via probabilistic context variables. In International Conference on Machine Learning, 2019.
[31] J. Rothfuss, D. Lee, I. Clavera, T. Asfour, and P. Abbeel. ProMP: Proximal meta- policy search. In International Conference on Learning Representations, 2019.
[32] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
[33] P. Stone, G. Kaminka, S. Kraus, and J. Rosenschein. Ad hoc autonomous agent teams: Collaboration without pre-coordination. In Proceedings of the AAAI Conference on Artificial Intelligence, 2010.
[34] D. Strouse, K. McKee, M. Botvinick, E. Hughes, and R. Everett. Collaborating with humans without human data. In Neural Information Processing Systems, 2021.
[35] G. Tesauro. Td-gammon, a self-teaching backgammon program, achieves master- level play. Neural computation, 6, 1994.
[36] J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel. Domain ran- domization for transferring deep neural networks from simulation to the real world. In Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, 2017.
[37] J. Von Oswald, E. Niklasson, E. Randazzo, J. Sacramento, A. Mordvintsev, A. Zhmoginov, and M. Vladymyrov. Transformers learn in-context by gradient descent. In International Conference on Machine Learning, 2023.
[38] X.Wang, Z.Tian, Z.Wan, Y.Wen, J.Wang, and W.Zhang.Order matters: Agent-by-agent policy optimization. In International Conference on Learning Representations, 2023.
[39] X. Wang, S. Zhang, W. Zhang, W. Dong, J. Chen, Y. Wen, and W. Zhang. Zsc- eval: An evaluation toolkit and benchmark for multi-agent zero-shot. In Neural Information Processing Systems, 2024.
[40] J.Wei, X.Wang, D.Schuurmans, M.Bosma, F.Xia, E.Chi, Q.V.Le, D.Zhou, et al. Chain-of-thought prompting elicits reasoning in large language models. In Neural Information Processing Systems, 2022.
[41] Z.Xie, Z.Lin, D.Ye, Q.Fu, Y.Wei,andS.Li.Future-conditioned unsupervised pre-training for decision transformer. In International Conference on Machine Learning, 2023.
[42] J. Xing, T. Nagata, K. Chen, X. Zou, E. Neftci, and J. L. Krichmar. Domain adaptation in reinforcement learning via latent unified state representation. In Proceedings of the AAAI Conference on Artificial Intelligence, 2021.
[43] K. Xue, Y. Wang, C. Guan, L. Yuan, H. Fu, Q. Fu, C. Qian, and Y. Yu. Hetero- geneous multi-agent zero-shot coordination by coevolution. IEEE Transactions on Evolutionary Computation, 2024.
[44] Z. Yan, N. Jouandeau, and A. A. Cherif. A survey and analysis of multi-robot coordination. International Journal of Advanced Robotic Systems, 10, 2013.
[45] C. Yu, J. Gao, W. Liu, B. Xu, H. Tang, J. Yang, Y. Wang, and Y. Wu. Learning zero-shot cooperation with humans, assuming humans are biased. In International Conference on Learning Representations, 2023.
[46] C. Yu, A. Velu, E. Vinitsky, J. Gao, Y. Wang, A. Bayen, and Y. Wu. The surpris- ing effectiveness of ppo in cooperative multi-agent games. In Neural Information Processing Systems, 2022.
[47] C. Yu, A. Velu, E. Vinitsky, J. Gao, Y. Wang, A. Bayen, and Y. Wu. Learning zero-shot cooperation with humans, assuming humans are biased. In International Conference on Learning Representations, 2023.
[48] J. Yu, Y. Xu, J. Y. Koh, T. Luong, G. Baid, Z. Wang, V. Vasudevan, A. Ku, Y. Yang, B. K. Ayan, et al. Scaling autoregressive models for content-rich text-to-image gen- eration. arXiv preprint arXiv:2206.10789, 2022.
[49] A. Zhang, R. McAllister, R. Calandra, Y. Gal, and S. Levine. Learning invariant representations for reinforcement learning without reconstruction. arXiv preprint arXiv:2006.10742, 2020.
[50] R. Zhao, J. Song, Y. Yuan, H. Hu, Y. Gao, Y. Wu, Z. Sun, and W. Yang. Max- imum entropy population-based training for zero-shot human-ai coordination. In Proceedings of the AAAI Conference on Artificial Intelligence, 2023.
[51] L. Zintgraf, S. Schulze, C. Lu, L. Feng, M. Igl, K. Shiarlis, Y. Gal, K. Hofmann, and S. Whiteson. Varibad: variational bayes-adaptive deep rl via meta-learning. J. Mach. Learn. Res., 22(1), Jan. 2021.
-
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98384-
dc.description.abstract在動態且具有不確定性的環境中實現人工智能體之間的有效協作,仍然是多智能體系統中的一大挑戰。現有方法,如自我對弈(self-play)與族群式訓練(population-based methods),要不是無法適應陌生合作對象的行為模式,就是需要極為大量的訓練次數。為了解決這些限制,我們提出了一種新的情境式協作框架——Coot(Coordination Transformers),該方法利用近期的互動歷史,能快速適應未見過的協作對象。與過往著重於增加訓練對象多樣性的做法不同,Coot 明確地聚焦於根據觀察到的夥伴行為進行動作預測,進而達到對新夥伴行為的適應。我們使用行為互補的智能體所產生的互動軌跡進行訓練,使得 Coot 能在無需額外監督或微調訓練的情況下,快速學會有效的協作策略。在 Overcooked 環境中的實驗顯示,Coot 在與陌生夥伴協作的任務中,明顯優於其他基準方法。人類評估也進一步證實 Coot 是最有效的協作夥伴;而大量的消融實驗則凸顯了其在多智能體場景中的韌性、靈活性與對情境的敏感度。zh_TW
dc.description.abstractEffective coordination among artificial agents in dynamic and uncertain environments remains a significant challenge in multi-agent systems. Existing approaches, such as self-play and population-based methods, either generalize poorly to unseen partners or require impractically extensive training. To overcome these limitations, we propose Coordination Transformers (Coot), a novel in-context coordination framework that uses recent interaction histories to rapidly adapt to unseen partners. Unlike previous approaches that primarily aim to increase the diversity of training partners, Coot explicitly focuses on adapting to new partner behaviors by predicting actions aligned with observed partner interactions. Trained on interaction trajectories collected from diverse pairs of agents with complementary behaviors, \coot quickly learns effective coordination strategies without explicit supervision or fine-tuning. Evaluations on the Overcooked benchmark demonstrate that \coot significantly outperforms baseline methods in coordination tasks involving previously unseen partners. Human evaluations further confirm \coot as the most effective collaborative partner, while extensive ablations highlight its robustness, flexibility, and sensitivity to context in multi-agent scenarios.en
dc.description.provenanceSubmitted by admin ntu (admin@lib.ntu.edu.tw) on 2025-08-05T16:09:29Z
No. of bitstreams: 0
en
dc.description.provenanceMade available in DSpace on 2025-08-05T16:09:29Z (GMT). No. of bitstreams: 0en
dc.description.tableofcontentsAcknowledgements i
摘要 iii
Abstract v
Contents vii
List of Figures xi
List of Tables xiii
Chapter 1 Introduction 1
1.1 Introduction 1
1.2 Proposal and Contribution 2
Chapter 2 Related Work 5
2.1 In-Context Reinforcement Learning 5
2.2 Learning to Coordinate 6
2.3 Transfer Learning in Reinforcement Learning 7
Chapter 3 Preliminary 9
3.1 Hidden-Utility Markov Game 9
3.2 In-Context Learning with Decision-Pretrained Transformers 10
Chapter 4 Method 11
4.1 From Task Generalization to Coordination Generalization 11
4.2 Dataset Generation 13
4.3 Training 14
4.4 Online Deployment 15
Chapter 5 Experiments 17
5.1 Evaluation Setup 17
5.1.1 Environments 17
5.1.2 Baselines 19
5.1.3 Evaluation Pipeline 20
5.1.4 Evaluation Metrics 20
5.2 Coordination Performance Across Benchmarks 21
5.3 Adaptation Through Context Accumulation 23
5.4 Impact of Trajectory Augmentation 23
5.5 Human-Agent Collaboration Study 25
Chapter 6 Conclusion 27
References 29
Appendix A — Environment 37
Appendix B — Implementation and Dataset Details 41
B.1 Source and Licensing of Policy Pools and Datasets 41
B.2 COOT Training Details 42
B.3 Baseline and Training Implementation 44
Appendix C — Additional Contexts for Human experiment 47
C.1 Experiment Setup 47
C.2 Experiment Platform 48
C.3 Detailed Human Participant Rankings 48
Appendix D — Supplementary Results 51
D.1 Evaluation and Algorithm 51
-
dc.language.isoen-
dc.subject情境學習zh_TW
dc.subject決策用轉換器zh_TW
dc.subject人機合作zh_TW
dc.subject多智能體合作zh_TW
dc.subject情境強化學習zh_TW
dc.subjectHuman-AI Collaborationen
dc.subjectTransformers for Decision Makingen
dc.subjectMulti-Agent Coordinationen
dc.subjectIn-Context Reinforcement Learningen
dc.subjectIn-Context Learningen
dc.title利用合作轉換器來學習情境合作zh_TW
dc.titleLearning to Coordinate In-Context with Coordination Transformersen
dc.typeThesis-
dc.date.schoolyear113-2-
dc.description.degree碩士-
dc.contributor.oralexamcommittee李宏毅 ;謝秉均zh_TW
dc.contributor.oralexamcommitteeHung-yi Lee;Ping-Chun Hsiehen
dc.subject.keyword情境學習,情境強化學習,多智能體合作,人機合作,決策用轉換器,zh_TW
dc.subject.keywordIn-Context Learning,In-Context Reinforcement Learning,Multi-Agent Coordination,Human-AI Collaboration,Transformers for Decision Making,en
dc.relation.page55-
dc.identifier.doi10.6342/NTU202502134-
dc.rights.note同意授權(限校園內公開)-
dc.date.accepted2025-07-29-
dc.contributor.author-college電機資訊學院-
dc.contributor.author-dept電信工程學研究所-
dc.date.embargo-lift2030-07-21-
顯示於系所單位:電信工程學研究所

文件中的檔案:
檔案 大小格式 
ntu-113-2.pdf
  未授權公開取用
8.39 MBAdobe PDF檢視/開啟
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved