請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98384完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 孫紹華 | zh_TW |
| dc.contributor.advisor | Shao-Hua Sun | en |
| dc.contributor.author | 王懷志 | zh_TW |
| dc.contributor.author | Huai-Chih Wang | en |
| dc.date.accessioned | 2025-08-05T16:09:29Z | - |
| dc.date.available | 2025-08-06 | - |
| dc.date.copyright | 2025-08-05 | - |
| dc.date.issued | 2025 | - |
| dc.date.submitted | 2025-07-28 | - |
| dc.identifier.citation | [1] M. Bain and C. Sammut. A framework for behavioural cloning. In Machine Intelligence 15, 1995.
[2] D. S. Bernstein, R. Givan, N. Immerman, and S. Zilberstein. The complexity of decentralized control of markov decision processes. In Mathematics of operations research, 2002. [3] T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, et al. Language models are few-shot learners. In Neural Information Processing Systems, 2020. [4] Y. Cao, W. Yu, W. Ren, and G. Chen. An overview of recent progress in the study of distributed multi-agent coordination. IEEE Transactions on Industrial informatics, 9, 2012. [5] M. Carroll, R. Shah, M. K. Ho, T. Griffiths, S. Seshia, P. Abbeel, and A. Dra- gan. On the utility of learning about humans for human-ai coordination. In Neural Information Processing Systems, 2019. [6] R. Charakorn, P. Manoonpong, and N. Dilokthanakul. Generating diverse cooperative agents by learning incompatible policies. In International Conference on Learning Representations, 2023. [7] L. Chen, K. Lu, A. Rajeswaran, K. Lee, A. Grover, M. Laskin, P. Abbeel, A. Srinivas, and I. Mordatch. Decision transformer: Reinforcement learning via sequence modeling. In Neural Information Processing Systems, 2021. [8] Z. Dai, F. Tomasi, and S. Ghiassian. In-context exploration-exploitation for reinforcement learning. In International Conference on Learning Representations, 2024. [9] Y. Duan, J. Schulman, X. Chen, P. L. Bartlett, I. Sutskever, and P. Abbeel. Rl ^2: Fast reinforcement learning via slow reinforcement learning. arXiv preprint arXiv:1611.02779, 2016. [10] C. Finn, P. Abbeel, and S. Levine. Model-agnostic meta-learning for fast adaptation of deep networks. In International Conference on Machine Learning, 2017. [11] A. Gupta, C. Devin, Y. Liu, P. Abbeel, and S. Levine. Learning invariant feature spaces to transfer skills with reinforcement learning. International Conference on Learning Representations, 2017. [12] H. Hu, A. Lerer, B. Cui, L. Pineda, N. Brown, and J. Foerster. Off-belief learning. In International Conference on Machine Learning, 2021. [13] H. Hu, A. Lerer, A. Peysakhovich, and J. Foerster. “other-play" for zero-shot coordination. In International Conference on Machine Learning, 2020. [14] M. Jaderberg, V. Dalibard, S. Osindero, W. M. Czarnecki, J. Donahue, A. Razavi, O. Vinyals, T. Green, I. Dunning, K. Simonyan, et al. Population-based training of neural networks. arXiv preprint arXiv:1711.09846, 2017. [15] L. Kirsch, J. Harrison, C. Freeman, J. Sohl-Dickstein, and J. Schmidhuber. Towards general-purpose in-context learning agents. In NeurIPS 2023 Foundation Models for Decision Making Workshop, 2023. [16] L. Kirsch, J. Harrison, C. D. Freeman, J. Sohl-Dickstein, and J. Schmidhuber. Towards general-purpose in-context learning agents. In NeurIPS 2023 Workshop on Distribution Shifts: New Frontiers with Foundation Models, 2024. [17] A. Kulesza, B. Taskar, et al. Determinantal point processes for machine learning. Foundations and Trends® in Machine Learning, 2012. [18] M. Laskin, L. Wang, J. Oh, E. Parisotto, S. Spencer, R. Steigerwald, D. Strouse, S. S. Hansen, A. Filos, E. Brooks, Maxime Gazeau, H. Sahni, S. Singh, and V. Mnih. In-context reinforcement learning with algorithm distillation. In International Conference on Learning Representations, 2023. [19] N. Lauffer, A. Shah, M. Carroll, M. D. Dennis, and S. Russell. Who needs to know? minimal knowledge for optimal coordination. In International Conference on Machine Learning, 2023. [20] J. Lee, A. Xie, A. Pacchiano, Y. Chandak, C. Finn, O. Nachum, and E. Brun- skill. Supervised pretraining can learn in-context reinforcement learning. In Neural Information Processing Systems, 2023. [21] K.-H. Lee, O. Nachum, M. S. Yang, L. Lee, D. Freeman, S. Guadarrama, I. Fischer, W. Xu, E. Jang, H. Michalewski, et al. Multi-game decision transformers. In Neural Information Processing Systems, 2022. [22] Y. Li, M. E. Ildiz, D. Papailiopoulos, and S. Oymak. Transformers as algorithms: Generalization and stability in in-context learning. In International Conference on Machine Learning, 2023. [23] Y. Li, S. Zhang, J. Sun, Y. Du, Y. Wen, X. Wang, and W. Pan. Cooperative open-ended learning framework for zero-shot coordination. In International Conference on Machine Learning, 2023. [24] Q. Long*, Z. Zhou*, A. Gupta, F. Fang, Y. Wu†, and X. Wang†. Evolutionary population curriculum for scaling multi-agent reinforcement learning. In International Conference on Learning Representations, 2020. [25] X. Lou, J. Guo, J. Zhang, J. Wang, K. Huang, and Y. Du. Pecan: Leveraging policy ensemble for context-aware zero-shot human-ai coordination. In Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems, 2023. [26] K. Lucas and R. E. Allen. Any-play: An intrinsic augmentation for zero-shot coordination. In Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems, 2022. [27] A. Lupu, B. Cui, H. Hu, and J. Foerster. Trajectory diversity for zero-shot coordina- tion. In International Conference on Machine Learning, 2021. [28] L. Matignon, G. J. Laurent, and N. Le Fort-Piat. Independent reinforcement learn- ers in cooperative markov games: a survey regarding coordination problems. The Knowledge Engineering Review, 27, 2012. [29] H.Nekoei,X.Zhao,J.Rajendran,M.Liu,andS.Chandar.Towardsfew-shotcoordi- nation: Revisiting ad-hoc teamplay challenge in the game of hanabi. In Conference on Lifelong Learning Agents, 2023. [30] K. Rakelly, A. Zhou, C. Finn, S. Levine, and D. Quillen. Efficient off-policy meta-reinforcement learning via probabilistic context variables. In International Conference on Machine Learning, 2019. [31] J. Rothfuss, D. Lee, I. Clavera, T. Asfour, and P. Abbeel. ProMP: Proximal meta- policy search. In International Conference on Learning Representations, 2019. [32] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017. [33] P. Stone, G. Kaminka, S. Kraus, and J. Rosenschein. Ad hoc autonomous agent teams: Collaboration without pre-coordination. In Proceedings of the AAAI Conference on Artificial Intelligence, 2010. [34] D. Strouse, K. McKee, M. Botvinick, E. Hughes, and R. Everett. Collaborating with humans without human data. In Neural Information Processing Systems, 2021. [35] G. Tesauro. Td-gammon, a self-teaching backgammon program, achieves master- level play. Neural computation, 6, 1994. [36] J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel. Domain ran- domization for transferring deep neural networks from simulation to the real world. In Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, 2017. [37] J. Von Oswald, E. Niklasson, E. Randazzo, J. Sacramento, A. Mordvintsev, A. Zhmoginov, and M. Vladymyrov. Transformers learn in-context by gradient descent. In International Conference on Machine Learning, 2023. [38] X.Wang, Z.Tian, Z.Wan, Y.Wen, J.Wang, and W.Zhang.Order matters: Agent-by-agent policy optimization. In International Conference on Learning Representations, 2023. [39] X. Wang, S. Zhang, W. Zhang, W. Dong, J. Chen, Y. Wen, and W. Zhang. Zsc- eval: An evaluation toolkit and benchmark for multi-agent zero-shot. In Neural Information Processing Systems, 2024. [40] J.Wei, X.Wang, D.Schuurmans, M.Bosma, F.Xia, E.Chi, Q.V.Le, D.Zhou, et al. Chain-of-thought prompting elicits reasoning in large language models. In Neural Information Processing Systems, 2022. [41] Z.Xie, Z.Lin, D.Ye, Q.Fu, Y.Wei,andS.Li.Future-conditioned unsupervised pre-training for decision transformer. In International Conference on Machine Learning, 2023. [42] J. Xing, T. Nagata, K. Chen, X. Zou, E. Neftci, and J. L. Krichmar. Domain adaptation in reinforcement learning via latent unified state representation. In Proceedings of the AAAI Conference on Artificial Intelligence, 2021. [43] K. Xue, Y. Wang, C. Guan, L. Yuan, H. Fu, Q. Fu, C. Qian, and Y. Yu. Hetero- geneous multi-agent zero-shot coordination by coevolution. IEEE Transactions on Evolutionary Computation, 2024. [44] Z. Yan, N. Jouandeau, and A. A. Cherif. A survey and analysis of multi-robot coordination. International Journal of Advanced Robotic Systems, 10, 2013. [45] C. Yu, J. Gao, W. Liu, B. Xu, H. Tang, J. Yang, Y. Wang, and Y. Wu. Learning zero-shot cooperation with humans, assuming humans are biased. In International Conference on Learning Representations, 2023. [46] C. Yu, A. Velu, E. Vinitsky, J. Gao, Y. Wang, A. Bayen, and Y. Wu. The surpris- ing effectiveness of ppo in cooperative multi-agent games. In Neural Information Processing Systems, 2022. [47] C. Yu, A. Velu, E. Vinitsky, J. Gao, Y. Wang, A. Bayen, and Y. Wu. Learning zero-shot cooperation with humans, assuming humans are biased. In International Conference on Learning Representations, 2023. [48] J. Yu, Y. Xu, J. Y. Koh, T. Luong, G. Baid, Z. Wang, V. Vasudevan, A. Ku, Y. Yang, B. K. Ayan, et al. Scaling autoregressive models for content-rich text-to-image gen- eration. arXiv preprint arXiv:2206.10789, 2022. [49] A. Zhang, R. McAllister, R. Calandra, Y. Gal, and S. Levine. Learning invariant representations for reinforcement learning without reconstruction. arXiv preprint arXiv:2006.10742, 2020. [50] R. Zhao, J. Song, Y. Yuan, H. Hu, Y. Gao, Y. Wu, Z. Sun, and W. Yang. Max- imum entropy population-based training for zero-shot human-ai coordination. In Proceedings of the AAAI Conference on Artificial Intelligence, 2023. [51] L. Zintgraf, S. Schulze, C. Lu, L. Feng, M. Igl, K. Shiarlis, Y. Gal, K. Hofmann, and S. Whiteson. Varibad: variational bayes-adaptive deep rl via meta-learning. J. Mach. Learn. Res., 22(1), Jan. 2021. | - |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98384 | - |
| dc.description.abstract | 在動態且具有不確定性的環境中實現人工智能體之間的有效協作,仍然是多智能體系統中的一大挑戰。現有方法,如自我對弈(self-play)與族群式訓練(population-based methods),要不是無法適應陌生合作對象的行為模式,就是需要極為大量的訓練次數。為了解決這些限制,我們提出了一種新的情境式協作框架——Coot(Coordination Transformers),該方法利用近期的互動歷史,能快速適應未見過的協作對象。與過往著重於增加訓練對象多樣性的做法不同,Coot 明確地聚焦於根據觀察到的夥伴行為進行動作預測,進而達到對新夥伴行為的適應。我們使用行為互補的智能體所產生的互動軌跡進行訓練,使得 Coot 能在無需額外監督或微調訓練的情況下,快速學會有效的協作策略。在 Overcooked 環境中的實驗顯示,Coot 在與陌生夥伴協作的任務中,明顯優於其他基準方法。人類評估也進一步證實 Coot 是最有效的協作夥伴;而大量的消融實驗則凸顯了其在多智能體場景中的韌性、靈活性與對情境的敏感度。 | zh_TW |
| dc.description.abstract | Effective coordination among artificial agents in dynamic and uncertain environments remains a significant challenge in multi-agent systems. Existing approaches, such as self-play and population-based methods, either generalize poorly to unseen partners or require impractically extensive training. To overcome these limitations, we propose Coordination Transformers (Coot), a novel in-context coordination framework that uses recent interaction histories to rapidly adapt to unseen partners. Unlike previous approaches that primarily aim to increase the diversity of training partners, Coot explicitly focuses on adapting to new partner behaviors by predicting actions aligned with observed partner interactions. Trained on interaction trajectories collected from diverse pairs of agents with complementary behaviors, \coot quickly learns effective coordination strategies without explicit supervision or fine-tuning. Evaluations on the Overcooked benchmark demonstrate that \coot significantly outperforms baseline methods in coordination tasks involving previously unseen partners. Human evaluations further confirm \coot as the most effective collaborative partner, while extensive ablations highlight its robustness, flexibility, and sensitivity to context in multi-agent scenarios. | en |
| dc.description.provenance | Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2025-08-05T16:09:29Z No. of bitstreams: 0 | en |
| dc.description.provenance | Made available in DSpace on 2025-08-05T16:09:29Z (GMT). No. of bitstreams: 0 | en |
| dc.description.tableofcontents | Acknowledgements i
摘要 iii Abstract v Contents vii List of Figures xi List of Tables xiii Chapter 1 Introduction 1 1.1 Introduction 1 1.2 Proposal and Contribution 2 Chapter 2 Related Work 5 2.1 In-Context Reinforcement Learning 5 2.2 Learning to Coordinate 6 2.3 Transfer Learning in Reinforcement Learning 7 Chapter 3 Preliminary 9 3.1 Hidden-Utility Markov Game 9 3.2 In-Context Learning with Decision-Pretrained Transformers 10 Chapter 4 Method 11 4.1 From Task Generalization to Coordination Generalization 11 4.2 Dataset Generation 13 4.3 Training 14 4.4 Online Deployment 15 Chapter 5 Experiments 17 5.1 Evaluation Setup 17 5.1.1 Environments 17 5.1.2 Baselines 19 5.1.3 Evaluation Pipeline 20 5.1.4 Evaluation Metrics 20 5.2 Coordination Performance Across Benchmarks 21 5.3 Adaptation Through Context Accumulation 23 5.4 Impact of Trajectory Augmentation 23 5.5 Human-Agent Collaboration Study 25 Chapter 6 Conclusion 27 References 29 Appendix A — Environment 37 Appendix B — Implementation and Dataset Details 41 B.1 Source and Licensing of Policy Pools and Datasets 41 B.2 COOT Training Details 42 B.3 Baseline and Training Implementation 44 Appendix C — Additional Contexts for Human experiment 47 C.1 Experiment Setup 47 C.2 Experiment Platform 48 C.3 Detailed Human Participant Rankings 48 Appendix D — Supplementary Results 51 D.1 Evaluation and Algorithm 51 | - |
| dc.language.iso | en | - |
| dc.subject | 情境學習 | zh_TW |
| dc.subject | 決策用轉換器 | zh_TW |
| dc.subject | 人機合作 | zh_TW |
| dc.subject | 多智能體合作 | zh_TW |
| dc.subject | 情境強化學習 | zh_TW |
| dc.subject | Human-AI Collaboration | en |
| dc.subject | Transformers for Decision Making | en |
| dc.subject | Multi-Agent Coordination | en |
| dc.subject | In-Context Reinforcement Learning | en |
| dc.subject | In-Context Learning | en |
| dc.title | 利用合作轉換器來學習情境合作 | zh_TW |
| dc.title | Learning to Coordinate In-Context with Coordination Transformers | en |
| dc.type | Thesis | - |
| dc.date.schoolyear | 113-2 | - |
| dc.description.degree | 碩士 | - |
| dc.contributor.oralexamcommittee | 李宏毅 ;謝秉均 | zh_TW |
| dc.contributor.oralexamcommittee | Hung-yi Lee;Ping-Chun Hsieh | en |
| dc.subject.keyword | 情境學習,情境強化學習,多智能體合作,人機合作,決策用轉換器, | zh_TW |
| dc.subject.keyword | In-Context Learning,In-Context Reinforcement Learning,Multi-Agent Coordination,Human-AI Collaboration,Transformers for Decision Making, | en |
| dc.relation.page | 55 | - |
| dc.identifier.doi | 10.6342/NTU202502134 | - |
| dc.rights.note | 同意授權(限校園內公開) | - |
| dc.date.accepted | 2025-07-29 | - |
| dc.contributor.author-college | 電機資訊學院 | - |
| dc.contributor.author-dept | 電信工程學研究所 | - |
| dc.date.embargo-lift | 2030-07-21 | - |
| 顯示於系所單位: | 電信工程學研究所 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-113-2.pdf 未授權公開取用 | 8.39 MB | Adobe PDF | 檢視/開啟 |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
