請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/80378完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 廖世偉(Shih-wei Liao) | |
| dc.contributor.author | JIAN HU | en |
| dc.contributor.author | 胡健 | zh_TW |
| dc.date.accessioned | 2022-11-24T03:05:26Z | - |
| dc.date.available | 2021-09-11 | |
| dc.date.available | 2022-11-24T03:05:26Z | - |
| dc.date.copyright | 2021-09-11 | |
| dc.date.issued | 2021 | |
| dc.date.submitted | 2021-09-06 | |
| dc.identifier.citation | [1] M. Andrychowicz, A. Raichuk, P. Stan´czyk, M. Orsini, S. Girgin, R. Marinier, L. Hussenot, M. Geist, O. Pietquin, M. Michalski, S. Gelly, and O. Bachem. What Matters In On-Policy Reinforcement Learning? A Large-Scale Empirical Study. arXiv:2006.05990, 2020. [2] W. Boehmer, V. Kurin, and S. Whiteson. Deep coordination graphs. In ICML 2020, 13-18 July 2020, Virtual Event, pages 980–991, 2020. [3] Y. Cao, W. Yu, W. Ren, and G. Chen. An overview of recent progress in the study of distributed multi-agent coordination. IEEE Transactions on Industrial informatics, 9(1):427–438, 2012. [4] K. Cobbe, J. Hilton, O. Klimov, and J. Schulman. Phasic policy gradient. arXiv preprint arXiv:2009.04416, 2020. [5] L. Engstrom, A. Ilyas, S. Santurkar, D. Tsipras, F. Janoos, L. Rudolph, and A. Madry. Implementation Matters in Deep Policy Gradients: A Case Study on PPO and TRPO. arXiv:2005.12729, 2020. [6] J. N. Foerster, G. Farquhar, T. Afouras, N. Nardelli, and S. Whiteson. Counterfactual multi-agent policy gradients. In AAAI-18, New Orleans, Louisiana, USA, February 2-7, 2018, pages 2974–2982. AAAI Press, 2018. [7] M. Hüttenrauch, A. Šošic´, and G. Neumann. Guided deep reinforcement learning for swarm systems. arXiv preprint arXiv:1709.06011, 2017. [8] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. In ICLR 2015, San Diego, CA, USA, May 7-9, 2015, 2015. [9] T. Kozuno, Y. Tang, M. Rowland, R. Munos, S. Kapturowski, W. Dabney, M. Valko, and D. Abel. Revisiting peng’s q (λ) for modern reinforcement learning. arXiv preprint arXiv:2103.00107, 2021. [10] L. Kraemer and B. Banerjee. Multi-agent reinforcement learning as a rehearsal for decentralized planning. Neurocomputing, 190:82–94, 2016. [11] R. Lowe, Y. Wu, A. Tamar, J. Harb, P. Abbeel, and I. Mordatch. Multi-agent actor- critic for mixed cooperative-competitive environments. In NeurIPS 2017, December 4-9, 2017, Long Beach, CA, USA, pages 6379–6390, 2017. [12] A. Mahajan, T. Rashid, M. Samvelyan, and S. Whiteson. MAVEN: multi-agent variational exploration. In NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pages 7611–7622, 2019. [13] V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. P. Lillicrap, T. Harley, D. Silver, and K. Kavukcuoglu. Asynchronous methods for deep reinforcement learning. In ICML 2016, New York City, NY, USA, June 19-24, 2016, pages 1928–1937, 2016. [14] O. Nachum, M. Ahn, H. Ponte, S. Gu, and V. Kumar. Multi-agent manipulation via locomotion using hierarchical sim2real. arXiv preprint arXiv:1908.05224, 2019. [15] S. C. Ong, S. W. Png, D. Hsu, and W. S. Lee. Pomdps for robotic tasks with mixed observability. 5:4, 2009. [16] B. Peng, T. Rashid, C. A. Schroeder de Witt, P.-A. Kamienny, P. H. Torr, W. Böhmer, and S. Whiteson. Facmac: Factored multi-agent centralised policy gradients. arXiv e-prints, pages arXiv–2003, 2020. [17] J. Peng and R. J. Williams. Incremental multi-step q-learning. In Machine Learning Proceedings 1994, pages 226–232. Elsevier, 1994. [18] D. Precup, R. S. Sutton, and S. P. Singh. Eligibility traces for off-policy policy evaluation. In (ICML 2000), Stanford University, Stanford, CA, USA, June 29 - July 2, 2000, pages 759–766. Morgan Kaufmann, 2000. [19] T. Rashid, G. Farquhar, B. Peng, and S. Whiteson. Weighted QMIX: Expanding Monotonic Value Function Factorisation. arXiv preprint arXiv:2006.10800, 2020. [20] T. Rashid, M. Samvelyan, C. S. de Witt, G. Farquhar, J. N. Foerster, and S. Whiteson. QMIX: monotonic value function factorisation for deep multi-agent reinforcement learning. In ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018, pages 4292–4301, 2018. [21] M. Samvelyan, T. Rashid, C. S. de Witt, G. Farquhar, N. Nardelli, T. G. J. Rudner, C.-M. Hung, P. H. S. Torr, J. Foerster, and S. Whiteson. The StarCraft Multi-Agent Challenge. arXiv preprint arXiv:1902.04043, 2019. [22] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017. [23] K. Son, S. Ahn, R. D. Reyes, J. Shin, and Y. Yi. QTRAN++: Im- proved Value Transformation for Cooperative Multi-Agent Reinforcement Learning. arXiv:2006.12010, 2020. [24] K. Son, D. Kim, W. J. Kang, D. Hostallero, and Y. Yi. QTRAN: learning to factorize with transformation for cooperative multi-agent reinforcement learning. In ICML 2019, 9-15 June 2019, Long Beach, California, USA, pages 5887–5896, 2019. [25] A. Stooke and P. Abbeel. Accelerated methods for deep reinforcement learning. arXiv preprint arXiv:1803.02811, 2018. [26] J. Su, S. Adams, and P. A. Beling. Value-decomposition multi-agent actor-critics. arXiv preprint arXiv:2007.12306, 2020. [27] J. Su, S. Adams, and P. A. Beling. Value-Decomposition Multi-Agent Actor-Critics. arXiv:2007.12306, 2020. [28] P. Sunehag, G. Lever, A. Gruslys, W. M. Czarnecki, V. Zambaldi, M. Jader- berg, M. Lanctot, N. Sonnerat, J. Z. Leibo, K. Tuyls, and T. Graepel. Value- Decomposition Networks For Cooperative Multi-Agent Learning. arXiv preprint arXiv:1706.05296, 2017. [29] R. S. Sutton and A. G. Barto. Reinforcement learning: An introduction. MIT press, 2018. [30] J. Wang, Z. Ren, T. Liu, Y. Yu, and C. Zhang. QPLEX: Duplex Dueling Multi-Agent Q-Learning. arXiv:2008.01062, 2020. [31] Y. Wang, B. Han, T. Wang, H. Dong, and C. Zhang. Off-Policy Multi-Agent De- composed Policy Gradients. arXiv:2007.12322, 2020. [32] Z. Wang, T. Schaul, M. Hessel, H. van Hasselt, M. Lanctot, and N. de Freitas. Du- eling network architectures for deep reinforcement learning. In ICML 2016, New York City, NY, USA, June 19-24, 2016, pages 1995–2003. [33] E. Wei, D. Wicke, D. Freelan, and S. Luke. Multiagent Soft Q-Learning. arXiv preprint arXiv:1804.09817, 2018. [34] Y. Xiao, J. Hoffman, and C. Amato. Macro-action-based deep multi-agent reinforce- ment learning. In Conference on Robot Learning, pages 1146–1161. PMLR, 2020. [35] Y. Yang, J. Hao, B. Liao, K. Shao, G. Chen, W. Liu, and H. Tang. Qatten: A General Framework for Cooperative Multiagent Reinforcement Learning. arXiv preprint arXiv:2002.03939, 2020. [36] C. Zhang and V. R. Lesser. Coordinated multi-agent reinforcement learning in net- worked distributed pomdps. In AAAI 2011, San Francisco, California, USA, August 7-11, 2011. AAAI Press, 2011. [37] M. Zhou, Z. Liu, P. Sui, Y. Li, and Y. Y. Chung. Learning Implicit Credit Assignment for Multi-Agent Actor-Critic. arXiv preprint arXiv:2007.02529, 2020. [38] M. Zhou, J. Luo, and J. V. et al. Smarts: Scalable multi-agent reinforcement learning training school for autonomous driving, 2020 | |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/80378 | - |
| dc.description.abstract | 許多複雜的多智能體系統,如機器人群控制和自主車輛協調,可以被建模為多代理強化學習(MARL)任務。QMIX是一種流行的基於單調性約束的MARL算法,已被用作基準環境的基線,如星際爭霸多Agent挑戰賽(SMAC)、捕食者-獵物(PP)。最近的QMIX變體以放松QMIX的單調性約束為目標,以提高QMIX的表達能力,使其在SMAC的性能得到改善。然而,我們發現,這些變體的性能改進受到各種實現技巧的顯著影響。在本文中,我們重新審視了QMIX的單調性約束。(1)我們設計了一個新穎的模型RMC來進一步研究單調性約束;結果表明,單調性約束可以提高一些純合作任務的採樣效率;(2)然後我們通過網格超參數搜索技巧來重新評估QMIX和這些變體的性能;結果表明QMIX在它們中取得了最佳性能;(3) 我們從理論角度分析了單調性混合網絡,並表明它可以代表任何純合作任務。這些分析表明,放鬆值分解網絡的單調性約束並不總是能提高QMIX的性能,這打破了我們以前對單調性約束的印象。 | zh_TW |
| dc.description.provenance | Made available in DSpace on 2022-11-24T03:05:26Z (GMT). No. of bitstreams: 1 U0001-3105202117103200.pdf: 1033127 bytes, checksum: 0446ddd78537a990a9d971fb35214dff (MD5) Previous issue date: 2021 | en |
| dc.description.tableofcontents | Verification Letter from the Oral Examination Committee i Acknowledgements ii 摘要 iii Abstract iv Contents v List of Figures viii List of Tables ix Chapter 1 Introduction 1 Chapter 2 Background 3 Chapter 3 Related Works 5 Chapter 4 RMC 7 Chapter 5 Experimental Setup 9 5.1 Benchmark Environment 9 5.2 Parallel Sampling 10 5.3 Evaluation Metric 11 Chapter 6 Experiments 12 6.1 Ablation Study of Monotonicity Constraint 12 6.2 Re-Evaluation 13 6.3 Fine-tuned-QMIX 14 6.4 Non-monotonic Matrix Games 15 Chapter 7 Discussion 17 7.1 Theory 17 7.2 Why monotonicity constraints work well in SMAC and DEPP? 18 Chapter 8 Conclusion 20 Chapter 9 Broader Impact 21 References 22 Appendix A — Code-level Optimizations 26 A.1 Optimizer 26 A.2 N-step Returns 27 A.3 Replay Buffer Size 28 A.4 Rollout Process Number 29 A.5 Exploration Steps 30 Appendix B — Hyperparameters 32 Appendix C — Omitted Experimental Results 35 C.1 Omitted Figures 35 C.2 The Performance of Original Algorithms 35 Appendix D — Pseudo-code 37 Appendix E — CTDE algorithms 39 E.1 Value-based Methods 39 E.1.1 VDNs 39 E.1.2 Qatten 39 E.1.3 QPLEX 40 E.1.4 WQMIX 40 E.2 Policy-based Methods 41 E.2.1 LICA 41 E.3 Summary 42 | |
| dc.language.iso | en | |
| dc.subject | 多智能體強化學習 | zh_TW |
| dc.subject | 單調性約束 | zh_TW |
| dc.subject | 超參數 | zh_TW |
| dc.subject | Hyperparameters | en |
| dc.subject | Multi-agent Reinforcement Learning | en |
| dc.subject | Monotonicity Constraint | en |
| dc.title | 再思考多智能體合作強化學習中的單調性約束 | zh_TW |
| dc.title | RMC: Rethinking the Monotonicity Constraint in Cooperative Multi-Agent Reinforcement Learning | en |
| dc.date.schoolyear | 109-2 | |
| dc.description.degree | 碩士 | |
| dc.contributor.oralexamcommittee | 周俊男(Hsin-Tsai Liu),孫瑞鴻(Chih-Yang Tseng),戴敏育,邱仁鈿,吳馬丁 | |
| dc.subject.keyword | 多智能體強化學習,單調性約束,超參數, | zh_TW |
| dc.subject.keyword | Multi-agent Reinforcement Learning,Monotonicity Constraint,Hyperparameters, | en |
| dc.relation.page | 42 | |
| dc.identifier.doi | 10.6342/NTU202100939 | |
| dc.rights.note | 同意授權(限校園內公開) | |
| dc.date.accepted | 2021-09-07 | |
| dc.contributor.author-college | 電機資訊學院 | zh_TW |
| dc.contributor.author-dept | 資訊網路與多媒體研究所 | zh_TW |
| 顯示於系所單位: | 資訊網路與多媒體研究所 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| U0001-3105202117103200.pdf 授權僅限NTU校內IP使用(校園外請利用VPN校外連線服務) | 1.01 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
