請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/81673完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 廖世偉(Shih-wei Liao) | |
| dc.contributor.author | Siyue Hu | en |
| dc.contributor.author | 胡思悅 | zh_TW |
| dc.date.accessioned | 2022-11-24T09:25:34Z | - |
| dc.date.available | 2022-11-24T09:25:34Z | - |
| dc.date.copyright | 2021-09-02 | |
| dc.date.issued | 2021 | |
| dc.date.submitted | 2021-08-20 | |
| dc.identifier.citation | [1] C. S. de Witt, T. Gupta, D. Makoviichuk, V. Makoviychuk, P. H. Torr, M. Sun, and S. Whiteson. Is independent learning all you need in the starcraft multi-agent challenge? arXiv preprint arXiv:2011.09533, 2020. [2] J. Foerster, G. Farquhar, T. Afouras, N. Nardelli, and S. Whiteson. Counterfactual multi-agent policy gradients. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018. [3] M. Fortunato, M. G. Azar, B. Piot, J. Menick, I. Osband, A. Graves, V. Mnih, R. Munos, D. Hassabis, O. Pietquin, et al. Noisy networks for exploration. arXiv preprint arXiv:1706.10295, 2017. [4] J. Hu, S. Jiang, S. A. Harding, H. Wu, and S.-w. Liao. Riit: Rethinking the importance of implementation tricks in multi-agent reinforcement learning. arXiv preprint arXiv:2102.03479, 2021. [5] S. Iqbal and F. Sha. Actor-attention-critic for multi-agent reinforcement learning. In International Conference on Machine Learning, pages 2961–2970. PMLR, 2019. [6] V. R. Konda and J. N. Tsitsiklis. Actor-critic algorithms. In Advances in neural information processing systems, pages 1008–1014. Citeseer, 2000. [7] L. Kraemer and B. Banerjee. Multi-agent reinforcement learning as a rehearsal for decentralized planning. Neurocomputing, 190:82–94, 2016. [8] D. Li, D. Zhao, Q. Zhang, and Y. Chen. Reinforcement learning and deep learning based lateral control for autonomous driving [application notes]. IEEE Computational Intelligence Magazine, 14(2):83–98, 2019. [9] R. Lowe, Y. Wu, A. Tamar, J. Harb, P. Abbeel, and I. Mordatch. Multiagent actor-critic for mixed cooperative-competitive environments. arXiv preprint arXiv:1706.02275, 2017. [10] A. Mahajan, T. Rashid, M. Samvelyan, and S. Whiteson. Maven: Multi-agent variational exploration. arXiv preprint arXiv:1910.07483, 2019. [11] F. A. Oliehoek and C. Amato. A concise introduction to decentralized POMDPs. Springer, 2016. [12] F. A. Oliehoek, M. T. Spaan, and N. Vlassis. Optimal and approximate q-value functions for decentralized pomdps. Journal of Artificial Intelligence Research, 32:289353, 2008. [13] P. Peng, Y. Wen, Y. Yang, Q. Yuan, Z. Tang, H. Long, and J. Wang. Multiagent bidirectionally-coordinated nets: Emergence of human-level coordination in learning to play starcraft combat games. arXiv preprint arXiv:1703.10069, 2017. [14] M. Plappert, R. Houthooft, P. Dhariwal, S. Sidor, R. Y. Chen, X. Chen, T. Asfour, P. Abbeel, and M. Andrychowicz. Parameter space noise for exploration. arXiv preprint arXiv:1706.01905, 2017. [15] W. Qiu, X. Wang, R. Yu, X. He, R. Wang, B. An, S. Obraztsova, and Z. Rabinovich. Rmix: Learning risk-sensitive policies for cooperative reinforcement learning agents. arXiv preprint arXiv:2102.08159, 2021. [16] T. Rashid, M. Samvelyan, C. Schroeder, G. Farquhar, J. Foerster, and S. Whiteson. Qmix: Monotonic value function factorisation for deep multi-agent reinforcement learning. In International Conference on Machine Learning, pages 4295–4304. PMLR, 2018. [17] M. Samvelyan, T. Rashid, C. S. De Witt, G. Farquhar, N. Nardelli, T. G. Rudner, C.-M. Hung, P. H. Torr, J. Foerster, and S. Whiteson. The starcraft multi-agent challenge. arXiv preprint arXiv:1902.04043, 2019. [18] J. Schulman, S. Levine, P. Abbeel, M. Jordan, and P. Moritz. Trust region policy optimization. In International conference on machine learning, pages 1889–1897. PMLR, 2015. [19] J. Schulman, P. Moritz, S. Levine, M. Jordan, and P. Abbeel. High- dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:1506.02438, 2015. [20] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017. [21] D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra, and M. Riedmiller. Deterministic policy gradient algorithms. In International conference on machine learning, pages 387–395. PMLR, 2014. [22] K. Son, D. Kim, W. J. Kang, D. E. Hostallero, and Y. Yi. Qtran: Learning to factorize with transformation for cooperative multi-agent reinforcement learning. In International Conference on Machine Learning, pages 5887–5896. PMLR, 2019. [23] H. Song, M. Kim, D. Park, Y. Shin, and J.-G. Lee. Learning from noisy labels with deep neural networks: A survey. arXiv preprint arXiv:2007.08199, 2020. [24] P. Sunehag, G. Lever, A. Gruslys, W. M. Czarnecki, V. Zambaldi, M. Jaderberg, M. Lanctot, N. Sonnerat, J. Z. Leibo, K. Tuyls, et al. Value-decomposition networks for cooperative multi-agent learning. arXiv preprint arXiv:1706.05296, 2017. [25] R. S. Sutton and A. G. Barto. Reinforcement learning: An introduction. MIT press, 2018. [26] R. S. Sutton, D. A. McAllester, S. P. Singh, Y. Mansour, et al. Policy gradient methods for reinforcement learning with function approximation. In NIPs, volume 99, pages 1057–1063. Citeseer, 1999. [27] M. Tan. Multi-agent reinforcement learning: Independent vs. cooperative agents. In Proceedings of the tenth international conference on machine learning, pages 330–337, 1993. [28] X.Wang, L. Ke, Z. Qiao, and X. Chai. Large-scale traffic signal control using a novel multiagent reinforcement learning. IEEE transactions on cybernetics, 51(1):174–187, 2020. [29] R. J. Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8(3-4):229–256, 1992. [30] Y. Yang, J. Hao, B. Liao, K. Shao, G. Chen, W. Liu, and H. Tang. Qatten: A general framework for cooperative multiagent reinforcement learning. arXiv preprint arXiv:2002.03939, 2020. [31] C. Yu, A. Velu, E. Vinitsky, Y.Wang, A. Bayen, and Y.Wu. The surprising effectiveness of mappo in cooperative, multi-agent games. arXiv preprint arXiv:2103.01955, 2021. | |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/81673 | - |
| dc.description.abstract | 近年來,許多廣為流行的多智能體強化學習(MARL)算法都採用了集中訓練與分散執行模(CTDE)。近期有部分學者嘗試將CTDE 架構直接套用在單智能體PPO 算法上,將其擴展為擁有集中式值函數的多智能體算法(MAPPO),並在《星際爭霸II》環境中進行測試,但實驗表明MAPPO 在《星際爭霸II》的許多任務下表現不佳。為了解決這個問題,我們設計了基於噪音的MAPPO(簡寫為ND-MAPPO),這個模型通過引入噪音機制,實現在集中的價值函數下給每個智能體分配不同的值,進而促進智能體的探索。實驗證明,我們所提的方法在《星際爭霸II》大部分場景皆遠超MAPPO,並在某些場景下同時超過最先進的CTDE算法QMIX。此外,我們首次從理論上證明PPO 通過集中值函數擴展為MAPPO是具備理論收斂性保證,並進一步分析值函數,從中獲得些有意思的見解。 | zh_TW |
| dc.description.provenance | Made available in DSpace on 2022-11-24T09:25:34Z (GMT). No. of bitstreams: 1 U0001-0507202101384200.pdf: 1375025 bytes, checksum: d66115c1dac1c2ae8b8817a509221d70 (MD5) Previous issue date: 2021 | en |
| dc.description.tableofcontents | Verification Letter from the Oral Examination Committee i Acknowledgements iii 摘要 v Abstract vii Contents ix List of Figures xi List of Tables xiii Chapter 1 Introduction 1 Chapter 2 Preliminaries 5 Chapter 3 RelatedWorks 9 Chapter 4 Methods 11 4.1 Motivation 11 4.2 Multi-agent PPO (MAPPO) 11 4.3 Noise-based Exploration 12 4.4 ND-MAPPO 12 4.5 Theoretical Perspective 14 Chapter 5 Experiments 15 5.1 Experiment Setup 15 5.1.1 Non-monotonic Matrix Game 15 5.1.2 StarCraft II 16 5.1.3 Evaluation Metric 17 5.2 Non-monotonic Matrix Game 17 5.3 SMAC 18 5.4 Ablation Studies 19 5.5 Extended V Value Analysis 20 5.6 Policy Entropy of ND-MAPPO 21 Chapter 6 Conclusion 23 Chapter 7 Broader Impact 25 References 27 Appendix A — Complete Detailed Multi-agent PPO proof 31 A.0.1 Multi-agent PPO Convergence 31 A.0.2 Lower Bound 34 Appendix B — Additional Results 35 B.1 Additional SMAC Results 35 B.2 Ablation Studies 35 Appendix C — Hyperparameters 37 | |
| dc.language.iso | en | |
| dc.subject | 多智能體強化學習 | zh_TW |
| dc.subject | 噪音擾動 | zh_TW |
| dc.subject | 集中訓練分散執行 | zh_TW |
| dc.subject | Noise Disturbance | en |
| dc.subject | Centralized Training with Decentralized Execution | en |
| dc.subject | Multi-Agent Reinforcement Learning | en |
| dc.title | ND-MAPPO:具有噪音擾動的多智能體近似策略優化算法 | zh_TW |
| dc.title | ND-MAPPO: Noise Disturbance Multi-Agent Proximal Policy Optimization | en |
| dc.date.schoolyear | 109-2 | |
| dc.description.degree | 碩士 | |
| dc.contributor.oralexamcommittee | 戴敏育(Hsin-Tsai Liu),邱仁鈿(Chih-Yang Tseng),周俊男,孫瑞鴻 | |
| dc.subject.keyword | 多智能體強化學習,集中訓練分散執行,噪音擾動, | zh_TW |
| dc.subject.keyword | Multi-Agent Reinforcement Learning,Centralized Training with Decentralized Execution,Noise Disturbance, | en |
| dc.relation.page | 38 | |
| dc.identifier.doi | 10.6342/NTU202101269 | |
| dc.rights.note | 未授權 | |
| dc.date.accepted | 2021-08-20 | |
| dc.contributor.author-college | 電機資訊學院 | zh_TW |
| dc.contributor.author-dept | 資訊工程學研究所 | zh_TW |
| 顯示於系所單位: | 資訊工程學系 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| U0001-0507202101384200.pdf 未授權公開取用 | 1.34 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
