Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/81673
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor廖世偉(Shih-wei Liao)
dc.contributor.authorSiyue Huen
dc.contributor.author胡思悅zh_TW
dc.date.accessioned2022-11-24T09:25:34Z-
dc.date.available2022-11-24T09:25:34Z-
dc.date.copyright2021-09-02
dc.date.issued2021
dc.date.submitted2021-08-20
dc.identifier.citation[1] C. S. de Witt, T. Gupta, D. Makoviichuk, V. Makoviychuk, P. H. Torr, M. Sun, and S. Whiteson. Is independent learning all you need in the starcraft multi-agent challenge? arXiv preprint arXiv:2011.09533, 2020. [2] J. Foerster, G. Farquhar, T. Afouras, N. Nardelli, and S. Whiteson. Counterfactual multi-agent policy gradients. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018. [3] M. Fortunato, M. G. Azar, B. Piot, J. Menick, I. Osband, A. Graves, V. Mnih, R. Munos, D. Hassabis, O. Pietquin, et al. Noisy networks for exploration. arXiv preprint arXiv:1706.10295, 2017. [4] J. Hu, S. Jiang, S. A. Harding, H. Wu, and S.-w. Liao. Riit: Rethinking the importance of implementation tricks in multi-agent reinforcement learning. arXiv preprint arXiv:2102.03479, 2021. [5] S. Iqbal and F. Sha. Actor-attention-critic for multi-agent reinforcement learning. In International Conference on Machine Learning, pages 2961–2970. PMLR, 2019. [6] V. R. Konda and J. N. Tsitsiklis. Actor-critic algorithms. In Advances in neural information processing systems, pages 1008–1014. Citeseer, 2000. [7] L. Kraemer and B. Banerjee. Multi-agent reinforcement learning as a rehearsal for decentralized planning. Neurocomputing, 190:82–94, 2016. [8] D. Li, D. Zhao, Q. Zhang, and Y. Chen. Reinforcement learning and deep learning based lateral control for autonomous driving [application notes]. IEEE Computational Intelligence Magazine, 14(2):83–98, 2019. [9] R. Lowe, Y. Wu, A. Tamar, J. Harb, P. Abbeel, and I. Mordatch. Multiagent actor-critic for mixed cooperative-competitive environments. arXiv preprint arXiv:1706.02275, 2017. [10] A. Mahajan, T. Rashid, M. Samvelyan, and S. Whiteson. Maven: Multi-agent variational exploration. arXiv preprint arXiv:1910.07483, 2019. [11] F. A. Oliehoek and C. Amato. A concise introduction to decentralized POMDPs. Springer, 2016. [12] F. A. Oliehoek, M. T. Spaan, and N. Vlassis. Optimal and approximate q-value functions for decentralized pomdps. Journal of Artificial Intelligence Research, 32:289353, 2008. [13] P. Peng, Y. Wen, Y. Yang, Q. Yuan, Z. Tang, H. Long, and J. Wang. Multiagent bidirectionally-coordinated nets: Emergence of human-level coordination in learning to play starcraft combat games. arXiv preprint arXiv:1703.10069, 2017. [14] M. Plappert, R. Houthooft, P. Dhariwal, S. Sidor, R. Y. Chen, X. Chen, T. Asfour, P. Abbeel, and M. Andrychowicz. Parameter space noise for exploration. arXiv preprint arXiv:1706.01905, 2017. [15] W. Qiu, X. Wang, R. Yu, X. He, R. Wang, B. An, S. Obraztsova, and Z. Rabinovich. Rmix: Learning risk-sensitive policies for cooperative reinforcement learning agents. arXiv preprint arXiv:2102.08159, 2021. [16] T. Rashid, M. Samvelyan, C. Schroeder, G. Farquhar, J. Foerster, and S. Whiteson. Qmix: Monotonic value function factorisation for deep multi-agent reinforcement learning. In International Conference on Machine Learning, pages 4295–4304. PMLR, 2018. [17] M. Samvelyan, T. Rashid, C. S. De Witt, G. Farquhar, N. Nardelli, T. G. Rudner, C.-M. Hung, P. H. Torr, J. Foerster, and S. Whiteson. The starcraft multi-agent challenge. arXiv preprint arXiv:1902.04043, 2019. [18] J. Schulman, S. Levine, P. Abbeel, M. Jordan, and P. Moritz. Trust region policy optimization. In International conference on machine learning, pages 1889–1897. PMLR, 2015. [19] J. Schulman, P. Moritz, S. Levine, M. Jordan, and P. Abbeel. High- dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:1506.02438, 2015. [20] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017. [21] D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra, and M. Riedmiller. Deterministic policy gradient algorithms. In International conference on machine learning, pages 387–395. PMLR, 2014. [22] K. Son, D. Kim, W. J. Kang, D. E. Hostallero, and Y. Yi. Qtran: Learning to factorize with transformation for cooperative multi-agent reinforcement learning. In International Conference on Machine Learning, pages 5887–5896. PMLR, 2019. [23] H. Song, M. Kim, D. Park, Y. Shin, and J.-G. Lee. Learning from noisy labels with deep neural networks: A survey. arXiv preprint arXiv:2007.08199, 2020. [24] P. Sunehag, G. Lever, A. Gruslys, W. M. Czarnecki, V. Zambaldi, M. Jaderberg, M. Lanctot, N. Sonnerat, J. Z. Leibo, K. Tuyls, et al. Value-decomposition networks for cooperative multi-agent learning. arXiv preprint arXiv:1706.05296, 2017. [25] R. S. Sutton and A. G. Barto. Reinforcement learning: An introduction. MIT press, 2018. [26] R. S. Sutton, D. A. McAllester, S. P. Singh, Y. Mansour, et al. Policy gradient methods for reinforcement learning with function approximation. In NIPs, volume 99, pages 1057–1063. Citeseer, 1999. [27] M. Tan. Multi-agent reinforcement learning: Independent vs. cooperative agents. In Proceedings of the tenth international conference on machine learning, pages 330–337, 1993. [28] X.Wang, L. Ke, Z. Qiao, and X. Chai. Large-scale traffic signal control using a novel multiagent reinforcement learning. IEEE transactions on cybernetics, 51(1):174–187, 2020. [29] R. J. Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8(3-4):229–256, 1992. [30] Y. Yang, J. Hao, B. Liao, K. Shao, G. Chen, W. Liu, and H. Tang. Qatten: A general framework for cooperative multiagent reinforcement learning. arXiv preprint arXiv:2002.03939, 2020. [31] C. Yu, A. Velu, E. Vinitsky, Y.Wang, A. Bayen, and Y.Wu. The surprising effectiveness of mappo in cooperative, multi-agent games. arXiv preprint arXiv:2103.01955, 2021.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/81673-
dc.description.abstract近年來,許多廣為流行的多智能體強化學習(MARL)算法都採用了集中訓練與分散執行模(CTDE)。近期有部分學者嘗試將CTDE 架構直接套用在單智能體PPO 算法上,將其擴展為擁有集中式值函數的多智能體算法(MAPPO),並在《星際爭霸II》環境中進行測試,但實驗表明MAPPO 在《星際爭霸II》的許多任務下表現不佳。為了解決這個問題,我們設計了基於噪音的MAPPO(簡寫為ND-MAPPO),這個模型通過引入噪音機制,實現在集中的價值函數下給每個智能體分配不同的值,進而促進智能體的探索。實驗證明,我們所提的方法在《星際爭霸II》大部分場景皆遠超MAPPO,並在某些場景下同時超過最先進的CTDE算法QMIX。此外,我們首次從理論上證明PPO 通過集中值函數擴展為MAPPO是具備理論收斂性保證,並進一步分析值函數,從中獲得些有意思的見解。zh_TW
dc.description.provenanceMade available in DSpace on 2022-11-24T09:25:34Z (GMT). No. of bitstreams: 1
U0001-0507202101384200.pdf: 1375025 bytes, checksum: d66115c1dac1c2ae8b8817a509221d70 (MD5)
Previous issue date: 2021
en
dc.description.tableofcontentsVerification Letter from the Oral Examination Committee i Acknowledgements iii 摘要 v Abstract vii Contents ix List of Figures xi List of Tables xiii Chapter 1 Introduction 1 Chapter 2 Preliminaries 5 Chapter 3 RelatedWorks 9 Chapter 4 Methods 11 4.1 Motivation 11 4.2 Multi-agent PPO (MAPPO) 11 4.3 Noise-based Exploration 12 4.4 ND-MAPPO 12 4.5 Theoretical Perspective 14 Chapter 5 Experiments 15 5.1 Experiment Setup 15 5.1.1 Non-monotonic Matrix Game 15 5.1.2 StarCraft II 16 5.1.3 Evaluation Metric 17 5.2 Non-monotonic Matrix Game 17 5.3 SMAC 18 5.4 Ablation Studies 19 5.5 Extended V Value Analysis 20 5.6 Policy Entropy of ND-MAPPO 21 Chapter 6 Conclusion 23 Chapter 7 Broader Impact 25 References 27 Appendix A — Complete Detailed Multi-agent PPO proof 31 A.0.1 Multi-agent PPO Convergence 31 A.0.2 Lower Bound 34 Appendix B — Additional Results 35 B.1 Additional SMAC Results 35 B.2 Ablation Studies 35 Appendix C — Hyperparameters 37
dc.language.isoen
dc.subject多智能體強化學習zh_TW
dc.subject噪音擾動zh_TW
dc.subject集中訓練分散執行zh_TW
dc.subjectNoise Disturbanceen
dc.subjectCentralized Training with Decentralized Executionen
dc.subjectMulti-Agent Reinforcement Learningen
dc.titleND-MAPPO:具有噪音擾動的多智能體近似策略優化算法zh_TW
dc.titleND-MAPPO: Noise Disturbance Multi-Agent Proximal Policy Optimizationen
dc.date.schoolyear109-2
dc.description.degree碩士
dc.contributor.oralexamcommittee戴敏育(Hsin-Tsai Liu),邱仁鈿(Chih-Yang Tseng),周俊男,孫瑞鴻
dc.subject.keyword多智能體強化學習,集中訓練分散執行,噪音擾動,zh_TW
dc.subject.keywordMulti-Agent Reinforcement Learning,Centralized Training with Decentralized Execution,Noise Disturbance,en
dc.relation.page38
dc.identifier.doi10.6342/NTU202101269
dc.rights.note未授權
dc.date.accepted2021-08-20
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept資訊工程學研究所zh_TW
顯示於系所單位:資訊工程學系

文件中的檔案:
檔案 大小格式 
U0001-0507202101384200.pdf
  未授權公開取用
1.34 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved