請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/72045
完整後設資料紀錄
DC 欄位 | 值 | 語言 |
---|---|---|
dc.contributor.advisor | 陳光禎 | |
dc.contributor.author | Hsuan-Man Hung | en |
dc.contributor.author | 洪瑄蔓 | zh_TW |
dc.date.accessioned | 2021-06-17T06:20:36Z | - |
dc.date.available | 2019-08-21 | |
dc.date.copyright | 2018-08-21 | |
dc.date.issued | 2018 | |
dc.date.submitted | 2018-08-19 | |
dc.identifier.citation | [1] C. Boutilier and B. Price. Accelerating reinforcement learning through implicit imitation. CoRR, abs/1106.0681, 2011.
[2] W. Burgard, M. Moors, C. Stachniss, and F. E. Schneider. Coordinated multi-robot exploration. IEEE Transactions on Robotics, 21(3):376-386, June 2005. [3] L. Busoniu, R. B. ^ska, and B. D. Schutter. A comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 38(2):156-172, March 2008. [4] K.-C. Chen, E. Ko, and M. Wu. Networked arti cial intelligence. to appear in the IEEE Network. [5] K.-C. Chen, T. Zhang, R. Gitlin, and G. Fettweis. Ultra-low latency mobile networking. to appear in the IEEE Network. [6] J. A. Clouse. Learning from an automated training agent. In Adaptation and Learning in Multiagent Systems. Springer Verlag, 1996. [7] A. Coates, B. Carpenter, C. Case, S. Satheesh, B. Suresh, T. Wang, D. J. Wu, and A. Y. Ng. Text detection and character recognition in scene images with unsupervised feature learning. In Proceedings of the 2011 International Conference on Document Analysis and Recognition, ICDAR '11, pages 440-445, Washington, DC, USA, 2011. IEEE Computer Society. [8] J. S. Jennings, G. Whelan, and W. F. Evans. Cooperative search and rescue with a team of mobile robots. In Advanced Robotics, 1997. ICAR '97. Proceedings., 8th international Conference on, pages 193-200, Jul 1997. [9] F. L. Lewis and D. Vrabie. Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circuits and Systems Magazine, 9(3):32-50, Third 2009. [10] S. Russell and P. Norvig. Arti cial Intelligence: A Modern Approach. Prentice Hall Press, Upper Saddle River, NJ, USA, 3rd edition, 2009. [11] R. S. Sutton and A. G. Barto. Introduction to Reinforcement Learning. MIT Press, Cambridge, MA, USA, 1st edition, 1998. [12] M. Tan. Multi-agent reinforcement learning: Independent vs. cooperative agents. In In Proceedings of the Tenth International Conference on Machine Learning, pages 330-337. Morgan Kaufmann, 1993. [13] S. Thrun. Learning occupancy grid maps with forward sensor models. Autonomous Robots, 15(2):111-127, Sep 2003. [14] A. M. TURING. Computing machinery and intelligence. Mind, LIX(236):433-460, 1950. [15] C. J. C. H. Watkins and P. Dayan. Q-learning. In Machine Learning, pages 279-292, 1992. [16] K. M. Wurm, C. Stachniss, and W. Burgard. Coordinated multi-robot exploration using a segmentation of the environment. In 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 1160-1165, Sep 2008. [17] Y. Xiang, A. Alahi, and S. Savarese. Learning to track: Online multi-object tracking by decision making. In 2015 IEEE International Conference on Computer Vision (ICCV), pages 4705{4713, Dec 2015. | |
dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/72045 | - |
dc.description.abstract | 人工智慧近年來一直是熱門的研究領域之一,其應用廣泛,無論是日常生活可見的或龐大艱深的問題都藉著人工智慧來尋找解決辦法。隨著智能個體複雜度及數量增加,所組成的多智能系統中個體的互動以及通訊網路為系統所帶來的影響也漸趨複雜。在這篇論文中,我們設計增強式學習之自動掃地機器人,並以單一機器人之學習表現作為基準來衡量多智能系統的學習行為與其所在的通訊環境之間的關係。 | zh_TW |
dc.description.abstract | Artificial intelligence has been one of the hottest research trends, developing to solve from problems happen in daily life to complicated ones. As the number of intelligent entities grows, the interactions between them become more complex, particularly when communication comes into play. In this thesis, we investigate the relationships lying behind communication and multi-agent learning systems through practical learning tasks. Using single agent implemented with reinforcement learning as a benchmark, the interplay between various communication scenarios and the learning behaviour in multi-agent systems is explored. | en |
dc.description.provenance | Made available in DSpace on 2021-06-17T06:20:36Z (GMT). No. of bitstreams: 1 ntu-107-R04942113-1.pdf: 4051561 bytes, checksum: 88e9517bb5f4742813997b782a429649 (MD5) Previous issue date: 2018 | en |
dc.description.tableofcontents | 誌謝. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i
摘要. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Artificial Intelligence . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Multiagent Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.4 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.5 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2 Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.1 Markov Decision Process . . . . . . . . . . . . . . . . . . . . . . . . 7 2.1.1 Bellman Optimality Principle . . . . . . . . . . . . . . . . . 11 2.2 Q-learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.3 n-step Temporal Dierence . . . . . . . . . . . . . . . . . . . . . . . 16 3 Single Agent Learning Task . . . . . . . . . . . . . . . . . . . . . . . 19 3.1 Problem Description . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.2 Single Agent Reinforcement Learning . . . . . . . . . . . . . . . . . 22 3.2.1 Reinforcement Learning Formulation . . . . . . . . . . . . . 22 3.2.2 Simple RL . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.2.3 Q-learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.2.4 Performance Evaluation: Simple RL versus Q-learning . . . 30 3.2.5 n-step Temporal Dierence Prediction . . . . . . . . . . . . 34 3.2.6 Performance Comparison: Q-learning and n-step TD . . . . 36 4 RL Enhancement with Planning . . . . . . . . . . . . . . . . . . . . 39 4.1 Fixed Length Planning . . . . . . . . . . . . . . . . . . . . . . . . . 39 4.2 Conditional Exhaustive Planning . . . . . . . . . . . . . . . . . . . 43 4.2.1 Conditions for Adopting Planning . . . . . . . . . . . . . . . 43 4.2.2 Planning algorithm: Grassre Algorithm . . . . . . . . . . . 45 4.2.3 Learning NB . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 4.2.4 Performance of Conditional Exhaustive Planning . . . . . . 51 5 Collaborative Multi-Agent System . . . . . . . . . . . . . . . . . . . 54 5.1 Ideal Communication . . . . . . . . . . . . . . . . . . . . . . . . . . 55 5.1.1 Information Exchange and Integration . . . . . . . . . . . . 55 5.1.2 Performance Measure . . . . . . . . . . . . . . . . . . . . . . 58 5.2 Random Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 5.3 Multiple Access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 5.4 Overall Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 6 Conclusions and Future Works . . . . . . . . . . . . . . . . . . . . . 70 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 | |
dc.language.iso | en | |
dc.title | 協作多智能系統在網路環境之應用 | zh_TW |
dc.title | Communication in Collaborative Multiagent Systems | en |
dc.type | Thesis | |
dc.date.schoolyear | 106-2 | |
dc.description.degree | 碩士 | |
dc.contributor.oralexamcommittee | 林茂昭,連紹宇,曾志成,鄧德雋 | |
dc.subject.keyword | 多智能系統,多代理人系統,增強式學習,機器學習,強化式學習, | zh_TW |
dc.subject.keyword | multi-agent systems,reinforcement learning,machine learning,Q-learning,TD learning, | en |
dc.relation.page | 72 | |
dc.identifier.doi | 10.6342/NTU201804013 | |
dc.rights.note | 有償授權 | |
dc.date.accepted | 2018-08-19 | |
dc.contributor.author-college | 電機資訊學院 | zh_TW |
dc.contributor.author-dept | 電信工程學研究所 | zh_TW |
顯示於系所單位: | 電信工程學研究所 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-107-1.pdf 目前未授權公開取用 | 3.96 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。