請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/1215
完整後設資料紀錄
DC 欄位 | 值 | 語言 |
---|---|---|
dc.contributor.advisor | 徐宏民(Winston Hsu) | |
dc.contributor.author | Ying Li | en |
dc.contributor.author | 李盈 | zh_TW |
dc.date.accessioned | 2021-05-12T09:34:22Z | - |
dc.date.available | 2018-10-21 | |
dc.date.available | 2021-05-12T09:34:22Z | - |
dc.date.copyright | 2018-10-12 | |
dc.date.issued | 2018 | |
dc.date.submitted | 2018-10-11 | |
dc.identifier.citation | [1] K. Arulkumaran, M. P. Deisenroth, M. Brundage, and A. A. Bharath. A brief survey of deep reinforcement learning. arXiv preprint arXiv:1708.05866, 2017.
[2] Y. Aytar, T. Pfaff, D. Budden, T. L. Paine, Z. Wang, and N. de Freitas. Playing hard exploration games by watching youtube. CoRR, abs/1805.11592, 2018. [3] P.-L. Bacon and D. Precup. Learning with options: Just deliberate and relax. In NIPS Bounded Optimality and Rational Metareasoning Workshop, 2015. [4] M. Baroni, A. Joulin, A. Jabri, G. Kruszewski, A. Lazaridou, K. Simonic, and T. Mikolov. Commai: Evaluating the first steps towards a useful general ai. arXiv preprint arXiv:1701.08954, 2017. [5] G. Berseth, C. Xie, and P. Cernek. Multi-skilled motion control. 2018. [6] G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba. Openai gym, 2016. [7] P. Dhariwal, C. Hesse, O. Klimov, A. Nichol, M. Plappert, A. Radford, J. Schulman, S. Sidor, Y. Wu, and P. Zhokhov. Openai baselines. https://github.com/openai/baselines, 2017. [8] R. Dubey, P. Agrawal, D. Pathak, T. L. Griffiths, and A. A. Efros. Investigating human priors for playing video games. CoRR, abs/1802.10217, 2018. [9] K. Frans, J. Ho, X. Chen, P. Abbeel, and J. Schulman. Meta learning shared hierarchies. CoRR, abs/1710.09767, 2017. [10] P. Henderson, R. Islam, P. Bachman, J. Pineau, D. Precup, and D. Meger. Deep reinforcement learning that matters. CoRR, abs/1709.06560, 2017. [11] B. Hengst. Discovering hierarchy in reinforcement learning with hexq. In ICML, volume 2, pages 243–250, 2002. [12] G. Hinton, O. Vinyals, and J. Dean. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015. [13] M. Huber and R. A. Grupen. A feedback control structure for on-line learning tasks. Robotics and autonomous systems, 22(3-4):303–315, 1997. [14] A. Irpan. Deep reinforcement learning doesn’t work yet. https://www.alexirpan.com/2018/02/14/rl-hard.html, 2018. [15] J. Z. Kolter, P. Abbeel, and A. Y. Ng. Hierarchical apprenticeship learning with application to quadruped locomotion. In Advances in Neural Information Processing Systems, pages 769–776, 2008. [16] G. Konidaris and A. G. Barto. Building portable options: Skill transfer in reinforcement learning. In IJCAI, volume 7, pages 895–900, 2007. [17] T. D. Kulkarni, K. Narasimhan, A. Saeedi, and J. B. Tenenbaum. Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation. CoRR, abs/1604.06057, 2016. [18] T. Lai and H. Robbins. Asymptotically efficient adaptive allocation rules. Adv. Appl.Math., 6(1):4–22, Mar. 1985. [19] S. Li and C. Zhang. An optimal online method of selecting source policies for reinforcement learning. CoRR, abs/1709.08201, 2017. [20] R. Liaw, S. Krishnan, A. Garg, D. Crankshaw, J. E. Gonzalez, and K. Goldberg. Composing meta-policies for autonomous driving using hierarchical deep reinforcement learning. arXiv preprint arXiv:1711.01503, 2017. [21] T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra. Continuous control with deep reinforcement learning. CoRR, abs/1509.02971, 2015. [22] D. Lopez-Paz et al. Gradient episodic memory for continual learning. In Advances in Neural Information Processing Systems, pages 6467–6476, 2017. [23] M. C. Machado, M. G. Bellemare, E. Talvitie, J. Veness, M. J. Hausknecht, and M. Bowling. Revisiting the arcade learning environment: Evaluation protocols and open problems for general agents. CoRR, abs/1709.06009, 2017. [24] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, et al. Human-level control through deep reinforcement learning. Nature, 518(7540):529, 2015. [25] M. Nickel and D. Kiela. Poincaré embeddings for learning hierarchical representations. In Advances in neural information processing systems, pages 6338–6347, 2017. [26] R. E. Parr and S. Russell. Hierarchical control and learning for Markov decision processes. University of California, Berkeley Berkeley, CA, 1998. [27] M. Roderick, C. Grimm, and S. Tellex. Deep abstract q-networks. In AAMAS, 2018. [28] A. A. Rusu, S. G. Colmenarejo, Ç. Gülçehre, G. Desjardins, J. Kirkpatrick, R. Pascanu, V. Mnih, K. Kavukcuoglu, and R. Hadsell. Policy distillation. CoRR, abs/1511.06295, 2015. [29] T. Shu, C. Xiong, and R. Socher. Hierarchical and interpretable skill acquisition in multi-task reinforcement learning, 2017. [30] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach,K. Kavukcuoglu, T. Graepel, and D. Hassabis. Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587):484–489, Jan. 2016. [31] R. S. Sutton, D. Precup, and S. Singh. Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning. Artificial intelligence, 112(1-2):181–211, 1999. [32] M. E. Taylor and P. Stone. Transfer learning for reinforcement learning domains: A survey. Journal of Machine Learning Research, 10(Jul):1633–1685, 2009. [33] C. Tessler, S. Givony, T. Zahavy, D. J. Mankowitz, and S. Mannor. A deep hierarchical approach to lifelong learning in minecraft, 2016. [34] H. van Hasselt, A. Guez, and D. Silver. Deep reinforcement learning with double q-learning. CoRR, abs/1509.06461, 2015. [35] O. Vinyals, T. Ewalds, S. Bartunov, P. Georgiev, A. S. Vezhnevets, M. Yeo, A. Makhzani, H. Küttler, J. Agapiou, J. Schrittwieser, J. Quan, S. Gaffney, S. Petersen, K. Simonyan, T. Schaul, H. van Hasselt, D. Silver, T. P. Lillicrap, K. Calderone, P. Keet, A. Brunasso, D. Lawrence, A. Ekermo, J. Repp, and R. Tsing. Starcraft II: A new challenge for reinforcement learning. CoRR, abs/1708.04782, 2017. [36] H. Yin and S. J. Pan. Knowledge transfer for deep reinforcement learning with hierarchical experience replay. In AAAI, pages 1640–1646, 2017. | |
dc.identifier.uri | http://tdr.lib.ntu.edu.tw/handle/123456789/1215 | - |
dc.description.abstract | 雖然深度強化學習(RL)的方法已經在各種視頻遊戲上達到了令人印象深刻的成果,但是RL的訓練仍然需要許多時間和計算資源,基於通過隨機探索環境,並從對應的稀疏獎勵中提取信息非常困難。
已經有許多作品試圖通過利用以往經驗中的相關知識來加速強化學習過程。有些人認為這是一個轉移學習問題,試圖利用其他遊戲的相關知識。有些人認為這是一個多任務問題,試圖找到一些能夠推廣到新任務的表示方法。在這篇論文中,我們將agent與環境互動並收集經驗的過程,視作生成訓練數據的方式,我們需要讓訓練數據擁有更多的差異,以使訓練過程更有效。然後,我們嘗試將在其他遊戲環境中訓練的模型加載到我們想要訓練的新遊戲中,以便生成一些不同的訓練數據。結果表明,使用不同目標的其他遊戲而不是隨機採取行動的策略可以加快學習過程。 | zh_TW |
dc.description.abstract | Although the deep reinforcement learning (RL) approach has achieved impressive results in a variety of video games, training by RL still requires a lot of time and computational resources since it is difficult to extract information from sparse reward by random exploration with the environment.
There have been many works attempts to accelerate the RL process by leveraging relevant knowledge from past experience. Some formulated this as a transfer learning problem, exploiting relevant knowledge from other games. Some formulated this as a multitasking problem, tried to find some useful representations which are capable of generalizing to new tasks. In this work, we treat the process the agent interacts with the environment and collects experience as a way to generate training data, which needs more variance to make the training process more efficient. We then try to load models trained on other game environments to the new game we want to train, in order to generate some different training data. The results show that use policy from other games with different goals instead of randomly taken action could speed up the learning process. | en |
dc.description.provenance | Made available in DSpace on 2021-05-12T09:34:22Z (GMT). No. of bitstreams: 1 ntu-107-R05922016-1.pdf: 4744133 bytes, checksum: 7a06ca76b4ad13641343e084ac769088 (MD5) Previous issue date: 2018 | en |
dc.description.tableofcontents | 誌謝 iii
摘要 iv Abstract v 1 Introduction 1 2 Related Work 3 2.1 Hierarchical Reinforcement Learning . . . . . . . . . . . . . . . . . . . . 3 2.2 Learning from Demonstration . . . . . . . . . . . . . . . . . . . . . . . 3 2.3 Transfer Learning and Multi-task learning . . . . . . . . . . . . . . . . . 4 3 Method 6 3.1 System Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3.2 Selecting existing models . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.3 Action Space Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 4 Experiments 13 4.1 Experimental Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 4.2 Successfully Speed Up Training . . . . . . . . . . . . . . . . . . . . . . 13 4.3 Simply Finetune From existing models . . . . . . . . . . . . . . . . . . . 14 4.4 Most Valuable Player . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 5 Conclusion 19 Bibliography 20 | |
dc.language.iso | zh-TW | |
dc.title | 利用其他遊戲的經驗來加速新遊戲的訓練 | zh_TW |
dc.title | LEVERAGE EXPERIMENTS FROM OTHER TASKS TO SPEED UP TRAINING | en |
dc.type | Thesis | |
dc.date.schoolyear | 107-1 | |
dc.description.degree | 碩士 | |
dc.contributor.oralexamcommittee | 陳文進(WC Chen),葉梅珍(Mei-Chen Yeh),余能豪(Neng-Hao Yu) | |
dc.subject.keyword | 強化學習, | zh_TW |
dc.subject.keyword | Reinforcement Learning, | en |
dc.relation.page | 23 | |
dc.identifier.doi | 10.6342/NTU201804197 | |
dc.rights.note | 同意授權(全球公開) | |
dc.date.accepted | 2018-10-12 | |
dc.contributor.author-college | 電機資訊學院 | zh_TW |
dc.contributor.author-dept | 資訊工程學研究所 | zh_TW |
顯示於系所單位: | 資訊工程學系 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-107-1.pdf | 4.63 MB | Adobe PDF | 檢視/開啟 |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。