利用其他遊戲的經驗來加速新遊戲的訓練

Ying Li; 李盈

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/1215

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	徐宏民(Winston Hsu)
dc.contributor.author	Ying Li	en
dc.contributor.author	李盈	zh_TW
dc.date.accessioned	2021-05-12T09:34:22Z	-
dc.date.available	2018-10-21
dc.date.available	2021-05-12T09:34:22Z	-
dc.date.copyright	2018-10-12
dc.date.issued	2018
dc.date.submitted	2018-10-11
dc.identifier.citation	[1] K. Arulkumaran, M. P. Deisenroth, M. Brundage, and A. A. Bharath. A brief survey of deep reinforcement learning. arXiv preprint arXiv:1708.05866, 2017. [2] Y. Aytar, T. Pfaff, D. Budden, T. L. Paine, Z. Wang, and N. de Freitas. Playing hard exploration games by watching youtube. CoRR, abs/1805.11592, 2018. [3] P.-L. Bacon and D. Precup. Learning with options: Just deliberate and relax. In NIPS Bounded Optimality and Rational Metareasoning Workshop, 2015. [4] M. Baroni, A. Joulin, A. Jabri, G. Kruszewski, A. Lazaridou, K. Simonic, and T. Mikolov. Commai: Evaluating the first steps towards a useful general ai. arXiv preprint arXiv:1701.08954, 2017. [5] G. Berseth, C. Xie, and P. Cernek. Multi-skilled motion control. 2018. [6] G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba. Openai gym, 2016. [7] P. Dhariwal, C. Hesse, O. Klimov, A. Nichol, M. Plappert, A. Radford, J. Schulman, S. Sidor, Y. Wu, and P. Zhokhov. Openai baselines. https://github.com/openai/baselines, 2017. [8] R. Dubey, P. Agrawal, D. Pathak, T. L. Griffiths, and A. A. Efros. Investigating human priors for playing video games. CoRR, abs/1802.10217, 2018. [9] K. Frans, J. Ho, X. Chen, P. Abbeel, and J. Schulman. Meta learning shared hierarchies. CoRR, abs/1710.09767, 2017. [10] P. Henderson, R. Islam, P. Bachman, J. Pineau, D. Precup, and D. Meger. Deep reinforcement learning that matters. CoRR, abs/1709.06560, 2017. [11] B. Hengst. Discovering hierarchy in reinforcement learning with hexq. In ICML, volume 2, pages 243–250, 2002. [12] G. Hinton, O. Vinyals, and J. Dean. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015. [13] M. Huber and R. A. Grupen. A feedback control structure for on-line learning tasks. Robotics and autonomous systems, 22(3-4):303–315, 1997. [14] A. Irpan. Deep reinforcement learning doesn’t work yet. https://www.alexirpan.com/2018/02/14/rl-hard.html, 2018. [15] J. Z. Kolter, P. Abbeel, and A. Y. Ng. Hierarchical apprenticeship learning with application to quadruped locomotion. In Advances in Neural Information Processing Systems, pages 769–776, 2008. [16] G. Konidaris and A. G. Barto. Building portable options: Skill transfer in reinforcement learning. In IJCAI, volume 7, pages 895–900, 2007. [17] T. D. Kulkarni, K. Narasimhan, A. Saeedi, and J. B. Tenenbaum. Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation. CoRR, abs/1604.06057, 2016. [18] T. Lai and H. Robbins. Asymptotically efficient adaptive allocation rules. Adv. Appl.Math., 6(1):4–22, Mar. 1985. [19] S. Li and C. Zhang. An optimal online method of selecting source policies for reinforcement learning. CoRR, abs/1709.08201, 2017. [20] R. Liaw, S. Krishnan, A. Garg, D. Crankshaw, J. E. Gonzalez, and K. Goldberg. Composing meta-policies for autonomous driving using hierarchical deep reinforcement learning. arXiv preprint arXiv:1711.01503, 2017. [21] T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra. Continuous control with deep reinforcement learning. CoRR, abs/1509.02971, 2015. [22] D. Lopez-Paz et al. Gradient episodic memory for continual learning. In Advances in Neural Information Processing Systems, pages 6467–6476, 2017. [23] M. C. Machado, M. G. Bellemare, E. Talvitie, J. Veness, M. J. Hausknecht, and M. Bowling. Revisiting the arcade learning environment: Evaluation protocols and open problems for general agents. CoRR, abs/1709.06009, 2017. [24] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, et al. Human-level control through deep reinforcement learning. Nature, 518(7540):529, 2015. [25] M. Nickel and D. Kiela. Poincaré embeddings for learning hierarchical representations. In Advances in neural information processing systems, pages 6338–6347, 2017. [26] R. E. Parr and S. Russell. Hierarchical control and learning for Markov decision processes. University of California, Berkeley Berkeley, CA, 1998. [27] M. Roderick, C. Grimm, and S. Tellex. Deep abstract q-networks. In AAMAS, 2018. [28] A. A. Rusu, S. G. Colmenarejo, Ç. Gülçehre, G. Desjardins, J. Kirkpatrick, R. Pascanu, V. Mnih, K. Kavukcuoglu, and R. Hadsell. Policy distillation. CoRR, abs/1511.06295, 2015. [29] T. Shu, C. Xiong, and R. Socher. Hierarchical and interpretable skill acquisition in multi-task reinforcement learning, 2017. [30] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach,K. Kavukcuoglu, T. Graepel, and D. Hassabis. Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587):484–489, Jan. 2016. [31] R. S. Sutton, D. Precup, and S. Singh. Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning. Artificial intelligence, 112(1-2):181–211, 1999. [32] M. E. Taylor and P. Stone. Transfer learning for reinforcement learning domains: A survey. Journal of Machine Learning Research, 10(Jul):1633–1685, 2009. [33] C. Tessler, S. Givony, T. Zahavy, D. J. Mankowitz, and S. Mannor. A deep hierarchical approach to lifelong learning in minecraft, 2016. [34] H. van Hasselt, A. Guez, and D. Silver. Deep reinforcement learning with double q-learning. CoRR, abs/1509.06461, 2015. [35] O. Vinyals, T. Ewalds, S. Bartunov, P. Georgiev, A. S. Vezhnevets, M. Yeo, A. Makhzani, H. Küttler, J. Agapiou, J. Schrittwieser, J. Quan, S. Gaffney, S. Petersen, K. Simonyan, T. Schaul, H. van Hasselt, D. Silver, T. P. Lillicrap, K. Calderone, P. Keet, A. Brunasso, D. Lawrence, A. Ekermo, J. Repp, and R. Tsing. Starcraft II: A new challenge for reinforcement learning. CoRR, abs/1708.04782, 2017. [36] H. Yin and S. J. Pan. Knowledge transfer for deep reinforcement learning with hierarchical experience replay. In AAAI, pages 1640–1646, 2017.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/handle/123456789/1215	-
dc.description.abstract	雖然深度強化學習（RL）的方法已經在各種視頻遊戲上達到了令人印象深刻的成果，但是RL的訓練仍然需要許多時間和計算資源，基於通過隨機探索環境，並從對應的稀疏獎勵中提取信息非常困難。已經有許多作品試圖通過利用以往經驗中的相關知識來加速強化學習過程。有些人認為這是一個轉移學習問題，試圖利用其他遊戲的相關知識。有些人認為這是一個多任務問題，試圖找到一些能夠推廣到新任務的表示方法。在這篇論文中，我們將agent與環境互動並收集經驗的過程，視作生成訓練數據的方式，我們需要讓訓練數據擁有更多的差異，以使訓練過程更有效。然後，我們嘗試將在其他遊戲環境中訓練的模型加載到我們想要訓練的新遊戲中，以便生成一些不同的訓練數據。結果表明，使用不同目標的其他遊戲而不是隨機採取行動的策略可以加快學習過程。	zh_TW
dc.description.abstract	Although the deep reinforcement learning (RL) approach has achieved impressive results in a variety of video games, training by RL still requires a lot of time and computational resources since it is difficult to extract information from sparse reward by random exploration with the environment. There have been many works attempts to accelerate the RL process by leveraging relevant knowledge from past experience. Some formulated this as a transfer learning problem, exploiting relevant knowledge from other games. Some formulated this as a multitasking problem, tried to find some useful representations which are capable of generalizing to new tasks. In this work, we treat the process the agent interacts with the environment and collects experience as a way to generate training data, which needs more variance to make the training process more efficient. We then try to load models trained on other game environments to the new game we want to train, in order to generate some different training data. The results show that use policy from other games with different goals instead of randomly taken action could speed up the learning process.	en
dc.description.provenance	Made available in DSpace on 2021-05-12T09:34:22Z (GMT). No. of bitstreams: 1 ntu-107-R05922016-1.pdf: 4744133 bytes, checksum: 7a06ca76b4ad13641343e084ac769088 (MD5) Previous issue date: 2018	en
dc.description.tableofcontents	誌謝 iii 摘要 iv Abstract v 1 Introduction 1 2 Related Work 3 2.1 Hierarchical Reinforcement Learning . . . . . . . . . . . . . . . . . . . . 3 2.2 Learning from Demonstration . . . . . . . . . . . . . . . . . . . . . . . 3 2.3 Transfer Learning and Multi-task learning . . . . . . . . . . . . . . . . . 4 3 Method 6 3.1 System Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3.2 Selecting existing models . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.3 Action Space Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 4 Experiments 13 4.1 Experimental Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 4.2 Successfully Speed Up Training . . . . . . . . . . . . . . . . . . . . . . 13 4.3 Simply Finetune From existing models . . . . . . . . . . . . . . . . . . . 14 4.4 Most Valuable Player . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 5 Conclusion 19 Bibliography 20
dc.language.iso	zh-TW
dc.subject	強化學習	zh_TW
dc.subject	Reinforcement Learning	en
dc.title	利用其他遊戲的經驗來加速新遊戲的訓練	zh_TW
dc.title	LEVERAGE EXPERIMENTS FROM OTHER TASKS TO SPEED UP TRAINING	en
dc.type	Thesis
dc.date.schoolyear	107-1
dc.description.degree	碩士
dc.contributor.oralexamcommittee	陳文進(WC Chen),葉梅珍(Mei-Chen Yeh),余能豪(Neng-Hao Yu)
dc.subject.keyword	強化學習,	zh_TW
dc.subject.keyword	Reinforcement Learning,	en
dc.relation.page	23
dc.identifier.doi	10.6342/NTU201804197
dc.rights.note	同意授權(全球公開)
dc.date.accepted	2018-10-12
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	資訊工程學研究所	zh_TW
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-107-1.pdf	4.63 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。