Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/1215
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor徐宏民(Winston Hsu)
dc.contributor.authorYing Lien
dc.contributor.author李盈zh_TW
dc.date.accessioned2021-05-12T09:34:22Z-
dc.date.available2018-10-21
dc.date.available2021-05-12T09:34:22Z-
dc.date.copyright2018-10-12
dc.date.issued2018
dc.date.submitted2018-10-11
dc.identifier.citation[1] K. Arulkumaran, M. P. Deisenroth, M. Brundage, and A. A. Bharath. A brief survey of deep reinforcement learning. arXiv preprint arXiv:1708.05866, 2017.
[2] Y. Aytar, T. Pfaff, D. Budden, T. L. Paine, Z. Wang, and N. de Freitas. Playing hard exploration games by watching youtube. CoRR, abs/1805.11592, 2018.
[3] P.-L. Bacon and D. Precup. Learning with options: Just deliberate and relax. In NIPS Bounded Optimality and Rational Metareasoning Workshop, 2015.
[4] M. Baroni, A. Joulin, A. Jabri, G. Kruszewski, A. Lazaridou, K. Simonic, and T. Mikolov. Commai: Evaluating the first steps towards a useful general ai. arXiv preprint arXiv:1701.08954, 2017.
[5] G. Berseth, C. Xie, and P. Cernek. Multi-skilled motion control. 2018.
[6] G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba. Openai gym, 2016.
[7] P. Dhariwal, C. Hesse, O. Klimov, A. Nichol, M. Plappert, A. Radford, J. Schulman, S. Sidor, Y. Wu, and P. Zhokhov. Openai baselines. https://github.com/openai/baselines, 2017.
[8] R. Dubey, P. Agrawal, D. Pathak, T. L. Griffiths, and A. A. Efros. Investigating human priors for playing video games. CoRR, abs/1802.10217, 2018.
[9] K. Frans, J. Ho, X. Chen, P. Abbeel, and J. Schulman. Meta learning shared hierarchies. CoRR, abs/1710.09767, 2017.
[10] P. Henderson, R. Islam, P. Bachman, J. Pineau, D. Precup, and D. Meger. Deep reinforcement learning that matters. CoRR, abs/1709.06560, 2017.
[11] B. Hengst. Discovering hierarchy in reinforcement learning with hexq. In ICML, volume 2, pages 243–250, 2002.
[12] G. Hinton, O. Vinyals, and J. Dean. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015.
[13] M. Huber and R. A. Grupen. A feedback control structure for on-line learning tasks. Robotics and autonomous systems, 22(3-4):303–315, 1997.
[14] A. Irpan. Deep reinforcement learning doesn’t work yet. https://www.alexirpan.com/2018/02/14/rl-hard.html, 2018.
[15] J. Z. Kolter, P. Abbeel, and A. Y. Ng. Hierarchical apprenticeship learning with application to quadruped locomotion. In Advances in Neural Information Processing Systems, pages 769–776, 2008.
[16] G. Konidaris and A. G. Barto. Building portable options: Skill transfer in reinforcement learning. In IJCAI, volume 7, pages 895–900, 2007.
[17] T. D. Kulkarni, K. Narasimhan, A. Saeedi, and J. B. Tenenbaum. Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation. CoRR, abs/1604.06057, 2016.
[18] T. Lai and H. Robbins. Asymptotically efficient adaptive allocation rules. Adv. Appl.Math., 6(1):4–22, Mar. 1985.
[19] S. Li and C. Zhang. An optimal online method of selecting source policies for reinforcement learning. CoRR, abs/1709.08201, 2017.
[20] R. Liaw, S. Krishnan, A. Garg, D. Crankshaw, J. E. Gonzalez, and K. Goldberg. Composing meta-policies for autonomous driving using hierarchical deep reinforcement learning. arXiv preprint arXiv:1711.01503, 2017.
[21] T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra. Continuous control with deep reinforcement learning. CoRR, abs/1509.02971, 2015.
[22] D. Lopez-Paz et al. Gradient episodic memory for continual learning. In Advances in Neural Information Processing Systems, pages 6467–6476, 2017.
[23] M. C. Machado, M. G. Bellemare, E. Talvitie, J. Veness, M. J. Hausknecht, and M. Bowling. Revisiting the arcade learning environment: Evaluation protocols and open problems for general agents. CoRR, abs/1709.06009, 2017.
[24] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, et al. Human-level control through deep reinforcement learning. Nature, 518(7540):529, 2015.
[25] M. Nickel and D. Kiela. Poincaré embeddings for learning hierarchical representations. In Advances in neural information processing systems, pages 6338–6347, 2017.
[26] R. E. Parr and S. Russell. Hierarchical control and learning for Markov decision processes. University of California, Berkeley Berkeley, CA, 1998.
[27] M. Roderick, C. Grimm, and S. Tellex. Deep abstract q-networks. In AAMAS, 2018.
[28] A. A. Rusu, S. G. Colmenarejo, Ç. Gülçehre, G. Desjardins, J. Kirkpatrick, R. Pascanu, V. Mnih, K. Kavukcuoglu, and R. Hadsell. Policy distillation. CoRR, abs/1511.06295, 2015.
[29] T. Shu, C. Xiong, and R. Socher. Hierarchical and interpretable skill acquisition in multi-task reinforcement learning, 2017.
[30] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach,K. Kavukcuoglu, T. Graepel, and D. Hassabis. Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587):484–489, Jan. 2016.
[31] R. S. Sutton, D. Precup, and S. Singh. Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning. Artificial intelligence, 112(1-2):181–211, 1999.
[32] M. E. Taylor and P. Stone. Transfer learning for reinforcement learning domains: A survey. Journal of Machine Learning Research, 10(Jul):1633–1685, 2009.
[33] C. Tessler, S. Givony, T. Zahavy, D. J. Mankowitz, and S. Mannor. A deep hierarchical approach to lifelong learning in minecraft, 2016.
[34] H. van Hasselt, A. Guez, and D. Silver. Deep reinforcement learning with double q-learning. CoRR, abs/1509.06461, 2015.
[35] O. Vinyals, T. Ewalds, S. Bartunov, P. Georgiev, A. S. Vezhnevets, M. Yeo, A. Makhzani, H. Küttler, J. Agapiou, J. Schrittwieser, J. Quan, S. Gaffney, S. Petersen, K. Simonyan, T. Schaul, H. van Hasselt, D. Silver, T. P. Lillicrap, K. Calderone, P. Keet, A. Brunasso, D. Lawrence, A. Ekermo, J. Repp, and R. Tsing. Starcraft II: A new challenge for reinforcement learning. CoRR, abs/1708.04782, 2017.
[36] H. Yin and S. J. Pan. Knowledge transfer for deep reinforcement learning with hierarchical experience replay. In AAAI, pages 1640–1646, 2017.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/handle/123456789/1215-
dc.description.abstract雖然深度強化學習(RL)的方法已經在各種視頻遊戲上達到了令人印象深刻的成果,但是RL的訓練仍然需要許多時間和計算資源,基於通過隨機探索環境,並從對應的稀疏獎勵中提取信息非常困難。
已經有許多作品試圖通過利用以往經驗中的相關知識來加速強化學習過程。有些人認為這是一個轉移學習問題,試圖利用其他遊戲的相關知識。有些人認為這是一個多任務問題,試圖找到一些能夠推廣到新任務的表示方法。在這篇論文中,我們將agent與環境互動並收集經驗的過程,視作生成訓練數據的方式,我們需要讓訓練數據擁有更多的差異,以使訓練過程更有效。然後,我們嘗試將在其他遊戲環境中訓練的模型加載到我們想要訓練的新遊戲中,以便生成一些不同的訓練數據。結果表明,使用不同目標的其他遊戲而不是隨機採取行動的策略可以加快學習過程。
zh_TW
dc.description.abstractAlthough the deep reinforcement learning (RL) approach has achieved impressive results in a variety of video games, training by RL still requires a lot of time and computational resources since it is difficult to extract information from sparse reward by random exploration with the environment.
There have been many works attempts to accelerate the RL process by leveraging relevant knowledge from past experience. Some formulated this as a transfer learning problem, exploiting relevant knowledge from other games. Some formulated this as a multitasking problem, tried to find some useful representations which are capable of generalizing to new tasks. In this work, we treat the process the agent interacts with the environment and collects experience as a way to generate training data, which needs more variance to make the training process more efficient. We then try to load models trained on other game environments to the new game we want to train, in order to generate some different training data. The results show that use policy from other games with different goals instead of randomly taken action could speed up the learning process.
en
dc.description.provenanceMade available in DSpace on 2021-05-12T09:34:22Z (GMT). No. of bitstreams: 1
ntu-107-R05922016-1.pdf: 4744133 bytes, checksum: 7a06ca76b4ad13641343e084ac769088 (MD5)
Previous issue date: 2018
en
dc.description.tableofcontents誌謝 iii
摘要 iv
Abstract v
1 Introduction 1
2 Related Work 3
2.1 Hierarchical Reinforcement Learning . . . . . . . . . . . . . . . . . . . . 3
2.2 Learning from Demonstration . . . . . . . . . . . . . . . . . . . . . . . 3
2.3 Transfer Learning and Multi-task learning . . . . . . . . . . . . . . . . . 4
3 Method 6
3.1 System Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.2 Selecting existing models . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.3 Action Space Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4 Experiments 13
4.1 Experimental Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.2 Successfully Speed Up Training . . . . . . . . . . . . . . . . . . . . . . 13
4.3 Simply Finetune From existing models . . . . . . . . . . . . . . . . . . . 14
4.4 Most Valuable Player . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5 Conclusion 19
Bibliography 20
dc.language.isozh-TW
dc.subject強化學習zh_TW
dc.subjectReinforcement Learningen
dc.title利用其他遊戲的經驗來加速新遊戲的訓練zh_TW
dc.titleLEVERAGE EXPERIMENTS FROM OTHER TASKS TO SPEED UP TRAININGen
dc.typeThesis
dc.date.schoolyear107-1
dc.description.degree碩士
dc.contributor.oralexamcommittee陳文進(WC Chen),葉梅珍(Mei-Chen Yeh),余能豪(Neng-Hao Yu)
dc.subject.keyword強化學習,zh_TW
dc.subject.keywordReinforcement Learning,en
dc.relation.page23
dc.identifier.doi10.6342/NTU201804197
dc.rights.note同意授權(全球公開)
dc.date.accepted2018-10-12
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept資訊工程學研究所zh_TW
顯示於系所單位:資訊工程學系

文件中的檔案:
檔案 大小格式 
ntu-107-1.pdf4.63 MBAdobe PDF檢視/開啟
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved