利用對抗式目標與資料擴增於深度強化學習間的遷移

Shu-Hsuan Hsu; 許書軒

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/7569

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	陳炳宇(Bing-Yu Chen)
dc.contributor.author	Shu-Hsuan Hsu	en
dc.contributor.author	許書軒	zh_TW
dc.date.accessioned	2021-05-19T17:46:42Z	-
dc.date.available	2099-12-31
dc.date.available	2021-05-19T17:46:42Z	-
dc.date.copyright	2018-07-19
dc.date.issued	2018
dc.date.submitted	2018-07-09
dc.identifier.citation	[1] M. G. Bellemare, Y. Naddaf, J. Veness, and M. Bowling. The arcade learning environ-ment: An evaluation platform for general agents.Journal of Artificial Intelligence Research,47:253–279, jun 2013. [2] M. Fortunato, M. G. Azar, B. Piot, J. Menick, I. Osband, A. Graves, V. Mnih, R. Munos,D. Hassabis, O. Pietquin, C. Blundell, and S. Legg. Noisy networks for exploration.CoRR,abs/1706.10295, 2017. [3] Y. Ganin and V. Lempitsky. Unsupervised domain adaptation by backpropagation. InPro-ceedings of the 32Nd International Conference on International Conference on MachineLearning - Volume 37, ICML’15, pages 1180–1189. JMLR.org, 2015. [4] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville,and Y. Bengio. Generative adversarial nets. In Z. Ghahramani, M. Welling, C. Cortes,N. D. Lawrence, and K. Q. Weinberger, editors,Advances in Neural Information ProcessingSystems 27, pages 2672–2680. Curran Associates, Inc., 2014. [5] S. Levine, C. Finn, T. Darrell, and P. Abbeel. End-to-end training of deep visuomotor poli-cies.Journal of Machine Learning Research, 17(39):1–40, 2016. [6] V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. P. Lillicrap, T. Harley, D. Silver, andK. Kavukcuoglu.Asynchronous methods for deep reinforcement learning.CoRR,abs/1602.01783, 2016. [7] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Ried-miller. Playing atari with deep reinforcement learning.arXiv preprint arXiv:1312.5602,2013. [8] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves,31 [9] E. Parisotto, J. L. Ba, and R. Salakhutdinov. Actor-mimic: Deep multitask and transferreinforcement learning.arXiv preprint arXiv:1511.06342, 2015. [10] A. A. Rusu, N. C. Rabinowitz, G. Desjardins, H. Soyer, J. Kirkpatrick, K. Kavukcuoglu,R. Pascanu, and R. Hadsell. Progressive neural networks.arXiv preprint arXiv:1606.04671,2016. [11] S. Sharma and B. Ravindran. Online multi-task learning using active sampling.CoRR,abs/1702.06053, 2017. [12] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrit-twieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, et al. Mastering the game of go withdeep neural networks and tree search.nature, 529(7587):484–489, 2016. [13] B. Sun, J. Feng, and K. Saenko. Return of frustratingly easy domain adaptation.CoRR,abs/1511.05547, 2015. [14] B. Sun and K. Saenko. Deep coral: Correlation alignment for deep domain adaptation. InComputer Vision–ECCV 2016 Workshops, pages 443–450. Springer, 2016. [15] R. S. Sutton and A. G. Barto.Reinforcement learning: An introduction, volume 1. [16] E. Tzeng, J. Hoffman, K. Saenko, and T. Darrell. Adversarial discriminative domain adap-tation.arXiv preprint arXiv:1702.05464, 2017. [17] H. van Hasselt, A. Guez, and D. Silver. Deep reinforcement learning with double q-learning.CoRR, abs/1509.06461, 2015. [18] C. J. Watkins and P. Dayan. Q-learning.Machine learning, 8(3-4):279–292, 1992. [19] T. Xiao, H. Li, W. Ouyang, and X. Wang. Learning deep feature representations with domainguided dropout for person re-identification. InComputer Vision and Pattern Recognition(CVPR), 2016 IEEE Conference on, pages 1249–1258. IEEE, 2016. [20] H. Yin and S. J. Pan. Knowledge transfer for deep reinforcement learning with hierarchicalexperience replay. InAAAI, pages 1640–1646, 2017. [21] C. Zhang, O. Vinyals, R. Munos, and S. Bengio. A study on overfitting in deep reinforcementlearning.arXiv preprint arXiv:1804.06893, 2018.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/7569	-
dc.description.abstract	在最近幾年內，深度強化學習已經被充分證明可以用來解決高維度的複雜問題，深度強化學習的下一個重點將會是該如何讓神經網路學會不同環境間的核心概念、或是藉由已經學過的知識來加速學習新環境的速度。現今的強化學習訓練出來的模型大部分都有無法很好的處理新環境的缺陷，即便這個新環境與曾經學過的環境十分相近依舊無法有好的成績。我們提出的方法可以讓強化學習的模型從單一環境中學習到較為核心的特徵，並且可以透過半監督式學習來加速模型於新環境中的學習速度。最後我們在一個非常受歡迎的強化學習環境 —— Arcade Learning Environment (ALE) 檢驗我們提出的方法，並且可以發現我們的方法可以打敗常見的標準方法像是使用預訓練模型以及微調網路權重等。	zh_TW
dc.description.abstract	In the past few years, deep reinforcement learning has been proven that can solve problems which have complex states like video games or board games. The next step of intelligent agents would be able to generalize between tasks, using prior experience to pick up new skills more quickly. However, most reinforcement learning algorithms for now are often suffering from catastrophic forgetting even when facing a very similar target task. Our approach enables the agents to generalize knowledge from a single source task, and boost the learning progress with a semi-supervised learning method when facing a new task. We evaluate this approach on Atari games, a popular reinforcement learning benchmark, and show that it outperforms common baselines based on pre-training and fine-tuning.	en
dc.description.provenance	Made available in DSpace on 2021-05-19T17:46:42Z (GMT). No. of bitstreams: 1 ntu-107-R05725031-1.pdf: 1526822 bytes, checksum: a0923cf01cd28931d22b7485255e0931 (MD5) Previous issue date: 2018	en
dc.description.tableofcontents	Table of Contents 致謝 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .i 中文摘要 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .iii List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .vi List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix Chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1 Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4 2.2 Domain Adaption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5 2.3 Multi-task Agent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6 Chapter 3 Background. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7 3.1 Q-Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7 3.2 Deep Q Network . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . .8 Chapter 4 Approach. . . . . . . . . . . . 11 4.1Transfer with Adversarial Objective . . . . . . . . . . . . . . . . .11 4.2 Augmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .14 4.2.1 Detail Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .16 Chapter 5 Experiments. . . . . . . . . . . .20 5.1Pong Variants . . . . . . . . . . . . . . . . . . . . . . . .21 5.2 Cross Games Transfer . . . . . . . . . . . . . . . . . . . . . . . . . . . .24 Chapter 6 Limitation and Future Works. . . . . . . . . . . . . . . . . . . . . . . . .28 6.1 Target Task with Different Action Space . . . . . . . . . . . . . . . . . . . . . . 28 6.2 Augmentation Method Selection . . . . . . . . . . . . . . . . . . . . . . . . . .28 Chapter 7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .31
dc.language.iso	zh-TW
dc.title	利用對抗式目標與資料擴增於深度強化學習間的遷移	zh_TW
dc.title	Transferring Deep Reinforcement Learning with Adversarial Objective and Augmentation	en
dc.type	Thesis
dc.date.schoolyear	106-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	王鈺強(Yu-Chiang Wang),李宏毅(Hung-yi Lee)
dc.subject.keyword	機器學習,強化學習,領域適應,	zh_TW
dc.subject.keyword	Machine Learning,Reinforcement Learning,Domain Adaption,	en
dc.relation.page	32
dc.identifier.doi	10.6342/NTU201801392
dc.rights.note	同意授權(全球公開)
dc.date.accepted	2018-07-09
dc.contributor.author-college	管理學院	zh_TW
dc.contributor.author-dept	資訊管理學研究所	zh_TW
dc.date.embargo-lift	2099-12-31	-
顯示於系所單位：	資訊管理學系

文件中的檔案：

檔案	大小	格式
ntu-107-1.pdf 此日期後於網路公開 2099-12-31	1.49 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。