Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 管理學院
  3. 資訊管理學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/7569
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor陳炳宇(Bing-Yu Chen)
dc.contributor.authorShu-Hsuan Hsuen
dc.contributor.author許書軒zh_TW
dc.date.accessioned2021-05-19T17:46:42Z-
dc.date.available2099-12-31
dc.date.available2021-05-19T17:46:42Z-
dc.date.copyright2018-07-19
dc.date.issued2018
dc.date.submitted2018-07-09
dc.identifier.citation[1] M. G. Bellemare, Y. Naddaf, J. Veness, and M. Bowling. The arcade learning environ-ment: An evaluation platform for general agents.Journal of Artificial Intelligence Research,47:253–279, jun 2013.
[2] M. Fortunato, M. G. Azar, B. Piot, J. Menick, I. Osband, A. Graves, V. Mnih, R. Munos,D. Hassabis, O. Pietquin, C. Blundell, and S. Legg. Noisy networks for exploration.CoRR,abs/1706.10295, 2017.
[3] Y. Ganin and V. Lempitsky. Unsupervised domain adaptation by backpropagation. InPro-ceedings of the 32Nd International Conference on International Conference on MachineLearning - Volume 37, ICML’15, pages 1180–1189. JMLR.org, 2015.
[4] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville,and Y. Bengio. Generative adversarial nets. In Z. Ghahramani, M. Welling, C. Cortes,N. D. Lawrence, and K. Q. Weinberger, editors,Advances in Neural Information ProcessingSystems 27, pages 2672–2680. Curran Associates, Inc., 2014.
[5] S. Levine, C. Finn, T. Darrell, and P. Abbeel. End-to-end training of deep visuomotor poli-cies.Journal of Machine Learning Research, 17(39):1–40, 2016.
[6] V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. P. Lillicrap, T. Harley, D. Silver, andK. Kavukcuoglu.Asynchronous methods for deep reinforcement learning.CoRR,abs/1602.01783, 2016.
[7] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Ried-miller. Playing atari with deep reinforcement learning.arXiv preprint arXiv:1312.5602,2013.
[8] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves,31
[9] E. Parisotto, J. L. Ba, and R. Salakhutdinov. Actor-mimic: Deep multitask and transferreinforcement learning.arXiv preprint arXiv:1511.06342, 2015.
[10] A. A. Rusu, N. C. Rabinowitz, G. Desjardins, H. Soyer, J. Kirkpatrick, K. Kavukcuoglu,R. Pascanu, and R. Hadsell. Progressive neural networks.arXiv preprint arXiv:1606.04671,2016.
[11] S. Sharma and B. Ravindran. Online multi-task learning using active sampling.CoRR,abs/1702.06053, 2017.
[12] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrit-twieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, et al. Mastering the game of go withdeep neural networks and tree search.nature, 529(7587):484–489, 2016.
[13] B. Sun, J. Feng, and K. Saenko. Return of frustratingly easy domain adaptation.CoRR,abs/1511.05547, 2015.
[14] B. Sun and K. Saenko. Deep coral: Correlation alignment for deep domain adaptation. InComputer Vision–ECCV 2016 Workshops, pages 443–450. Springer, 2016.
[15] R. S. Sutton and A. G. Barto.Reinforcement learning: An introduction, volume 1.
[16] E. Tzeng, J. Hoffman, K. Saenko, and T. Darrell. Adversarial discriminative domain adap-tation.arXiv preprint arXiv:1702.05464, 2017.
[17] H. van Hasselt, A. Guez, and D. Silver. Deep reinforcement learning with double q-learning.CoRR, abs/1509.06461, 2015.
[18] C. J. Watkins and P. Dayan. Q-learning.Machine learning, 8(3-4):279–292, 1992.
[19] T. Xiao, H. Li, W. Ouyang, and X. Wang. Learning deep feature representations with domainguided dropout for person re-identification. InComputer Vision and Pattern Recognition(CVPR), 2016 IEEE Conference on, pages 1249–1258. IEEE, 2016.
[20] H. Yin and S. J. Pan. Knowledge transfer for deep reinforcement learning with hierarchicalexperience replay. InAAAI, pages 1640–1646, 2017.
[21] C. Zhang, O. Vinyals, R. Munos, and S. Bengio. A study on overfitting in deep reinforcementlearning.arXiv preprint arXiv:1804.06893, 2018.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/7569-
dc.description.abstract在最近幾年內,深度強化學習已經被充分證明可以用來解決高維度的複雜問題,深度強化學習的下一個重點將會是該如何讓神經網路學會不同環境間的核心概念、或是藉由已經學過的知識來加速學習新環境的速度。現今的強化學習訓練出來的模型大部分都有無法很好的處理新環境的缺陷,即便這個新環境與曾經學過的環境十分相近依舊無法有好的成績。我們提出的方法可以讓強化學習的模型從單一環境中學習到較為核心的特徵,並且可以透過半監督式學習來加速模型於新環境中的學習速度。最後我們在一個非常受歡迎的強化學習環境 —— Arcade Learning Environment (ALE) 檢驗我們提出的方法,並且可以發現我們的方法可以打敗常見的標準方法像是使用預訓練模型以及微調網路權重等。zh_TW
dc.description.abstractIn the past few years, deep reinforcement learning has been proven that can solve problems which have complex states like video games or board games. The next step of intelligent agents would be able to generalize between tasks, using prior experience to pick up new skills more quickly. However, most reinforcement learning algorithms for now are often suffering from catastrophic forgetting even when facing a very similar target task.
Our approach enables the agents to generalize knowledge from a single source task, and boost the learning progress with a semi-supervised learning method when facing a new task.
We evaluate this approach on Atari games, a popular reinforcement learning benchmark, and show that it outperforms common baselines based on pre-training and fine-tuning.
en
dc.description.provenanceMade available in DSpace on 2021-05-19T17:46:42Z (GMT). No. of bitstreams: 1
ntu-107-R05725031-1.pdf: 1526822 bytes, checksum: a0923cf01cd28931d22b7485255e0931 (MD5)
Previous issue date: 2018
en
dc.description.tableofcontentsTable of Contents
致謝 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .i
中文摘要 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .iii
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .vi
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
Chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2
Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1 Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4
2.2 Domain Adaption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5
2.3 Multi-task Agent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6
Chapter 3 Background. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7
3.1 Q-Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7
3.2 Deep Q Network . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . .8
Chapter 4 Approach. . . . . . . . . . . . 11
4.1Transfer with Adversarial Objective . . . . . . . . . . . . . . . . .11
4.2 Augmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .14
4.2.1 Detail Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .16
Chapter 5 Experiments. . . . . . . . . . . .20
5.1Pong Variants . . . . . . . . . . . . . . . . . . . . . . . .21
5.2 Cross Games Transfer . . . . . . . . . . . . . . . . . . . . . . . . . . . .24
Chapter 6 Limitation and Future Works. . . . . . . . . . . . . . . . . . . . . . . . .28
6.1 Target Task with Different Action Space . . . . . . . . . . . . . . . . . . . . . . 28
6.2 Augmentation Method Selection . . . . . . . . . . . . . . . . . . . . . . . . . .28
Chapter 7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .31
dc.language.isozh-TW
dc.title利用對抗式目標與資料擴增於深度強化學習間的遷移zh_TW
dc.titleTransferring Deep Reinforcement Learning with Adversarial Objective and Augmentationen
dc.typeThesis
dc.date.schoolyear106-2
dc.description.degree碩士
dc.contributor.oralexamcommittee王鈺強(Yu-Chiang Wang),李宏毅(Hung-yi Lee)
dc.subject.keyword機器學習,強化學習,領域適應,zh_TW
dc.subject.keywordMachine Learning,Reinforcement Learning,Domain Adaption,en
dc.relation.page32
dc.identifier.doi10.6342/NTU201801392
dc.rights.note同意授權(全球公開)
dc.date.accepted2018-07-09
dc.contributor.author-college管理學院zh_TW
dc.contributor.author-dept資訊管理學研究所zh_TW
dc.date.embargo-lift2099-12-31-
顯示於系所單位:資訊管理學系

文件中的檔案:
檔案 大小格式 
ntu-107-1.pdf
  此日期後於網路公開 2099-12-31
1.49 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved