Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 電機工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/71129
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor王勝德(Sheng-De Wang)
dc.contributor.authorKuan-Ting Chenen
dc.contributor.author陳冠廷zh_TW
dc.date.accessioned2021-06-17T04:54:29Z-
dc.date.available2018-08-01
dc.date.copyright2018-08-01
dc.date.issued2018
dc.date.submitted2018-07-30
dc.identifier.citation[1] S. M. LaValle and J. J. Kuffner, “Rapidly-exploring random trees: progress and prospects,” Algorithmic and Computational Robotics: New Directions, pages 293--308. A K Peters, Wellesley, MA, 2001.
[2] D. Dewey*,” Reinforcement learning and the reward engineer principle,” AAAI Spring Symposium Series, 2014.
[3] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, et al, “Playing Atari with deep reinforcement learning,” arXiv preprint arXiv:1312.5602 , December 2013.
[4] Stephen James, Edward Johns, “3D simulation for robot arm control with Deep Q-Learning,” arXiv preprint arXiv:1609.03759, September 2016.
[5] T. P. Lillicrap, J. J. Hunt , A. Pritzel, N. Heess, T. Erez and Y. Tassa, et al, “Continuous control with deep reinforcement learning,” arXiv preprint arXiv:1509.02971, September 2015.
[6] D. Silver, G. Lever, N. Heess, et al, “Deterministic Policy Gradient Algorithms,” Proceedings of the International Conference on Machine Learning, pp. 387-395, 2014
[7] V. R. Konda, J.. Tsitsiklis, “Actor-Critic Algorithms,” NIPS, 1999.
[8] T. Schaul, J. Quan, I. Antonoglou and D. Silver,. “Prioritized experience replay,” arXiv preprint arXiv: 1511.05952, November 2015.
[9] V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. P. Lillicrap, T. Harley, et al, “Asynchronous methods for deep reinforcement learning,” arXiv preprint arXiv:1602.01783, February 2016.
[10] J. Wang, Z. K. Nelson, D. Tirumala, H. Soyer, J. Z. Leibo, R Meunos, et al, “Learning to reinforcement learn,” arXiv preprint arXiv:1611.05763. November 2016.
[11] J. Chung, C. Gulcehre, K. Cho and Y. Bengio, “Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling,” arXiv preprint arXiv:1412.3555. December 2014.
[12] E. Vorontsov, C. Trabelsi, S. Kadoury, C. Pal, “On orthogonality and learning recurrent networks with long term dependencies,” arXiv preprint arXiv:1702.00071. January 2017.
[13] G.Hinton, “Neural Networks for Machine Learning-Overview of mini-batch gradient descent,” pp.26-30.
Available: https://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf
[14] V-REP Platform.
Available: http://www.coppeliarobotics.com/index.html
[15] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, et al, “Tensorflow: Large-scale machine learning on heterogeneous systems,” arXiv preprint arXiv:1603.04467, Mar 2016.
[16] V. Nair, G. E. Hinton. “Rectified Linear Units Improve Restricted Boltzmann Machines,” ICML’10 pp.807-814. June 2010.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/71129-
dc.description.abstract深度強化學習已經提出了許多方法去控制機械手臂,如Deep Q-Learning (DQN)、及Policy Gradient (PG)。而Deterministic Deep Policy Gradient (DDPG)則是利用了確定性策略(Deterministic Policy)取代隨機性策略(Stochastic Policy)簡化訓練流程並讓結果更突出。強化學習利用環境給的回饋來訓練底層的神經網路進而完成任務,而一個適當的回饋可以讓神經網路得到更好的結果但這是需要專業的背景知識及不斷從錯誤中來得到適合的環境回饋函數。在本篇論文中,我們提出了一個基於DDPG的方法,並利用優先記憶回放(Prioritized Experience Replay)、非同步學習(Asynchronous Agent Learning)及元學習(Meta Learning)的概念來強化DDPG,其中元學習是利用多個分散式的學習者(Learners),我們稱為工作者(Workers)去學習連續過去時間的環境狀態以及環境回饋。本篇論文是以模擬六軸(IRB140)及七軸(LBR iiwa 14 R820)的機械手臂的尖端碰觸到3D空間隨機出現的目標為實驗。實驗結果顯示我們提出的演算法比DDPG使用特別定義的回饋函數還要訓練得更快、任務成功率更高。zh_TW
dc.description.abstractDeep reinforcement learning has been proposed for training the control of robotic arms, such as Deep Q-Learning (DQN) and Policy Gradient (PG). The approach of Deterministic Deep Policy Gradient (DDPG) takes the advantage of deterministic policy instead of stochastic policy to further simplify the training process and improve the performance. Reinforcement Learning takes the reward from the environment and trains the underlying network to achieve the task. An appropriate reward will get better performance and faster training time, but it requires the domain knowledge and the method of trial and error to define the appropriate reward function. In this paper, we proposed a method that is based on DDPG and makes use of Prioritized Experience Replay (PER), Asynchronous Agent Learning and Meta Learning. The proposed Meta Learning approach uses multiple distributed learners, called workers, to learn from consecutive previous states and rewards. Simulations are done on 6-DOF (IRB140) and 7-DOF (LBR iiwa 14 R820) robotic arms to let the control agents learn to reach random targets in three dimension space. The experiments show that the algorithm we proposed is better than the algorithm using DDPG with specialized reward function on the task success rate and the training speed.en
dc.description.provenanceMade available in DSpace on 2021-06-17T04:54:29Z (GMT). No. of bitstreams: 1
ntu-107-R05921037-1.pdf: 1721467 bytes, checksum: a7b4bf2812cbc161ad712a31c885ed54 (MD5)
Previous issue date: 2018
en
dc.description.tableofcontents口試委員會審定書 i
致謝 ii
摘要 iii
ABSTRACT iv
CONTENTS v
LIST OF FIGURES viii
LIST OF TABLES ix
LIST OF ALGORITHMS x
Chapter 1 Introduction 1
1.1 Motivation 1
1.2 Approach Overview 2
1.3 Contribution 3
1.4 Thesis Organization 3
Chapter 2 Related Works 4
2.1 Reinforcement Learning 4
2.2 Deep Deterministic Policy Gradient (DDPG) 5
2.3 Prioritized Experience Replay 6
2.4 Deep Meta Reinforcement Learning 6
Chapter 3 Approach 7
3.2 Algorithm 7
3.3 Detail of Implementation 12
3.3.1 Definition of States and Actions 12
3.3.2 Definition of Reward 12
3.3.3 Architecture of Neural Network 15
3.3.4 Architecture of Prioritized Experience Replay 16
Chapter 4 Experiment 17
4.2 Environment 18
4.2.1 Arm 1 – IRB140 19
4.2.2 Arm 2 – LBR iiwa 14 R820 20
4.3 Evaluation Metrics 21
4.3.1 Training speed 21
4.3.2 Task success rate 21
4.4 Experiment Result 22
4.4.1 Training speed 22
4.4.2 Task Success Rate 24
Chapter 5 Conclusion 26
REFERENCE 27
dc.language.isoen
dc.subject深度學習zh_TW
dc.subject3D機械手臂zh_TW
dc.subject元學習zh_TW
dc.subject非同步強化學習zh_TW
dc.subjectAsynchronous Reinforcement Learningen
dc.subjectDeep Learningen
dc.subjectMeta Learningen
dc.subject3D Robotic Armen
dc.title元學習於分散式連續控制機械手臂zh_TW
dc.titleDistributed Continuous Control with Meta Learning On Robotic Armsen
dc.typeThesis
dc.date.schoolyear106-2
dc.description.degree碩士
dc.contributor.oralexamcommittee雷欽隆(Chin-Laung Lei),王鈺強(Yu-Chiang Wang),曾俊元(C. Henry Tseng)
dc.subject.keyword3D機械手臂,深度學習,非同步強化學習,元學習,zh_TW
dc.subject.keyword3D Robotic Arm,Deep Learning,Asynchronous Reinforcement Learning,Meta Learning,en
dc.relation.page28
dc.identifier.doi10.6342/NTU201801006
dc.rights.note有償授權
dc.date.accepted2018-07-30
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept電機工程學研究所zh_TW
顯示於系所單位:電機工程學系

文件中的檔案:
檔案 大小格式 
ntu-107-1.pdf
  未授權公開取用
1.68 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved