元學習於分散式連續控制機械手臂

Kuan-Ting Chen; 陳冠廷

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/71129

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	王勝德(Sheng-De Wang)
dc.contributor.author	Kuan-Ting Chen	en
dc.contributor.author	陳冠廷	zh_TW
dc.date.accessioned	2021-06-17T04:54:29Z	-
dc.date.available	2018-08-01
dc.date.copyright	2018-08-01
dc.date.issued	2018
dc.date.submitted	2018-07-30
dc.identifier.citation	[1] S. M. LaValle and J. J. Kuffner, “Rapidly-exploring random trees: progress and prospects,” Algorithmic and Computational Robotics: New Directions, pages 293--308. A K Peters, Wellesley, MA, 2001. [2] D. Dewey*,” Reinforcement learning and the reward engineer principle,” AAAI Spring Symposium Series, 2014. [3] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, et al, “Playing Atari with deep reinforcement learning,” arXiv preprint arXiv:1312.5602 , December 2013. [4] Stephen James, Edward Johns, “3D simulation for robot arm control with Deep Q-Learning,” arXiv preprint arXiv:1609.03759, September 2016. [5] T. P. Lillicrap, J. J. Hunt , A. Pritzel, N. Heess, T. Erez and Y. Tassa, et al, “Continuous control with deep reinforcement learning,” arXiv preprint arXiv:1509.02971, September 2015. [6] D. Silver, G. Lever, N. Heess, et al, “Deterministic Policy Gradient Algorithms,” Proceedings of the International Conference on Machine Learning, pp. 387-395, 2014 [7] V. R. Konda, J.. Tsitsiklis, “Actor-Critic Algorithms,” NIPS, 1999. [8] T. Schaul, J. Quan, I. Antonoglou and D. Silver,. “Prioritized experience replay,” arXiv preprint arXiv: 1511.05952, November 2015. [9] V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. P. Lillicrap, T. Harley, et al, “Asynchronous methods for deep reinforcement learning,” arXiv preprint arXiv:1602.01783, February 2016. [10] J. Wang, Z. K. Nelson, D. Tirumala, H. Soyer, J. Z. Leibo, R Meunos, et al, “Learning to reinforcement learn,” arXiv preprint arXiv:1611.05763. November 2016. [11] J. Chung, C. Gulcehre, K. Cho and Y. Bengio, “Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling,” arXiv preprint arXiv:1412.3555. December 2014. [12] E. Vorontsov, C. Trabelsi, S. Kadoury, C. Pal, “On orthogonality and learning recurrent networks with long term dependencies,” arXiv preprint arXiv:1702.00071. January 2017. [13] G.Hinton, “Neural Networks for Machine Learning-Overview of mini-batch gradient descent,” pp.26-30. Available: https://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf [14] V-REP Platform. Available: http://www.coppeliarobotics.com/index.html [15] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, et al, “Tensorflow: Large-scale machine learning on heterogeneous systems,” arXiv preprint arXiv:1603.04467, Mar 2016. [16] V. Nair, G. E. Hinton. “Rectified Linear Units Improve Restricted Boltzmann Machines,” ICML’10 pp.807-814. June 2010.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/71129	-
dc.description.abstract	深度強化學習已經提出了許多方法去控制機械手臂，如Deep Q-Learning (DQN)、及Policy Gradient (PG)。而Deterministic Deep Policy Gradient (DDPG)則是利用了確定性策略(Deterministic Policy)取代隨機性策略(Stochastic Policy)簡化訓練流程並讓結果更突出。強化學習利用環境給的回饋來訓練底層的神經網路進而完成任務，而一個適當的回饋可以讓神經網路得到更好的結果但這是需要專業的背景知識及不斷從錯誤中來得到適合的環境回饋函數。在本篇論文中，我們提出了一個基於DDPG的方法，並利用優先記憶回放(Prioritized Experience Replay)、非同步學習(Asynchronous Agent Learning)及元學習(Meta Learning)的概念來強化DDPG，其中元學習是利用多個分散式的學習者(Learners)，我們稱為工作者(Workers)去學習連續過去時間的環境狀態以及環境回饋。本篇論文是以模擬六軸(IRB140)及七軸(LBR iiwa 14 R820)的機械手臂的尖端碰觸到3D空間隨機出現的目標為實驗。實驗結果顯示我們提出的演算法比DDPG使用特別定義的回饋函數還要訓練得更快、任務成功率更高。	zh_TW
dc.description.abstract	Deep reinforcement learning has been proposed for training the control of robotic arms, such as Deep Q-Learning (DQN) and Policy Gradient (PG). The approach of Deterministic Deep Policy Gradient (DDPG) takes the advantage of deterministic policy instead of stochastic policy to further simplify the training process and improve the performance. Reinforcement Learning takes the reward from the environment and trains the underlying network to achieve the task. An appropriate reward will get better performance and faster training time, but it requires the domain knowledge and the method of trial and error to define the appropriate reward function. In this paper, we proposed a method that is based on DDPG and makes use of Prioritized Experience Replay (PER), Asynchronous Agent Learning and Meta Learning. The proposed Meta Learning approach uses multiple distributed learners, called workers, to learn from consecutive previous states and rewards. Simulations are done on 6-DOF (IRB140) and 7-DOF (LBR iiwa 14 R820) robotic arms to let the control agents learn to reach random targets in three dimension space. The experiments show that the algorithm we proposed is better than the algorithm using DDPG with specialized reward function on the task success rate and the training speed.	en
dc.description.provenance	Made available in DSpace on 2021-06-17T04:54:29Z (GMT). No. of bitstreams: 1 ntu-107-R05921037-1.pdf: 1721467 bytes, checksum: a7b4bf2812cbc161ad712a31c885ed54 (MD5) Previous issue date: 2018	en
dc.description.tableofcontents	口試委員會審定書 i 致謝 ii 摘要 iii ABSTRACT iv CONTENTS v LIST OF FIGURES viii LIST OF TABLES ix LIST OF ALGORITHMS x Chapter 1 Introduction 1 1.1 Motivation 1 1.2 Approach Overview 2 1.3 Contribution 3 1.4 Thesis Organization 3 Chapter 2 Related Works 4 2.1 Reinforcement Learning 4 2.2 Deep Deterministic Policy Gradient (DDPG) 5 2.3 Prioritized Experience Replay 6 2.4 Deep Meta Reinforcement Learning 6 Chapter 3 Approach 7 3.2 Algorithm 7 3.3 Detail of Implementation 12 3.3.1 Definition of States and Actions 12 3.3.2 Definition of Reward 12 3.3.3 Architecture of Neural Network 15 3.3.4 Architecture of Prioritized Experience Replay 16 Chapter 4 Experiment 17 4.2 Environment 18 4.2.1 Arm 1 – IRB140 19 4.2.2 Arm 2 – LBR iiwa 14 R820 20 4.3 Evaluation Metrics 21 4.3.1 Training speed 21 4.3.2 Task success rate 21 4.4 Experiment Result 22 4.4.1 Training speed 22 4.4.2 Task Success Rate 24 Chapter 5 Conclusion 26 REFERENCE 27
dc.language.iso	en
dc.subject	深度學習	zh_TW
dc.subject	3D機械手臂	zh_TW
dc.subject	元學習	zh_TW
dc.subject	非同步強化學習	zh_TW
dc.subject	Asynchronous Reinforcement Learning	en
dc.subject	Deep Learning	en
dc.subject	Meta Learning	en
dc.subject	3D Robotic Arm	en
dc.title	元學習於分散式連續控制機械手臂	zh_TW
dc.title	Distributed Continuous Control with Meta Learning On Robotic Arms	en
dc.type	Thesis
dc.date.schoolyear	106-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	雷欽隆(Chin-Laung Lei),王鈺強(Yu-Chiang Wang),曾俊元(C. Henry Tseng)
dc.subject.keyword	3D機械手臂,深度學習,非同步強化學習,元學習,	zh_TW
dc.subject.keyword	3D Robotic Arm,Deep Learning,Asynchronous Reinforcement Learning,Meta Learning,	en
dc.relation.page	28
dc.identifier.doi	10.6342/NTU201801006
dc.rights.note	有償授權
dc.date.accepted	2018-07-30
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	電機工程學研究所	zh_TW
顯示於系所單位：	電機工程學系

文件中的檔案：

檔案	大小	格式
ntu-107-1.pdf 未授權公開取用	1.68 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。