請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/71129完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 王勝德(Sheng-De Wang) | |
| dc.contributor.author | Kuan-Ting Chen | en |
| dc.contributor.author | 陳冠廷 | zh_TW |
| dc.date.accessioned | 2021-06-17T04:54:29Z | - |
| dc.date.available | 2018-08-01 | |
| dc.date.copyright | 2018-08-01 | |
| dc.date.issued | 2018 | |
| dc.date.submitted | 2018-07-30 | |
| dc.identifier.citation | [1] S. M. LaValle and J. J. Kuffner, “Rapidly-exploring random trees: progress and prospects,” Algorithmic and Computational Robotics: New Directions, pages 293--308. A K Peters, Wellesley, MA, 2001.
[2] D. Dewey*,” Reinforcement learning and the reward engineer principle,” AAAI Spring Symposium Series, 2014. [3] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, et al, “Playing Atari with deep reinforcement learning,” arXiv preprint arXiv:1312.5602 , December 2013. [4] Stephen James, Edward Johns, “3D simulation for robot arm control with Deep Q-Learning,” arXiv preprint arXiv:1609.03759, September 2016. [5] T. P. Lillicrap, J. J. Hunt , A. Pritzel, N. Heess, T. Erez and Y. Tassa, et al, “Continuous control with deep reinforcement learning,” arXiv preprint arXiv:1509.02971, September 2015. [6] D. Silver, G. Lever, N. Heess, et al, “Deterministic Policy Gradient Algorithms,” Proceedings of the International Conference on Machine Learning, pp. 387-395, 2014 [7] V. R. Konda, J.. Tsitsiklis, “Actor-Critic Algorithms,” NIPS, 1999. [8] T. Schaul, J. Quan, I. Antonoglou and D. Silver,. “Prioritized experience replay,” arXiv preprint arXiv: 1511.05952, November 2015. [9] V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. P. Lillicrap, T. Harley, et al, “Asynchronous methods for deep reinforcement learning,” arXiv preprint arXiv:1602.01783, February 2016. [10] J. Wang, Z. K. Nelson, D. Tirumala, H. Soyer, J. Z. Leibo, R Meunos, et al, “Learning to reinforcement learn,” arXiv preprint arXiv:1611.05763. November 2016. [11] J. Chung, C. Gulcehre, K. Cho and Y. Bengio, “Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling,” arXiv preprint arXiv:1412.3555. December 2014. [12] E. Vorontsov, C. Trabelsi, S. Kadoury, C. Pal, “On orthogonality and learning recurrent networks with long term dependencies,” arXiv preprint arXiv:1702.00071. January 2017. [13] G.Hinton, “Neural Networks for Machine Learning-Overview of mini-batch gradient descent,” pp.26-30. Available: https://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf [14] V-REP Platform. Available: http://www.coppeliarobotics.com/index.html [15] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, et al, “Tensorflow: Large-scale machine learning on heterogeneous systems,” arXiv preprint arXiv:1603.04467, Mar 2016. [16] V. Nair, G. E. Hinton. “Rectified Linear Units Improve Restricted Boltzmann Machines,” ICML’10 pp.807-814. June 2010. | |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/71129 | - |
| dc.description.abstract | 深度強化學習已經提出了許多方法去控制機械手臂,如Deep Q-Learning (DQN)、及Policy Gradient (PG)。而Deterministic Deep Policy Gradient (DDPG)則是利用了確定性策略(Deterministic Policy)取代隨機性策略(Stochastic Policy)簡化訓練流程並讓結果更突出。強化學習利用環境給的回饋來訓練底層的神經網路進而完成任務,而一個適當的回饋可以讓神經網路得到更好的結果但這是需要專業的背景知識及不斷從錯誤中來得到適合的環境回饋函數。在本篇論文中,我們提出了一個基於DDPG的方法,並利用優先記憶回放(Prioritized Experience Replay)、非同步學習(Asynchronous Agent Learning)及元學習(Meta Learning)的概念來強化DDPG,其中元學習是利用多個分散式的學習者(Learners),我們稱為工作者(Workers)去學習連續過去時間的環境狀態以及環境回饋。本篇論文是以模擬六軸(IRB140)及七軸(LBR iiwa 14 R820)的機械手臂的尖端碰觸到3D空間隨機出現的目標為實驗。實驗結果顯示我們提出的演算法比DDPG使用特別定義的回饋函數還要訓練得更快、任務成功率更高。 | zh_TW |
| dc.description.abstract | Deep reinforcement learning has been proposed for training the control of robotic arms, such as Deep Q-Learning (DQN) and Policy Gradient (PG). The approach of Deterministic Deep Policy Gradient (DDPG) takes the advantage of deterministic policy instead of stochastic policy to further simplify the training process and improve the performance. Reinforcement Learning takes the reward from the environment and trains the underlying network to achieve the task. An appropriate reward will get better performance and faster training time, but it requires the domain knowledge and the method of trial and error to define the appropriate reward function. In this paper, we proposed a method that is based on DDPG and makes use of Prioritized Experience Replay (PER), Asynchronous Agent Learning and Meta Learning. The proposed Meta Learning approach uses multiple distributed learners, called workers, to learn from consecutive previous states and rewards. Simulations are done on 6-DOF (IRB140) and 7-DOF (LBR iiwa 14 R820) robotic arms to let the control agents learn to reach random targets in three dimension space. The experiments show that the algorithm we proposed is better than the algorithm using DDPG with specialized reward function on the task success rate and the training speed. | en |
| dc.description.provenance | Made available in DSpace on 2021-06-17T04:54:29Z (GMT). No. of bitstreams: 1 ntu-107-R05921037-1.pdf: 1721467 bytes, checksum: a7b4bf2812cbc161ad712a31c885ed54 (MD5) Previous issue date: 2018 | en |
| dc.description.tableofcontents | 口試委員會審定書 i
致謝 ii 摘要 iii ABSTRACT iv CONTENTS v LIST OF FIGURES viii LIST OF TABLES ix LIST OF ALGORITHMS x Chapter 1 Introduction 1 1.1 Motivation 1 1.2 Approach Overview 2 1.3 Contribution 3 1.4 Thesis Organization 3 Chapter 2 Related Works 4 2.1 Reinforcement Learning 4 2.2 Deep Deterministic Policy Gradient (DDPG) 5 2.3 Prioritized Experience Replay 6 2.4 Deep Meta Reinforcement Learning 6 Chapter 3 Approach 7 3.2 Algorithm 7 3.3 Detail of Implementation 12 3.3.1 Definition of States and Actions 12 3.3.2 Definition of Reward 12 3.3.3 Architecture of Neural Network 15 3.3.4 Architecture of Prioritized Experience Replay 16 Chapter 4 Experiment 17 4.2 Environment 18 4.2.1 Arm 1 – IRB140 19 4.2.2 Arm 2 – LBR iiwa 14 R820 20 4.3 Evaluation Metrics 21 4.3.1 Training speed 21 4.3.2 Task success rate 21 4.4 Experiment Result 22 4.4.1 Training speed 22 4.4.2 Task Success Rate 24 Chapter 5 Conclusion 26 REFERENCE 27 | |
| dc.language.iso | en | |
| dc.subject | 深度學習 | zh_TW |
| dc.subject | 3D機械手臂 | zh_TW |
| dc.subject | 元學習 | zh_TW |
| dc.subject | 非同步強化學習 | zh_TW |
| dc.subject | Asynchronous Reinforcement Learning | en |
| dc.subject | Deep Learning | en |
| dc.subject | Meta Learning | en |
| dc.subject | 3D Robotic Arm | en |
| dc.title | 元學習於分散式連續控制機械手臂 | zh_TW |
| dc.title | Distributed Continuous Control with Meta Learning On Robotic Arms | en |
| dc.type | Thesis | |
| dc.date.schoolyear | 106-2 | |
| dc.description.degree | 碩士 | |
| dc.contributor.oralexamcommittee | 雷欽隆(Chin-Laung Lei),王鈺強(Yu-Chiang Wang),曾俊元(C. Henry Tseng) | |
| dc.subject.keyword | 3D機械手臂,深度學習,非同步強化學習,元學習, | zh_TW |
| dc.subject.keyword | 3D Robotic Arm,Deep Learning,Asynchronous Reinforcement Learning,Meta Learning, | en |
| dc.relation.page | 28 | |
| dc.identifier.doi | 10.6342/NTU201801006 | |
| dc.rights.note | 有償授權 | |
| dc.date.accepted | 2018-07-30 | |
| dc.contributor.author-college | 電機資訊學院 | zh_TW |
| dc.contributor.author-dept | 電機工程學研究所 | zh_TW |
| 顯示於系所單位: | 電機工程學系 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-107-1.pdf 未授權公開取用 | 1.68 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
