以強化式學習達到機械手臂避障及能量速度優化之軌跡規劃

Yung-Hsiu Chen; 陳永修

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/72277

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	林沛群
dc.contributor.author	Yung-Hsiu Chen	en
dc.contributor.author	陳永修	zh_TW
dc.date.accessioned	2021-06-17T06:32:50Z	-
dc.date.available	2023-08-20
dc.date.copyright	2018-08-20
dc.date.issued	2018
dc.date.submitted	2018-08-16
dc.identifier.citation	[1] About Joseph Engelberger - Father of Robotics. Available: https://www.robotics.org/joseph-engelberger/about.cfm [2] KUKA Robot History. Available: https://www.robots.com/articles/kuka-robot-history [3] S. Singh and M.-C. Leu, 'Optimal trajectory generation for robotic manipulators using dynamic programming,' Journal of dynamic systems, measurement, and control, vol. 109, no. 2, pp. 88-96, 1987. [4] G. Field and Y. Stepanenko, 'Iterative dynamic programming: an approach to minimum energy trajectory planning for robotic manipulators,' in IEEE International Conference on Robotics and Automation (ICRA), 1996, vol. 3, pp. 2755-2760. [5] A. R. Hirakawa and A. Kawamura, 'Trajectory generation for redundant manipulators under optimization of consumed electrical energy,' in Industry Applications Conference, 1996, vol. 3, pp. 1626-1632. [6] A. R. Hirakawa and A. Kawamura, 'Trajectory planning of redundant manipulators for minimum energy consumption without matrix inversion,' in IEEE International Conference on Robotics and Automation (ICRA), 1997, vol. 3, pp. 2415-2420. [7] J. Gregory, A. Olivares, and E. Staffetti, 'Energy-optimal trajectory planning for robot manipulators with holonomic constraints,' Systems & Control Letters, vol. 61, no. 2, pp. 279-291, 2012. [8] C. Hansen, J. Öltjen, D. Meike, and T. Ortmaier, 'Enhanced approach for energy-efficient trajectory generation of industrial robots,' in IEEE International Conference on Automation Science and Engineering (CASE), 2012, pp. 1-7. [9] O. Wigstrom, B. Lennartson, A. Vergnano, and C. Breitholtz, 'High-level scheduling of energy optimal trajectories,' IEEE Transactions on Automation Science and Engineering, vol. 10, no. 1, pp. 57-64, 2013. [10] T. Arakawa, N. Kubota, and T. Fukuda, 'Virus-evolutionary genetic algorithm with subpopulations: application to trajectory generation of redundant manipulator through energy optimization,' in IEEE International Conference on Systems, Man, and Cybernetics, 1996, vol. 3, pp. 1930-1935. [11] G. Sahar and J. M. Hollerbach, 'Planning of minimum-time trajectories for robot arms,' The International journal of robotics research, vol. 5, no. 3, pp. 90-100, 1986. [12] A. Gasparetto and V. Zanotto, 'Optimal trajectory planning for industrial robots,' Advances in Engineering Software, vol. 41, no. 4, pp. 548-556, 2010. [13] F. Rubio, F. Valero, J. Sunyer, and J. Cuadrado, 'Optimal time trajectories for industrial robots with torque, power, jerk and energy consumed constraints,' Industrial Robot: An International Journal, vol. 39, no. 1, pp. 92-100, 2012. [14] M. Ghasemi, N. Kashiri, and M. Dardel, 'Time-optimal trajectory planning of robot manipulators in point-to-point motion using an indirect method,' Proceedings of the Institution of Mechanical Engineers, Part C: Journal of Mechanical Engineering Science, vol. 226, no. 2, pp. 473-484, 2012. [15] K. Suh and J. Hollerbach, 'Local versus global torque optimization of redundant manipulators,' in IEEE International Conference on Robotics and Automation (ICRA), 1987, vol. 4, pp. 619-624. [16] J. Hollerbach and K. Suh, 'Redundancy resolution of manipulators through torque optimization,' in IEEE International Conference on Robotics and Automation (ICRA), 1985, vol. 2, pp. 1016-1021. [17] F. Albert, S. Koh, C. Chen, S. Tiong, and S. Edwin, 'Optimizing joint angles of robotic manipulator using genetic algorithm,' in International Conference on Computer Engineering and Applications, 2011, vol. 2, pp. 134-139. [18] C. Park, 'Self-collision detection & avoidance algorithm for a robot manipulator,' International Journal of Engineering and Innovative Technology (IJEIT), vol. 5, no. 4, pp. 139-142, 2015. [19] I. Lee, K. K. Lee, O. Sim, K. S. Woo, C. Buyoun, and J.-H. Oh, 'Collision detection system for the practical use of the humanoid robot,' in IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids), 2015, pp. 972-976. [20] J. Kuffner, K. Nishiwaki, S. Kagami, Y. Kuniyoshi, M. Inaba, and H. Inoue, 'Self-collision detection and prevention for humanoid robots,' in IEEE International Conference on Robotics and Automation (ICRA), 2002, vol. 3, pp. 2265-2270. [21] 蔡謹容, '以基礎幾何模型搭配具主被動自由度和掌內壓力與近接感測之夾爪達到快速低計算成本之多樣化物體夾取,' 碩士, 機械工程學系, 國立臺灣大學, 2017. [22] D. Nguyen-Tuong and J. Peters, 'Model learning for robot control: a survey,' Cognitive processing, vol. 12, no. 4, pp. 319-340, 2011. [23] J. Kober, J. A. Bagnell, and J. Peters, 'Reinforcement learning in robotics: A survey,' The International Journal of Robotics Research, vol. 32, no. 11, pp. 1238-1274, 2013. [24] M. Moosavi, M. Eram, A. Khajeh, O. Mahmoudi, and F. Piltan, 'Design New Artificial Intelligence Base Modified PID Hybrid Controller for Highly Nonlinear System,' International Journal of Advanced Science and Technology, vol. 57, no. 5, pp. 45-62, 2013. [25] W. T. Miller, R. P. Hewes, F. H. Glanz, and L. G. Kraft, 'Real-time dynamic control of an industrial manipulator using a neural network-based learning controller,' IEEE Transactions on Robotics and Automation, vol. 6, no. 1, pp. 1-9, 1990. [26] T. Ozaki, T. Suzuki, T. Furuhashi, S. Okuma, and Y. Uchikawa, 'Trajectory control of robotic manipulators using neural networks,' IEEE Transactions on Industrial Electronics, vol. 38, no. 3, pp. 195-202, 1991. [27] A. Tayebi, 'Adaptive iterative learning control for robot manipulators,' Automatica, vol. 40, no. 7, pp. 1195-1203, 2004. [28] S. Števo, I. Sekaj, and M. Dekan, 'Optimization of robotic arm trajectory using genetic algorithm,' IFAC Proceedings Volumes, vol. 47, no. 3, pp. 1748-1753, 2014. [29] D. P. Garg and M. Kumar, 'Optimization techniques applied to multiple manipulators for path planning and torque minimization,' Engineering applications of artificial intelligence, vol. 15, no. 3-4, pp. 241-252, 2002. [30] F. J. Abu-Dakka, I. F. Assad, R. M. Alkhdour, and M. Abderahim, 'Statistical evaluation of an evolutionary algorithm for minimum time trajectory planning problem for industrial robots,' The International Journal of Advanced Manufacturing Technology, vol. 89, no. 1-4, pp. 389-406, 2017. [31] S. Imajo, M. Konishi, T. Nishi, and J. Imai, 'Application of a neural network to the generation of a robot arm trajectory,' Artificial Life and Robotics, vol. 9, no. 3, pp. 107-111, 2005. [32] L. Tian and C. Collins, 'An effective robot trajectory planning method using a genetic algorithm,' Mechatronics, vol. 14, no. 5, pp. 455-470, 2004. [33] R. Glasius, A. Komoda, and S. C. Gielen, 'Neural network dynamics for path planning and obstacle avoidance,' Neural Networks, vol. 8, no. 1, pp. 125-133, 1995. [34] P. Martı́n and J. d. R. Millán, 'Robot arm reaching through neural inversions and reinforcement learning,' Robotics and Autonomous Systems, vol. 31, no. 4, pp. 227-246, 2000. [35] F. Stulp, E. Theodorou, M. Kalakrishnan, P. Pastor, L. Righetti, and S. Schaal, 'Learning motion primitive goals for robust manipulation,' in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2011, pp. 325-331. [36] F. Stulp, E. A. Theodorou, and S. Schaal, 'Reinforcement learning with sequences of motion primitives for robust manipulation,' IEEE Transactions on robotics, vol. 28, no. 6, pp. 1360-1370, 2012. [37] S. Levine, P. Pastor, A. Krizhevsky, J. Ibarz, and D. Quillen, 'Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection,' The International Journal of Robotics Research, 2016. [38] L. Pinto and A. Gupta, 'Supersizing self-supervision: Learning to grasp from 50k tries and 700 robot hours,' in IEEE International Conference on Robotics and Automation (ICRA), 2016, pp. 3406-3413. [39] J. Tan et al., 'Sim-to-Real: Learning Agile Locomotion For Quadruped Robots,' arXiv preprint arXiv:1804.10332, 2018. [40] J. J. Craig, Introduction to robotics: mechanics and control, Third Edition ed. Pearson Prentice, Hall Upper Saddle River, 2005. [41] 林沛群. 機器人學. Available: https://zh-tw.coursera.org/learn/robotics1 [42] Techman Robot. Available: http://tm-robot.com/tw/www/TM5.php [43] S. Gottschalk, M. C. Lin, and D. Manocha, 'OBBTree: A hierarchical structure for rapid interference detection,' in Proceedings of the 23rd annual conference on Computer graphics and interactive techniques, 1996, pp. 171-180. [44] N. Perrin, O. Stasse, F. Lamiraux, Y. J. Kim, and D. Manocha, 'Real-time footstep planning for humanoid robots among 3D obstacles using a hybrid bounding box,' in IEEE International Conference on Robotics and Automation (ICRA), 2012, pp. 977-982. [45] S. Gottschalk, 'Separating axis theorem,' Technical Report TR96-024, Department of Computer Science, UNC Chapel Hill1996. [46] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, 'Proximal policy optimization algorithms,' arXiv preprint arXiv:1707.06347, 2017. [47] J. Schulman, S. Levine, P. Abbeel, M. Jordan, and P. Moritz, 'Trust region policy optimization,' in International Conference on Machine Learning, 2015, pp. 1889-1897. [48] N. Heess et al., 'Emergence of locomotion behaviours in rich environments,' arXiv preprint arXiv:1707.02286, 2017. [49] Simple Reinforcement Learning Tutorials. Available: https://github.com/MorvanZhou/Reinforcement-learning-with-tensorflow [50] S. Liang and R. Srikant, 'Why deep neural networks for function approximation?,' arXiv preprint arXiv:1610.04161, 2016. [51] R. Selmic and F. L. Lewis, 'Neural network approximation of piecewise continuous functions: application to friction compensation,' in IEEE International Symposium on Intelligent Control, 1997, pp. 227-232. [52] Z. Zainuddin and O. Pauline, 'Function approximation using artificial neural networks,' WSEAS Transactions on Mathematics, vol. 7, no. 6, pp. 333-338, 2008. [53] Keras. Available: https://keras.io/ [54] TensorFlow. Available: https://www.tensorflow.org/
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/72277	-
dc.description.abstract	近年來工業自動化逐漸成為發展趨勢，機器人相關應用也日益增加，工業機器人能取代人力，在工廠產線上具有更高的效率，而隨著人工智慧等演算法的技術發展，機器人在性能與控制策略上皆能有所提升，能較好的應付未知或是較複雜的工作環境。本論文著重於使用強化學習方式優化軌跡，並以機器學習的方法補償動力學模型。在軌跡優化部分，由於本論文研究對象為市售的工業機械手臂，並無開放底層控制器直接控制各軸馬達，故軌跡規劃時必須按照現有上位控制器具備的模式給予指令，而目前給予運動軌跡的方式乃是給定各個via point的位置與速度，透過三次式連接各點形成軌跡，在此限制條件下提升機械手臂的運動效率，主要又可分為能量和時間兩部分。本研究中以Bi-RRT預先生成軌跡決定起始的via point，再透過強化學習的方式，調整前述via point的位置或時間，過程中得到最高分的輸出即為最佳化的結果，在強化學習的獎勵設計同時考慮避障的問題或力矩的限制條件。而經實驗驗證，在模擬時手臂動力學模型和實際的動態仍存有些許誤差，則真實世界中和模擬上得到的最佳化軌跡表現並不會完全一致，故為了確保模擬時的準確性，還需進一步補償模擬和現實之間的差異。本研究中即採用機器學習的方法，將手臂的運動狀態與模擬、真實力矩的誤差作為標籤資料，透過類神經網路逼近其非線性模型的對應關係。	zh_TW
dc.description.abstract	Industrial automation has become an important issue in recent years, more and more manipulators are applied to manufactories. Instead of manpower, industries prefer automated machine which has more advantages. For example, industrial automation can provide a high productivity, allowing the company to run a manufacturing plant for 24 hours every day. Besides, it has high safety when the plant is in an extreme environment. While artificial intelligence becomes a growing trend, some technologies are applied to robotics to improve the control policies or performance in unknown environment. This research focuses on trajectory optimization and dynamic model compensation. Without directly controlling each joint motor, trajectory command is sent according to the given form, which contains the via points of the trajectory. This research deals with energy and time optimization by using reinforcement learning. Designing the actor-critic agent and reward contains energy/time consumption and obstacle/torque constraints. The actions will change the position and speed of via points, the aim is to get the action of the highest reward. In the experimental stage, the dynamic model exists some differences between simulation and reality. To ensure the availability of simulation, this research uses neural networks to compensate the model.	en
dc.description.provenance	Made available in DSpace on 2021-06-17T06:32:50Z (GMT). No. of bitstreams: 1 ntu-107-R05522812-1.pdf: 8093422 bytes, checksum: c15a6682f401cbe5d5cab23c120d9b17 (MD5) Previous issue date: 2018	en
dc.description.tableofcontents	誌謝 I 摘要 II Abstract III 目錄 IV 圖目錄 VII 表目錄 XI 第一章緒論 1 1.1 前言 1 1.2 研究動機 2 1.3 文獻回顧 2 1.4 貢獻 8 1.5 論文架構 9 第二章六軸機械手臂模型建立與碰撞偵測 10 2.1 運動學模型 10 2.1.1 D-H model 10 2.1.2順向運動學 12 2.1.3逆向運動學 13 2.2 動力學模型 17 2.2.1 簡化動力學模型 17 2.2.2 馬達傳動系統之影響 21 2.3 碰撞偵測演算法 23 2.3.1 碰撞自身機構 28 2.3.2 碰撞外界障礙物 30 2.4 小結 32 第三章以強化式學習進行軌跡能量最佳化 33 3.1 演算法架構 34 3.1.1 三次式軌跡 34 3.1.2 強化學習架構 34 3.2 三維空間軌跡之能量優化 37 3.2.1 無障礙物之軌跡優化 37 3.2.1.1 基於全域搜索之強化學習架構 39 3.2.1.2 基於參考路徑進行調整之強化學習架構 46 3.2.2 有障礙物之軌跡優化 56 3.2.2.1 有障礙物之軌跡優化範例 58 3.2.2.2 Bi-RRT預先生成軌跡再優化 63 第四章以強化式學習進行軌跡時間最佳化 69 4.1 演算法架構 69 4.2 三維空間軌跡之時間優化 70 4.2.1改變via point時間點 70 4.2.2 能量優化後軌跡進行時間優化 76 第五章實驗結果討論與分析 79 5.1 前言 79 5.2 實驗結果與分析 79 5.2.1 動力學模型 79 5.2.1.1 馬達傳動系統參數的線性回歸 79 5.2.1.2 加入Neural Network的補償 83 5.2.2 軌跡能量優化 94 5.2.2.1 測試人員使用教導模式規劃軌跡 95 5.2.2.2 Bi-RRT預先生成軌跡再進行能量優化 102 5.2.2.3 對教導模式之軌跡進行能量優化 104 5.2.2.4 使用機械手臂實際回饋進行能量優化 106 5.2.3 軌跡時間優化 107 5.2.3.1 對能量優化後的軌跡進行時間優化 107 5.2.3.2 對教導模式之軌跡進行時間優化 110 第六章結論與未來展望 112 6.1 結論 112 6.2 未來展望 113 參考文獻 114
dc.language.iso	zh-TW
dc.subject	類神經網路	zh_TW
dc.subject	機器學習	zh_TW
dc.subject	強化學習	zh_TW
dc.subject	動力學模型補償	zh_TW
dc.subject	軌跡優化	zh_TW
dc.subject	避障	zh_TW
dc.subject	trajectory optimization	en
dc.subject	machine learning	en
dc.subject	neural network	en
dc.subject	reinforcement learning	en
dc.subject	obstacle avoidance	en
dc.subject	dynamic model compensation	en
dc.title	以強化式學習達到機械手臂避障及能量速度優化之軌跡規劃	zh_TW
dc.title	Manipulator Trajectory Planning for Obstacle Avoidance and Energy/speed Optimization Based on Reinforcement Learning	en
dc.type	Thesis
dc.date.schoolyear	106-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	連豊力,陳中明,顏炳郎
dc.subject.keyword	類神經網路,機器學習,強化學習,動力學模型補償,軌跡優化,避障,	zh_TW
dc.subject.keyword	neural network,machine learning,reinforcement learning,dynamic model compensation,trajectory optimization,obstacle avoidance,	en
dc.relation.page	118
dc.identifier.doi	10.6342/NTU201803627
dc.rights.note	有償授權
dc.date.accepted	2018-08-16
dc.contributor.author-college	工學院	zh_TW
dc.contributor.author-dept	機械工程學研究所	zh_TW
顯示於系所單位：	機械工程學系

文件中的檔案：

檔案	大小	格式
ntu-107-1.pdf 未授權公開取用	7.9 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。