自示範中學習結合與行動相關聯之資料調整法生成適用於動態場景之反應式行動法則

Chung-Che Yu; 余宗哲

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/51601

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	周承復(Cheng-Fu Chou),王傑智(Chieh-Chih Wang)
dc.contributor.author	Chung-Che Yu	en
dc.contributor.author	余宗哲	zh_TW
dc.date.accessioned	2021-06-15T13:40:41Z	-
dc.date.available	2016-03-08
dc.date.copyright	2016-03-08
dc.date.issued	2016
dc.date.submitted	2016-01-10
dc.identifier.citation	Abbeel, P., Coates, A., & Ng, A. Y. (2010). Autonomous helicopter aerobatics through apprenticeship learning. The International Journal of Robotics Research, 29(13), 1608–1639. Abbeel, P. & Ng, A. Y. (2004). Apprenticeship learning via inverse reinforcement learning. In Proceedings of the twenty-first international conference on Machine learning, (pp.˜1). ACM. Althoff, D., Kuffner, J. J., Wollherr, D., & Buss, M. (2012). Safety assessment of robot trajectories for navigation in uncertain and dynamic environments. Autonomous Robots, 32(3), 285–302. Argall, B., Browning, B., & Veloso, M. (2007). Learning by demonstration with critique from a human teacher. In Proceedings of the ACM/IEEE international conference on Human-robot interaction, (pp. 57–64). ACM. Argall, B. D., Chernova, S., Veloso, M., & Browning, B. (2009). A survey of robot learning from demonstration. Robotics and autonomous systems, 57(5), 469–483. Atkeson, C. G.&Schaal, S. (1997). Robot learning from demonstration. In ICML, volume 97, (pp. 12–20). Borenstein, J. & Koren, Y. (1991). The vector field histogram-fast obstacle avoidance for mobile robots. Robotics and Automation, IEEE Transactions on, 7(3), 278–288. Calinon, S. & Billard, A. (2007). Active teaching in robot programming by demonstration. In Robot and Human interactive Communication, 2007. RO-MAN 2007. The 16th IEEE International Symposium on, (pp. 702–707). IEEE. Calinon, S. & Billard, A. (2009). Statistical learning by imitation of competing constraints in joint space and task space. Advanced Robotics, 23(15), 2059–2076. Coates, A., Abbeel, P., & Ng, A. Y. (2008). Learning for control from multiple demonstrations. In Proceedings of the 25th international conference on Machine learning, (pp. 144–151). ACM. Fox, D., Burgard,W., & Thrun, S. (1997). The dynamic window approach to collision avoidance. IEEE Robotics and Automation Magazine, 4(1), 23–33. Freund, Y. & Schapire, R. E. (1995). A desicion-theoretic generalization of on-line learning and an application to boosting. In Computational learning theory, (pp. 23–37)., London, UK. Springer. Gonz´alez-Fierro, M., Hern´andez-Garc´ıa, D., Nanayakkara, T., & Balaguer, C. (2015). Behavior sequencing based on demonstrations: a case of a humanoid opening a door while walking. Advanced Robotics, 29(5), 315–329. Hamner, B., Singh, S., & Scherer, S. (2006). Learning obstacle avoidance parameters from operator behavior. Journal of Field Robotics, 23(1112), 1037–1058. Henry, P., Vollmer, C., Ferris, B., & Fox, D. (2010). Learning to navigate through crowded environments. In 2010 IEEE International Conference on Robotics and Automation, (pp. 981–986). Khatib, O. (1986). Real-time obstacle avoidance for manipulators and mobile robots. The international journal of robotics research, 5(1), 90–98. Kormushev, P., Calinon, S., & Caldwell, D. G. (2011). Imitation learning of positional and force skills demonstrated via kinesthetic teaching and haptic input. Advanced Robotics, 25(5), 581–603. Levine, S., Popovic, Z., & Koltun, V. (2010). Feature construction for inverse reinforcement learning. In Advances in Neural Information Processing Systems, (pp. 1342–1350). Levine, S., Popovic, Z., & Koltun, V. (2011). Nonlinear inverse reinforcement learning with gaussian processes. In Advances in Neural Information Processing Systems, (pp. 19–27). Mehrotra, K., Mohan, C. K., & Ranka, S. (1997). Elements of artificial neural networks. MIT press. Minguez, J. & Montano, L. (2000). Nearness diagram navigation (nd): A new real time collision avoidance approach. In Intelligent Robots and Systems, 2000.(IROS 2000). Proceedings. 2000 IEEE/RSJ International Conference on, volume 3, (pp. 2094–2100). IEEE. Minguez, J. & Montano, L. (2004). Nearness diagram (nd) navigation: collision avoidance in troublesome scenarios. Robotics and Automation, IEEE Transactions on, 20(1), 45–59. Minguez, J. & Montano, L. (2005). Sensor-based robot motion generation in unknown, dynamic and troublesome scenarios. Robotics and Autonomous Systems, 52(4), 290–311. Minguez, J., Osuna, J., & Montano, L. (2004). A 'divide and conquer' strategy based on situations to achieve reactive collision avoidance in troublesome scenarios. In 2004 IEEE International Conference on Robotics and Automation, volume 4, (pp. 3855–3862). Ng, A. Y. & Russell, S. J. (2000). Algorithms for inverse reinforcement learning. In ICML, (pp. 663–670). Quinlan, S. & Khatib, O. (1993). Elastic bands: Connecting path planning and control. In 1993 IEEE International Conference on Robotics and Automation, volume 2, (pp. 802–807). Ratliff, N., Bagnell, J. A., & Srinivasa, S. S. (2007). Imitation learning for locomotion and manipulation. In 2007 7th IEEE-RAS International Conference on Humanoid Robots, (pp. 392–397). Ratliff, N., Bradley, D., Bagnell, J. A., & Chestnutt, J. (2007). Boosting structured prediction for imitation learning. In Advances in Neural Information Processing Systems, (pp. 1153–1160). Ratliff, N. D., Bagnell, J. A., & Zinkevich, M. A. (2006). Maximum margin planning. In Proceedings of the 23rd international conference on Machine learning, (pp. 729–736). ACM. Ratliff, N. D., Silver, D., & Bagnell, J. A. (2009). Learning to search: Functional gradient techniques for imitation learning. Autonomous Robots, 27(1), 25–53. Ross, S. & Bagnell, J. A. (2010). Efficient reductions for imitation learning. In Proceedings of the 13th International Conference on Artificial Intelligence and Statistics, (pp. 661–668). Ross, S., Gordon, G. J., & Bagnell, J. A. (2011). A reduction of imitation learning and structured prediction to no-regret online learning. In Proceedings of the 14th International Conference on Artifical Intelligence and Statistics, (pp. 627–635). Ross, S., Melik-Barkhudarov, N., Shankar, K. S., Wendel, A., Dey, D., Bagnell, J. A., & Hebert, M. (2013). Learning monocular reactive uav control in cluttered natural environments. In 2013 IEEE International Conference on Robotics and Automation, (pp. 1765–1772). Settles, B. (2010). Active learning literature survey. University of Wisconsin, Madison, 52, 55–66. Silver, D., Bagnell, J. A., & Stentz, A. (2008). High performance outdoor navigation from overhead data using imitation learning. Robotics: Science and Systems IV, Zurich, Switzerland. Silver, D., Bagnell, J. A., & Stentz, A. (2010). Learning from demonstration for autonomous navigation in complex unstructured terrain. The International Journal of Robotics Research, 29(12), 1565–1592. Silver, D., Bagnell, J. A., & Stentz, A. (2012). Active learning from demonstration for robust autonomous navigation. In 2012 IEEE International Conference on Robotics and Automation, (pp. 200–207). Syed, U. & Schapire, R. E. (2007). A game-theoretic approach to apprenticeship learning. In Advances in Neural Information Processing Systems, (pp. 1449–1456). Trautman, P. & Krause, A. (2010). Unfreezing the robot: Navigation in dense, interacting crowds. In Intelligent Robots and Systems (IROS), 2010 IEEE/RSJ International Conference on, (pp. 797–803). IEEE. Yu, C.-C., Chen, W.-C., Wang, C.-C., & Hu, J.-S. (2009). Self-tuning nearness diagram navigation. In International Conference on Service and Interactive Robotics (SIRCon), Taipei, Taiwan. Yu, C.-C. & Wang, C.-C. (2011). Learning collision-free navigation from demonstration without global information. In International Conference on Service and Interactive Robotics (SIRCon), (pp. 232–237)., Taichung, Taiwan. Yu, C.-C. & Wang, C.-C. (2012). Collision-and freezing-free navigation in dynamic environments using learning to search. In Technologies and Applications of Artificial Intelligence (TAAI), 2012 Conference on, (pp. 151–156)., Tainan, Taiwan. IEEE. Yu, C.-C. & Wang, C.-C. (2013). Interactive learning from demonstration with a multilevel mechanism for collision-free navigation in dynamic environments. In Technologies and Applications of Artificial Intelligence (TAAI), 2013 Conference on, (pp. 240–245). IEEE. Yu, C.-C. & Wang, C.-C. (2014). Multi-step learning to search for dynamic environment navigation. Journal of Information Science and Engineering, 30, 637–652. Ziebart, B. D., Maas, A. L., Bagnell, J. A., & Dey, A. K. (2008). Maximum entropy inverse reinforcement learning. In AAAI, (pp. 1433–1438). Ziebart, B. D., Ratliff, N. D., Gallagher, G., Mertz, C., Peterson, K., Bagnell, J. A., Hebert, M., Dey, A. K., & Srinivasa, S. (2009). Planning-based prediction for pedestrians. In Intelligent Robots and Systems, 2009. IROS 2009. IEEE/RSJ International Conference on, (pp. 3931–3936). IEEE.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/51601	-
dc.description.abstract	近年來，研究學者們開始運用機器學習的技巧來降低調整參數以及設計各式規則的麻煩性，對機器人學而言，使用這類學習技巧的目的便是讓機器人能從資料中學得運動法則所需之各式參數；藉由自資料中學習的概念以及研究學者發現人類族群會利用教導與學習的過程來獲得新技能之事實，機器人自示範中學習 (robot learning from demonstration) （亦稱模仿學習）的概念開始受到重視。而在模仿學習 (imitation learning) 演算法發展的過程中，其研究重點從機器人模仿人類以完成任務的理論與方法上之設計，轉變成如何用適當的方法來表示欲模仿、學得之項目，在此基礎上，一個具代表性的特徵組以及學習模型對於學習的過程而言是不可或缺的。在本論文中，主要貢獻之一便是展示出使用行動特徵以及高階資訊後對於學習的影響，在結合這些資訊以及基於此些行動後的未來狀態，發展出多步特徵 (multi-step feature) 並生成相對應的特徵向量，利用此特徵向量，將可學得示範者連續性的行為，由於所提出的特徵表示法與行動有極大的相關性，故稱之為行動相關聯資料 (action-correlated data)。除了行動相關聯資料，此論文的主要貢獻為以下概念：藉由資料調整法並搭配互動式模仿學習的技巧生成具結構性的法則。在起始的示範以及學習過程之後，失敗的情境將啟動行動相關聯之資料調整法，在此概念下，將利用互動式學習法獲取額外的示範資料，而這些新取得之資料將用以訓練出新的法則，而非使用所有取得之資料重新訓練單一的法則；在只有起始示範資料且啟動互動機制相當耗費資源而不可得的情形下，所提出的學習機制將利用學習者 (learner) 以及示範資料的特性自動地將行動相關聯之資料進行調整，即使沒有使用者所提供的新示範資料，仍舊可以生成出具結構性的法則。基於室內靜態環境以及模擬的動態環境下之實驗成果，足以驗證所提出之方法之成效，使用者的示範資料能夠被有效的學習以生成反應式行動法則 (reactive action policies)，基於這些情境下的成功經驗，所提出之學習機制當可應用於自主式機器人的導航，讓機器人族群能更快地融入我們的日常生活之中。	zh_TW
dc.description.abstract	In recent years, machine learning techniques are applied to reduce the burden of parameter tuning and rule designing processes in many applications. Utilizing these learning techniques, the main purpose is to let robots have the ability to learn parameters of policies from data. With the concept of learning from data, researchers are inspired by the fact that humans are capable of obtaining new skills from teaching and learning processes. As a result, the research field of robot learning from demonstration attracts attention. With developed imitation learning algorithms which theories and methods for robots imitating human subjects are studied, the focus becomes how to use a proper representation to describe the task to be imitated. A representative feature set and learning model are essential and should be chosen for the learning process. As part of the contributions in this thesis, effects of applying action feature and high level information are presented. Combing these information with future states based on actions to be selected, the proposed multi-step feature can be constructed to form a feature vector for the learner to train a policy for reproducing successive motion behaviors as demonstrators. Since the proposed feature representation is strongly correlated with actions, it is referred as action-correlated data in this thesis. Besides action-correlated data, the main contribution in this thesis is the concept of an arrangement procedure for generating a structured policy with interactive imitation learning techniques. The arrangement of action-correlated data will be applied for those failed cases after the initial demonstration and training processes. With this concept, additional demonstrations should be acquired through the interactive learning process and these newly obtained examples will be applied to train new policies instead of retraining a single policy using all the collected data. For those situations only the initial demonstration is provided and interactive processes do cost heavily, utilizing the characteristic of the leaner and demonstrations, action-correlated data can still be automatically arranged with the proposed mechanism. A structured policy will be generated accordingly for such particular situation even without additional demonstrations from human subjects. Based on the experimental results in indoor static environments and simulated dynamic environments, it is shown that the proposed method is capable of generating reactive action policies based on demonstrations from human subjects. With successful experiences in these scenarios, it is encouraged that the proposed mechanism should be applied for autonomous robots to complete the navigation in the daily life.	en
dc.description.provenance	Made available in DSpace on 2021-06-15T13:40:41Z (GMT). No. of bitstreams: 1 ntu-105-R96922044-1.pdf: 4890752 bytes, checksum: d58843fe6cea480841ca2d51b94f68fe (MD5) Previous issue date: 2016	en
dc.description.tableofcontents	ABSTRACT .......................................................... ii LIST OF FIGURES .................................................... v LIST OF TABLES ..................................................... x CHAPTER 1. Introduction ............................................ 1 1.1. Motivation .................................................. 2 1.2. Challenges .................................................. 3 1.3. Contributions ............................................... 5 CHAPTER 2. Related Works ........................................... 9 2.1. Robot Navigation ............................................ 9 2.2. Robot Learning from Demonstration .......................... 12 CHAPTER 3. Learning Collision-Free Navigation from Demonstration with the Learning to Search Algorithm ............................. 16 3.1. Related Works .............................................. 17 3.2. The Imitation Learning Algorithm ........................... 19 3.2.1. Preliminary and Problem Definition ..................... 19 3.2.2. The Learning to Search Algorithm ....................... 21 3.2.3. The Instance-based Example Approach .................... 24 3.3. Experimental Results and Discussion ........................ 25 3.3.1. Experimental Settings .................................. 26 3.3.2. Learning Collision-Free Navigation ..................... 26 3.3.3. Discussion ............................................. 28 3.4. Conclusion ................................................. 32 CHAPTER 4. One-Step and Multi-Step Learning to Search for Dynamic Environment Navigation ................................ 33 4.1. One-Step Greedy Approach ................................... 34 4.1.1. Implementation in a Simple Simulated World ............. 35 4.1.2. Experimental Settings of the One-Step Greedy Approach .. 38 4.1.3. Learning Results of the One-Step Greedy Approach........ 40 4.2. Multi-Step Approach ........................................ 44 4.2.1. Implementation in a Complex Simulated World ............ 46 4.2.2. Experimental Settings .................................. 47 4.2.3. Learning Results and Discussion ........................ 48 4.3. Conclusion ................................................. 52 CHAPTER 5. Interactive Learning from Demonstration with a Multilevel Mechanism for Collision-Free Navigation in Dynamic Environments ........................................... 53 5.1. The Interactive Learning Algorithm ......................... 54 5.1.1. Implementation Details ................................. 56 5.2. Experimental Results ....................................... 58 5.2.1. Experimental Settings .................................. 59 5.2.2. Learning Results and Discussion ........................ 59 5.3. Conclusion ................................................. 63 CHAPTER 6. Imitation Learning with Action-correlated Data Arrangement for Generating Reactive Action Policies in Dynamic Scenes ......... 65 6.1. Introduction ............................................... 66 6.2. Related works and background knowledge ..................... 69 6.3. Preliminaries .............................................. 73 6.3.1. Problem Definition ..................................... 73 6.3.2. Procedure of the LEARCH Intuition ...................... 75 6.4. Action-correlated Data Arrangement ......................... 76 6.5. Experiments ................................................ 79 6.5.1. Features ............................................... 80 6.5.2. Comparison ............................................. 81 6.5.3. Visualization .......................................... 85 6.5.4. Discussion ............................................. 88 6.6. Conclusion ................................................. 90 CHAPTER 7. Conclusion ............................................. 92 BIBLIOGRAPHY ...................................................... 95
dc.language.iso	en
dc.title	自示範中學習結合與行動相關聯之資料調整法生成適用於動態場景之反應式行動法則	zh_TW
dc.title	Imitation Learning with Action-correlated Data Arrangement for Generating Reactive Action Policies in Dynamic Scenes	en
dc.type	Thesis
dc.date.schoolyear	104-1
dc.description.degree	碩士
dc.contributor.oralexamcommittee	顏炳郎(Ping-Lang Yen)
dc.subject.keyword	自示範中學習,與行動相關聯之資料調整法,反應式行動法則,	zh_TW
dc.subject.keyword	imitation learning,action-correlated data arrangement,reactive action policies,	en
dc.relation.page	99
dc.rights.note	有償授權
dc.date.accepted	2016-01-11
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	資訊網路與多媒體研究所	zh_TW
顯示於系所單位：	資訊網路與多媒體研究所

文件中的檔案：

檔案	大小	格式
ntu-105-1.pdf 目前未授權公開取用	4.78 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。