Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 電機工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/68476
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor傅立成
dc.contributor.authorPei-Huai Ciouen
dc.contributor.author邱沛淮zh_TW
dc.date.accessioned2021-06-17T02:22:18Z-
dc.date.available2020-08-28
dc.date.copyright2017-08-28
dc.date.issued2017
dc.date.submitted2017-08-19
dc.identifier.citation[1] G. Ferrer and A. Sanfeliu. Multi-objective cost-to-go functions on robot navigation in dynamic environments. In 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 3824–3829.
[2] J. V. Gómez, N. Mavridis, and S. Garrido. Fast marching solution for the social path planning problem. In 2014 IEEE International Conference on Robotics and Automation (ICRA), pages 1871–1876.
[3] P. Ratsamee, Y. Mae, K. Ohara, M. Kojima, and T. Arai. Social navigation modelbased on human intention analysis using face orientation. In IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 1682–1687.
[4] M. Kollmitz, K. Hsiao, J. Gaa, and W. Burgard. Time dependent planning on a layered social cost map for human-aware robot navigation. In Mobile Robots (ECMR), 2015 European Conference on, pages 1–6.
[5] Dirk Helbing Moln´ar and P´eter. Social force model for pedestrian dynamics. Phys. Rev. E 51, 1998.
[6] Fernando Caballero No´e P´erez-Higueras, Rafael Ram´on-Vigo and Luis Merino. Robot local navigation with learned social cost functions. In Informatics in Control, Automation and Robotics (ICINCO), 2014 11th International Conference on, volume 02, pages 618–625.
[7] C. Weinrich, M. Volkhardt, E. Einhorn, and H. M. Gross. Prediction of human collision avoidance behavior by lifelong learning for socially compliant robot naviga-tion. In 2013 IEEE International Conference on Robotics and Automation, pages 376–381.
[8] Francisco Cruz, Sven Magg, Cornelius Weber, and Stefan Wermter. Training agents with interactive reinforcement learning and contextual affordances. IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2016.
[9] B. S. B. Dewantara. Building a socially acceptable navigation and behavior of a mobile robot using q-learning. In 2016 International Conference on Knowledge Creation and Intelligent Computing (KCIC), pages 88–93.
[10] B. S. B. Dewantara and J. Miura. Generation of a socially aware behavior of a guide robot using reinforcement learning. In 2016 International Electronics Symposium (IES), pages 105–110.
[11] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, DaanWierstra, and Martin Riedmiller. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.
[12] L. Tai, S. Li, and M. Liu. A deep-network solution towards model-less obstacle avoidance. In 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 2759–2764.
[13] Yuke Zhu, Roozbeh Mottaghi, Eric Kolve, Li Fei-Fei, Joseph J. Lim, Ali Farhadi, and Abhinav Gupta. Target-driven visual navigation in indoor scenes using deep reinforcement learning. arXiv:1609.05143v1, 2016.
[14] Kendon A. Ciolek, T.M. Environment and the spatial arrangement of conversational encounters. Sociological Inquiry, 1980.
[15] K. Charalampous, I. Kostavelis, and A. Gasteratos. Context-dependent social mapping. In 2016 IEEE International Conference on Imaging Systems and Techniques (IST), pages 30–35.
[16] V. Nguyen and C. Jayawardena. A decision making model for optimizing social relationship for side-by-side robotic wheelchairs in active mode. In 2016 6th IEEE International Conference on Biomedical Robotics and Biomechatronics (BioRob), pages 735–740.
[17] T. Miyashita Y. Morales and N. Hagita. Social robotic wheelchair centered on passenger and pedestrian comfort. Robotics and Autonomous Systems, 2016.
[18] T. Fraichard, R. Paulin, and P. Reignier. Human-robot motion: An attention-based navigation approach. In Robot and Human Interactive Communication, 2014 ROMAN: The 23rd IEEE International Symposium on, pages 684–691.
[19] D. Mehta, G. Ferrer, and E. Olson. Autonomous navigation in dynamic social environments using multi-policy decision making. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2016.
[20] J. P. Fentanes, B. Lacerda, T. Krajn, x00Ed, N. Hawes, and M. Hanheide. Now or later? predicting and maximising success of navigation actions from long-term experience. In 2015 IEEE International Conference on Robotics and Automation (ICRA), pages 1112–1117.
[21] H. Bai, S. Cai, N. Ye, D. Hsu, and W. S. Lee. Intention-aware online pomdp planning for autonomous driving in a crowd. In 2015 IEEE International Conference on Robotics and Automation (ICRA), pages 454–460.
[22] S. Candido, J. Davidson, and S. Hutchinson. Exploiting domain knowledge in planning for uncertain robot systems modeled as pomdps. In Robotics and Automation (ICRA), 2010 IEEE International Conference on, pages 3596–3603.
[23] Yuchen Fu, Zhipeng Xu, Fei Zhu, Quan Liu, and Xiaoke Zhou. Learn to humanlevel control in dynamic environment using incremental batch interrupting temporal abstraction. Computer Science and Information Systems, 2013.
[24] Givan B. and Driessens K. Tadepalli, P. Relational reinforcement learning an overview. ICML workshop on relational reinforcement learning, 2004.
[25] Richard Liaw, Sanjay Krishnan, Animesh Garg, Daniel Crankshaw, Joseph E. Gonzalez, and Ken Goldberg. Composing meta-policies for autonomous driving using hierarchical deep reinforcement learning. 2016.
[26] Sahand Sharifzadeh, Ioannis Chiotellis, Rudolph Triebel, and Daniel Cremers. Learning to drive using inverse reinforcement learning and deep q-networks. arXiv:1612.03653v1, 2016.
[27] Juan Nieto Roland Siegwart Mark Pfeiffer, Michael Schaeuble and Cesar Cadena. From perception to decision: A data-driven approach to end-to-end motion planning for autonomous ground robots. IEEE International Conference on Robotics and Automation (ICRA), 2017.
[28] Yu Fan Chen, Miao Liu, Michael Everett, and Jonathan P. How. Decentralized noncommunicating multiagent collision avoidance with deep reinforcement learning. arXiv preprint arXiv:1609.07845 (2016)., 2016.
[29] Mingming Li, Rui Jiang, Shuzhi Sam Ge, and Tong Heng Lee. Role playing learning for socially concomitant mobile robot navigation. arXiv:1705.10092v1, 2017.
[30] Yu Fan Chen, Michael Everett, Miao Liu, and Jonathan P. How. Socially aware motion planning with deep reinforcement learning. arXiv:1703.08862 [cs.RO], 2017.
[31] Zanlungo, Francesco, Tetsushi Ikeda, and Takayuki Kanda. Social force model with explicit collision prediction. EPL (Europhysics Letters) 93.6, 2011.
[32] Rachel Kirby. Social robot navigation. Carnegie Mellon University, 2010.
[33] Martin L Puterman. Markov decision processes: Discrete stochastic dynamic programming. Journal of the Operational Research Society, 1995.
[34] R Bellman. Dynamic programming princeton university press princeton. New Jersey Google Scholar, 1957.
[35] Christopher J.C.H. Watkins and Peter Dayan. Q-learning. Machine Learning, 8(3):279–292, 1992.
[36] T. Kanungo, D.M. Mount, N.S. Netanyahu, C.D. Piatko, R. Silverman, and A.Y. Wu. An efficient k-means clustering algorithm: Analysis and implementation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002.
[37] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis. Human-level control through deep reinforcement learning. Nature, 2014.
[38] J. Tsitsiklis and B Van Roy. An analysis of temporal-difference learning with function approximation. Automatic Control, 1997.
[39] L.-J. Lin. Self-improving reactive agents based on reinforcement learning,planning and teaching. Machine Learning, 1992.
[40] Rainer Lienhart, Alexander Kuranov, and Vadim Pisarevsky. Empirical analysis of detection cascades of boosted classifiers for rapid object detection. MRL Technical Report, 2002.
[41] Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and DaanWierstra. Continuous control with deep reinforcement learning. International Conference on Learning Representations(ICLR), 2016.
[42] J. Rios-Martinez, A. Spalanzani, and C. Laugier. From proxemics theory to sociallyaware navigation: A survey. International Journal of Social Robotics, 7(2):137–153, 2015.
[43] P. Henry, C. Vollmer, B. Ferris, and D. Fox. Learning to navigate through crowded environments. In 2010 IEEE International Conference on Robotics and Automation, pages 981–986.
[44] D. Vasquez, B. Okal, and K. O. Arras. Inverse reinforcement learning algorithms and features for robot navigation in crowds: An experimental comparison. In 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 1341– 1346.
[45] P. Abbeel, D. Dolgov, A. Y. Ng, and S. Thrun. Apprenticeship learning for motion planning with application to parking lot navigation. Int. Con! on Intelligent Robots and Systems (IROS), 2008.
[46] C. C. Yu and C. C. Wang. Collision- and freezing-free navigation in dynamic environments using learning to search. In 2012 Conference on Technologies and Applications of Artificial Intelligence, pages 151–156.
[47] J. Minguez and L. Montano. Nearness diagram (nd) navigation: collision avoidance in troublesome scenarios. Robotics and Automation, IEEE Transactions on, 20(1):45–59, 2004.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/68476-
dc.description.abstract對於服務型機器人,只考慮某些條件,例如:最短路徑的導航是不夠的。在人機共存的環境中,機器人除了考慮這些條件外,也要讓人認為他的導航是足夠自然的。為了使機器人遵守’社交規則’,使用機器學習的方式使機器人學會社交導航比繁瑣的由研究人員設計特徵來的適合。最近,深度增強式學習開始導入機器人研究領域,然而還很少研究考慮到用此學習架構來解決社交導航問題。社交導航是一個高維度的問題。為了解決這些問題,本研究提出合成強化式學習以提供一架構使機器人能由感測器輸入去學習出如何產生適當的速度。本系統使用深度強化式學習來學習特定場景之機器人之社交導航速度。藉由獎勵更新模組,人們可以提供回饋給機器人。為了使我們的系統更一般化,我們不使用模擬或是提前蒐集的資料。因為他們缺少了機器人與人在真實環境中的互動。我們直接將我們的系統導入真實空間,並提出方法以人類之先備知識來解決深度增強式學習過於長的學習時間。我們的系統可以逐漸學習如何控制機器人的速度在某個特定的條件下並且藉由人們的回饋來調整條件已了解當時的社交規則。由實驗證明,我們提出的合成強化式學習可以學會如何社交導航並且於合理時間內學會。獎勵得更新更使我們的系統能學到更合適的導航行為zh_TW
dc.description.abstractFor service robot, the navigation movement that only considers the metrics such as minimum path is not enough. In the environment that robot and human coexist, the robot not only needs to consider such metric but also to let the human think its navigation movement is natural enough. In order to following such ’social norms’ in the environment, using learning method to make robot learn how to navigate is easier than tediously designing handcrafted rules. Recently, deep reinforcement learning (DRL) is applied to the robotic field. However, there are very few researchers who consider solving the social navigation problem, which is in a high dimensional space by applying DRL method. In order to solve these problems, the research proposes the composite reinforcement learning (CRL) system that provide a framework that use the sensor input to learn how to generate the velocity of the robot. The system uses DRL to learn the velocity in a given set of scenarios and a reward update module that provides ways of updating the reward function based on the feedback of human. In order to generalize the system, we don’t use simulator or pre-collected data that are in lack of the real interaction between human and robot. We directly apply our system to the real environment and provide methods to cope with the long training time problem of DRL in real environment by incorporating prior knowledge to the system. The CRL system is able to incrementally learn to determine its velocity by a given rules (e.g. reward functions). Also it will keep collecting human feedback to keep synchronizing the reward functions inside the system to the current social norms. The experiments show that the proposed CRL system can learn how to navigate in reasonable time. The updating reward is able to make the system learn a more suitable navigation style.en
dc.description.provenanceMade available in DSpace on 2021-06-17T02:22:18Z (GMT). No. of bitstreams: 1
ntu-106-R04921008-1.pdf: 48799863 bytes, checksum: 5988cdb53d260d75458445cb346e0c2c (MD5)
Previous issue date: 2017
en
dc.description.tableofcontents口試委員會審定書i
誌謝ii
摘要iii
Abstract iv
Contents vi
List of Figures ix
List of Tables xii
1 Introduction 1
1.1 Motivation and Background . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2 Preliminaries 11
2.1 Social Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Markov Decision Process . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3 Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.4 Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.4.1 Q-learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4.2 Deep Q-Learning . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.5 Human Detection and Tracking . . . . . . . . . . . . . . . . . . . . . . . 21
3 Composite Reinforcement Learning 22
3.1 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.2 Compose Reinforcement Learning in Different Scenarios . . . . . . . . . 26
3.2.1 Scenario 1 : Navigation in Medium Human Density scenario . . . 26
3.2.2 Scenario2 : Navigation in High Human Density scenario . . . . . 27
3.2.3 Scenario3 : Interaction scenario . . . . . . . . . . . . . . . . . . 27
3.3 Compose Human Feedback into the System . . . . . . . . . . . . . . . . 28
3.4 Encode Prior Knowledge . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.4.1 Adapt Velocity Space . . . . . . . . . . . . . . . . . . . . . . . . 31
3.4.2 Human Guiding . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.4.3 Stabilize Action . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.4.4 Generate Bad Experiments . . . . . . . . . . . . . . . . . . . . . 35
3.5 Feature Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.5.1 Static Entities Related Features . . . . . . . . . . . . . . . . . . 36
3.5.2 Single Human Related Features . . . . . . . . . . . . . . . . . . 38
3.5.3 Multiple Human Related Features . . . . . . . . . . . . . . . . . 39
3.6 Reward Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.7 Network Design of Deep Reinforcement Learning . . . . . . . . . . . . . 44
4 Experiments 49
4.1 Experiment Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.2 Learning Behavior from human . . . . . . . . . . . . . . . . . . . . . . . 50
4.3 Real World Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.4 Condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.5 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.5.1 Result and Comparison . . . . . . . . . . . . . . . . . . . . . . . 58
5 Conclusions 71
Reference 72
dc.language.isoen
dc.title以合成強化式學習之適應行為學習之社交機器人導航zh_TW
dc.titleAdaptive Behavior Learning Social Robot Navigation with Composite Reinforcement Learningen
dc.typeThesis
dc.date.schoolyear105-2
dc.description.degree碩士
dc.contributor.oralexamcommittee周瑞仁,范欽雄,簡忠漢,陳永耀
dc.subject.keyword社交導航,強化式學習,深度強化式學習,zh_TW
dc.subject.keywordsocial navigation,reinforcement learning,deep reinforcement learning,en
dc.relation.page78
dc.identifier.doi10.6342/NTU201703974
dc.rights.note有償授權
dc.date.accepted2017-08-20
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept電機工程學研究所zh_TW
顯示於系所單位:電機工程學系

文件中的檔案:
檔案 大小格式 
ntu-106-1.pdf
  目前未授權公開取用
47.66 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved