以合成強化式學習之適應行為學習之社交機器人導航

Pei-Huai Ciou; 邱沛淮

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/68476

標題:	以合成強化式學習之適應行為學習之社交機器人導航 Adaptive Behavior Learning Social Robot Navigation with Composite Reinforcement Learning
作者:	Pei-Huai Ciou 邱沛淮
指導教授:	傅立成
關鍵字:	社交導航,強化式學習,深度強化式學習, social navigation,reinforcement learning,deep reinforcement learning,
出版年 :	2017
學位:	碩士
摘要:	對於服務型機器人，只考慮某些條件，例如：最短路徑的導航是不夠的。在人機共存的環境中，機器人除了考慮這些條件外，也要讓人認為他的導航是足夠自然的。為了使機器人遵守’社交規則’，使用機器學習的方式使機器人學會社交導航比繁瑣的由研究人員設計特徵來的適合。最近，深度增強式學習開始導入機器人研究領域，然而還很少研究考慮到用此學習架構來解決社交導航問題。社交導航是一個高維度的問題。為了解決這些問題，本研究提出合成強化式學習以提供一架構使機器人能由感測器輸入去學習出如何產生適當的速度。本系統使用深度強化式學習來學習特定場景之機器人之社交導航速度。藉由獎勵更新模組，人們可以提供回饋給機器人。為了使我們的系統更一般化，我們不使用模擬或是提前蒐集的資料。因為他們缺少了機器人與人在真實環境中的互動。我們直接將我們的系統導入真實空間，並提出方法以人類之先備知識來解決深度增強式學習過於長的學習時間。我們的系統可以逐漸學習如何控制機器人的速度在某個特定的條件下並且藉由人們的回饋來調整條件已了解當時的社交規則。由實驗證明，我們提出的合成強化式學習可以學會如何社交導航並且於合理時間內學會。獎勵得更新更使我們的系統能學到更合適的導航行為 For service robot, the navigation movement that only considers the metrics such as minimum path is not enough. In the environment that robot and human coexist, the robot not only needs to consider such metric but also to let the human think its navigation movement is natural enough. In order to following such ’social norms’ in the environment, using learning method to make robot learn how to navigate is easier than tediously designing handcrafted rules. Recently, deep reinforcement learning (DRL) is applied to the robotic field. However, there are very few researchers who consider solving the social navigation problem, which is in a high dimensional space by applying DRL method. In order to solve these problems, the research proposes the composite reinforcement learning (CRL) system that provide a framework that use the sensor input to learn how to generate the velocity of the robot. The system uses DRL to learn the velocity in a given set of scenarios and a reward update module that provides ways of updating the reward function based on the feedback of human. In order to generalize the system, we don’t use simulator or pre-collected data that are in lack of the real interaction between human and robot. We directly apply our system to the real environment and provide methods to cope with the long training time problem of DRL in real environment by incorporating prior knowledge to the system. The CRL system is able to incrementally learn to determine its velocity by a given rules (e.g. reward functions). Also it will keep collecting human feedback to keep synchronizing the reward functions inside the system to the current social norms. The experiments show that the proposed CRL system can learn how to navigate in reasonable time. The updating reward is able to make the system learn a more suitable navigation style.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/68476
DOI:	10.6342/NTU201703974
全文授權:	有償授權
顯示於系所單位：	電機工程學系

文件中的檔案：

檔案	大小	格式
ntu-106-1.pdf 目前未授權公開取用	47.66 MB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。