請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/74092
標題: | 自我模仿學習中探索與利用之平衡 Balancing Exploration and Exploitation in Self-Imitation Learning |
作者: | Chun-Yao Kang 康鈞堯 |
指導教授: | 陳銘憲(Ming-Syan Chen) |
關鍵字: | 強化學習,自我模仿學習,探索,獎勵稀疏, reinforcement learning,self-imitation learning,exploration,sparse reward, |
出版年 : | 2019 |
學位: | 碩士 |
摘要: | 獎勵稀疏的問題在強化學習中向來為一大挑戰。解決此類問題需同時仰賴有效率的探索 (exploration) 及利用 (exploitation) 方法。近期提出的兩種方法分別改善了這兩個面相。其中一種為自我模仿學習,透過模仿 agent 自身過去的良好路徑來加強利用的效率。另外一種方法為探索獎勵,當 agent 訪問到較少被訪問的狀態則給予較大的內在獎勵。本文提出了一個全新的架構 Explore-then-Exploit (EE),藉由交錯運行自我模仿學習及探索獎勵來增強這兩種演算法的效果。在探索階段,內在獎勵協助 agent 有效率地搜索環境,同時並收集高報酬的路徑。而在自我模仿階段, agent 將學習重現收集到的路徑,進而快速收斂至良好的政策。同時,此政策也可為下一次的探索階段提供較好的起始點。本研究結果顯示,EE 架構在各項使用稀疏獎勵的 MuJoCo 環境中可達到較佳或與現有方法相當的表現。 Sparse reward tasks are always challenging in reinforcement learning. Learning such tasks requires both efficient exploitation and exploration to reduce the sample complexity. One class of methods called self-imitation learning is recently proposed, which encourages the agent to do more exploitation by imitating past good trajectories. Exploration bonuses, however, is another class of methods which enhance exploration by producing intrinsic reward when the agent visits novel states. In this thesis, we introduce a novel framework Explore-then-Exploit (EE), which interleaves self-imitation learning with an exploration bonus to strengthen the effect of these two algorithms. In the exploring stage, with the aid of intrinsic reward, the agent tends to explore unseen states and occasionally collect high rewarding experiences, while in the self-imitating stage, the agent learns to consistently reproduce such experiences and thus provides a better starting point for subsequent stages. Our result shows that EE achieves superior or comparable performance on variants of MuJoCo environments with episodic reward settings. |
URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/74092 |
DOI: | 10.6342/NTU201903114 |
全文授權: | 有償授權 |
顯示於系所單位: | 電機工程學系 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-108-1.pdf 目前未授權公開取用 | 1.94 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。