請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/93837| 標題: | 透過擴散模型的密度估計實現示範學習 Learning from Demonstration via Density Estimation Using Diffusion Model |
| 作者: | 王湘淳 Hsiang-Chun Wang |
| 指導教授: | 孫紹華 Shao-Hua Sun |
| 關鍵字: | 模仿學習,擴散模型,行為克隆,生成對抗模仿學習, Imitation Learning,Diffusion Models,Behavioral Cloning,Generative Adversarial Imitation Learning, |
| 出版年 : | 2024 |
| 學位: | 碩士 |
| 摘要: | 本文提出了兩種透過模仿學習來解決學習任務的方法:傳統的行為克隆(BC)方法以及擴散模型引導的行為克隆(DBC)和擴散獎勵引導的對抗式模仿學習(DRAIL)。傳統的BC方法通常模擬專家的行動和狀態的條件概率$p(a|s)$或聯合概率$p(s,a)$,但儘管BC方法簡單易用,卻常常在泛化能力方面表現不佳。相反,模擬聯合概率可能會增強泛化能力,但推理過程常常耗時,而且可能會受到流形過擬合的影響。
為了解決這些挑戰,本文提出了一種新方法 DBC,它結合了一個訓練有素的擴散模型,旨在捕捉專家行為,並優化BC損失和擴散模型損失。在各種連續控制任務中的實驗驗證了 DBC 相對於現有方法的優越性能。此外,通過對模擬條件概率和聯合概率進行單獨建模的限制以及與不同生成模型的比較的額外實驗,進一步確定了所提方法的有效性。 在此基礎上,DRAIL將擴散模型整合到生成對抗模仿學習(GAIL)中,通過擴散判別分類器增強鑑別器,並設計擴散獎勵來改善策略學習的穩定性。在導航、操縱和運動任務的廣泛實驗中,DRAIL相對於以往的模仿學習方法表現出更好的效果,並突顯了其提高的泛化能力和數據效率。通過對學習的獎勵函數進行可視化分析,進一步支持了DRAIL相對於GAIL產生更精確和一致獎勵的能力。 綜上所述,本文強調了擴散模型在推進模仿學習技術方面的重要性,為在各種任務領域中從專家示範中學習提供了更穩健的解決方案。 Imitation learning presents a solution to learning tasks solely through expert demonstrations, circumventing the need for direct access to environment rewards. Traditional methods such as behavioral cloning (BC) model either the conditional probability $p(a|s)$ or the joint probability $p(s,a)$ of expert actions and states. While BC's simplicity is appealing, it often struggles with generalization, whereas modeling the joint probability can enhance generalization but may suffer from time-consuming inference and manifold overfitting. To address these challenges, a novel approach, termed diffusion model-guided behavioral cloning (DBC), is introduced. This framework incorporates a diffusion model trained to capture expert behaviors, optimizing both the BC loss and the proposed diffusion model loss. Experimental validation across various continuous control tasks, including navigation, robot arm manipulation, dexterous manipulation, and locomotion, demonstrates the superior performance of DBC over existing methods. Furthermore, additional experiments are designed to elucidate the limitations of modeling conditional and joint probabilities individually, alongside comparisons with different generative models, affirming the efficacy of the proposed approach. Building upon this foundation, diffusion rewards guided adversarial imitation learning (DRAIL) integrates a diffusion model into generative adversarial imitation learning (GAIL). By enhancing the discriminator through a diffusion discriminative classifier and designing diffusion rewards for policy learning, DRAIL aims to provide more precise and smoother rewards, addressing the brittleness and instability often observed in GAIL training. Extensive experimentation across navigation, manipulation, and locomotion tasks corroborates the effectiveness of DRAIL compared to prior imitation learning methods, highlighting its enhanced generalizability and data efficiency. Visual analysis of learned reward functions further supports DRAIL's ability to generate more refined and consistent rewards compared to GAIL. In summary, the synthesis of these approaches underscores the significance of diffusion models in advancing imitation learning techniques, offering more robust solutions for learning from expert demonstrations in diverse task domains. |
| URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/93837 |
| DOI: | 10.6342/NTU202401755 |
| 全文授權: | 同意授權(限校園內公開) |
| 顯示於系所單位: | 電信工程學研究所 |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-112-2.pdf 授權僅限NTU校內IP使用(校園外請利用VPN校外連線服務) | 8.18 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
