透過擴散模型的密度估計實現示範學習

王湘淳; Hsiang-Chun Wang

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/93837

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	孫紹華	zh_TW
dc.contributor.advisor	Shao-Hua Sun	en
dc.contributor.author	王湘淳	zh_TW
dc.contributor.author	Hsiang-Chun Wang	en
dc.date.accessioned	2024-08-08T16:28:55Z	-
dc.date.available	2024-08-09	-
dc.date.copyright	2024-08-08	-
dc.date.issued	2024	-
dc.date.submitted	2024-07-16	-
dc.identifier.citation	[1] P. Abbeel and A. Y. Ng. Apprenticeship learning via inverse reinforcement learning. In International Conference on Machine Learning, 2004. [2] M. Arjovsky, S. Chintala, and L. Bottou. Wasserstein generative adversarial networks. In International Conference on Machine Learning, 2017. [3] K. Arulkumaran, M. P. Deisenroth, M. Brundage, and A. A. Bharath. Deep reinforcement learning: A brief survey. IEEE Signal Processing Magazine, 2017. [4] Y. Aytar, T. Pfaff, D. Budden, T. Paine, Z. Wang, and N. De Freitas. Playing hard exploration games by watching youtube. In Neural Information Processing Systems, 2018. [5] M. Bain and C. Sammut. A framework for behavioural cloning. In Machine Intelligence 15, 1995. [6] C. M. Bishop and N. M. Nasrabadi. Pattern recognition and machine learning. Springer, 2006. [7] D. M. Blei, A. Kucukelbir, and J. D. McAuliffe. Variational inference: A review for statisticians. Journal of the American statistical Association, 2017. [8] G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba. Openai gym, 2016. [9] M. Chen, Y. Wang, T. Liu, Z. Yang, X. Li, Z. Wang, and T. Zhao. On computation and generalization of generative adversarial imitation learning. In International Conference on Learning Representations, 2020. [10] S.-F. Chen, H.-C. Wang, M.-H. Hsu, C.-M. Lai, and S.-H. Sun. Diffusion model-augmented behavioral cloning. In International Conference on Machine Learning,2024. [11] C. Chi, S. Feng, Y. Du, Z. Xu, E. Cousineau, B. Burchfiel, and S. Song. Diffusion policy: Visuomotor policy learning via action diffusion. In Robotics: Science and Systems, 2023. [12] P. F. Christiano, J. Leike, T. Brown, M. Martic, S. Legg, and D. Amodei. Deep reinforcement learning from human preferences. In Advances in Neural Information Processing Systems, 2017. [13] F. Codevilla, E. Santana, A. M. López, and A. Gaidon. Exploring the limitations of behavior cloning for autonomous driving. In International Conference on Computer Vision, 2019. [14] R. Dadashi, L. Hussenot, M. Geist, and O. Pietquin. Primal wasserstein imitation learning. arXiv preprint arXiv:2006.04678, 2020. [15] P. Dhariwal and A. Nichol. Diffusion models beat gans on image synthesis. In Neural Information Processing Systems, 2021. [16] L. Dinh, J. Sohl-Dickstein, and S. Bengio. Density estimation using real NVP. In International Conference on Learning Representations, 2017. [17] Y. Du and I. Mordatch. Implicit generation and modeling with energy based models. In Neural Information Processing Systems, 2019. [18] R. Feng, D. Zhao, and Z.-J. Zha. Understanding noise injection in gans. In international conference on machine learning, pages 3284–3293. PMLR, 2021. [19] D. Fisch, E. Kalkowski, and B. Sick. Knowledge fusion for probabilistic generative classifiers with data mining applications. IEEE Transactions on Knowledge and Data Engineering, 2013. [20] P. Florence, C. Lynch, A. Zeng, O. A. Ramirez, A. Wahid, L. Downs, A. Wong, J. Lee, I. Mordatch, and J. Tompson. Implicit behavioral cloning. In Conference on Robot Learning, 2022. [21] J. Fu, A. Kumar, O. Nachum, G. Tucker, and S. Levine. D4rl: Datasets for deep data-driven reinforcement learning. arXiv preprint arXiv:2004.07219, 2020. [22] J. Fu, K. Luo, and S. Levine. Learning robust rewards with adverserial inverse reinforcement learning. In International Conference on Learning Representations, 2018. [23] A. Ganapathi, P. Florence, J. Varley, K. Burns, K. Goldberg, and A. Zeng. Implicit kinematic policies: Unifying joint and cartesian action spaces in end-to-end robot learning. In International Conference on Robotics and Automation, 2022. [24] J. Garcıa and F. Fernández. A comprehensive survey on safe reinforcement learning. Journal of Machine Learning Research, 2015. [25] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. In Advances in Neural Information Processing Systems, 2014. [26] T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In Proceedings of International Conference on Machine Learning, 2018. [27] J. Harmer, L. Gisslén, J. del Val, H. Holst, J. Bergdahl, T. Olsson, K. Sjöö, and M. Nordin. Imitation learning with concurrent actions in 3d games. In IEEE Conference on Computational Intelligence and Games, 2018. [28] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. [29] P. Henderson, R. Islam, P. Bachman, J. Pineau, D. Precup, and D. Meger. Deep reinforcement learning that matters. In Association for the Advancement of Artificial Intelligence, 2018. [30] J. Ho and S. Ermon. Generative adversarial imitation learning. In Advances in Neural Information Processing Systems, 2016. [31] A. J. J Ho. Denoising diffusion probabilistic models. In Advances in Neural Information Processing Systems, 2020. [32] R. Jena, C. Liu, and K. Sycara. Augmenting gail with bc for sample efficient imitation learning. In Conference on Robot Learning, pages 80–90. PMLR, 2021. [33] L. Ke, S. Choudhury, M. Barnes, W. Sun, G. Lee, and S. Srinivasa. Imitation learning as f-divergence minimization. In International Workshop on the Algorithmic Foundations of Robotics, 2020. [34] D. Kingma, T. Salimans, B. Poole, and J. Ho. Variational diffusion models. In Neural Information Processing Systems, 2021. [35] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. In Proceedings of International Conference on Learning Representations, 2015. [36] D. P. Kingma and M. Welling. Auto-encoding variational bayes. In International Conference on Learning Representations, 2014. [37] B. R. Kiran, I. Sobh, V. Talpaert, P. Mannion, A. A. Al Sallab, S. Yogamani, and P. Pérez. Deep reinforcement learning for autonomous driving: A survey. IEEE Transactions on Intelligent Transportation Systems, 2021. [38] P.-C. Ko, J. Mao, Y. Du, S.-H. Sun, and J. B. Tenenbaum. Learning to act from actionless videos through dense correspondences. arXiv preprint arXiv:2310.08576, 2023. [39] P.-C. Ko, J. Mao, Y. Du, S.-H. Sun, and J. B. Tenenbaum. Learning to act from actionless videos through dense correspondences. In International Conference on Learning Representations, 2024. [40] I. Kostrikov. Pytorch implementations of reinforcement learning algorithms. https://github.com/ikostrikov/pytorch-a2c-ppo-acktr-gail, 2018. [41] I. Kostrikov, K. K. Agrawal, D. Dwibedi, S. Levine, and J. Tompson. Discriminatoractor-critic: Addressing sample inefficiency and reward bias in adversarial imitation learning. In International Conference on Learning Representations, 2019. [42] I. Kostrikov, K. K. Agrawal, D. Dwibedi, S. Levine, and J. Tompson. Discriminator-actor-critic: Addressing sample inefficiency and reward bias in adversarial imitation learning. In International Conference on Machine Learning, 2019. [43] C.-M. Lai, H.-C. Wang, P.-C. Hsieh, Y.-C. F. Wang, M.-H. Chen, and S.-H. Sun. Diffusion-reward adversarial imitation learning. arXiv preprint arXiv:2405.16194, 2024. [44] Y. Lee, S.-H. Sun, S. Somasundaram, E. S. Hu, and J. J. Lim. Composing complex skills by learning transition policies. In Proceedings of International Conference on Learning Representations, 2019. [45] Y. Lee, A. Szot, S.-H. Sun, and J. J. Lim. Generalizable imitation learning from observation via inferring goal proximity. In Neural Information Processing Systems, 2021. [46] J. Leike, D. Krueger, T. Everitt, M. Martic, V. Maini, and S. Legg. Scalable agent alignment via reward modeling: a research direction. arXiv preprint arXiv:1811.07871, 2018. [47] S. Levine, A. Kumar, G. Tucker, and J. Fu. Offline reinforcement learning: Tutorial, review, and perspectives on open problems. arXiv preprint arXiv:2005.01643, 2020. [48] A. C. Li, M. Prabhudesai, S. Duggal, E. Brown, and D. Pathak. Your diffusion model is secretly a zero-shot classifier. In International Conference on Computer Vision, 2023. [49] T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra. Continuous control with deep reinforcement learning. In International Conference on Learning Representations, 2016. [50] G. Loaiza-Ganem, B. L. Ross, J. C. Cresswell, and A. L. Caterini. Diagnosing and fixing manifold overfitting in deep generative models. Transactions on Machine Learning Research, 2022. [51] C. Luo. Understanding diffusion models: A unified perspective. arXiv preprint arXiv:2208.11970, 2022. [52] A. O. Ly and M. Akhloufi. Learning to drive by imitation: An overview of deep behavior cloning methods. IEEE Transactions on Intelligent Vehicles, 2020. [53] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, et al. Human-level control through deep reinforcement learning. Nature, 2015. [54] A. Y. Ng and S. J. Russell. Algorithms for inverse reinforcement learning. In International Conference on Machine Learning, 2000. [55] A. Q. Nichol and P. Dhariwal. Improved denoising diffusion probabilistic models. In International Conference on Machine Learning, 2021. [56] A. v. d. Oord, Y. Li, and O. Vinyals. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748, 2018. [57] M. Orsini, A. Raichuk, L. Hussenot, D. Vincent, R. Dadashi, S. Girgin, M. Geist, O. Bachem, O. Pietquin, and M. Andrychowicz. What matters for adversarial imitation learning? In Neural Information Processing Systems, 2021. [58] T. Osa, J. Pajarinen, G. Neumann, J. A. Bagnell, P. Abbeel, J. Peters, et al. An algorithmic perspective on imitation learning. Foundations and Trends® in Robotics, 2018. [59] T. Pearce, T. Rashid, A. Kanervisto, D. Bignell, M. Sun, R. Georgescu, S. V. Macua, S. Z. Tan, I. Momennejad, K. Hofmann, and S. Devlin. Imitating human behaviour with diffusion models. In International Conference on Learning Representations, 2023. [60] M. Plappert, M. Andrychowicz, A. Ray, B. McGrew, B. Baker, G. Powell, J. Schneider, J. Tobin, M. Chociej, P. Welinder, et al. Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464, 2018. [61] D. A. Pomerleau. Alvinn: An autonomous land vehicle in a neural network. In Advances in Neural Information Processing Systems, 1989. [62] B. Poole, A. Jain, J. T. Barron, and B. Mildenhall. Dreamfusion: Text-to-3d using 2d diffusion. In The Eleventh International Conference on Learning Representations, 2023. [63] V. Popov, I. Vovk, V. Gogoryan, T. Sadekova, M. Kudinov, and J. Wei. Diffusionbased voice conversion with fast maximum likelihood sampling scheme. In International Conference on Learning Representations, 2022. [64] H. Ravichandar, A. S. Polydoros, S. Chernova, and A. Billard. Recent advances in robot learning from demonstration. Annual review of control, robotics, and autonomous systems, 2020. [65] M. Reuss, M. Li, X. Jia, and R. Lioutikov. Goal conditioned imitation learning using score-based diffusion policies. In Robotics: Science and Systems, 2023. [66] D. Rezende and S. Mohamed. Variational inference with normalizing flows. In International Conference on Machine Learning, 2015. [67] O. Ronneberger, P. Fischer, and T. Brox. U-net: Convolutional networks for biomedical image segmentation. In Medical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, proceedings, part III 18, 2015. [68] S. Ross, G. Gordon, and D. Bagnell. A reduction of imitation learning and structured prediction to no-regret online learning. In International Conference on Artificial Intelligence and Statistics, 2011. [69] S. Schaal. Learning from demonstration. In Advances in Neural Information Processing Systems, 1997. [70] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017. [71] J. Sohl-Dickstein, E. Weiss, N. Maheswaranathan, and S. Ganguli. Deep unsupervised learning using nonequilibrium thermodynamics. In International Conference on Machine Learning, 2015. [72] J. Song, C. Meng, and S. Ermon. Denoising diffusion implicit models. In International Conference on Machine Learning, 2021. [73] Y. Song and S. Ermon. Generative modeling by estimating gradients of the data distribution. In Neural Information Processing Systems, 2019. [74] Y. Song and D. P. Kingma. How to train your energy-based models. arXiv preprint arXiv:2101.03288, 2021. [75] Y. Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon, and B. Poole. Scorebased generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2021. [76] G. Swamy, D. Wu, S. Choudhury, D. Bagnell, and S. Wu. Inverse reinforcement learning without reinforcement learning. In International Conference on Machine Learning, 2023. [77] U. Syed, M. Bowling, and R. E. Schapire. Apprenticeship learning using linear programming. In Proceedings of the 25th international conference on Machine learning, 2008. [78] F. Torabi, G. Warnell, and P. Stone. Behavioral cloning from observation. In International Joint Conference on Artificial Intelligence, 2018. [79] F. Torabi, G. Warnell, and P. Stone. Generative adversarial imitation from observation. In International Conference on Machine Learning, 2019. [80] P. von Platen, S. Patil, A. Lozhkov, P. Cuenca, N. Lambert, K. Rasul, M. Davaadorj, D. Nair, S. Paul, W. Berman, Y. Xu, S. Liu, and T. Wolf. Diffusers: State-of-the-art diffusion models. https://github.com/huggingface/diffusers, 2022. [81] B. Wang, G. Wu, T. Pang, Y. Zhang, and Y. Yin. Diffail: Diffusion adversarial imitation learning. In Association for the Advancement of Artificial Intelligence, 2024. [82] B. Wang, Y. Zhang, T. Pang, G. Wu, and Y. Yin. Diffail: Diffusion adversarial imitation learning. arXiv preprint arXiv:2312.06348, 2023. [83] L. Wang, C. Fernandez, and C. Stiller. High-level decision making for automated highway driving via behavior cloning. IEEE Transactions on Intelligent Vehicles, 2022. [84] Q. Wu, R. Gao, and H. Zha. Bridging explicit and implicit deep generative models via neural stein estimators. In Neural Information Processing Systems, 2021. [85] J. Xu, Z. Li, B. Du, M. Zhang, and J. Liu. Reluplex made more practical: Leaky relu. In IEEE Symposium on Computers and Communications, 2020. [86] T. Z. Zhao, V. Kumar, S. Levine, and C. Finn. Learning fine-grained bimanual manipulation with low-cost hardware. In Robotics: Science and Systems, 2023. [87] B. D. Ziebart, A. L. Maas, J. A. Bagnell, A. K. Dey, et al. Maximum entropy inverse reinforcement learning. In Aaai, 2008. [88] K. Zolna, S. Reed, A. Novikov, S. G. Colmenarejo, D. Budden, S. Cabi, M. Denil, N. de Freitas, and Z. Wang. Task-relevant adversarial imitation learning. In Conference on Robot Learning, 2021.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/93837	-
dc.description.abstract	本文提出了兩種透過模仿學習來解決學習任務的方法：傳統的行為克隆（BC）方法以及擴散模型引導的行為克隆（DBC）和擴散獎勵引導的對抗式模仿學習（DRAIL）。傳統的BC方法通常模擬專家的行動和狀態的條件概率$p(a\|s)$或聯合概率$p(s,a)$，但儘管BC方法簡單易用，卻常常在泛化能力方面表現不佳。相反，模擬聯合概率可能會增強泛化能力，但推理過程常常耗時，而且可能會受到流形過擬合的影響。為了解決這些挑戰，本文提出了一種新方法 DBC，它結合了一個訓練有素的擴散模型，旨在捕捉專家行為，並優化BC損失和擴散模型損失。在各種連續控制任務中的實驗驗證了 DBC 相對於現有方法的優越性能。此外，通過對模擬條件概率和聯合概率進行單獨建模的限制以及與不同生成模型的比較的額外實驗，進一步確定了所提方法的有效性。在此基礎上，DRAIL將擴散模型整合到生成對抗模仿學習（GAIL）中，通過擴散判別分類器增強鑑別器，並設計擴散獎勵來改善策略學習的穩定性。在導航、操縱和運動任務的廣泛實驗中，DRAIL相對於以往的模仿學習方法表現出更好的效果，並突顯了其提高的泛化能力和數據效率。通過對學習的獎勵函數進行可視化分析，進一步支持了DRAIL相對於GAIL產生更精確和一致獎勵的能力。綜上所述，本文強調了擴散模型在推進模仿學習技術方面的重要性，為在各種任務領域中從專家示範中學習提供了更穩健的解決方案。	zh_TW
dc.description.abstract	Imitation learning presents a solution to learning tasks solely through expert demonstrations, circumventing the need for direct access to environment rewards. Traditional methods such as behavioral cloning (BC) model either the conditional probability $p(a\|s)$ or the joint probability $p(s,a)$ of expert actions and states. While BC's simplicity is appealing, it often struggles with generalization, whereas modeling the joint probability can enhance generalization but may suffer from time-consuming inference and manifold overfitting. To address these challenges, a novel approach, termed diffusion model-guided behavioral cloning (DBC), is introduced. This framework incorporates a diffusion model trained to capture expert behaviors, optimizing both the BC loss and the proposed diffusion model loss. Experimental validation across various continuous control tasks, including navigation, robot arm manipulation, dexterous manipulation, and locomotion, demonstrates the superior performance of DBC over existing methods. Furthermore, additional experiments are designed to elucidate the limitations of modeling conditional and joint probabilities individually, alongside comparisons with different generative models, affirming the efficacy of the proposed approach. Building upon this foundation, diffusion rewards guided adversarial imitation learning (DRAIL) integrates a diffusion model into generative adversarial imitation learning (GAIL). By enhancing the discriminator through a diffusion discriminative classifier and designing diffusion rewards for policy learning, DRAIL aims to provide more precise and smoother rewards, addressing the brittleness and instability often observed in GAIL training. Extensive experimentation across navigation, manipulation, and locomotion tasks corroborates the effectiveness of DRAIL compared to prior imitation learning methods, highlighting its enhanced generalizability and data efficiency. Visual analysis of learned reward functions further supports DRAIL's ability to generate more refined and consistent rewards compared to GAIL. In summary, the synthesis of these approaches underscores the significance of diffusion models in advancing imitation learning techniques, offering more robust solutions for learning from expert demonstrations in diverse task domains.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2024-08-08T16:28:55Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2024-08-08T16:28:55Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	Verification Letter from the Oral Examination Committee i 摘要 iii Abstract v Contents vii Chapter 1 Introduction 1 1.1 Diffusion Model for Offline Imitation Learning . . . . . . . . . . . . 2 1.2 Diffusion Model for Online Imitation Learning . . . . . . . . . . . . 3 1.3 Published Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Chapter 2 Related Work 5 2.1 Behavioral Cloning (BC) . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 Inverse Reinforcement Learning (IRL) . . . . . . . . . . . . . . . . 6 2.3 Adversarial Imitation Learning (AIL) . . . . . . . . . . . . . . . . . 6 Chapter 3 Preliminaries 7 3.1 Imitation Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3.1.1 Modeling Conditional and Joint Probability . . . . . . . . . . . . . 7 3.1.1.1 Modeling Conditional Probability p(a\|s) . . . . . . . . 7 3.1.1.2 Modeling Joint Probability p(s, a) . . . . . . . . . . . 8 3.1.2 Generative Adversarial Imitation Learning (GAIL) . . . . . . . . . 8 3.2 Diffusion Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.2.1 Diffusion Models in DBC . . . . . . . . . . . . . . . . . . . . . . . 10 3.2.2 Diffusion Models in DRAIL . . . . . . . . . . . . . . . . . . . . . 11 Chapter 4 Diffusion Model-Augmented Behavioral Cloning 13 4.1 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 4.1.1 Behavioral Cloning Loss . . . . . . . . . . . . . . . . . . . . . . . 14 4.1.2 Learning a Diffusion Model and Guiding Policy Learning . . . . . . 14 4.1.2.1 Learning a Diffusion Model . . . . . . . . . . . . . . . 15 4.1.2.2 Learning a Policy with Diffusion Model Loss . . . . . 16 4.1.3 Combining the Two Objectives . . . . . . . . . . . . . . . . . . . . 17 4.2 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 4.2.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4.2.2 Baselines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 4.2.3 Multimodality of Environments . . . . . . . . . . . . . . . . . . . . 20 4.2.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.2.5 Generalization Experiments in FETCHPICK . . . . . . . . . . . . . . 23 4.2.6 Manifold Overfitting Experiments . . . . . . . . . . . . . . . . . . 24 Chapter 5 Diffusion-Reward Adversarial Imitation Learning 27 5.1 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 5.1.1 Reward prediction with a conditional diffusion model . . . . . . . . 28 5.1.2 Diffusion discriminative classifier . . . . . . . . . . . . . . . . . . 29 5.1.3 Diffusion-Reward Adversarial Imitation Learning . . . . . . . . . . 32 5.2 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 5.2.1 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . . . . 33 5.2.2 Baselines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 5.2.3 Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . . 36 5.2.4 Generalizability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 5.2.5 Data efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 5.2.6 Reward function visualization . . . . . . . . . . . . . . . . . . . . 40 5.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 Chapter 6 Conclusion 43 6.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 6.2 Further Direction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 Bibliography 45 Appendix A — Algorithm of DBC 58 Appendix B — Additional Experiments of DBC on Image-Based Environment 61 Appendix C — Ablation Study of DBC 63 C.1 Comparing Different Generative Models . . . . . . . . . . . . . . . 63 C.2 Effect of the Diffusion Model Loss Coefficient λ . . . . . . . . . . . 64 C.3 Effect of the Normalization Term . . . . . . . . . . . . . . . . . . . 64 Appendix D — Relationships between LBC and LDM in DBC 67 D.1 Training Progress . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 D.2 F-Divergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 Appendix E — Alleviating Manifold Overfitting by Noise Injection in DBC 71 E.1 Modeling Expert Distribution . . . . . . . . . . . . . . . . . . . . . 71 E.2 Guide Policy Learning . . . . . . . . . . . . . . . . . . . . . . . . . 72 Appendix F — Effect of Dataset Size and Data Augmentation in DBC 75 Appendix G — Qualitative Results in DBC 77 Appendix H — On the Theoretical Motivation for Guiding Policy Learning with Diffusion Model in DBC 79 H.1 Relation to DiffAIL in DRAIL . . . . . . . . . . . . . . . . . . . . . 84 H.2 Extended results of generalization experiments in DRAIL . . . . . . 87 H.2.1 Experiment settings . . . . . . . . . . . . . . . . . . . . . . . . . . 87 H.2.2 Experiment results . . . . . . . . . . . . . . . . . . . . . . . . . . 88 H.3 Converged performance in DRAIL . . . . . . . . . . . . . . . . . . 90 Appendix I — Environment & Task Details 93 I.1 MAZE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 I.2 FETCHPUSH & FETCHPICK . . . . . . . . . . . . . . . . . . . . . . . 94 I.3 HANDROTATE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 I.4 CHEETAH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 I.5 WALKER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 I.6 ANTREACH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 Appendix J — Model Architecture 99 J.1 Model Architecture of DBC . . . . . . . . . . . . . . . . . . . . . . 99 J.1.1 Model Architecture of BC, Implicit BC, Diffusion Policy, and DBC 99 J.1.2 Model Architecture of EBM, VAE, and GAN . . . . . . . . . . . . 101 J.2 Model architecture of DRAIL . . . . . . . . . . . . . . . . . . . . . 102 J.2.1 Model architecture of DRAIL, DiffAIL, and the baselines . . . . . . 102 J.2.2 Image-based model architecture of DRAIL, DiffAIL, and the baselines104 Appendix K — Training and Inference Details 107 K.1 Training and Inference Details of DBC . . . . . . . . . . . . . . . . 107 K.1.1 Computation Resource . . . . . . . . . . . . . . . . . . . . . . . . 107 K.1.2 Hyperparamters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 K.1.3 Inference Details . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 K.1.4 Training Details of Generative Models . . . . . . . . . . . . . . . . 110 K.2 Training details of DRAIL . . . . . . . . . . . . . . . . . . . . . . . 114 K.2.1 Training hyperparamters . . . . . . . . . . . . . . . . . . . . . . . 114 K.2.2 Reward function details . . . . . . . . . . . . . . . . . . . . . . . . 115	-
dc.language.iso	en	-
dc.subject	模仿學習	zh_TW
dc.subject	擴散模型	zh_TW
dc.subject	行為克隆	zh_TW
dc.subject	生成對抗模仿學習	zh_TW
dc.subject	Diffusion Models	en
dc.subject	Behavioral Cloning	en
dc.subject	Imitation Learning	en
dc.subject	Generative Adversarial Imitation Learning	en
dc.title	透過擴散模型的密度估計實現示範學習	zh_TW
dc.title	Learning from Demonstration via Density Estimation Using Diffusion Model	en
dc.type	Thesis	-
dc.date.schoolyear	112-2	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	林軒田;李宏毅	zh_TW
dc.contributor.oralexamcommittee	Hsuan-Tien Lin;Hung-yi Lee	en
dc.subject.keyword	模仿學習,擴散模型,行為克隆,生成對抗模仿學習,	zh_TW
dc.subject.keyword	Imitation Learning,Diffusion Models,Behavioral Cloning,Generative Adversarial Imitation Learning,	en
dc.relation.page	115	-
dc.identifier.doi	10.6342/NTU202401755	-
dc.rights.note	同意授權(限校園內公開)	-
dc.date.accepted	2024-07-16	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	電信工程學研究所	-
顯示於系所單位：	電信工程學研究所

文件中的檔案：

檔案	大小	格式
ntu-112-2.pdf 授權僅限NTU校內IP使用（校園外請利用VPN校外連線服務）	8.18 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。