使用自適應性離散動作空間的基於模型強化學習

沈郁鈞; Yu-Chun Shen

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/92220

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	陳銘憲	zh_TW
dc.contributor.advisor	Ming-Syan Chen	en
dc.contributor.author	沈郁鈞	zh_TW
dc.contributor.author	Yu-Chun Shen	en
dc.date.accessioned	2024-03-17T16:12:57Z	-
dc.date.available	2024-03-18	-
dc.date.copyright	2024-03-16	-
dc.date.issued	2024	-
dc.date.submitted	2024-02-19	-
dc.identifier.citation	[1] Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. Layer normalization. arXiv preprint arXiv:1607.06450, 2016. [2] Chang Chen, Yi-Fu Wu, Jaesik Yoon, and Sungjin Ahn. Transdreamer: Reinforcement learning with transformer world models. arXiv preprint arXiv:2202.09481, 2022. [3] Fei Deng, Ingook Jang, and Sungjin Ahn. DreamerPro: Reconstruction-free model-based reinforcement learning with prototypical representations. In International Conference on Machine Learning, pages 4956–4975. PMLR, 2022. [4] Fei Deng, Junyeong Park, and Sungjin Ahn. Facing off world model backbones: Rnns, transformers, and s4. In Thirty-seventh Conference on Neural Information Processing Systems, 2023. [5] Rahul Dey and Fathi M Salem. Gate-variants of gated recurrent unit (gru) neural networks. In 2017 IEEE 60th international midwest symposium on circuits and systems (MWSCAS), pages 1597–1600. IEEE, 2017. [6] Albert Gu, Karan Goel, and Christopher Re. Efficiently modeling long sequences with structured state spaces. In International Conference on Learning Representations, 2021. [7] David Ha and Jürgen Schmidhuber. Recurrent world models facilitate policy evolution. Advances in Neural Information Processing Systems, 31, 2018. [8] Danijar Hafner, Timothy Lillicrap, Jimmy Ba, and Mohammad Norouzi. Dream to Control: Learning Behaviors by Latent Imagination. In International Conference on Learning Representations, 2020. [9] Danijar Hafner, Timothy Lillicrap, Ian Fischer, Ruben Villegas, David Ha, Honglak Lee, and James Davidson. Learning Latent Dynamics for Planning from Pixels. In International conference on machine learning, pages 2555–2565. PMLR, 2019. [10] Danijar Hafner, Timothy P Lillicrap, Mohammad Norouzi, and Jimmy Ba. Mastering Atari with Discrete World Models. In International Conference on Learning Representations, 2021. [11] Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, and Timothy Lillicrap. Mastering Diverse Domains through World Models. arXiv preprint arXiv:2301.04104, 2023. [12] Nicklas Hansen, Hao Su, and Xiaolong Wang. Temporal difference learning for model predictive control. In International Conference on Machine Learning, pages 8387–8406. PMLR, 2022. [13] Nicklas Hansen, Hao Su, and Xiaolong Wang. TD-MPC2: Scalable, robust world models for continuous control. In International Conference on Learning Representations, 2024. [14] KS Holkar and Laxman M Waghmare. An overview of model predictive control. International Journal of control and automation, 3(4):47–63, 2010. [15] Diederik P Kingma and Max Welling. Auto-encoding variational bayes, 2013. [16] Frank Klinker. Exponential moving average versus moving exponential average. Mathematische Semesterberichte, 58:97–107, 2011. [17] Timothy P Lillicrap, Jonathan J Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. Continuous control with deep reinforcement learning. In International Conference on Learning Representations, 2016. [18] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013. [19] Tung D Nguyen, Rui Shu, Tuan Pham, Hung Bui, and Stefano Ermon. Temporal predictive coding for model-based planning in latent space. In International Conference on Machine Learning, pages 8130–8139. PMLR, 2021. [20] Masashi Okada and Tadahiro Taniguchi. Dreaming: Model-based reinforcement learning by latent imagination without reconstruction. In 2021 IEEE international conference on robotics and automation (ICRA), pages 4209–4215. IEEE, 2021. [21] Masashi Okada and Tadahiro Taniguchi. Dreamingv2: Reinforcement learning with discrete world models without reconstruction. In 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 985–991. IEEE, 2022. [22] John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. Trust region policy optimization. In International Conference on Machine Learning, pages 1889–1897. PMLR, 2015. [23] John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017. [24] David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, et al. Mastering the game of go without human knowledge. nature, 550(7676):354– 359, 2017. [25] Matthijs TJ Spaan. Partially observable markov decision processes. In Reinforcement learning: State-of-the-art, pages 387–414. Springer, 2012. [26] Richard S Sutton and Andrew G Barto. Reinforcement learning: An introduction. MIT press, 2018. [27] Yunhao Tang and Shipra Agrawal. Discretizing Continuous Action Space for On- Policy Optimization. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pages 5981–5988, 2020. [28] Yuval Tassa, Saran Tunyasuvunakool, Alistair Muldal, Yotam Doron, Siqi Liu, Steven Bohez, Josh Merel, Tom Erez, Timothy Lillicrap, and Nicolas Heess. dm_control: Software and tasks for continuous control. Software Impacts, 6:100022, 2020. [29] Emanuel Todorov, Tom Erez, and Yuval Tassa. MuJoCo: A physics engine for model-based control. In 2012 IEEE/RSJ international conference on intelligent robots and systems, pages 5026–5033. IEEE, 2012.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/92220	-
dc.description.abstract	連續動作控制是強化學習中其中一個主要的研究議題。在連續控制任務中，智能體通過從連續動作空間中決定精確的最佳動作值以採取接下來的行動，這相對於具有離散動作空間的決策任務更爲複雜且具挑戰性。因此，連續動作空間離散化是減少應對連續控制任務複雜性的其中一種可行直觀方式。然而，固定的離散化連續動作空間可能會在不同的離散程度遇到不同的問題。本研究提出了一種適應性連續動作空間離散化方法，在初始階段離散化後的連續動作空間集會較小且間距較稀疏，在智能體訓練中期時，此離散化連續動作空間集合會進行擴展，透過增加集合內的元素來獲得更緊密的離散化連續動作空間集合。我們更近一步一致性和適應性離散化連續動作取樣方法應用於最先進的基於模型的強化學習（model-based reinforcement learning）演算法，並在多個連續控制任務上進行評估，並在大部分任務中和原先方法相比取得較優或相近的結果。除此之外，我們提出的方法在計算時間效率上也優於原始的連續動作取樣方法。	zh_TW
dc.description.abstract	Continuous control has emerged as a prominent area of focus within reinforcement learning. The agent takes action by determining the action value from a continuous action space for continuous control tasks, which is more challenging than decision-making tasks with discrete action space. Hence, continuous action space discretization is an intuitive approach to reduce the complexity of dealing with continuous control tasks. However, consistent action space discretization may encounter different problems depending on different fixed granularity. The present study introduces an adaptive continuous action space discretization approach, initializing with coarse discretization and then expanding the discretized action space set with denser granularity. We also apply both consistent and adaptive discretization methods to the state-of-the-art model-based reinforcement learning algorithm and benchmark several continuous control tasks. Our method achieves better or comparable results over the original action sampling method with superior computation time efficiency.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2024-03-17T16:12:56Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2024-03-17T16:12:57Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	Master’s Thesis Acceptance Certificate i Acknowledgements ii 摘要 iii Abstract iv Contents vi List of Figures viii List of Tables ix Chapter 1 Introduction 1 Chapter 2 Preliminaries 4 2.1 Partially Observable Markov Decision Process 4 2.2 Dreamer 5 2.3 Continuous Control 8 Chapter 3 Consistent Continuous Action Space Discretization 9 3.1 Discretization Process 9 3.2 Architecture 10 Chapter 4 Continuous Action Space Discretization with Adaptive Granularity 12 4.1 Motivation 12 4.2 Limitation 13 Chapter 5 Experiments 15 5.1 Settings 15 5.2 Comparison between Gaussian Method and Consistent Discretization 17 5.3 Comparison between Gaussian Method, Consistent Discretization, and Adaptive Granularity Discretization 20 5.4 Computational Time Cost Analysis 26 Chapter 6 Related Works 27 6.1 Variations of World Models 27 6.2 Variations of Dreamer 28 Chapter 7 Conclusion 29 References 30 Appendix A — Additional Experiments among all Action Sampling Methods 34	-
dc.language.iso	en	-
dc.subject	強化學習	zh_TW
dc.subject	連續動作控制	zh_TW
dc.subject	離散化	zh_TW
dc.subject	Continuous Control	en
dc.subject	Discretization	en
dc.subject	Reinforcement Learning	en
dc.title	使用自適應性離散動作空間的基於模型強化學習	zh_TW
dc.title	Adaptive Discretized Action Space Approach for Model-Based Reinforcement Learning	en
dc.type	Thesis	-
dc.date.schoolyear	112-1	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	林澤;孫紹華;高宏宇	zh_TW
dc.contributor.oralexamcommittee	Che Lin;Shao-Hua Sun;Hung-Yu Kao	en
dc.subject.keyword	強化學習,連續動作控制,離散化,	zh_TW
dc.subject.keyword	Reinforcement Learning,Continuous Control,Discretization,	en
dc.relation.page	36	-
dc.identifier.doi	10.6342/NTU202400390	-
dc.rights.note	未授權	-
dc.date.accepted	2024-02-20	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	電機工程學系	-
顯示於系所單位：	電機工程學系

文件中的檔案：

檔案	大小	格式
ntu-112-1.pdf 未授權公開取用	7.91 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。