利用遺傳程式設計從專家示範自動推論任務子結構

劉容均; Jung-Chun Liu

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/89128

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	于天立	zh_TW
dc.contributor.advisor	Tian-Li Yu	en
dc.contributor.author	劉容均	zh_TW
dc.contributor.author	Jung-Chun Liu	en
dc.date.accessioned	2023-08-16T17:14:54Z	-
dc.date.available	2023-11-09	-
dc.date.copyright	2023-08-16	-
dc.date.issued	2023	-
dc.date.submitted	2023-08-10	-
dc.identifier.citation	D. Abel, D. Arumugam, L. Lehnert, and M. Littman. State abstractions for lifelong reinforcement learning. In Proceedings of the 35th International Conference on Machine Learning, pages 10–19. PMLR, July 2018. ISSN: 2640-3498. J. Andreas, D. Klein, and S. Levine. Modular multitask reinforcement learning with policy sketches. In International Conference on Machine Learning, pages 166–175, 2017. A. Arora, H. Fiorino, D. Pellier, M. M ́etivier, and S. Pesty. A review of learning planning action models. The Knowledge Engineering Review, 33:e20, 2018. K. Arulkumaran, M. P. Deisenroth, M. Brundage, and A. A. Bharath. Deep reinforcement learning: A brief survey. IEEE Signal Processing Magazine, 34(6):26–38, 2017. R. S. Aylett and G. J. Petley. AI planning: Solutions for real world problems. Knowledge-Based Systems, 13(2):61–69, 2000. T. Back, U. Hammel, and H.-P. Schwefel. Evolutionary computation: Comments on the history and current state. IEEE Transactions on Evolutionary Computation, 1(1):3–17, 1997. E. A. Brooks, J. Rajendran, R. L. Lewis, and S. Singh. Reinforcement learning of implicit and explicit control flow in instructions. In International Conference on Machine Learning, pages 1082–1091, 2021. R. W. Byrne and A. E. Russon. Learning by imitation: A hierarchical approach. Behavioral and Brain Sciences, 21(5):667–684, 1998. L. C. Cobo, P. Zang, C. L. I. Jr, and A. L. Thomaz. Automatic state abstraction from demonstration. In International Joint Conference on Artificial Intelligence, volume 22, page 1243, 2011. P. Dhariwal, C. Hesse, O. Klimov, A. Nichol, M. Plappert, A. Radford, J. Schulman, S. Sidor, Y. Wu, and P. Zhokhov. OpenAI baselines. https: //github.com/openai/baselines, 2017. C. Florensa, Y. Duan, and P. Abbeel. Stochastic neural networks for hierarchical reinforcement learning. 2017. arXiv:1704.03012 [cs]. D. Furelos-Blanco, M. Law, A. Jonsson, K. Broda, and A. Russo. Induction and exploitation of subgoal automata for reinforcement learning. Journal of Artificial Intelligence Research, 70:1031–1116, 2021. D. Furelos-Blanco, M. Law, A. Russo, K. Broda, and A. Jonsson. Induction of subgoal automata for reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 3890–3897, 2020. M. Ghallab, C. Knoblock, D. Wilkins, A. Barrett, D. Christianson, M. Friedman, C. Kwok, K. Golden, S. Penberthy, D. Smith, Y. Sun, and D. Weld. PDDL - the planning domain definition language. Technical Report, 1998. J. J. Grefenstette, C. L. Ramsey, and A. C. Schultz. Learning sequential decision rules using simulation models and competition. Machine Learning, 5(4):355– 381, Oct. 1990. L. Guan, S. Sreedharan, and S. Kambhampati. Leveraging approximate symbolic models for reinforcement learning via skill diversity, June 2022. arXiv:2202.02886 [cs]. B. Hayes and B. Scassellati. Autonomously constructing hierarchical task networks for planning and human-robot collaboration. In 2016 IEEE International Conference on Robotics and Automation, pages 5469–5476, 2016. M. Helmert. The fast downward planning system. Journal of Artificial Intelligence Research, 26:191–246, 2006. J. Ho and S. Ermon. Generative adversarial imitation learning. Advances in neural information processing systems, 29, 2016. J. Hoffmann. FF: the fast-forward planning system. AI magazine, 22(3):57–57, 2001. R. T. Icarte, T. Klassen, R. Valenzano, and S. McIlraith. Using reward machines for high-level task specification and decomposition in reinforcement learning. In Proceedings of the 35th International Conference on Machine Learning, pages 2107–2116, 2018. R. T. Icarte, T. Q. Klassen, R. Valenzano, and S. A. McIlraith. Reward machines: exploiting reward function structure in reinforcement learning. Journal of Artificial Intelligence Research, 73, 2022. T. D. Kulkarni, K. R. Narasimhan, A. Saeedi, and J. B. Tenenbaum. Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation. Advances in neural information processing systems, 29, 2016. P. Ladosz, L. Weng, M. Kim, and H. Oh. Exploration in deep reinforcement learning: A survey. Information Fusion, 85:1–22, 2022. lcswillems. Pytorch actor-critic deep reinforcement learning algorithms: A2C and PPO. https://github.com/lcswillems/torch-ac, 2022. A. Liu, S. Sohn, M. Qazwini, and H. Lee. Learning parameterized task structure for generalization to unseen entities. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 7534–7541, 2022. W.-Y. Loh. Classification and regression trees. Wiley interdisciplinary reviews: data mining and knowledge discovery, 1(1):14–23, 2011. V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. P. Lillicrap, T. Harley, D. Silver, and K. Kavukcuoglu. Asynchronous methods for deep reinforcement learning, 2016. arXiv:1602.01783 [cs]. V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, et al. Human-level control through deep reinforcement learning. nature, 518(7540):529–533, 2015. T. M. Moerland, J. Broekens, A. Plaat, C. M. Jonker, et al. Model-based reinforcement learning: A survey. Foundations and Trends® in Machine Learning, 16(1):1–118, 2023. H. M. Pasula, L. S. Zettlemoyer, and L. P. Kaelbling. Learning symbolic models of stochastic domains. Journal of Artificial Intelligence Research, 29:309–352, 2007. S. Pateria, B. Subagdja, A.-h. Tan, and C. Quek. Hierarchical reinforcement learning: A comprehensive survey. ACM Computing Surveys, 54(5):1–35, 2022. A. Raffin, A. Hill, A. Gleave, A. Kanervisto, M. Ernestus, and N. Dormann. Stable-baselines3: Reliable reinforcement learning implementations. Journal of Machine Learning Research, 22(268):1–8, 2021. R. Raileanu and T. Rocktäschel. RIDE: Rewarding impact-driven exploration for procedurally-generated environments. In International Conference on Learning Representations, 2020. S. Ross, G. Gordon, and D. Bagnell. A reduction of imitation learning and structured prediction to no-regret online learning. In Proceedings of the fourteenth international conference on artificial intelligence and statistics, pages 627–635. JMLR Workshop and Conference Proceedings, 2011. S. J. Russell and P. Norvig. Artificial intelligence: a modern approach. Pearson Education, Inc., 3rd edition, 2010. J. Schulman, S. Levine, P. Abbeel, M. Jordan, and P. Moritz. Trust region policy optimization. In F. Bach and D. Blei, editors, Proceedings of the 32nd International Conference on Machine Learning, volume 37 of Proceedings of Machine Learning Research, pages 1889–1897, Lille, France, 2015. PMLR. J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov. Proximal policy optimization algorithms, 2017. arXiv:1707.06347 [cs]. C. E. Shannon. A mathematical theory of communication. The Bell system technical journal, 27(3):379–423, 1948. T. Silver and R. Chitnis. PDDLGym: gym environments from PDDL problems. In International Conference on Automated Planning and Scheduling (ICAPS) PRL Workshop, 2020. S. Sohn, J. Oh, and H. Lee. Hierarchical reinforcement learning for zero-shot generalization with subtask dependencies. Advances in neural information processing systems, 31, 2018. S. Sohn, H. Woo, J. Choi, and H. Lee. Meta reinforcement learning with autonomous inference of subtask dependencies, 2020. arXiv:2001.00248 [cs, stat]. S. Sohn, H. Woo, J. Choi, l. qiang, I. Gur, A. Faust, and H. Lee. Fast inference and transfer of compositional task structures for few-shot task generalization. In Uncertainty in Artificial Intelligence, pages 1857–1865, 2022. S. Sukhbaatar, E. Denton, A. Szlam, and R. Fergus. Learning goal embeddings via self-play for hierarchical reinforcement learning, 2018. arXiv:1811.09083 [cs, stat]. S.-H. Sun, T.-L. Wu, and J. J. Lim. Program guided agent. In International Conference on Learning Representations, 2020. M. Svetlik, M. Leonetti, J. Sinapov, R. Shah, N. Walker, and P. Stone. Automatic curriculum graph generation for reinforcement learning agents. Proceedings of the AAAI Conference on Artificial Intelligence, 31(1), 2017. R. Toro Icarte, E. Waldie, T. Klassen, R. Valenzano, M. Castro, and S. McIlraith. Learning reward machines for partially observable reinforcement learning. Advances in Neural Information Processing Systems, 32, 2019. S. Wan, Y. Tang, Y. Tian, and T. Kaneko. Deir: Efficient and robust exploration through discriminative-model-based episodic intrinsic rewards. In International Joint Conference on Artificial Intelligence, 2023. Z. Wang, S. Cai, A. Liu, X. Ma, and Y. Liang. Describe, explain, plan and select: Interactive planning with large language models enables open-world multi-task agents, 2023. arXiv:2302.01560 [cs]. Z. Xu, B. Wu, A. Ojha, D. Neider, and U. Topcu. Active finite reward automaton inference and reinforcement learning using queries and counterexamples. In A. Holzinger, P. Kieseberg, A. M. Tjoa, and E. Weippl, editors, Machine Learning and Knowledge Extraction, pages 115–135. Springer International Publishing, 2021. C. A. Y. Blandin, L. Proteau. On the cognitive processes underlying contextual interference and observational learning. Journal of Motor Behavior, 26(1):18– 26, 1994.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/89128	-
dc.description.abstract	為了處理層次性和組合性的決策問題，智能代理人需要任務結構和子任務規則的領域知識表示，以進行規劃和推理。先前的方法通常假設預定義的子任務存在，因為在缺乏領域知識的狀況下確定子任務具有困難性。因此，我們提出了一個框架，從專家示範中自動歸納推論子任務以解決複雜任務。該框架涵蓋了經典規劃、深度強化學習和演化計算，過程包括為歸納符號規則、從目標構建任務結構，以及基於任務結構提供內在獎勵。我們利用基因程式設計進行符號規則推論，在此過程中，規則模型的選擇反映了先驗領域知識的效果規則。我們在兩個環境中評估了該框架，包括 Minecraft 環境，並證明它提升了深度強化學習代理的學習效率。此外，我們還展示了該框架能通過組合任務結構和推論新規則，展現在任務和技能層面的通用性。本研究對於整合框架作為解決層次性現實世界問題的認知架構提供了深入的觀點。	zh_TW
dc.description.abstract	To deal with hierarchical and compositional decision-making problems, intelligent agents necessitate domain knowledge representation on task structures and subtask rules for planning and reasoning. Previous approaches often rely on strong assumptions about pre-defined subtasks due to the difficulty of determining subtasks lacking domain knowledge. Therefore, we propose a framework that automatically induces subtasks from expert demonstrations to solve complex tasks. The framework encompasses planning, deep reinforcement learning (DRL), and evolutionary computation, and the procedure involves inducing symbolic rules, constructing task structures from goals, and providing intrinsic rewards based on task structures. We utilize genetic programming for symbolic rule induction, where the selection of the rule model reflects prior domain knowledge of effect rules. We evaluate the framework in two environments, including the Minecraft environment, and demonstrate that it improves the performance of DRL agents. In addition, we also demonstrate the generalizability in task and skill level by composing the task structure and inducing the new rules. This research contributes insights into integrated frameworks as a cognitive architecture to address hierarchical real-world problems.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2023-08-16T17:14:54Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2023-08-16T17:14:54Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	口試委員會審定書 i 致謝 ii 中文摘要 iii Abstract iv Contents v List of Figures ix List of Tables xii List of Symbols xiii 1 Introduction 1 2 Background 5 2.1 Deep Reinforcement Learning 6 2.2 Classical Planning 6 2.3 Evolutionary Computation 7 2.4 Related Works 8 2.4.1 Learning Abstraction from Demonstrations 8 2.4.2 Subtask Inference 9 3 Problem Formulation and Preliminaries 11 3.1 Markov Decision Process 11 3.2 Planning Domain Definition Language 12 3.3 MDP Problems in Planning Language 13 3.4 Critical Action 15 3.4.1 Effect and Precondition Variable Space 16 3.4.2 Action Schema 17 3.4.3 Effect Rule 19 3.4.4 Precondition Rule 20 3.5 Critical Action Graph 20 4 Inducing Hierarchical Structure and Symbolic Knowledge 23 4.1 Overview 23 4.2 Induction Module 24 4.2.1 Collecting Demonstrations 25 4.2.2 Extracting Action-Effect Linkages from Demonstrations 26 4.2.3 Determining Effect Symbolic Rules 28 4.2.4 Determining Precondition 29 4.2.5 Summary 31 4.3 Training Module 32 4.3.1 Inferring Critical Action Graph 33 4.3.2 Deep Reinforcement Learning Agent 33 4.3.3 Intrinsic Rewards 34 4.3.4 Summary 35 5 Test Environments and Experiment Settings 37 5.1 Switch 38 5.1.1 Number of Switches 40 5.1.2 Order 40 5.1.3 Distractors 41 5.1.4 Four Rooms 41 5.2 Minecraft 42 5.3 Implementation Detail 45 5.3.1 Advantage Actor-Critic 46 5.3.2 Genetic Programming 47 5.3.3 Reward Function 50 5.4 Compared Algorithms 50 5.4.1 Proximal Policy Optimization and Deep Q-Network 51 5.4.2 Rewarding Impact-Driven Exploration 51 5.4.3 Behavior Cloning 52 5.4.4 Generative Adversarial Imitation Learning 53 6 Experiments and Discussions 54 6.1 Preliminary Experiment Results 54 6.1.1 Action-Effect Linkage 55 6.1.2 Symbolic Regression 55 6.1.3 Intrinsic Reward and Pre-Training 57 6.2 Experiment Results on Performance 60 6.2.1 Switch 60 6.2.2 Minecraft 62 6.3 Generalizability 63 6.3.1 Preliminaries 64 6.3.2 Task Generalization 65 6.3.3 Skill Generalization 66 6.4 Discussions 67 6.4.1 Extracting Task Structures from Demonstrations 67 6.4.2 Comparison with Other Methods 67 6.4.3 Limitation 69 7 Conclusion and Future Work 72 Bibliography 73 A Task Structure of Test Problems 80 B Mutual Information of Action-Effect Pairs in Minecraft 82	-
dc.language.iso	en	-
dc.subject	遺傳程式設計	zh_TW
dc.subject	經典規劃	zh_TW
dc.subject	決策問題	zh_TW
dc.subject	歸納學習	zh_TW
dc.subject	示範學習	zh_TW
dc.subject	深度強化學習	zh_TW
dc.subject	Learning from Demonstration	en
dc.subject	Deep Reinforcement Learning	en
dc.subject	Classical Planning	en
dc.subject	Decision Making	en
dc.subject	Inductive Learning	en
dc.subject	Genetic Programming	en
dc.title	利用遺傳程式設計從專家示範自動推論任務子結構	zh_TW
dc.title	Automatic Induction of Task Substructures from Expert Demonstrations via Genetic Programming	en
dc.type	Thesis	-
dc.date.schoolyear	111-2	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	王勝德;陳穎平	zh_TW
dc.contributor.oralexamcommittee	Sheng-De Wang;Ying-Ping Chen	en
dc.subject.keyword	歸納學習,示範學習,決策問題,深度強化學習,經典規劃,遺傳程式設計,	zh_TW
dc.subject.keyword	Inductive Learning,Learning from Demonstration,Decision Making,Deep Reinforcement Learning,Classical Planning,Genetic Programming,	en
dc.relation.page	83	-
dc.identifier.doi	10.6342/NTU202303054	-
dc.rights.note	同意授權(全球公開)	-
dc.date.accepted	2023-08-11	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	電機工程學系	-
顯示於系所單位：	電機工程學系

文件中的檔案：

檔案	大小	格式
ntu-111-2.pdf	1.99 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。