Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 電機工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/89128
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor于天立zh_TW
dc.contributor.advisorTian-Li Yuen
dc.contributor.author劉容均zh_TW
dc.contributor.authorJung-Chun Liuen
dc.date.accessioned2023-08-16T17:14:54Z-
dc.date.available2023-11-09-
dc.date.copyright2023-08-16-
dc.date.issued2023-
dc.date.submitted2023-08-10-
dc.identifier.citationD. Abel, D. Arumugam, L. Lehnert, and M. Littman. State abstractions for lifelong reinforcement learning. In Proceedings of the 35th International Conference on Machine Learning, pages 10–19. PMLR, July 2018. ISSN: 2640-3498.
J. Andreas, D. Klein, and S. Levine. Modular multitask reinforcement learning with policy sketches. In International Conference on Machine Learning, pages 166–175, 2017.
A. Arora, H. Fiorino, D. Pellier, M. M ́etivier, and S. Pesty. A review of learning planning action models. The Knowledge Engineering Review, 33:e20, 2018.
K. Arulkumaran, M. P. Deisenroth, M. Brundage, and A. A. Bharath. Deep reinforcement learning: A brief survey. IEEE Signal Processing Magazine, 34(6):26–38, 2017.
R. S. Aylett and G. J. Petley. AI planning: Solutions for real world problems. Knowledge-Based Systems, 13(2):61–69, 2000.
T. Back, U. Hammel, and H.-P. Schwefel. Evolutionary computation: Comments on the history and current state. IEEE Transactions on Evolutionary Computation, 1(1):3–17, 1997.
E. A. Brooks, J. Rajendran, R. L. Lewis, and S. Singh. Reinforcement learning of implicit and explicit control flow in instructions. In International Conference on Machine Learning, pages 1082–1091, 2021.
R. W. Byrne and A. E. Russon. Learning by imitation: A hierarchical approach. Behavioral and Brain Sciences, 21(5):667–684, 1998.
L. C. Cobo, P. Zang, C. L. I. Jr, and A. L. Thomaz. Automatic state abstraction from demonstration. In International Joint Conference on Artificial Intelligence, volume 22, page 1243, 2011.
P. Dhariwal, C. Hesse, O. Klimov, A. Nichol, M. Plappert, A. Radford, J. Schulman, S. Sidor, Y. Wu, and P. Zhokhov. OpenAI baselines. https: //github.com/openai/baselines, 2017.
C. Florensa, Y. Duan, and P. Abbeel. Stochastic neural networks for hierarchical reinforcement learning. 2017. arXiv:1704.03012 [cs].
D. Furelos-Blanco, M. Law, A. Jonsson, K. Broda, and A. Russo. Induction and exploitation of subgoal automata for reinforcement learning. Journal of Artificial Intelligence Research, 70:1031–1116, 2021.
D. Furelos-Blanco, M. Law, A. Russo, K. Broda, and A. Jonsson. Induction of subgoal automata for reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 3890–3897, 2020.
M. Ghallab, C. Knoblock, D. Wilkins, A. Barrett, D. Christianson, M. Friedman, C. Kwok, K. Golden, S. Penberthy, D. Smith, Y. Sun, and D. Weld. PDDL - the planning domain definition language. Technical Report, 1998.
J. J. Grefenstette, C. L. Ramsey, and A. C. Schultz. Learning sequential decision rules using simulation models and competition. Machine Learning, 5(4):355– 381, Oct. 1990.
L. Guan, S. Sreedharan, and S. Kambhampati. Leveraging approximate symbolic models for reinforcement learning via skill diversity, June 2022. arXiv:2202.02886 [cs].
B. Hayes and B. Scassellati. Autonomously constructing hierarchical task networks for planning and human-robot collaboration. In 2016 IEEE International Conference on Robotics and Automation, pages 5469–5476, 2016.
M. Helmert. The fast downward planning system. Journal of Artificial Intelligence Research, 26:191–246, 2006.
J. Ho and S. Ermon. Generative adversarial imitation learning. Advances in neural information processing systems, 29, 2016.
J. Hoffmann. FF: the fast-forward planning system. AI magazine, 22(3):57–57, 2001.
R. T. Icarte, T. Klassen, R. Valenzano, and S. McIlraith. Using reward machines for high-level task specification and decomposition in reinforcement learning. In Proceedings of the 35th International Conference on Machine Learning, pages 2107–2116, 2018.
R. T. Icarte, T. Q. Klassen, R. Valenzano, and S. A. McIlraith. Reward machines: exploiting reward function structure in reinforcement learning. Journal of Artificial Intelligence Research, 73, 2022.
T. D. Kulkarni, K. R. Narasimhan, A. Saeedi, and J. B. Tenenbaum. Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation. Advances in neural information processing systems, 29, 2016.
P. Ladosz, L. Weng, M. Kim, and H. Oh. Exploration in deep reinforcement learning: A survey. Information Fusion, 85:1–22, 2022.
lcswillems. Pytorch actor-critic deep reinforcement learning algorithms: A2C and PPO. https://github.com/lcswillems/torch-ac, 2022.
A. Liu, S. Sohn, M. Qazwini, and H. Lee. Learning parameterized task structure for generalization to unseen entities. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 7534–7541, 2022.
W.-Y. Loh. Classification and regression trees. Wiley interdisciplinary reviews: data mining and knowledge discovery, 1(1):14–23, 2011.
V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. P. Lillicrap, T. Harley, D. Silver, and K. Kavukcuoglu. Asynchronous methods for deep reinforcement learning, 2016. arXiv:1602.01783 [cs].
V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, et al. Human-level control through deep reinforcement learning. nature, 518(7540):529–533, 2015.
T. M. Moerland, J. Broekens, A. Plaat, C. M. Jonker, et al. Model-based reinforcement learning: A survey. Foundations and Trends® in Machine Learning, 16(1):1–118, 2023.
H. M. Pasula, L. S. Zettlemoyer, and L. P. Kaelbling. Learning symbolic models of stochastic domains. Journal of Artificial Intelligence Research, 29:309–352, 2007.
S. Pateria, B. Subagdja, A.-h. Tan, and C. Quek. Hierarchical reinforcement learning: A comprehensive survey. ACM Computing Surveys, 54(5):1–35, 2022.
A. Raffin, A. Hill, A. Gleave, A. Kanervisto, M. Ernestus, and N. Dormann. Stable-baselines3: Reliable reinforcement learning implementations. Journal of Machine Learning Research, 22(268):1–8, 2021.
R. Raileanu and T. Rocktäschel. RIDE: Rewarding impact-driven exploration for procedurally-generated environments. In International Conference on Learning Representations, 2020.
S. Ross, G. Gordon, and D. Bagnell. A reduction of imitation learning and structured prediction to no-regret online learning. In Proceedings of the fourteenth international conference on artificial intelligence and statistics, pages 627–635. JMLR Workshop and Conference Proceedings, 2011.
S. J. Russell and P. Norvig. Artificial intelligence: a modern approach. Pearson Education, Inc., 3rd edition, 2010.
J. Schulman, S. Levine, P. Abbeel, M. Jordan, and P. Moritz. Trust region policy optimization. In F. Bach and D. Blei, editors, Proceedings of the 32nd International Conference on Machine Learning, volume 37 of Proceedings of Machine Learning Research, pages 1889–1897, Lille, France, 2015. PMLR.
J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov. Proximal policy optimization algorithms, 2017. arXiv:1707.06347 [cs].
C. E. Shannon. A mathematical theory of communication. The Bell system technical journal, 27(3):379–423, 1948.
T. Silver and R. Chitnis. PDDLGym: gym environments from PDDL problems. In International Conference on Automated Planning and Scheduling (ICAPS) PRL Workshop, 2020.
S. Sohn, J. Oh, and H. Lee. Hierarchical reinforcement learning for zero-shot generalization with subtask dependencies. Advances in neural information processing systems, 31, 2018.
S. Sohn, H. Woo, J. Choi, and H. Lee. Meta reinforcement learning with autonomous inference of subtask dependencies, 2020. arXiv:2001.00248 [cs, stat].
S. Sohn, H. Woo, J. Choi, l. qiang, I. Gur, A. Faust, and H. Lee. Fast inference and transfer of compositional task structures for few-shot task generalization. In Uncertainty in Artificial Intelligence, pages 1857–1865, 2022.
S. Sukhbaatar, E. Denton, A. Szlam, and R. Fergus. Learning goal embeddings via self-play for hierarchical reinforcement learning, 2018. arXiv:1811.09083 [cs, stat].
S.-H. Sun, T.-L. Wu, and J. J. Lim. Program guided agent. In International Conference on Learning Representations, 2020.
M. Svetlik, M. Leonetti, J. Sinapov, R. Shah, N. Walker, and P. Stone. Automatic curriculum graph generation for reinforcement learning agents. Proceedings of the AAAI Conference on Artificial Intelligence, 31(1), 2017.
R. Toro Icarte, E. Waldie, T. Klassen, R. Valenzano, M. Castro, and S. McIlraith. Learning reward machines for partially observable reinforcement learning. Advances in Neural Information Processing Systems, 32, 2019.
S. Wan, Y. Tang, Y. Tian, and T. Kaneko. Deir: Efficient and robust exploration through discriminative-model-based episodic intrinsic rewards. In International Joint Conference on Artificial Intelligence, 2023.
Z. Wang, S. Cai, A. Liu, X. Ma, and Y. Liang. Describe, explain, plan and select: Interactive planning with large language models enables open-world multi-task agents, 2023. arXiv:2302.01560 [cs].
Z. Xu, B. Wu, A. Ojha, D. Neider, and U. Topcu. Active finite reward automaton inference and reinforcement learning using queries and counterexamples. In A. Holzinger, P. Kieseberg, A. M. Tjoa, and E. Weippl, editors, Machine Learning and Knowledge Extraction, pages 115–135. Springer International Publishing, 2021.
C. A. Y. Blandin, L. Proteau. On the cognitive processes underlying contextual interference and observational learning. Journal of Motor Behavior, 26(1):18– 26, 1994.
-
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/89128-
dc.description.abstract為了處理層次性和組合性的決策問題,智能代理人需要任務結構和子任務規則的領域知識表示,以進行規劃和推理。先前的方法通常假設預定義的子任務存在,因為在缺乏領域知識的狀況下確定子任務具有困難性。因此,我們提出了一個框架,從專家示範中自動歸納推論子任務以解決複雜任務。該框架涵蓋了經典規劃、深度強化學習和演化計算,過程包括為歸納符號規則、從目標構建任務結構,以及基於任務結構提供內在獎勵。我們利用基因程式設計進行符號規則推論,在此過程中,規則模型的選擇反映了先驗領域知識的效果規則。我們在兩個環境中評估了該框架,包括 Minecraft 環境,並證明它提升了深度強化學習代理的學習效率。此外,我們還展示了該框架能通過組合任務結構和推論新規則,展現在任務和技能層面的通用性。本研究對於整合框架作為解決層次性現實世界問題的認知架構提供了深入的觀點。zh_TW
dc.description.abstractTo deal with hierarchical and compositional decision-making problems, intelligent agents necessitate domain knowledge representation on task structures and subtask rules for planning and reasoning. Previous approaches often rely on strong assumptions about pre-defined subtasks due to the difficulty of determining subtasks lacking domain knowledge. Therefore, we propose a framework that automatically induces subtasks from expert demonstrations to solve complex tasks. The framework encompasses planning, deep reinforcement learning (DRL), and evolutionary computation, and the procedure involves inducing symbolic rules, constructing task structures from goals, and providing intrinsic rewards based on task structures. We utilize genetic programming for symbolic rule induction, where the selection of the rule model reflects prior domain knowledge of effect rules. We evaluate the framework in two environments, including the Minecraft environment, and demonstrate that it improves the performance of DRL agents. In addition, we also demonstrate the generalizability in task and skill level by composing the task structure and inducing the new rules. This research contributes insights into integrated frameworks as a cognitive architecture to address hierarchical real-world problems.en
dc.description.provenanceSubmitted by admin ntu (admin@lib.ntu.edu.tw) on 2023-08-16T17:14:54Z
No. of bitstreams: 0
en
dc.description.provenanceMade available in DSpace on 2023-08-16T17:14:54Z (GMT). No. of bitstreams: 0en
dc.description.tableofcontents口試委員會審定書 i
致謝 ii
中文摘要 iii
Abstract iv
Contents v
List of Figures ix
List of Tables xii
List of Symbols xiii
1 Introduction 1
2 Background 5
2.1 Deep Reinforcement Learning 6
2.2 Classical Planning 6
2.3 Evolutionary Computation 7
2.4 Related Works 8
2.4.1 Learning Abstraction from Demonstrations 8
2.4.2 Subtask Inference 9
3 Problem Formulation and Preliminaries 11
3.1 Markov Decision Process 11
3.2 Planning Domain Definition Language 12
3.3 MDP Problems in Planning Language 13
3.4 Critical Action 15
3.4.1 Effect and Precondition Variable Space 16
3.4.2 Action Schema 17
3.4.3 Effect Rule 19
3.4.4 Precondition Rule 20
3.5 Critical Action Graph 20
4 Inducing Hierarchical Structure and Symbolic Knowledge 23
4.1 Overview 23
4.2 Induction Module 24
4.2.1 Collecting Demonstrations 25
4.2.2 Extracting Action-Effect Linkages from Demonstrations 26
4.2.3 Determining Effect Symbolic Rules 28
4.2.4 Determining Precondition 29
4.2.5 Summary 31
4.3 Training Module 32
4.3.1 Inferring Critical Action Graph 33
4.3.2 Deep Reinforcement Learning Agent 33
4.3.3 Intrinsic Rewards 34
4.3.4 Summary 35
5 Test Environments and Experiment Settings 37
5.1 Switch 38
5.1.1 Number of Switches 40
5.1.2 Order 40
5.1.3 Distractors 41
5.1.4 Four Rooms 41
5.2 Minecraft 42
5.3 Implementation Detail 45
5.3.1 Advantage Actor-Critic 46
5.3.2 Genetic Programming 47
5.3.3 Reward Function 50
5.4 Compared Algorithms 50
5.4.1 Proximal Policy Optimization and Deep Q-Network 51
5.4.2 Rewarding Impact-Driven Exploration 51
5.4.3 Behavior Cloning 52
5.4.4 Generative Adversarial Imitation Learning 53
6 Experiments and Discussions 54
6.1 Preliminary Experiment Results 54
6.1.1 Action-Effect Linkage 55
6.1.2 Symbolic Regression 55
6.1.3 Intrinsic Reward and Pre-Training 57
6.2 Experiment Results on Performance 60
6.2.1 Switch 60
6.2.2 Minecraft 62
6.3 Generalizability 63
6.3.1 Preliminaries 64
6.3.2 Task Generalization 65
6.3.3 Skill Generalization 66
6.4 Discussions 67
6.4.1 Extracting Task Structures from Demonstrations 67
6.4.2 Comparison with Other Methods 67
6.4.3 Limitation 69
7 Conclusion and Future Work 72
Bibliography 73
A Task Structure of Test Problems 80
B Mutual Information of Action-Effect Pairs in Minecraft 82
-
dc.language.isoen-
dc.subject遺傳程式設計zh_TW
dc.subject經典規劃zh_TW
dc.subject決策問題zh_TW
dc.subject歸納學習zh_TW
dc.subject示範學習zh_TW
dc.subject深度強化學習zh_TW
dc.subjectLearning from Demonstrationen
dc.subjectDeep Reinforcement Learningen
dc.subjectClassical Planningen
dc.subjectDecision Makingen
dc.subjectInductive Learningen
dc.subjectGenetic Programmingen
dc.title利用遺傳程式設計從專家示範自動推論任務子結構zh_TW
dc.titleAutomatic Induction of Task Substructures from Expert Demonstrations via Genetic Programmingen
dc.typeThesis-
dc.date.schoolyear111-2-
dc.description.degree碩士-
dc.contributor.oralexamcommittee王勝德;陳穎平zh_TW
dc.contributor.oralexamcommitteeSheng-De Wang;Ying-Ping Chenen
dc.subject.keyword歸納學習,示範學習,決策問題,深度強化學習,經典規劃,遺傳程式設計,zh_TW
dc.subject.keywordInductive Learning,Learning from Demonstration,Decision Making,Deep Reinforcement Learning,Classical Planning,Genetic Programming,en
dc.relation.page83-
dc.identifier.doi10.6342/NTU202303054-
dc.rights.note同意授權(全球公開)-
dc.date.accepted2023-08-11-
dc.contributor.author-college電機資訊學院-
dc.contributor.author-dept電機工程學系-
顯示於系所單位:電機工程學系

文件中的檔案:
檔案 大小格式 
ntu-111-2.pdf1.99 MBAdobe PDF檢視/開啟
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved