Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 電機工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/93415
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor孫紹華zh_TW
dc.contributor.advisorShao-Hua Sunen
dc.contributor.author林辰宇zh_TW
dc.contributor.authorChen-Yu Linen
dc.date.accessioned2024-07-31T16:12:58Z-
dc.date.available2024-08-01-
dc.date.copyright2024-07-31-
dc.date.issued2024-
dc.date.submitted2024-07-23-
dc.identifier.citation[1] M. Ahn, A. Brohan, N. Brown, Y. Chebotar, O. Cortes, B. David, C. Finn, C. Fu, K. Gopalakrishnan, K. Hausman, A. Herzog, D. Ho, J. Hsu, J. Ibarz, B. Ichter, A. Irpan, E. Jang, R. J. Ruano, K. Jeffrey, S. Jesmonth, N. J. Joshi, R. Julian, D. Kalashnikov, Y. Kuang, K.-H. Lee, S. Levine, Y. Lu, L. Luu, C. Parada, P. Pastor, J. Quiambao, K. Rao, J. Rettinghouse, D. Reyes, P. Sermanet, N. Sievers, C. Tan, A. Toshev, V. Vanhoucke, F. Xia, T. Xiao, P. Xu, S. Xu, M. Yan, and A. Zeng. Do as i can, not as i say: Grounding language in robotic affordances, 2022.
[2] P. D. Boer, Kroese, S. Mannor, and R. Y. Rubinstein. A tutorial on the cross-entropy method. In Annals of Operations Research, volume 134, pages 19–67, 2004.
[3] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding, 2019.
[4] C. Finn, P. Abbeel, and S. Levine. Model-agnostic meta-learning for fast adaptation of deep networks, 2017.
[5] T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor, 2018.
[6] K. Hausman, J. T. Springenberg, Z. Wang, N. Heess, and M. Riedmiller. Learning an embedding space for transferable robot skills. In International Conference on Learning Representations, 2018.
[7] P. Hua, Y. Chen, and H. Xu. Simple emergent action representations from multi-task policy training, 2023.
[8] W. Huang, P. Abbeel, D. Pathak, and I. Mordatch. Language models as zero-shot planners: Extracting actionable knowledge for embodied agents, 2022.
[9] W. Huang, F. Xia, D. Shah, D. Driess, A. Zeng, Y. Lu, P. Florence, I. Mordatch, S. Levine, K. Hausman, and B. Ichter. Grounded decoding: Guiding text generation with grounded models for embodied agents, 2023.
[10] B. Li and L. Han. Distance weighted cosine similarity measure for text classification. In H. Yin, K. Tang, Y. Gao, F. Klawonn, M. Lee, T. Weise, B. Li, and X. Yao, editors, IDEAL, volume 8206 of Lecture Notes in Computer Science, pages 611–618. Springer, 2013.
[11] OpenAI. Gpt-4 technical report. ArXiv, abs/2303.08774, 2023.
[12] K. Pertsch, Y. Lee, and J. J. Lim. Accelerating reinforcement learning with learned skill priors. In Conference on Robot Learning (CoRL), 2020.
[13] K. Rakelly, A. Zhou, D. Quillen, C. Finn, and S. Levine. Efficient off-policy meta-reinforcement learning via probabilistic context variables, 2019.
[14] N. Reimers and I. Gurevych. Sentence-bert: Sentence embeddings using siamese bert-networks, 2019.
[15] G. Wang, Y. Xie, Y. Jiang, A. Mandlekar, C. Xiao, Y. Zhu, L. Fan, and A. Anandkumar. Voyager: An open-ended embodied agent with large language models. arXiv preprint arXiv: Arxiv-2305.16291, 2023.
[16] M. Xu, M. Veloso, and S. Song. ASPire: Adaptive skill priors for reinforcement learning. In A. H. Oh, A. Agarwal, D. Belgrave, and K. Cho, editors, Advances in Neural Information Processing Systems, 2022.
[17] R. Yang, H. Xu, Y. Wu, and X. Wang. Multi-task reinforcement learning with soft modularization, 2020.
[18] J. Zhang, J. Zhang, K. Pertsch, Z. Liu, X. Ren, M. Chang, S.-H. Sun, and J. J. Lim. Bootstrap your own skills: Learning to solve new tasks with large language model guidance, 2023.
-
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/93415-
dc.description.abstract在強化學習(Reinforcement Learning)領域中,獲取新技能一直是一項長期挑戰,通常需要大量的訓練和探索。技能導向的強化學習雖然前景光明,但在有效轉移已學習的技能以獲取新技能方面遇到了障礙。本研究旨在通過引入一種稱為“技能內插”的新方法來彌補這一差距,並利用語言模型基於先前的學習經驗來引導生成新技能。

與著重規劃和決策的傳統強化學習方法不同,我們的方法強調動作技能的內插。通過將語言模型整合到強化學習框架中,我們旨在使代理能夠在學習的技能之間無縫過渡,從而提高其在多樣環境中的適應性和效率。

受語言模型在自然語言處理任務中豐富能力的啟發,我們提出了一種方法,其中智能體利用語義表示通過內插生成中間技能。舉例來說,我們探索了在跑步和跳躍之間內插生成慢跑或跑跳等場景中的技能內插應用。

通過實驗的分析與證實,我們展示了新的方法使智能體更高效地獲取新技能方面的效果,充分利用從先前訓練經驗中獲得的知識。此外,我們分析了模型架構、訓練數據和自然語言等各種因素對技能內插效果的影響。

總而言之,本研究通過引入一種利用語言模型促進技能與技能間的獲取和轉移的新範式,對技能導向的強化學習領域作出了貢獻。所提出的框架為增強智能體在動態環境中的適應性和自主性開闢了未來研究的途徑。
zh_TW
dc.description.abstractWithin the reinforcement learning domain (RL), acquiring new skills has been a longstanding challenge, often requiring extensive training and exploration. Skill-based RL, while promising, has encountered hurdles in effectively transferring previously learned skills to acquire novel ones. This research endeavors to bridge this gap by introducing a novel approach termed ``skill-interpolation", which leverages language models to facilitate the generation of new skills based on prior learned experiences.

Unlike traditional RL methods, which primarily focus on planning and decision-making, our approach emphasizes the interpolation of locomotion skills. By integrating language models into the RL framework, we aim to enable agents to seamlessly transition between learned skills, thereby enhancing their adaptability and efficiency in diverse environments.

Drawing inspiration from the rich capabilities of language models in natural language processing tasks, we propose a methodology wherein agents utilize semantic representations to generate intermediate skills through interpolation. Specifically, we explore the application of skill-interpolation in scenarios such as interpolating running and jumping into jogging or galloping.

Through empirical evaluations and experiments, we demonstrate the efficacy of our approach in enabling agents to acquire new skills more efficiently, leveraging the knowledge gained from previous training experiences. Furthermore, we analyze the impact of various factors such as model architecture, training data, and semantic representation on the effectiveness of skill-interpolation.

Overall, this research contributes to advancing the field of skill-based reinforcement learning by introducing a novel paradigm that harnesses the power of language models to facilitate seamless skill acquisition and transfer. The proposed framework opens avenues for future research in enhancing agent adaptability and autonomy in dynamic environments.
en
dc.description.provenanceSubmitted by admin ntu (admin@lib.ntu.edu.tw) on 2024-07-31T16:12:58Z
No. of bitstreams: 0
en
dc.description.provenanceMade available in DSpace on 2024-07-31T16:12:58Z (GMT). No. of bitstreams: 0en
dc.description.tableofcontents摘要i
Abstract iii
Contents v
List of Figures vii
Chapter 1 Introduction 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Research Purpose and Contribution . . . . . . . . . . . . . . . . . . 2
Chapter 2 Literature Review 3
2.1 Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1.1 Skill-based Reinforcement Learning . . . . . . . . . . . . . . . . . 3
2.1.2 Multi-task Reinforcement Learning . . . . . . . . . . . . . . . . . . 4
2.1.3 Meta Reinforcement Learning . . . . . . . . . . . . . . . . . . . . 4
2.1.4 Skill Transfer and Skill Interpolation . . . . . . . . . . . . . . . . . 4
2.2 Language Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2.1 LLM Guidance for Embodied Agents . . . . . . . . . . . . . . . . 5
Chapter 3 Method 7
3.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.1.1 Soft Actor-Critic in Reinforcement Learning . . . . . . . . . . . . . 7
3.2 Skill Interpolation with Emergent Action Representation . . . . . . . 8
3.2.1 Emergent Action Representation . . . . . . . . . . . . . . . . . . . 8
3.2.2 Pre-training Skill Latent . . . . . . . . . . . . . . . . . . . . . . . . 9
3.3 Skill Interpolation with Language Guidance . . . . . . . . . . . . . . 10
3.3.1 Guiding Skill Interpolation via LLM . . . . . . . . . . . . . . . . . 10
3.3.2 Adapting to Target Skill . . . . . . . . . . . . . . . . . . . . . . . . 12
Chapter 4 Experiments 15
4.1 Baselines and Experimental Settings . . . . . . . . . . . . . . . . . . 15
4.1.1 Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.1.2 Variants of Our Approach . . . . . . . . . . . . . . . . . . . . . . . 15
4.2 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.2.1 Zero-shot Interpolation . . . . . . . . . . . . . . . . . . . . . . . . 16
4.2.2 Few-shot Adaptation . . . . . . . . . . . . . . . . . . . . . . . . . 17
Chapter 5 Conclusions and Future Work 19
5.1 Skill Interpolation with LLM Guidance . . . . . . . . . . . . . . . . 19
5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
References 21
-
dc.language.isoen-
dc.subject技能導向強化學習zh_TW
dc.subject技能內插zh_TW
dc.subject語言模型zh_TW
dc.subject強化式學習zh_TW
dc.subjectReinforcement Learningen
dc.subjectSkill-Based Reinforcement Learningen
dc.subjectLanguage Modelen
dc.subjectSkill Interpolationen
dc.title利用語言引導強化式學習中的技能內插zh_TW
dc.titleLeveraging Language Guidance for Skill-Interpolation in Reinforcement Learningen
dc.typeThesis-
dc.date.schoolyear112-2-
dc.description.degree碩士-
dc.contributor.oralexamcommittee謝秉均;李宏毅zh_TW
dc.contributor.oralexamcommitteePing-Chun Hsieh;Hung-Yi Leeen
dc.subject.keyword強化式學習,技能導向強化學習,語言模型,技能內插,zh_TW
dc.subject.keywordReinforcement Learning,Skill-Based Reinforcement Learning,Language Model,Skill Interpolation,en
dc.relation.page23-
dc.identifier.doi10.6342/NTU202402114-
dc.rights.note同意授權(限校園內公開)-
dc.date.accepted2024-07-23-
dc.contributor.author-college電機資訊學院-
dc.contributor.author-dept電機工程學系-
顯示於系所單位:電機工程學系

文件中的檔案:
檔案 大小格式 
ntu-112-2.pdf
授權僅限NTU校內IP使用(校園外請利用VPN校外連線服務)
567.59 kBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved