Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98830
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor林軒田zh_TW
dc.contributor.advisorHsuan-Tien Linen
dc.contributor.author周玉鑫zh_TW
dc.contributor.authorYu-Hsin Chouen
dc.date.accessioned2025-08-19T16:21:53Z-
dc.date.available2025-08-20-
dc.date.copyright2025-08-19-
dc.date.issued2025-
dc.date.submitted2025-08-11-
dc.identifier.citationA. S. Azad, I. Gur, J. Emhoff, N. Alexis, A. Faust, P. Abbeel, and I. Stoica. Clutr: Curriculum learning via unsupervised task representation learning. In International Conference on Machine Learning, pages 1361–1395. PMLR, 2023.
A. Bakhtin, L. van der Maaten, J. Johnson, L. Gustafson, and R. Girshick. Phyre: A new benchmark for physical reasoning. Advances in Neural Information Processing Systems, 32, 2019.
D. M. Bear, E. Wang, D. Mrowca, F. J. Binder, H.-Y. F. Tung, R. Pramod, C. Holdaway, S. Tao, K. Smith, F.-Y. Sun, et al. Physion: Evaluating physical prediction from vision in humans and machines. arXiv preprint arXiv:2106.08261, 2021.
M. G. Bellemare, Y. Naddaf, J. Veness, and M. Bowling. The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 47:253–279, jun 2013.
M. Beukman, S. Coward, M. Matthews, M. Fellows, M. Jiang, M. Dennis, and J. Foerster. Refining minimax regret for unsupervised environment design. arXiv preprintarXiv:2402.12284, 2024.
T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, et al. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
Z. Chen, K. Yi, Y. Li, M. Ding, A. Torralba, J. B. Tenenbaum, and C. Gan. Comphy: Compositional physical reasoning of objects and events from videos. arXiv preprint arXiv:2205.01089, 2022.
X. Cheng, K. Shi, A. Agarwal, and D. Pathak. Extreme parkour with legged robots. In 2024 IEEE International Conference on Robotics and Automation (ICRA), pages 11443–11450. IEEE, 2024.
M. Chevalier-Boisvert, B. Dai, M. Towers, R. de Lazcano, L. Willems, S. Lahlou, S. Pal, P. S. Castro, and J. Terry. Minigrid & miniworld: Modular & customizable reinforcement learning environments for goal-oriented tasks. CoRR, abs/2306.13831, 2023.
H. Chung, J. Lee, M. Kim, D. Kim, and S. Oh. Adversarial environment design via regret-guided diffusion models. arXiv preprint arXiv:2410.19715, 2024.
M. Dennis, N. Jaques, E. Vinitsky, A. Bayen, S. Russell, A. Critch, and S. Levine. Emergent complexity and zero-shot transfer via unsupervised environment design. Advances in neural information processing systems, 33:13049–13061, 2020.
J. Fu, A. Kumar, O. Nachum, G. Tucker, and S. Levine. D4rl: Datasets for deep data-driven reinforcement learning. arXiv preprint arXiv:2004.07219, 2020.
S. Garcin, J. Doran, S. Guo, C. G. Lucas, and S. V. Albrecht. Dred: Zero-shot transfer in reinforcement learning via data-regularised environment design. arXiv preprint arXiv:2402.03479, 2024.
Q. Garrido, N. Ballas, M. Assran, A. Bardes, L. Najman, M. Rabbat, E. Dupoux, and Y. LeCun. Intuitive physics understanding emerges from self-supervised pretraining on natural videos. arXiv preprint arXiv:2502.11831, 2025.
S. Huang, R. F. J. Dossa, C. Ye, J. Braga, D. Chakraborty, K. Mehta, and J. G. Araújo. Cleanrl: High-quality single-file implementations of deep reinforcement learning algorithms. Journal of Machine Learning Research, 23(274):1–18, 2022.
S. James, Z. Ma, D. Rovick Arrojo, and A. J. Davison. Rlbench: The robot learning benchmark & learning environment. IEEE Robotics and Automation Letters, 2020.
M. Janner, Y. Du, J. B. Tenenbaum, and S. Levine. Planning with diffusion for flexible behavior synthesis. arXiv preprint arXiv:2205.09991, 2022.
M. Jiang, M. Dennis, J. Parker-Holder, J. Foerster, E. Grefenstette, and T. Rocktäschel. Replay-guided adversarial environment design. Advances in Neural Information Processing Systems, 34:1884–1897, 2021.
M. Jiang, E. Grefenstette, and T. Rocktäschel. Prioritized level replay. In International Conference on Machine Learning, pages 4940–4950. PMLR, 2021.
B. Kang, Y. Yue, R. Lu, Z. Lin, Y. Zhao, K. Wang, G. Huang, and J. Feng. How far is video generation from world model: A physical law perspective. arXiv preprint arXiv:2411.02385, 2024.
T. Kojima, S. S. Gu, M. Reid, Y. Matsuo, and Y. Iwasawa. Large language models are zero-shot reasoners. Advances in neural information processing systems, 35:22199–22213, 2022.
S. Li, K. Wu, C. Zhang, and Y. Zhu. I-phyre: Interactive physical reasoning. arXiv preprint arXiv:2312.03009, 2023.
W. Liang, S. Wang, H.-J. Wang, O. Bastani, D. Jayaraman, and Y. J. Ma. Environment curriculum generation via large language models. In 8th Annual Conference on Robot Learning, 2024.
Z. Lin, Y.-F. Wu, S. Peri, B. Fu, J. Jiang, and S. Ahn. Improving generative imagination in object-centric world models. In International conference on machine learning, pages 6140–6149. PMLR, 2020.
M. Matthews, M. Beukman, C. Lu, and J. Foerster. Kinetix: Investigating the training of general agents through open-ended physics-based control tasks. arXiv preprint arXiv:2410.23208, 2024.
L. Metz, C. D. Freeman, S. S. Schoenholz, and T. Kachman. Gradients are not all you need. arXiv preprint arXiv:2111.05803, 2021.
J. Parker-Holder, M. Jiang, M. Dennis, M. Samvelyan, J. Foerster, E. Grefenstette, and T. Rocktäschel. Evolving curricula with regret-based environment design. In International Conference on Machine Learning, pages 17473–17498. PMLR, 2022.
A. Rutherford, M. Beukman, T. Willi, B. Lacerda, N. Hawes, and J. Foerster. No regrets: Investigating and improving regret approximations for curriculum discovery. arXiv preprint arXiv:2408.15099, 2024.
J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
H. J. Suh, M. Simchowitz, K. Zhang, and R. Tedrake. Do differentiable simulators give better policy gradients? In International Conference on Machine Learning, pages 20668–20696. PMLR, 2022.
F.-Y. Sun, S. Harini, A. Yi, Y. Zhou, A. Zook, J. Tremblay, L. Cross, J. Wu, and N. Haber. Factorsim: Generative simulation via factorized representation. In The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024.
J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel. Domain randomization for transferring deep neural networks from simulation to the real world. In 2017 IEEE/RSJ international conference on intelligent robots and systems (IROS), pages 23–30. IEEE, 2017.
M. Towers, A. Kwiatkowski, J. Terry, J. U. Balis, G. D. Cola, T. Deleu, M. Goulão, A. Kallinteris, M. Krimmel, A. KG, R. Perez-Vicente, A. Pierré, S. Schulhoff, J. J. Tai, H. Tan, and O. G. Younis. Gymnasium: A standard interface for reinforcement learning environments, 2024.
L. Wang, Y. Ling, Z. Yuan, M. Shridhar, C. Bao, Y. Qin, B. Wang, H. Xu, and X. Wang. Gensim: Generating robotic simulation tasks via large language models. arXiv preprint arXiv:2310.01361, 2023.
Y. Wang, Z. Xian, F. Chen, T.-H. Wang, Y. Wang, K. Fragkiadaki, Z. Erickson, D. Held, and C. Gan. Robogen: Towards unleashing infinite data for automated robot learning via generative simulation. arXiv preprint arXiv:2311.01455, 2023.
M. Wołczyk, B. Cupiał, M. Ostaszewski, M. Bortkiewicz, M. Zając, R. Pascanu, Ł. Kuciński, and P. Miłoś. Fine-tuning reinforcement learning models is secretly a forgetting mitigation problem. arXiv preprint arXiv:2402.02868, 2024.
Z. Wu, N. Dvornik, K. Greff, T. Kipf, and A. Garg. Slotformer: Unsupervised visual dynamics simulation with object-centric models. arXiv preprint arXiv:2210.05861, 2022.
Z. Wu, J. Hu, W. Lu, I. Gilitschenski, and A. Garg. Slotdiffusion: Object-centric generative modeling with diffusion models. Advances in Neural Information Processing Systems, 36:50932–50958, 2023.
K. Yi, C. Gan, Y. Li, P. Kohli, J. Wu, A. Torralba, and J. B. Tenenbaum. Clevrer: Collision events for video representation and reasoning. arXiv preprint arXiv:1910.01442, 2019.
A. Yu, G. Yang, R. Choi, Y. Ravan, J. Leonard, and P. Isola. Learning visual parkour from generated images. In 8th Annual Conference on Robot Learning, 2024.
A. Zook, F.-Y. Sun, J. Spjut, V. Blukis, S. Birchfield, and J. Tremblay. Grs: Generating robotic simulation tasks from real-world images. arXiv preprint arXiv:2410.15536, 2024.
-
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98830-
dc.description.abstract基於物理的控制任務需要具備良好的泛化能力,因為違反物理定律(例如重力、碰撞等)可能帶來嚴重的安全風險。我們探討如何透過產生訓練環境課程來提升在此類任務的泛化能力。基於無監督環境設計框架,我們發現既有方法中所採用的隨機環境產生器,可能削弱零樣本泛化能力。透過檢查其產生的環境,我們發現這些產生的環境往往過於複雜。為了解決這個問題,因為視覺語言模型無需額外訓練,且可在零樣本的情況使用與進行條件化控制,我們利用隨即可用的視覺語言模型來產生與現實對應的訓練環境。我們進一步以語意對應性與樣本複雜度對應性兩項指標,衡量所產生的環境與參考環境及策略的對應性,並提出多項重要的設計決策以提升這兩個指標。實驗結果顯示,即便僅使用具現實對應的環境產生器,就能顯著提升泛化能力,並可透過結合互補的無監督環境設計方法來進一步增強。我們提出的方法 V-SFL,在所研究的基於物理的控制任務中達到最佳表現。zh_TW
dc.description.abstractPhysics-based control tasks demand robust generalization because violations of physical laws, such as those involving gravity or collisions, could cause severe safety risks. We investigate how to improve generalization in such tasks by generating a training environment curriculum. Building on the framework of Unsupervised Environment Design (UED), we identify that random environment generators, as adopted by several prior UED works, could hinder zero-shot generalization. By examining the generated environments, we found that the generated environments are often overly complex. To address this, we use off-the-shelf Vision-Language Models (VLMs) to produce environments with grounded complexity, leveraging that VLMs are training-free and can be conditioned in a zero-shot manner. We further define grounded complexity by semantic groundedness and sample complexity groundedness to reflect how grounded the generated environments are with respect to a reference environment and policy. We outline several design choices to achieve these metrics. Experimental results demonstrate that even a grounded environment generator alone improves generalization. Furthermore, performance can be further boosted by incorporating a complementary UED method. Our proposed method, VLM-based Sampling For Learnability (V-SFL), achieves state-of-the-art performance on the studied physics-based control benchmark.en
dc.description.provenanceSubmitted by admin ntu (admin@lib.ntu.edu.tw) on 2025-08-19T16:21:53Z
No. of bitstreams: 0
en
dc.description.provenanceMade available in DSpace on 2025-08-19T16:21:53Z (GMT). No. of bitstreams: 0en
dc.description.tableofcontentsAcknowledgements i
摘要 iii
Abstract v
Contents vii
List of Figures ix
List of Tables xi
Denotation xiii
Chapter 1 Introduction 1
Chapter 2 Related works 7
Chapter 3 Background 9
Chapter 4 Methodology 11
4.1 Why do we need grounded complexity? 11
4.2 Grounded environment generation with VLM 13
4.3 Learnability-based regret approximation 14
4.4 Testing environments for evaluating generalization performance 15
Chapter 5 Experiments 19
5.1 How important is each design choice made in our method? 19
5.2 How is the proposed method compared to prior work on zero-shot performance? 22
5.3 How fast does the proposed method adapt to unseen environment(s)? 25
5.4 Can VLM be used to score the environments and generate a curriculum based on the score? 27
Chapter 6 Conclusion 29
References 31
Appendix A — Environment generation prompts 37
Appendix B — Details of procedurally generated environments 41
Appendix C — Experiment details 45
Appendix D — On the impact of different α 49
Appendix E — Hand-designed test environments 51
-
dc.language.isoen-
dc.subject視覺語言模型zh_TW
dc.subject無監督環境生成zh_TW
dc.subject基於物理的控制zh_TW
dc.subject強化學習zh_TW
dc.subjectReinforcement Learningen
dc.subjectPhysics-based controlen
dc.subjectVision-Language Modelsen
dc.subjectUnsupervised Environment Designen
dc.title利用視覺語言模型生成與現實對應的訓練環境課程以提升具物理泛化能力的控制策略zh_TW
dc.titleImproving Physics-Based Control with Grounded Environment Curriculum Generation via Vision-Language Modelsen
dc.typeThesis-
dc.date.schoolyear113-2-
dc.description.degree碩士-
dc.contributor.oralexamcommittee孫紹華;柯宗瑋;王鈺強zh_TW
dc.contributor.oralexamcommitteeShao-Hua Sun;Tsung-Wei Ke;Yu-Chiang Wangen
dc.subject.keyword無監督環境生成,視覺語言模型,強化學習,基於物理的控制,zh_TW
dc.subject.keywordUnsupervised Environment Design,Vision-Language Models,Reinforcement Learning,Physics-based control,en
dc.relation.page52-
dc.identifier.doi10.6342/NTU202501834-
dc.rights.note同意授權(全球公開)-
dc.date.accepted2025-08-14-
dc.contributor.author-college電機資訊學院-
dc.contributor.author-dept資訊工程學系-
dc.date.embargo-lift2025-08-20-
顯示於系所單位:資訊工程學系

文件中的檔案:
檔案 大小格式 
ntu-113-2.pdf4.32 MBAdobe PDF檢視/開啟
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved