利用視覺語言模型生成與現實對應的訓練環境課程以提升具物理泛化能力的控制策略

周玉鑫; Yu-Hsin Chou

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98830

標題:	利用視覺語言模型生成與現實對應的訓練環境課程以提升具物理泛化能力的控制策略 Improving Physics-Based Control with Grounded Environment Curriculum Generation via Vision-Language Models
作者:	周玉鑫 Yu-Hsin Chou
指導教授:	林軒田 Hsuan-Tien Lin
關鍵字:	無監督環境生成,視覺語言模型,強化學習,基於物理的控制, Unsupervised Environment Design,Vision-Language Models,Reinforcement Learning,Physics-based control,
出版年 :	2025
學位:	碩士
摘要:	基於物理的控制任務需要具備良好的泛化能力，因為違反物理定律（例如重力、碰撞等）可能帶來嚴重的安全風險。我們探討如何透過產生訓練環境課程來提升在此類任務的泛化能力。基於無監督環境設計框架，我們發現既有方法中所採用的隨機環境產生器，可能削弱零樣本泛化能力。透過檢查其產生的環境，我們發現這些產生的環境往往過於複雜。為了解決這個問題，因為視覺語言模型無需額外訓練，且可在零樣本的情況使用與進行條件化控制，我們利用隨即可用的視覺語言模型來產生與現實對應的訓練環境。我們進一步以語意對應性與樣本複雜度對應性兩項指標，衡量所產生的環境與參考環境及策略的對應性，並提出多項重要的設計決策以提升這兩個指標。實驗結果顯示，即便僅使用具現實對應的環境產生器，就能顯著提升泛化能力，並可透過結合互補的無監督環境設計方法來進一步增強。我們提出的方法 V-SFL，在所研究的基於物理的控制任務中達到最佳表現。 Physics-based control tasks demand robust generalization because violations of physical laws, such as those involving gravity or collisions, could cause severe safety risks. We investigate how to improve generalization in such tasks by generating a training environment curriculum. Building on the framework of Unsupervised Environment Design (UED), we identify that random environment generators, as adopted by several prior UED works, could hinder zero-shot generalization. By examining the generated environments, we found that the generated environments are often overly complex. To address this, we use off-the-shelf Vision-Language Models (VLMs) to produce environments with grounded complexity, leveraging that VLMs are training-free and can be conditioned in a zero-shot manner. We further define grounded complexity by semantic groundedness and sample complexity groundedness to reflect how grounded the generated environments are with respect to a reference environment and policy. We outline several design choices to achieve these metrics. Experimental results demonstrate that even a grounded environment generator alone improves generalization. Furthermore, performance can be further boosted by incorporating a complementary UED method. Our proposed method, VLM-based Sampling For Learnability (V-SFL), achieves state-of-the-art performance on the studied physics-based control benchmark.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98830
DOI:	10.6342/NTU202501834
全文授權:	同意授權(全球公開)
電子全文公開日期:	2025-08-20
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-113-2.pdf	4.32 MB	Adobe PDF	檢視/開啟

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。