透過自適應模式選擇與點追蹤實現穩定的影片轉動作

游一心; Yi-Hsin Yu

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/101330

標題:	透過自適應模式選擇與點追蹤實現穩定的影片轉動作 Stable Video-to-Action Mapping via Adaptive Mode Selection and Point Tracking
作者:	游一心 Yi-Hsin Yu
指導教授:	許永真 Jane Yung-jen Hsu
共同指導教授:	李濬屹 Chun-Yi Lee
關鍵字:	電腦視覺,強化學習基於影片的策略學習 Computer Vision,Reinforcement LearningVideo-based Policy Learning
出版年 :	2026
學位:	碩士
摘要:	近期以影片為基礎的機器人學習取得了快速進展，使得擴散模型能夠在完全不依賴動作標註的情況下生成視覺計畫，AVDC 即是一個代表性成果。然而，實際執行時的成功率往往並非受限於生成影片的品質，而是受到影片轉動作（video-to-action mapping）階段的缺陷所瓶頸：僵化的模式分類會導致系統性地將抓取與推動誤判，而逐幀串接的光流估計則會在長視野下累積誤差。為了解決這些限制，我們提出一個改進的影片轉動作框架，透過自適應的模式選擇機制，以及基於點追蹤與深度估計的 3D 重建流程來強化 AVDC。於 AVDC 所採用的 11 個 Meta-World 任務進行評估後，我們的方法在整體上提升了任務成功率，並能更忠實地執行擴散模型所生成的視覺計畫，從而縮小視覺規劃品質與實際機器人控制之間的落差。 Recent progress in video-based robotic learning has enabled diffusion models to generate visual plans without requiring any action annotations, as exemplified by AVDC. However, in practice, the final performance is often bottlenecked not by the quality of the generated videos but by the imperfections in the video-to-action mapping: rigid mode classification causes systematic grasp/push errors, and sequential optical-flow estimation accumulates drift over long horizons. To address these limitations, we propose an improved video-to-action framework that augments AVDC with an adaptive mode selection mechanism and a more stable 3D motion reconstruction pipeline based on point tracking and temporally consistent depth estimation. Evaluated across the 11 Meta-World tasks used in AVDC, our method consistently increases task success rates and more faithfully executes the visual plans produced by the diffusion model, thereby narrowing the gap between visual planning quality and real robotic control.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/101330
DOI:	10.6342/NTU202504713
全文授權:	同意授權(全球公開)
電子全文公開日期:	2026-01-17
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-114-1.pdf	10.22 MB	Adobe PDF	檢視/開啟

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。