透過自適應模式選擇與點追蹤實現穩定的影片轉動作

游一心; Yi-Hsin Yu

Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/101330

Title:	透過自適應模式選擇與點追蹤實現穩定的影片轉動作 Stable Video-to-Action Mapping via Adaptive Mode Selection and Point Tracking
Authors:	游一心 Yi-Hsin Yu
Advisor:	許永真 Jane Yung-jen Hsu
Co-Advisor:	李濬屹 Chun-Yi Lee
Keyword:	電腦視覺,強化學習基於影片的策略學習 Computer Vision,Reinforcement LearningVideo-based Policy Learning
Publication Year :	2026
Degree:	碩士
Abstract:	近期以影片為基礎的機器人學習取得了快速進展，使得擴散模型能夠在完全不依賴動作標註的情況下生成視覺計畫，AVDC 即是一個代表性成果。然而，實際執行時的成功率往往並非受限於生成影片的品質，而是受到影片轉動作（video-to-action mapping）階段的缺陷所瓶頸：僵化的模式分類會導致系統性地將抓取與推動誤判，而逐幀串接的光流估計則會在長視野下累積誤差。為了解決這些限制，我們提出一個改進的影片轉動作框架，透過自適應的模式選擇機制，以及基於點追蹤與深度估計的 3D 重建流程來強化 AVDC。於 AVDC 所採用的 11 個 Meta-World 任務進行評估後，我們的方法在整體上提升了任務成功率，並能更忠實地執行擴散模型所生成的視覺計畫，從而縮小視覺規劃品質與實際機器人控制之間的落差。 Recent progress in video-based robotic learning has enabled diffusion models to generate visual plans without requiring any action annotations, as exemplified by AVDC. However, in practice, the final performance is often bottlenecked not by the quality of the generated videos but by the imperfections in the video-to-action mapping: rigid mode classification causes systematic grasp/push errors, and sequential optical-flow estimation accumulates drift over long horizons. To address these limitations, we propose an improved video-to-action framework that augments AVDC with an adaptive mode selection mechanism and a more stable 3D motion reconstruction pipeline based on point tracking and temporally consistent depth estimation. Evaluated across the 11 Meta-World tasks used in AVDC, our method consistently increases task success rates and more faithfully executes the visual plans produced by the diffusion model, thereby narrowing the gap between visual planning quality and real robotic control.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/101330
DOI:	10.6342/NTU202504713
Fulltext Rights:	同意授權(全球公開)
metadata.dc.date.embargo-lift:	2026-01-17
Appears in Collections:	資訊工程學系

Files in This Item:

File	Size	Format
ntu-114-1.pdf	10.22 MB	Adobe PDF	View/Open

Show full item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets