基於強化學習之四足機器人運動控制結合顯式隱式狀態估測應用於高度落差之結構化地形

張祐誠; Yu-Cheng Chang

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/101839

標題:	基於強化學習之四足機器人運動控制結合顯式隱式狀態估測應用於高度落差之結構化地形 Reinforcement Learning-Based Quadrupedal Locomotion Control Using Explicit and Implicit State Estimation on Structured Terrains with Height Discontinuities
作者:	張祐誠 Yu-Cheng Chang
指導教授:	連豊力 Feng-Li Lian
關鍵字:	強化學習,運動控制狀態估測四足機器人 Reinforcement Learning,Locomotion ControlState EstimationQuadrupedal Robot
出版年 :	2026
學位:	碩士
摘要:	隨著機器人技術的進步，四足機器人在各種應用中變得越來越普遍, 在救災、巡檢等領域展現出巨大的潛力。然而，四足機器人在樓梯等非平坦的地形上行走仍然是一個具有挑戰性的問題。傳統的控制方法通常依賴於精確的動力學模型的控制策略，這在複雜和動態環境中可能表現不佳。近年來，強化學習已被證明是一種可行的方法，能透過環境互動學習最佳控制策略，但在應用於複雜地形時，仍受限於感測器雜訊、狀態資訊不全以及模擬與現實間的虛實差距。本研究提出了一種基於強化學習的四足機器人控制方法，針對具有高度落差之結構化地形上實現穩定的速度跟蹤控制。透過設計獎勵函數、狀態空間和動作空間，並結合域隨機化技術來縮小模擬與現實之間的差距。為克服感測器雜訊與資訊缺失，設計了一個結合顯式和隱式狀態估計器，以估計機器人的關鍵狀態。顯式估計器用於估計具有明確物理意義的狀態，而隱式估計器則利用自監督式學習來估計機器人在不同地形下的運動學特徵。在模擬環境中，我們於多種具有高度落差的結構化地形上進行廣泛實驗，結果顯示所提出的方法能在各種地形下維持穩定的速度跟蹤，並在通過率與追蹤誤差等指標上優於現有基線方法。其中，在階高 0.2 m 的樓梯地形上，本方法可達到 97.5% 的通過率。進一步地，我們將訓練完成的策略部署至實際四足機器人，並在符合建築法規最大階高與最小階寬之樓梯上進行測試，實驗結果顯示可達到100% 的通過率，證明所提出方法在現實世界樓梯環境中的可行性與有效性。 With the rapid development of robotics, quadrupedal robots are becoming increasingly prevalent, demonstrating immense potential in applications such as disaster relief and inspection. However, traversing uneven terrain remains a significant challenge. Traitional control methods, often relying on precise dynamic models and hand-engineered strategies, may underperform in complex and dynamic environments. Recently, reinforcement learning (RL) has proven to be a feasible approach, enabling robots to learn optimal control strategies through continuous interaction with the environment. Nevertheless, existing RL methods still face difficulties in traversing stairs and uneven terrain, primarily due to sensor noise, partial state observability, and the sim-to-real gap. This thesis proposes an RL-based control method for quadruped robots to achieve stable velocity tracking on structured terrain with discrete height changes. We bridge the sim-to-real gap by designing specific reward functions, state and action spaces, and incorporating domain randomization techniques. To address sensor noise and incomplete state information, we design a hybrid state estimator combining explicit and implicit estimation. The explicit estimator estimates states with clear physical significance, while the implicit estimator utilizes self-supervised learning to capture the robot’s kinematic characteristics across different terrains. In simulation, extensive experiments on various structured terrains demonstrate that the proposed method maintains stable velocity tracking and outperforms existing baselines in metrics such as success rate and tracking error. Notably, the method achieves a 97.5% success rate on stairs with a step height of 0.2 m. Furthermore, we deployed the trained policy to a physical quadruped robot and tested it on stairs that comply with building regulations for maximum step height and minimum step width. The experimental results show a 100% success rate, verifying the feasibility and effectiveness of the proposed method in real-world stair environments.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/101839
DOI:	10.6342/NTU202600602
全文授權:	同意授權(限校園內公開)
電子全文公開日期:	2031-02-02
顯示於系所單位：	電機工程學系

文件中的檔案：

檔案	大小	格式
ntu-114-1.pdf 未授權公開取用	38.85 MB	Adobe PDF	檢視/開啟

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。