基於強化學習之四足機器人運動控制結合顯式隱式狀態估測應用於高度落差之結構化地形

張祐誠; Yu-Cheng Chang

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/101839

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	連豊力	zh_TW
dc.contributor.advisor	Feng-Li Lian	en
dc.contributor.author	張祐誠	zh_TW
dc.contributor.author	Yu-Cheng Chang	en
dc.date.accessioned	2026-03-04T16:59:50Z	-
dc.date.available	2026-03-05	-
dc.date.copyright	2026-03-04	-
dc.date.issued	2026	-
dc.date.submitted	2026-02-07	-
dc.identifier.citation	[1: The Guardian 2024]Source: Reuters / The Guardian, “Rescuers search toppled buildings after powerful taiwan quake–video report,” The Guardian, Apr. 3, 2024, ISSN: 0261-3077. [2: 中視新聞 2024]中視新聞, 《中視 60 分鐘》高樓救災，台灣做得到? 消防英雄，代價沉重! 鎖定中視「60 分鐘」│ 中視新聞 20240620, Jun. 20, 2024. [Online]. Available: https://www.youtube.com/watch?v=H6w3_a1U5jQ (visited on 11/18/2025). [3: Fankhauser 2023]Péter Fankhauser, “How to Choose the Right Inspection Robot,” en, ANYbotics,Mar. 2023. [Online]. Available: https://www.anybotics.com/news/how-to-hire-an-industrial-inspection-robot-part-2/ (visited on 08/01/2025). [4: Liu, Chen, Zhang, and Chen 2009]Chengju Liu, Yifei Chen, Jiaqi Zhang, and Qijun Chen, “CPG driven locomotion control of quadruped robot,” in 2009 IEEE International Conference on Systems, Man and Cybernetics, ISSN: 1062-922X, Oct. 2009, pp. 2368–2373. DOI: 10.1109/ICSMC.2009.5346399. [5: Xie, Ma, Wei, An, and Su 2019]Junpeng Xie, Hongxu Ma, Qing Wei, Honglei An, and Bo Su, “Adaptive walkingon slope of quadruped robot based on cpg,” in 2019 2nd World Conference on Mechanical Engineering and Intelligent Manufacturing (WCMEIM), 2019, pp. 487–493. DOI: 10.1109/WCMEIM48965.2019.00103. [6: Di Carlo, Wensing, Katz, Bledt, and Kim 2018]Jared Di Carlo, Patrick M. Wensing, Benjamin Katz, Gerardo Bledt, and Sang-bae Kim, “Dynamic Locomotion in the MIT Cheetah 3 Through Convex Model-Predictive Control,” in 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), ISSN: 2153-0866, Oct. 2018, pp. 1–9. DOI: 10.1109/IROS.2018.8594448. [7: Mosher 1968]R. Mosher, “Test and evaluation of a versatile walking truck,” Proceedings of Off-Road Mobility Research Symposium, Washington DC, 1968, pp. 359–379, 1968. [8: Raibert, Blankespoor, Nelson, and Playter 2008]Marc Raibert, Kevin Blankespoor, Gabriel Nelson, and Rob Playter, “BigDog, the Rough-Terrain Quadruped Robot,” IFAC Proceedings Volumes, 17th IFAC World Congress, vol. 41, no. 2, pp. 10 822–10 825, Jan. 2008, ISSN: 1474-6670. DOI: 10.3182/20080706-5-KR-1001.01833. [9: Wikipedia contributors 2025]Wikipedia contributors, “Legged squad support system — Wikipedia, the free encyclopedia,” 2025, [Online; accessed 19-August-2025]. [Online]. Available: https:/ / en . wikipedia . org / w / index . php ? title = Legged _ Squad _ Support _System&oldid=1306078584. [10: Boston Dynamics 2021]Boston Dynamics, “Spot Specifications,” 2021. [Online]. Available: https : / / support.bostondynamics.com/s/article/Spot- Specifications- 49916 (visited on 08/06/2025). [11: Hutter et al. 2016]Marco Hutter, Christian Gehring, Dominic Jud et al., “Anymal - a highly mobile and dynamic quadrupedal robot,” in 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2016, pp. 38–44. DOI:10.1109/IROS.2016.7758092. [12: Xiao, Tong, Yang, Zeng, Li, and Sreenath 2021]Anxing Xiao, Wenzhe Tong, Lizhi Yang, Jun Zeng, Zhongyu Li, and Koushil Sreenath,“Robotic Guide Dog: Leading a Human with Leash-Guided Hybrid Physical Interaction,” in 2021 IEEE International Conference on Robotics and Automation (ICRA), ISSN: 2577-087X, May 2021, pp. 11 470–11 476. DOI: 10.1109/ICRA48506.2021.9561786. [13: Unitree 2025]Unitree. “机器狗 go2_ 四足机器人 _ 机器狗公司 \| 宇树科技.” (Dec. 5,2025), [Online]. Available: https : / / www . unitree . com / cn / go2 (visited on 12/05/2025). [14: ANYbotics 2025]ANYbotics, “ANYbotics \| Autonomous Legged Robots for Industrial Inspection,”en-US, ANYbotics, Aug. 2025. [Online]. Available: https://www.anybotics.com/ (visited on 08/18/2025). [15: Shin, Hong, Woo, Choe, Son, Kim, Kim, Lee, Hwangbo, and Park 2022]Young-Ha Shin, Seungwoo Hong, Sangyoung Woo, JongHun Choe, Harim Son, Gijeong Kim, Joon-Ha Kim, KangKyu Lee, Jemin Hwangbo, and Hae-Won Park, “Design of kaist hound, a quadruped robot platform for fast and efficient locomotion with mixed-integer nonlinear optimization of a gear train,” in 2022 International Conference on Robotics and Automation (ICRA), 2022, pp. 6614–6620. DOI: 10.1109/ICRA46639.2022.9811755. [16: Ghost Robotics 2025]Ghost Robotics, “Ghost Robotics \| Robots That Feel the World,” en, Aug. 2025.[Online]. Available: https://www.ghostrobotics.io/ (visited on 08/18/2025). [17: Pratt, Chew, Torres, Dilworth, and Pratt 2001]Jerry Pratt, Chee-Meng Chew, Ann Torres, Peter Dilworth, and Gill Pratt, “Virtual model control: An intuitive approach for bipedal locomotion,” The International Journal of Robotics Research, vol. 20, no. 2, pp. 129–143, 2001. DOI: 10.1177/02783640122067309. [18: Hrr, Pratt, Chew, Herr, and Pratt 1998]Jianjuen Hrr, J. Pratt, Chee-Meng Chew, H. Herr, and G. Pratt, “Adaptive virtual model control of a bipedal walking robot,” in Proceedings. IEEE International Joint Symposia on Intelligence and Systems (Cat. No.98EX174), 1998, pp. 245–251. DOI:10.1109/IJSIS.1998.685453. [19: Slotine and Li 1991]Jean-Jacques E. Slotine and Weiping Li, Applied nonlinear control, en. Englewood Cliffs, NJ: Prentice Hall, 1991, ISBN: 978-0-13-040890-7 978-0-13-040049-9. [20: Vukobratovic and Borovac 2004]Miomir Vukobratovic and Branislav Borovac, “Zero-moment point - thirty five years of its life,” Int. J. Humanoid Robotics, vol. 1, pp. 157–173,2004. [21: Kajita, Kanehiro, Kaneko, Fujiwara, Harada, Yokoi, and Hirukawa 2003]S. Kajita, F. Kanehiro, K. Kaneko, K. Fujiwara, K. Harada, K. Yokoi, and H. Hirukawa, “Biped walking pattern generation by using preview control of zero-moment point,” in 2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422), ISSN: 1050-4729, vol. 2, Sep. 2003, 1620–1626 vol.2. DOI: 10.1109/ROBOT.2003.1241826. [22: Viragh, Bjelonic, Bellicoso, Jenelten, and Hutter 2019]Yvain de Viragh, Marko Bjelonic, C. Dario Bellicoso, Fabian Jenelten, and Marco Hutter, “Trajectory Optimization for Wheeled Legged Quadrupedal Robots Using Linearized ZMP Constraints,” IEEE Robotics and Automation Letters, vol. 4, no. 2, pp. 1633–1640, Apr. 2019, ISSN: 2377-3766. DOI:10.1109/LRA.2019.2896721. [23: Zhang, Gao, Duan, Li, Yu, Chen, Li, Liu, Li, Liu, and Xu 2013]Si Zhang, Junyao Gao, Xingguang Duan, Hui Li, Zhangguo Yu, Xuechao Chen, Jing Li, Huaxin Liu, Xin Li, Yi Liu, and Zhe Xu, “Trot pattern generation for quadruped robot based on the ZMP stability margin,” in 2013 ICME International Conference on Complex Medical Engineering, May 2013, pp. 608–613. DOI: 10.1109/ICCME.2013.6548322. [24: Li, Zhang, Niu, and Li 2022]Xuesheng Li, Xinhao Zhang, Junkai Niu, and Chen Li, “A Stable Walking Strategy of Quadruped Robot Based on ZMP in Trotting gait,” in 2022 IEEE International Conference on Mechatronics and Automation (ICMA), ISSN: 2152-744X, Aug. 2022, pp. 858–863. DOI: 10.1109/ICMA54519.2022.9855991. [25: Dario Bellicoso, Jenelten, Fankhauser, Gehring, Hwangbo, and Hutter 2017]C. Dario Bellicoso, Fabian Jenelten, Péter Fankhauser, Christian Gehring, Jemin Hwangbo, and Marco Hutter, “Dynamic locomotion and whole-body control for quadrupedal robots,” in 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2017, pp. 3359–3365. DOI: 10.1109/IROS.2017.8206174. [26: Fahmi, Mastalli, Focchi, and Semini 2019]Shamel Fahmi, Carlos Mastalli, Michele Focchi, and Claudio Semini, “Passive whole-body control for quadruped robots: Experimental validation over challenging terrain,” IEEE Robotics and Automation Letters, vol. 4, no. 3, pp. 2553–2560,2019. DOI: 10.1109/LRA.2019.2908502. [27: Winkler, Mastalli, Havoutis, Focchi, Caldwell, and Semini 2015]Alexander W. Winkler, Carlos Mastalli, Ioannis Havoutis, Michele Focchi, Darwin G. Caldwell, and Claudio Semini, “Planning and execution of dynamic whole-body locomotion for a hydraulic quadruped on challenging terrain,” in 2015 IEEE International Conference on Robotics and Automation (ICRA), 2015, pp. 5148–5154. DOI: 10.1109/ICRA.2015.7139916. [28: Winkler, Bellicoso, Hutter, and Buchli 2018]Alexander W. Winkler, C. Dario Bellicoso, Marco Hutter, and Jonas Buchli, “Gait and Trajectory Optimization for Legged Systems Through Phase-Based End-Effector Parameterization,” IEEE Robotics and Automation Letters, vol. 3, no. 3, pp. 1560–1567, Jul. 2018, Conference Name: IEEE Robotics and Automation Letters, ISSN:2377-3766. DOI: 10.1109/LRA.2018.2798285. [29: Neunert, Stäuble, Giftthaler, Bellicoso, Carius, Gehring, Hutter, and Buchli 2018]Michael Neunert, Markus Stäuble, Markus Giftthaler, Carmine D. Bellicoso, Jan Carius, Christian Gehring, Marco Hutter, and Jonas Buchli, “Whole-body nonlinear model predictive control through contacts for quadrupeds,” IEEE Robotics and Automation Letters, vol. 3, no. 3, pp. 1458–1465, 2018. DOI: 10.1109/LRA.2018.2800124. [30: Grandia, Farshidian, Ranftl, and Hutter 2019]Ruben Grandia, Farbod Farshidian, René Ranftl, and Marco Hutter, “Feedback mpc for torque-controlled legged robots,” in 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2019, pp. 4730–4737. DOI: 10.1109/IROS40897.2019.8968251. [31: Grandia, Jenelten, Yang, Farshidian, and Hutter 2023]Ruben Grandia, Fabian Jenelten, Shaohui Yang, Farbod Farshidian, and Marco Hutter, “Perceptive locomotion through nonlinear model-predictive control, IEEE Transactions on Robotics, vol. 39, no. 5, pp. 3402–3421, 2023. DOI: 10.1109/TRO.2023.3275384. [32: Rudin, Hoeller, Reist, and Hutter 2022]Nikita Rudin, David Hoeller, Philipp Reist, and Marco Hutter, “Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning,” en, in Proceedings of the 5th Conference on Robot Learning, ISSN: 2640-3498, PMLR, Jan.2022, pp. 91–100. [33: Kumar, Fu, Pathak, and Malik 2021]Ashish Kumar, Zipeng Fu, Deepak Pathak, and Jitendra Malik, “RMA: Rapid Motor Adaptation for Legged Robots,” Jul. 2021, arXiv:2107.04034 [cs]. DOI: 10.48550 / arXiv . 2107 . 04034. [Online]. Available: http : / / arxiv . org / abs /2107.04034 (visited on 03/13/2025). [34: Aswin Nahrendra, Yu, and Myung 2023]I Made Aswin Nahrendra, Byeongho Yu, and Hyun Myung, “Dreamwaq: Learning robust quadrupedal locomotion with implicit terrain imagination via deep reinforcement learning,” in 2023 IEEE International Conference on Robotics and Automation (ICRA), 2023, pp. 5078–5084. DOI: 10.1109/ICRA48891.2023.10161144. [35: Long, Wang, Li, Cao, Gao, and Pang 2024]Junfeng Long, ZiRui Wang, Quanyi Li, Liu Cao, Jiawei Gao, and Jiangmiao Pang,“Hybrid internal model: Learning agile legged locomotion with simulated robot response,” in The Twelfth International Conference on Learning Representations, 2024. [36: Miki, Lee, Hwangbo, Wellhausen, Koltun, and Hutter 2022]Takahiro Miki, Joonho Lee, Jemin Hwangbo, Lorenz Wellhausen, Vladlen Koltun, and Marco Hutter, “Learning robust perceptive locomotion for quadrupedal robots in the wild,” Sci. Robot., vol. 7, no. 62, eabk2822, Jan. 2022, arXiv:2201.08117 [cs], ISSN: 2470-9476. DOI: 10.1126/scirobotics.abk2822. [37: Luo, Li, Yu, Wang, Wu, and Zhu 2024]Shixin Luo, Songbo Li, Ruiqi Yu, Zhicheng Wang, Jun Wu, and Qiuguo Zhu, “PIE:Parkour with Implicit-Explicit Learning Framework for Legged Robots,” Sep. 2024, arXiv:2408.13740 [cs]. DOI: 10.48550/arXiv.2408.13740. [Online]. Available:http://arxiv.org/abs/2408.13740 (visited on 07/05/2025). [38: Yu, Wang, Wang, Wang, Wu, and Zhu 2025]Ruiqi Yu, Qianshi Wang, Yizhen Wang, Zhicheng Wang, Jun Wu, and Qiuguo Zhu, “Walking with Terrain Reconstruction: Learning to Traverse Risky Sparse Footholds,” Mar. 2025, arXiv:2409.15692 [cs]. DOI: 10 . 48550 / arXiv . 2409 . 15692. [Online]. Available: http://arxiv.org/abs/2409.15692 (visited on 08/19/2025). [39: Kim, Shin, and Kim 2024]Mincheol Kim, Ukcheol Shin, and Jung-Yup Kim, “Learning Quadrupedal Locomotion with Impaired Joints Using Random Joint Masking,” in 2024 IEEE International Conference on Robotics and Automation (ICRA), May 2024, pp. 9751–9757. DOI: 10.1109/ICRA57147.2024.10610088. [40: Luo, Xiao, and Lu 2023]Zeren Luo, Erdong Xiao, and Peng Lu, “FT-Net: Learning Failure Recovery and Fault-Tolerant Locomotion for Quadruped Robots,” IEEE Robotics and Automation Letters, vol. 8, no. 12, pp. 8414–8421, Feb. 2023, ISSN: 2377-3766. DOI: 10.1109/LRA.2023.3329766. [41: Kang, Vincenti, and Coros 2022]Dongho Kang, Flavio De Vincenti, and Stelian Coros, “Nonlinear Model Predictive Control for Quadrupedal Locomotion Using Second-Order Sensitivity Analysis,”Jul. 2022, arXiv:2207.10465 [cs]. DOI: 10.48550/arXiv.2207.10465. [Online]. Available: http://arxiv.org/abs/2207.10465 (visited on 03/13/2025). [42: Kang, Cheng, Zamora, Zargarbashi, and Coros 2023]Dongho Kang, Jin Cheng, Miguel Zamora, Fatemeh Zargarbashi, and Stelian Coros,“RL + Model-based Control: Using On-demand Optimal Control to Learn Versatile Legged Locomotion,” IEEE Robot. Autom. Lett., vol. 8, no. 10, pp. 6619–6626, Oct. 2023, arXiv:2305.17842 [cs], ISSN: 2377-3766, 2377-3774. DOI: 10 . 1109 / LRA.2023.3307008. [43: Lyu, Lang, Zhao, Zhang, Ding, and Wang 2024]Shangke Lyu, Xin Lang, Han Zhao, Hongyin Zhang, Pengxiang Ding, and Donglin Wang, “RL2AC: Reinforcement Learning-based Rapid Online Adaptive Control for Legged Robot Robust Locomotion,” en, in Robotics: Science and Systems XX , Robotics: Science and Systems Foundation, Jul. 2024, ISBN: 9798990284807. DOI: 10.15607/RSS.2024.XX.060. [44: Gassmann, Zacharias, Zollner, and Dillmann 2005]B. Gassmann, F. Zacharias, J.M. Zollner, and R. Dillmann, “Localization of Walking Robots,” in Proceedings of the 2005 IEEE International Conference on Robotics and Automation, ISSN: 1050-4729, Apr. 2005, pp. 1471–1476. DOI: 10 . 1109 / ROBOT.2005.1570322. [45: Chilian and Hirschmuller 2009]Annett Chilian and Heiko Hirschmuller, “Stereo camera based navigation of mobile robots on rough terrain,” en, in 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems, St. Louis, MO: IEEE, Oct. 2009, pp. 4571–4576, ISBN: 978-1-4244-3803-7. DOI: 10.1109/IROS.2009.5354535. [46: Cobano, Estremera, and Gonzalez de Santos 2008]J. A. Cobano, J. Estremera, and P. Gonzalez de Santos, “Location of legged robots in outdoor environments,” Robotics and Autonomous Systems, vol. 56, no. 9, pp. 751–761, Sep. 2008, ISSN: 0921-8890. DOI: 10.1016/j.robot.2007.12.003. [47: Bloesch, Hutter, Hoepflinger, Leutenegger, Gehring, Remy, and Siegwart 2013]Michael Bloesch, Marco Hutter, Mark A. Hoepflinger, Stefan Leutenegger, Christian Gehring, C. David Remy, and Roland Siegwart,“State Estimation for Legged Robots: Consistent Fusion of Leg Kinematics and IMU,” en, in Robotics, Nicholas Roy, Paul Newman, and Siddhartha Srinivasa, Eds., The MIT Press, Jul. 2013, pp. 17–24, ISBN: 978-0-262-31572-2. DOI: 10.7551/mitpress/9816.003.0008. [48: Kim, Hong, Ji, Jeon, Hwangbo, Oh, and Park 2021]Joon-Ha Kim, Seungwoo Hong, Gwanghyeon Ji, Seunghun Jeon, Jemin Hwangbo, Jun-Ho Oh, and Hae-Won Park, “Legged Robot State Estimation With Dynamic Contact Event Information,” IEEE Robotics and Automation Letters, vol. 6, no. 4, pp. 6733–6740, Oct. 2021, Conference Name: IEEE Robotics and Automation Letters, ISSN: 2377-3766. DOI: 10.1109/LRA.2021.3093876. [49: Ji, Mun, Kim, and Hwangbo 2022]Gwanghyeon Ji, Juhyeok Mun, Hyeongjun Kim, and Jemin Hwangbo, “Concurrent Training of a Control Policy and a State Estimator for Dynamic and Robust Legged Locomotion,”IEEE Robot. Autom. Lett., vol. 7, no. 2, pp. 4630–4637, Apr. 2022, arXiv:2202.05481 [cs], ISSN: 2377-3766, 2377-3774. DOI: 10.1109/LRA.2022.3151396. [50: Åström 1965]K. J Åström, “Optimal control of Markov processes with incomplete state information,” Journal of Mathematical Analysis and Applications, vol. 10, no. 1, pp. 174–205, Feb. 1965, ISSN: 0022-247X. DOI: 10.1016/0022-247X(65)90154-X. [51: Schulman, Wolski, Dhariwal, Radford, and Klimov 2017]John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov,“Proximal Policy Optimization Algorithms,” Aug. 2017, arXiv:1707.06347 [cs]. DOI: 10.48550/arXiv.1707.06347. [Online]. Available: http://arxiv.org/abs/1707.06347 (visited on 02/20/2025). [52: Schulman, Moritz, Levine, Jordan, and Abbeel 2018]John Schulman, Philipp Moritz, Sergey Levine, Michael Jordan, and Pieter Abbeel,“High-dimensional continuous control using generalized advantage estimation,”2018. arXiv: 1506.02438 [cs.LG]. [Online]. Available: https://arxiv.org/abs/1506.02438. [53: Zbontar, Jing, Misra, LeCun, and Deny 2021]Jure Zbontar, Li Jing, Ishan Misra, Yann LeCun, and Stephane Deny, “Barlow twins: Self-supervised learning via redundancy reduction,” in Proceedings of the 38th International Conference on Machine Learning, Marina Meila and Tong Zhang, Eds., ser. Proceedings of Machine Learning Research, vol. 139, PMLR, 18–24 Jul 2021, pp. 12 310–12 320. [54: Macenski, Foote, Gerkey, Lalancette, and Woodall 2022]Steven Macenski, Tully Foote, Brian Gerkey, Chris Lalancette, and William Woodall,“Robot operating system 2: Design, architecture, and uses in the wild,” Science Robotics, vol. 7, no. 66, eabm6074, 2022. DOI: 10.1126/scirobotics.abm6074. [55: Makoviychuk et al. 2021]Viktor Makoviychuk, Lukasz Wawrzyniak, Yunrong Guo, and et al., “Isaac Gym: High Performance GPU Based Physics Simulation For Robot Learning,” in Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks, J. Vanschoren and S. Yeung, Eds., vol. 1, 2021. [56: Yue, Zhou, Zhang, Hua, Wang, and Kou 2022]Wangyang Yue, Yuan Zhou, Xiaochuan Zhang, Yuchen Hua, Zhiyuan Wang, and Guang Kou, “AACC: Asymmetric Actor-Critic in Contextual Reinforcement Learning,”Aug.2022, arXiv:2208.02376 [cs]. DOI: 10 . 48550 / arXiv . 2208 . 02376.[Online]. Available: http://arxiv.org/abs/2208.02376 (visited on 04/22/2025). [57: Kingma and Ba 2015]Diederik P. Kingma and Jimmy Ba, “Adam: A method for stochastic optimization.,”in ICLR (Poster), Yoshua Bengio and Yann LeCun, Eds., 2015. [58: Maaten and Hinton 2008]Laurens van der Maaten and Geoffrey Hinton, “Visualizing Data using t-SNE,”Journal of Machine Learning Research, vol. 9, no. 86, pp. 2579–2605, 2008. [59: Todorov, Erez, and Tassa 2012]Emanuel Todorov, Tom Erez, and Yuval Tassa, “Mujoco: A physics engine for model-based control,” in 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, IEEE, 2012, pp. 5026–5033. DOI: 10.1109/IROS.2012.6386109. [60: 建築技術規則建築設計施工編 2025]建築技術規則建築設計施工編, “建築技術規則建築設計施工編-編章節條文-全國法規資料庫,” zh-Hant-TW, 2025. [Online]. Available: https://law.moj.gov.tw/LawClass/LawParaDeatil.aspx?pcode=D0070115&bp=9 (visited on 10/01/2025).	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/101839	-
dc.description.abstract	隨著機器人技術的進步，四足機器人在各種應用中變得越來越普遍, 在救災、巡檢等領域展現出巨大的潛力。然而，四足機器人在樓梯等非平坦的地形上行走仍然是一個具有挑戰性的問題。傳統的控制方法通常依賴於精確的動力學模型的控制策略，這在複雜和動態環境中可能表現不佳。近年來，強化學習已被證明是一種可行的方法，能透過環境互動學習最佳控制策略，但在應用於複雜地形時，仍受限於感測器雜訊、狀態資訊不全以及模擬與現實間的虛實差距。本研究提出了一種基於強化學習的四足機器人控制方法，針對具有高度落差之結構化地形上實現穩定的速度跟蹤控制。透過設計獎勵函數、狀態空間和動作空間，並結合域隨機化技術來縮小模擬與現實之間的差距。為克服感測器雜訊與資訊缺失，設計了一個結合顯式和隱式狀態估計器，以估計機器人的關鍵狀態。顯式估計器用於估計具有明確物理意義的狀態，而隱式估計器則利用自監督式學習來估計機器人在不同地形下的運動學特徵。在模擬環境中，我們於多種具有高度落差的結構化地形上進行廣泛實驗，結果顯示所提出的方法能在各種地形下維持穩定的速度跟蹤，並在通過率與追蹤誤差等指標上優於現有基線方法。其中，在階高 0.2 m 的樓梯地形上，本方法可達到 97.5% 的通過率。進一步地，我們將訓練完成的策略部署至實際四足機器人，並在符合建築法規最大階高與最小階寬之樓梯上進行測試，實驗結果顯示可達到100% 的通過率，證明所提出方法在現實世界樓梯環境中的可行性與有效性。	zh_TW
dc.description.abstract	With the rapid development of robotics, quadrupedal robots are becoming increasingly prevalent, demonstrating immense potential in applications such as disaster relief and inspection. However, traversing uneven terrain remains a significant challenge. Traitional control methods, often relying on precise dynamic models and hand-engineered strategies, may underperform in complex and dynamic environments. Recently, reinforcement learning (RL) has proven to be a feasible approach, enabling robots to learn optimal control strategies through continuous interaction with the environment. Nevertheless, existing RL methods still face difficulties in traversing stairs and uneven terrain, primarily due to sensor noise, partial state observability, and the sim-to-real gap. This thesis proposes an RL-based control method for quadruped robots to achieve stable velocity tracking on structured terrain with discrete height changes. We bridge the sim-to-real gap by designing specific reward functions, state and action spaces, and incorporating domain randomization techniques. To address sensor noise and incomplete state information, we design a hybrid state estimator combining explicit and implicit estimation. The explicit estimator estimates states with clear physical significance, while the implicit estimator utilizes self-supervised learning to capture the robot’s kinematic characteristics across different terrains. In simulation, extensive experiments on various structured terrains demonstrate that the proposed method maintains stable velocity tracking and outperforms existing baselines in metrics such as success rate and tracking error. Notably, the method achieves a 97.5% success rate on stairs with a step height of 0.2 m. Furthermore, we deployed the trained policy to a physical quadruped robot and tested it on stairs that comply with building regulations for maximum step height and minimum step width. The experimental results show a 100% success rate, verifying the feasibility and effectiveness of the proposed method in real-world stair environments.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2026-03-04T16:59:50Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2026-03-04T16:59:50Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	誌謝 i 摘要 iii ABSTRACT v CONTENTS vii LIST OF FIGURES xi LIST OF TABLES xiii Chapter 1 Introduction 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 Contribution of The Proposed Method . . . . . . . . . . . . . . . . . 7 1.4 Organization of the Thesis . . . . . . . . . . . . . . . . . . . . . . . 7 Chapter 2 Background and Literature Survey 9 2.1 Quadrupedal Robot . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2 Locomotion Controller . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2.1 Model-based Control . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2.2 Model-Free Control . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.2.2.1 CPG Locomotion . . . . . . . . . . . . . . . . . . . . . . 17 2.2.2.2 Proprioceptive Locomotion . . . . . . . . . . . . . . . . . 17 2.2.2.3 Exteroceptive Locomotion . . . . . . . . . . . . . . . . . . 18 2.2.2.4 Fault Talurant Locomotion . . . . . . . . . . . . . . . . . 19 2.2.3 Hybrid Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.3 State Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Chapter 3 Preliminaries 25 3.1 Quadruped Robot Basics . . . . . . . . . . . . . . . . . . . . . . . . 25 3.1.1 Coordinate System . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.2 Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . . . . 26 3.2.1 Partially Observable Markov Decision Process . . . . . . . . . . . 26 3.2.2 Proximal Policy Optimization (PPO) . . . . . . . . . . . . . . . . 28 3.3 State Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.3.1 Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.3.2 Self-supervised Learning with Barlow Twins . . . . . . . . . . . . 34 Chapter 4 System Overview 37 4.1 Control System Overview . . . . . . . . . . . . . . . . . . . . . . . 37 4.2 Hardware System . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 4.3 Simulation Environment . . . . . . . . . . . . . . . . . . . . . . . . 40 Chapter 5 Proposed Method 41 5.1 Locomotion Controller . . . . . . . . . . . . . . . . . . . . . . . . . 41 5.1.1 Observation Space . . . . . . . . . . . . . . . . . . . . . . . . . . 41 5.1.2 Reward Function . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 5.1.3 Action Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 5.1.4 Asymmetric Actor Critic Network Architecture . . . . . . . . . . . 45 5.1.4.1 Actor Network . . . . . . . . . . . . . . . . . . . . . . . . 45 5.1.4.2 Critic Network . . . . . . . . . . . . . . . . . . . . . . . . 45 5.2 State Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 5.2.1 Explicit Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . 47 5.2.2 Implicit Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . 50 5.3 Training Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . 51 5.3.1 Terrain Curriculum . . . . . . . . . . . . . . . . . . . . . . . . . . 51 5.3.2 Domain Randomization . . . . . . . . . . . . . . . . . . . . . . . 52 Chapter 6 Simulation and Experiments 53 6.1 Training Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 6.1.1 Evaluation Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 6.1.2 Comparison with Other Methods . . . . . . . . . . . . . . . . . . 54 6.1.3 Estimator Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 58 6.2 Simulation Experiments . . . . . . . . . . . . . . . . . . . . . . . . 60 6.2.1 Velocity Performance Evaluation . . . . . . . . . . . . . . . . . . 61 6.2.2 Flat Terrain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 6.2.2.1 Gait Characteristics . . . . . . . . . . . . . . . . . . . . . 69 6.2.2.2 Velocity Tracking Performance . . . . . . . . . . . . . . . 72 6.2.3 Stairs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 6.2.3.1 Success Rate . . . . . . . . . . . . . . . . . . . . . . . . . 75 6.2.3.2 Velocity Tracking Performance . . . . . . . . . . . . . . . 75 6.2.3.3 Analysis and Comparison . . . . . . . . . . . . . . . . . . 79 6.2.4 Discrete Step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 6.3 Real-World Experiments . . . . . . . . . . . . . . . . . . . . . . . . 90 6.3.1 Stair Climbing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 Chapter 7 Conclusion and Future Works 95 7.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 7.2 Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 References 97 Appendix A Training Details 109 A.1 Hyperparameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 A.1.1 PPO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 A.1.2 Implicit Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . 109 A.1.3 Explicit Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . 110 A.1.4 Height Scan Encoder . . . . . . . . . . . . . . . . . . . . . . . . . 110 A.2 Curriculum Learning . . . . . . . . . . . . . . . . . . . . . . . . . . 110 A.3 Domain Randomization . . . . . . . . . . . . . . . . . . . . . . . . 112 Appendix B Slope Terrain 113 B.1 Slope Terrain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 Appendix C Supplementary Experiments 117	-
dc.language.iso	en	-
dc.subject	強化學習	-
dc.subject	運動控制	-
dc.subject	狀態估測	-
dc.subject	四足機器人	-
dc.subject	Reinforcement Learning	-
dc.subject	Locomotion Control	-
dc.subject	State Estimation	-
dc.subject	Quadrupedal Robot	-
dc.title	基於強化學習之四足機器人運動控制結合顯式隱式狀態估測應用於高度落差之結構化地形	zh_TW
dc.title	Reinforcement Learning-Based Quadrupedal Locomotion Control Using Explicit and Implicit State Estimation on Structured Terrains with Height Discontinuities	en
dc.type	Thesis	-
dc.date.schoolyear	114-1	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	簡忠漢;李後燦;黃正民;江明理	zh_TW
dc.contributor.oralexamcommittee	Jong-Hann Jean;Hou-Tsan Lee;Cheng-Ming Huang;Ming-Li Chiang	en
dc.subject.keyword	強化學習,運動控制狀態估測四足機器人	zh_TW
dc.subject.keyword	Reinforcement Learning,Locomotion ControlState EstimationQuadrupedal Robot	en
dc.relation.page	119	-
dc.identifier.doi	10.6342/NTU202600602	-
dc.rights.note	同意授權(限校園內公開)	-
dc.date.accepted	2026-02-09	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	電機工程學系	-
dc.date.embargo-lift	2031-02-02	-
顯示於系所單位：	電機工程學系

文件中的檔案：

檔案	大小	格式
ntu-114-1.pdf 未授權公開取用	38.85 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。