{"title":"SYNLOCO‐VE: Synthesizing central pattern generator with reinforcement learning and velocity estimator for quadruped locomotion","authors":"Xinyu Zhang, Zhiyuan Xiao, Xiang Zhou, Qingrui Zhang","doi":"10.1002/oca.3181","DOIUrl":null,"url":null,"abstract":"It is a challenging task to learn a robust and natural locomotion controller for quadruped robots at different terrains and velocities. In particular, the locomotion learning task will be even more difficult for the case with no exteroceptive sensors. In this article, the learning‐based locomotion control is, therefore, investigated for quadruped robots only using proprioceptive sensors. A new framework called SYNLOCO‐VE is proposed by synthesizing a feedforward gait planner, a trunk velocity estimator, and reinforcement learning (RL). The feedforward gait planner is developed based on the well‐known central pattern generator, but it can change the foot length for improved velocity tracking performance. The trunk velocity estimator is designed based on deep learning, which estimates the trunk velocity using historical data from proprioceptive sensors. The introduction of the trunk velocity estimator can mitigate the influence of the partial observation issue due to the lack of exteroceptive sensors. RL is employed to learn a feedback controller to regulate the robot gaits using feedback from proprioceptive sensors and the trunk velocity estimation. In the proposed framework, the feedforward gait planner can also guide the training process of RL, thus resulting in more stable and faster policy learning. Ablation studies are provided to demonstrate the efficiency of different modules in the proposed design. Extensive experiments are performed using a quadruped robot Go1, which only has proprioceptive sensors. The proposed framework is able to learn robust and stable locomotion at different terrains and tasks. Experimental comparisons are also conducted to illustrate the advantages of the proposed design over the state‐of‐the‐art methods.","PeriodicalId":501055,"journal":{"name":"Optimal Control Applications and Methods","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Optimal Control Applications and Methods","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/oca.3181","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
It is a challenging task to learn a robust and natural locomotion controller for quadruped robots at different terrains and velocities. In particular, the locomotion learning task will be even more difficult for the case with no exteroceptive sensors. In this article, the learning‐based locomotion control is, therefore, investigated for quadruped robots only using proprioceptive sensors. A new framework called SYNLOCO‐VE is proposed by synthesizing a feedforward gait planner, a trunk velocity estimator, and reinforcement learning (RL). The feedforward gait planner is developed based on the well‐known central pattern generator, but it can change the foot length for improved velocity tracking performance. The trunk velocity estimator is designed based on deep learning, which estimates the trunk velocity using historical data from proprioceptive sensors. The introduction of the trunk velocity estimator can mitigate the influence of the partial observation issue due to the lack of exteroceptive sensors. RL is employed to learn a feedback controller to regulate the robot gaits using feedback from proprioceptive sensors and the trunk velocity estimation. In the proposed framework, the feedforward gait planner can also guide the training process of RL, thus resulting in more stable and faster policy learning. Ablation studies are provided to demonstrate the efficiency of different modules in the proposed design. Extensive experiments are performed using a quadruped robot Go1, which only has proprioceptive sensors. The proposed framework is able to learn robust and stable locomotion at different terrains and tasks. Experimental comparisons are also conducted to illustrate the advantages of the proposed design over the state‐of‐the‐art methods.