Fangzhou Yu, Ryan Batke, Jeremy Dao, J. Hurst, Kevin R. Green, Alan Fern
{"title":"Dynamic Bipedal Turning through Sim-to-Real Reinforcement Learning","authors":"Fangzhou Yu, Ryan Batke, Jeremy Dao, J. Hurst, Kevin R. Green, Alan Fern","doi":"10.1109/Humanoids53995.2022.10000225","DOIUrl":null,"url":null,"abstract":"For legged robots to match the athletic capabilities of humans and animals, they must not only produce robust periodic walking and running, but also seamlessly switch between nominal locomotion gaits and more specialized transient maneuvers. Despite recent advancements in controls of bipedal robots, there has been little focus on producing highly dynamic behaviors. Recent work utilizing reinforcement learning to produce policies for control of legged robots have demonstrated success in producing robust walking behaviors. However, these learned policies have difficulty expressing a multitude of different behaviors on a single network. Inspired by conventional optimization-based control techniques for legged robots, this work applies a recurrent policy to execute four-step, 90° turns trained using reference data generated from optimized single rigid body model trajectories. We present a training framework using epilogue terminal rewards for learning specific behaviors from pre-computed trajectory data and demonstrate a successful transfer to hardware on the bipedal robot Cassie.","PeriodicalId":180816,"journal":{"name":"2022 IEEE-RAS 21st International Conference on Humanoid Robots (Humanoids)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE-RAS 21st International Conference on Humanoid Robots (Humanoids)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/Humanoids53995.2022.10000225","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
For legged robots to match the athletic capabilities of humans and animals, they must not only produce robust periodic walking and running, but also seamlessly switch between nominal locomotion gaits and more specialized transient maneuvers. Despite recent advancements in controls of bipedal robots, there has been little focus on producing highly dynamic behaviors. Recent work utilizing reinforcement learning to produce policies for control of legged robots have demonstrated success in producing robust walking behaviors. However, these learned policies have difficulty expressing a multitude of different behaviors on a single network. Inspired by conventional optimization-based control techniques for legged robots, this work applies a recurrent policy to execute four-step, 90° turns trained using reference data generated from optimized single rigid body model trajectories. We present a training framework using epilogue terminal rewards for learning specific behaviors from pre-computed trajectory data and demonstrate a successful transfer to hardware on the bipedal robot Cassie.