{"title":"Walk These Ways: Tuning Robot Control for Generalization with Multiplicity of Behavior","authors":"G. Margolis","doi":"10.48550/arXiv.2212.03238","DOIUrl":null,"url":null,"abstract":"Learned locomotion policies can rapidly adapt to diverse environments similar to those experienced during training but lack a mechanism for fast tuning when they fail in an out-of-distribution test environment. This necessitates a slow and iterative cycle of reward and environment redesign to achieve good performance on a new task. As an alternative, we propose learning a single policy that encodes a structured family of locomotion strategies that solve training tasks in different ways, resulting in Multiplicity of Behavior (MoB). Different strategies generalize differently and can be chosen in real-time for new tasks or environments, bypassing the need for time-consuming retraining. We release a fast, robust open-source MoB locomotion controller, Walk These Ways, that can execute diverse gaits with variable footswing, posture, and speed, unlocking diverse downstream tasks: crouching, hopping, high-speed running, stair traversal, bracing against shoves, rhythmic dance, and more. Video and code release: https://gmargo11.github.io/walk-these-ways/","PeriodicalId":273870,"journal":{"name":"Conference on Robot Learning","volume":"53 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"29","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Conference on Robot Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2212.03238","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 29
Abstract
Learned locomotion policies can rapidly adapt to diverse environments similar to those experienced during training but lack a mechanism for fast tuning when they fail in an out-of-distribution test environment. This necessitates a slow and iterative cycle of reward and environment redesign to achieve good performance on a new task. As an alternative, we propose learning a single policy that encodes a structured family of locomotion strategies that solve training tasks in different ways, resulting in Multiplicity of Behavior (MoB). Different strategies generalize differently and can be chosen in real-time for new tasks or environments, bypassing the need for time-consuming retraining. We release a fast, robust open-source MoB locomotion controller, Walk These Ways, that can execute diverse gaits with variable footswing, posture, and speed, unlocking diverse downstream tasks: crouching, hopping, high-speed running, stair traversal, bracing against shoves, rhythmic dance, and more. Video and code release: https://gmargo11.github.io/walk-these-ways/
习得的运动策略可以快速适应不同的环境,类似于训练期间的经验,但缺乏在分布外测试环境中失败时快速调整的机制。这就需要一个缓慢而迭代的奖励和环境重新设计周期,以实现新任务的良好表现。作为替代方案,我们建议学习一个单一的策略,该策略编码一个结构化的运动策略家族,以不同的方式解决训练任务,从而产生行为多样性(MoB)。不同的策略有不同的概括方式,可以在新的任务或环境中实时选择,而无需耗时的再培训。我们发布了一个快速,强大的开源MoB运动控制器,Walk These Ways,可以执行不同的步态与可变的脚摆,姿势和速度,解锁不同的下游任务:蹲伏,跳跃,高速跑步,楼梯穿越,支撑对推搡,有节奏的舞蹈,和更多。视频及代码发布:https://gmargo11.github.io/walk-these-ways/