走这些路:具有行为多样性的机器人泛化调谐控制

Conference on Robot Learning Pub Date : 2022-12-06 DOI:10.48550/arXiv.2212.03238

G. Margolis

{"title":"走这些路:具有行为多样性的机器人泛化调谐控制","authors":"G. Margolis","doi":"10.48550/arXiv.2212.03238","DOIUrl":null,"url":null,"abstract":"Learned locomotion policies can rapidly adapt to diverse environments similar to those experienced during training but lack a mechanism for fast tuning when they fail in an out-of-distribution test environment. This necessitates a slow and iterative cycle of reward and environment redesign to achieve good performance on a new task. As an alternative, we propose learning a single policy that encodes a structured family of locomotion strategies that solve training tasks in different ways, resulting in Multiplicity of Behavior (MoB). Different strategies generalize differently and can be chosen in real-time for new tasks or environments, bypassing the need for time-consuming retraining. We release a fast, robust open-source MoB locomotion controller, Walk These Ways, that can execute diverse gaits with variable footswing, posture, and speed, unlocking diverse downstream tasks: crouching, hopping, high-speed running, stair traversal, bracing against shoves, rhythmic dance, and more. Video and code release: https://gmargo11.github.io/walk-these-ways/","PeriodicalId":273870,"journal":{"name":"Conference on Robot Learning","volume":"53 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"29","resultStr":"{\"title\":\"Walk These Ways: Tuning Robot Control for Generalization with Multiplicity of Behavior\",\"authors\":\"G. Margolis\",\"doi\":\"10.48550/arXiv.2212.03238\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Learned locomotion policies can rapidly adapt to diverse environments similar to those experienced during training but lack a mechanism for fast tuning when they fail in an out-of-distribution test environment. This necessitates a slow and iterative cycle of reward and environment redesign to achieve good performance on a new task. As an alternative, we propose learning a single policy that encodes a structured family of locomotion strategies that solve training tasks in different ways, resulting in Multiplicity of Behavior (MoB). Different strategies generalize differently and can be chosen in real-time for new tasks or environments, bypassing the need for time-consuming retraining. We release a fast, robust open-source MoB locomotion controller, Walk These Ways, that can execute diverse gaits with variable footswing, posture, and speed, unlocking diverse downstream tasks: crouching, hopping, high-speed running, stair traversal, bracing against shoves, rhythmic dance, and more. Video and code release: https://gmargo11.github.io/walk-these-ways/\",\"PeriodicalId\":273870,\"journal\":{\"name\":\"Conference on Robot Learning\",\"volume\":\"53 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"29\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Conference on Robot Learning\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.48550/arXiv.2212.03238\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Conference on Robot Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2212.03238","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 29

摘要

习得的运动策略可以快速适应不同的环境，类似于训练期间的经验，但缺乏在分布外测试环境中失败时快速调整的机制。这就需要一个缓慢而迭代的奖励和环境重新设计周期，以实现新任务的良好表现。作为替代方案，我们建议学习一个单一的策略，该策略编码一个结构化的运动策略家族，以不同的方式解决训练任务，从而产生行为多样性(MoB)。不同的策略有不同的概括方式，可以在新的任务或环境中实时选择，而无需耗时的再培训。我们发布了一个快速，强大的开源MoB运动控制器，Walk These Ways，可以执行不同的步态与可变的脚摆，姿势和速度，解锁不同的下游任务:蹲伏，跳跃，高速跑步，楼梯穿越，支撑对推搡，有节奏的舞蹈，和更多。视频及代码发布:https://gmargo11.github.io/walk-these-ways/

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Walk These Ways: Tuning Robot Control for Generalization with Multiplicity of Behavior

Learned locomotion policies can rapidly adapt to diverse environments similar to those experienced during training but lack a mechanism for fast tuning when they fail in an out-of-distribution test environment. This necessitates a slow and iterative cycle of reward and environment redesign to achieve good performance on a new task. As an alternative, we propose learning a single policy that encodes a structured family of locomotion strategies that solve training tasks in different ways, resulting in Multiplicity of Behavior (MoB). Different strategies generalize differently and can be chosen in real-time for new tasks or environments, bypassing the need for time-consuming retraining. We release a fast, robust open-source MoB locomotion controller, Walk These Ways, that can execute diverse gaits with variable footswing, posture, and speed, unlocking diverse downstream tasks: crouching, hopping, high-speed running, stair traversal, bracing against shoves, rhythmic dance, and more. Video and code release: https://gmargo11.github.io/walk-these-ways/

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Conference on Robot Learning

自引率

0.00%

发文量

期刊最新文献

MResT: Multi-Resolution Sensing for Real-Time Control with Vision-Language Models Lidar Line Selection with Spatially-Aware Shapley Value for Cost-Efficient Depth Completion Safe Robot Learning in Assistive Devices through Neural Network Repair COACH: Cooperative Robot Teaching Learning Goal-Conditioned Policies Offline with Self-Supervised Reward Shaping