Towards zero-shot cross-agent transfer learning via latent-space universal notice network

IF 5.2 2区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS Robotics and Autonomous Systems Pub Date : 2025-02-01 Epub Date: 2024-11-22 DOI:10.1016/j.robot.2024.104862

Samuel Beaussant , Sebastien Lengagne , Benoit Thuilot , Olivier Stasse

{"title":"Towards zero-shot cross-agent transfer learning via latent-space universal notice network","authors":"Samuel Beaussant , Sebastien Lengagne , Benoit Thuilot , Olivier Stasse","doi":"10.1016/j.robot.2024.104862","DOIUrl":null,"url":null,"abstract":"<div><div>Despite numerous improvements regarding the sample-efficiency of Reinforcement Learning (RL) methods, learning from scratch still requires millions (even dozens of millions) of interactions with the environment to converge to a high-reward policy. This is usually because the agent has no prior information about the task and its own physical embodiment. One way to address and mitigate this data-hungriness is to use Transfer Learning (TL). In this paper, we explore TL in the context of RL with the specific purpose of transferring policies from one agent to another, even in the presence of morphology discrepancies or different state–action spaces. We propose a process to leverage past knowledge from one agent (source) to speed up or even bypass the learning phase for a different agent (target) tackling the same task. Our proposed method first leverages Variational Auto-Encoders (VAE) to learn an agent-agnostic latent space from paired, time-aligned trajectories collected on a set of agents. Then, we train a policy embedded inside the created agent-invariant latent space to solve a given task, yielding a task-module reusable by any of the agents sharing this common feature space. Through several robotic tasks and heterogeneous hardware platforms, both in simulation and on physical robots, we show the benefits of our approach in terms of improved sample-efficiency. More specifically we report zero-shot generalization in some instances, where performances after transfer are recovered instantly. In worst case scenarios, performances are retrieved after fine-tuning on the target robot for a fraction of the training cost required to train a policy with similar performances from scratch.</div></div>","PeriodicalId":49592,"journal":{"name":"Robotics and Autonomous Systems","volume":"184 ","pages":"Article 104862"},"PeriodicalIF":5.2000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Robotics and Autonomous Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S092188902400246X","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/11/22 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Despite numerous improvements regarding the sample-efficiency of Reinforcement Learning (RL) methods, learning from scratch still requires millions (even dozens of millions) of interactions with the environment to converge to a high-reward policy. This is usually because the agent has no prior information about the task and its own physical embodiment. One way to address and mitigate this data-hungriness is to use Transfer Learning (TL). In this paper, we explore TL in the context of RL with the specific purpose of transferring policies from one agent to another, even in the presence of morphology discrepancies or different state–action spaces. We propose a process to leverage past knowledge from one agent (source) to speed up or even bypass the learning phase for a different agent (target) tackling the same task. Our proposed method first leverages Variational Auto-Encoders (VAE) to learn an agent-agnostic latent space from paired, time-aligned trajectories collected on a set of agents. Then, we train a policy embedded inside the created agent-invariant latent space to solve a given task, yielding a task-module reusable by any of the agents sharing this common feature space. Through several robotic tasks and heterogeneous hardware platforms, both in simulation and on physical robots, we show the benefits of our approach in terms of improved sample-efficiency. More specifically we report zero-shot generalization in some instances, where performances after transfer are recovered instantly. In worst case scenarios, performances are retrieved after fine-tuning on the target robot for a fraction of the training cost required to train a policy with similar performances from scratch.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

通过潜空间通用通知网络实现零距离跨代理迁移学习

尽管强化学习（RL）方法在样本效率方面有了许多改进，但从头开始学习仍需要与环境进行数百万次（甚至数千万次）的交互，才能收敛到高回报策略。这通常是因为代理没有关于任务和自身物理体现的先验信息。解决和缓解这种数据饥渴症的方法之一是使用迁移学习（TL）。在本文中，我们探讨了 RL 背景下的 TL，其具体目的是将策略从一个代理转移到另一个代理，即使存在形态差异或不同的状态-行动空间。我们提出了一个流程，利用一个代理（源代理）过去的知识，加快甚至绕过处理相同任务的另一个代理（目标代理）的学习阶段。我们提出的方法首先利用变异自动编码器（VAE），从一组代理收集的成对、时间对齐的轨迹中学习一个与代理无关的潜在空间。然后，我们在所创建的与代理无关的潜空间内训练一个策略，以解决给定的任务，从而产生一个任务模块，可供共享这一共同特征空间的任何代理重复使用。通过几个机器人任务和异构硬件平台（包括模拟和物理机器人），我们展示了我们的方法在提高采样效率方面的优势。更具体地说，我们报告了某些情况下的零镜头泛化，即转移后的性能可立即恢复。在最坏的情况下，在目标机器人上进行微调后，只需花费从头开始训练具有类似性能的策略所需的一小部分训练成本，就能恢复性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Robotics and Autonomous Systems 工程技术-机器人学

CiteScore

9.00

自引率

7.00%

发文量

164

审稿时长

4.5 months

期刊介绍： Robotics and Autonomous Systems will carry articles describing fundamental developments in the field of robotics, with special emphasis on autonomous systems. An important goal of this journal is to extend the state of the art in both symbolic and sensory based robot control and learning in the context of autonomous systems. Robotics and Autonomous Systems will carry articles on the theoretical, computational and experimental aspects of autonomous systems, or modules of such systems.