Towards zero-shot cross-agent transfer learning via latent-space universal notice network

IF 4.3 2区 计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS Robotics and Autonomous Systems Pub Date : 2024-11-22 DOI:10.1016/j.robot.2024.104862
Samuel Beaussant , Sebastien Lengagne , Benoit Thuilot , Olivier Stasse
{"title":"Towards zero-shot cross-agent transfer learning via latent-space universal notice network","authors":"Samuel Beaussant ,&nbsp;Sebastien Lengagne ,&nbsp;Benoit Thuilot ,&nbsp;Olivier Stasse","doi":"10.1016/j.robot.2024.104862","DOIUrl":null,"url":null,"abstract":"<div><div>Despite numerous improvements regarding the sample-efficiency of Reinforcement Learning (RL) methods, learning from scratch still requires millions (even dozens of millions) of interactions with the environment to converge to a high-reward policy. This is usually because the agent has no prior information about the task and its own physical embodiment. One way to address and mitigate this data-hungriness is to use Transfer Learning (TL). In this paper, we explore TL in the context of RL with the specific purpose of transferring policies from one agent to another, even in the presence of morphology discrepancies or different state–action spaces. We propose a process to leverage past knowledge from one agent (source) to speed up or even bypass the learning phase for a different agent (target) tackling the same task. Our proposed method first leverages Variational Auto-Encoders (VAE) to learn an agent-agnostic latent space from paired, time-aligned trajectories collected on a set of agents. Then, we train a policy embedded inside the created agent-invariant latent space to solve a given task, yielding a task-module reusable by any of the agents sharing this common feature space. Through several robotic tasks and heterogeneous hardware platforms, both in simulation and on physical robots, we show the benefits of our approach in terms of improved sample-efficiency. More specifically we report zero-shot generalization in some instances, where performances after transfer are recovered instantly. In worst case scenarios, performances are retrieved after fine-tuning on the target robot for a fraction of the training cost required to train a policy with similar performances from scratch.</div></div>","PeriodicalId":49592,"journal":{"name":"Robotics and Autonomous Systems","volume":"184 ","pages":"Article 104862"},"PeriodicalIF":4.3000,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Robotics and Autonomous Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S092188902400246X","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Despite numerous improvements regarding the sample-efficiency of Reinforcement Learning (RL) methods, learning from scratch still requires millions (even dozens of millions) of interactions with the environment to converge to a high-reward policy. This is usually because the agent has no prior information about the task and its own physical embodiment. One way to address and mitigate this data-hungriness is to use Transfer Learning (TL). In this paper, we explore TL in the context of RL with the specific purpose of transferring policies from one agent to another, even in the presence of morphology discrepancies or different state–action spaces. We propose a process to leverage past knowledge from one agent (source) to speed up or even bypass the learning phase for a different agent (target) tackling the same task. Our proposed method first leverages Variational Auto-Encoders (VAE) to learn an agent-agnostic latent space from paired, time-aligned trajectories collected on a set of agents. Then, we train a policy embedded inside the created agent-invariant latent space to solve a given task, yielding a task-module reusable by any of the agents sharing this common feature space. Through several robotic tasks and heterogeneous hardware platforms, both in simulation and on physical robots, we show the benefits of our approach in terms of improved sample-efficiency. More specifically we report zero-shot generalization in some instances, where performances after transfer are recovered instantly. In worst case scenarios, performances are retrieved after fine-tuning on the target robot for a fraction of the training cost required to train a policy with similar performances from scratch.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
通过潜空间通用通知网络实现零距离跨代理迁移学习
尽管强化学习(RL)方法在样本效率方面有了许多改进,但从头开始学习仍需要与环境进行数百万次(甚至数千万次)的交互,才能收敛到高回报策略。这通常是因为代理没有关于任务和自身物理体现的先验信息。解决和缓解这种数据饥渴症的方法之一是使用迁移学习(TL)。在本文中,我们探讨了 RL 背景下的 TL,其具体目的是将策略从一个代理转移到另一个代理,即使存在形态差异或不同的状态-行动空间。我们提出了一个流程,利用一个代理(源代理)过去的知识,加快甚至绕过处理相同任务的另一个代理(目标代理)的学习阶段。我们提出的方法首先利用变异自动编码器(VAE),从一组代理收集的成对、时间对齐的轨迹中学习一个与代理无关的潜在空间。然后,我们在所创建的与代理无关的潜空间内训练一个策略,以解决给定的任务,从而产生一个任务模块,可供共享这一共同特征空间的任何代理重复使用。通过几个机器人任务和异构硬件平台(包括模拟和物理机器人),我们展示了我们的方法在提高采样效率方面的优势。更具体地说,我们报告了某些情况下的零镜头泛化,即转移后的性能可立即恢复。在最坏的情况下,在目标机器人上进行微调后,只需花费从头开始训练具有类似性能的策略所需的一小部分训练成本,就能恢复性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Robotics and Autonomous Systems
Robotics and Autonomous Systems 工程技术-机器人学
CiteScore
9.00
自引率
7.00%
发文量
164
审稿时长
4.5 months
期刊介绍: Robotics and Autonomous Systems will carry articles describing fundamental developments in the field of robotics, with special emphasis on autonomous systems. An important goal of this journal is to extend the state of the art in both symbolic and sensory based robot control and learning in the context of autonomous systems. Robotics and Autonomous Systems will carry articles on the theoretical, computational and experimental aspects of autonomous systems, or modules of such systems.
期刊最新文献
MOVRO2: Loosely coupled monocular visual radar odometry using factor graph optimization Learning temporal maps of dynamics for mobile robots Towards zero-shot cross-agent transfer learning via latent-space universal notice network Delta- and Kalman-filter designs for multi-sensor pose estimation on spherical mobile mapping systems Safe tracking control for free-flying space robots via control barrier functions
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1