{"title":"通过潜空间通用通知网络实现零距离跨代理迁移学习","authors":"Samuel Beaussant , Sebastien Lengagne , Benoit Thuilot , Olivier Stasse","doi":"10.1016/j.robot.2024.104862","DOIUrl":null,"url":null,"abstract":"<div><div>Despite numerous improvements regarding the sample-efficiency of Reinforcement Learning (RL) methods, learning from scratch still requires millions (even dozens of millions) of interactions with the environment to converge to a high-reward policy. This is usually because the agent has no prior information about the task and its own physical embodiment. One way to address and mitigate this data-hungriness is to use Transfer Learning (TL). In this paper, we explore TL in the context of RL with the specific purpose of transferring policies from one agent to another, even in the presence of morphology discrepancies or different state–action spaces. We propose a process to leverage past knowledge from one agent (source) to speed up or even bypass the learning phase for a different agent (target) tackling the same task. Our proposed method first leverages Variational Auto-Encoders (VAE) to learn an agent-agnostic latent space from paired, time-aligned trajectories collected on a set of agents. Then, we train a policy embedded inside the created agent-invariant latent space to solve a given task, yielding a task-module reusable by any of the agents sharing this common feature space. Through several robotic tasks and heterogeneous hardware platforms, both in simulation and on physical robots, we show the benefits of our approach in terms of improved sample-efficiency. More specifically we report zero-shot generalization in some instances, where performances after transfer are recovered instantly. In worst case scenarios, performances are retrieved after fine-tuning on the target robot for a fraction of the training cost required to train a policy with similar performances from scratch.</div></div>","PeriodicalId":49592,"journal":{"name":"Robotics and Autonomous Systems","volume":"184 ","pages":"Article 104862"},"PeriodicalIF":4.3000,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Towards zero-shot cross-agent transfer learning via latent-space universal notice network\",\"authors\":\"Samuel Beaussant , Sebastien Lengagne , Benoit Thuilot , Olivier Stasse\",\"doi\":\"10.1016/j.robot.2024.104862\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Despite numerous improvements regarding the sample-efficiency of Reinforcement Learning (RL) methods, learning from scratch still requires millions (even dozens of millions) of interactions with the environment to converge to a high-reward policy. This is usually because the agent has no prior information about the task and its own physical embodiment. One way to address and mitigate this data-hungriness is to use Transfer Learning (TL). In this paper, we explore TL in the context of RL with the specific purpose of transferring policies from one agent to another, even in the presence of morphology discrepancies or different state–action spaces. We propose a process to leverage past knowledge from one agent (source) to speed up or even bypass the learning phase for a different agent (target) tackling the same task. Our proposed method first leverages Variational Auto-Encoders (VAE) to learn an agent-agnostic latent space from paired, time-aligned trajectories collected on a set of agents. Then, we train a policy embedded inside the created agent-invariant latent space to solve a given task, yielding a task-module reusable by any of the agents sharing this common feature space. Through several robotic tasks and heterogeneous hardware platforms, both in simulation and on physical robots, we show the benefits of our approach in terms of improved sample-efficiency. More specifically we report zero-shot generalization in some instances, where performances after transfer are recovered instantly. In worst case scenarios, performances are retrieved after fine-tuning on the target robot for a fraction of the training cost required to train a policy with similar performances from scratch.</div></div>\",\"PeriodicalId\":49592,\"journal\":{\"name\":\"Robotics and Autonomous Systems\",\"volume\":\"184 \",\"pages\":\"Article 104862\"},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2024-11-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Robotics and Autonomous Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S092188902400246X\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Robotics and Autonomous Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S092188902400246X","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
Towards zero-shot cross-agent transfer learning via latent-space universal notice network
Despite numerous improvements regarding the sample-efficiency of Reinforcement Learning (RL) methods, learning from scratch still requires millions (even dozens of millions) of interactions with the environment to converge to a high-reward policy. This is usually because the agent has no prior information about the task and its own physical embodiment. One way to address and mitigate this data-hungriness is to use Transfer Learning (TL). In this paper, we explore TL in the context of RL with the specific purpose of transferring policies from one agent to another, even in the presence of morphology discrepancies or different state–action spaces. We propose a process to leverage past knowledge from one agent (source) to speed up or even bypass the learning phase for a different agent (target) tackling the same task. Our proposed method first leverages Variational Auto-Encoders (VAE) to learn an agent-agnostic latent space from paired, time-aligned trajectories collected on a set of agents. Then, we train a policy embedded inside the created agent-invariant latent space to solve a given task, yielding a task-module reusable by any of the agents sharing this common feature space. Through several robotic tasks and heterogeneous hardware platforms, both in simulation and on physical robots, we show the benefits of our approach in terms of improved sample-efficiency. More specifically we report zero-shot generalization in some instances, where performances after transfer are recovered instantly. In worst case scenarios, performances are retrieved after fine-tuning on the target robot for a fraction of the training cost required to train a policy with similar performances from scratch.
期刊介绍:
Robotics and Autonomous Systems will carry articles describing fundamental developments in the field of robotics, with special emphasis on autonomous systems. An important goal of this journal is to extend the state of the art in both symbolic and sensory based robot control and learning in the context of autonomous systems.
Robotics and Autonomous Systems will carry articles on the theoretical, computational and experimental aspects of autonomous systems, or modules of such systems.