在有效的非策略元强化学习中庆祝鲁棒性

Ziyi Liu, Zongyuan Li, Qianqian Cao, Yuan Wan, Xian Guo
{"title":"在有效的非策略元强化学习中庆祝鲁棒性","authors":"Ziyi Liu, Zongyuan Li, Qianqian Cao, Yuan Wan, Xian Guo","doi":"10.1109/RCAR54675.2022.9872291","DOIUrl":null,"url":null,"abstract":"Deep reinforcement learning algorithms can enable agents to learn policies for complex tasks without expert knowledge. However, the learned policies are typically specialized to one specific task and can not generalize to new tasks. While meta-reinforcement learning (meta-RL) algorithms can enable agents to solve new tasks based on prior experience, most of them build on on-policy reinforcement learning algorithms which require large amounts of samples during meta-training and do not consider task-specific features across different tasks and thus make it very difficult to train an agent with high performance. To address these challenges, in this paper, we propose an off-policy meta-RL algorithm abbreviated as CRL (Celebrating Robustness Learning) that disentangles task-specific policy parameters by an adapter network to shared low-level parameters, learns a probabilistic latent space to extract universal information across different tasks and perform temporal-extended exploration. Our approach outperforms baseline methods both in sample efficiency and asymptotic performance on several meta-RL benchmarks.","PeriodicalId":304963,"journal":{"name":"2022 IEEE International Conference on Real-time Computing and Robotics (RCAR)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Celebrating Robustness in Efficient Off-Policy Meta-Reinforcement Learning\",\"authors\":\"Ziyi Liu, Zongyuan Li, Qianqian Cao, Yuan Wan, Xian Guo\",\"doi\":\"10.1109/RCAR54675.2022.9872291\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep reinforcement learning algorithms can enable agents to learn policies for complex tasks without expert knowledge. However, the learned policies are typically specialized to one specific task and can not generalize to new tasks. While meta-reinforcement learning (meta-RL) algorithms can enable agents to solve new tasks based on prior experience, most of them build on on-policy reinforcement learning algorithms which require large amounts of samples during meta-training and do not consider task-specific features across different tasks and thus make it very difficult to train an agent with high performance. To address these challenges, in this paper, we propose an off-policy meta-RL algorithm abbreviated as CRL (Celebrating Robustness Learning) that disentangles task-specific policy parameters by an adapter network to shared low-level parameters, learns a probabilistic latent space to extract universal information across different tasks and perform temporal-extended exploration. Our approach outperforms baseline methods both in sample efficiency and asymptotic performance on several meta-RL benchmarks.\",\"PeriodicalId\":304963,\"journal\":{\"name\":\"2022 IEEE International Conference on Real-time Computing and Robotics (RCAR)\",\"volume\":\"22 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-07-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE International Conference on Real-time Computing and Robotics (RCAR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/RCAR54675.2022.9872291\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Real-time Computing and Robotics (RCAR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/RCAR54675.2022.9872291","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

深度强化学习算法可以使代理在没有专家知识的情况下学习复杂任务的策略。然而,学习到的策略通常是专门针对一个特定的任务,不能推广到新的任务。虽然元强化学习(meta-RL)算法可以使智能体根据先前的经验解决新任务,但它们大多建立在非策略强化学习算法的基础上,这些算法在元训练期间需要大量的样本,并且不考虑不同任务之间的特定任务特征,因此很难训练出高性能的智能体。为了解决这些挑战,在本文中,我们提出了一种off-policy - rl算法,缩写为CRL(庆祝鲁棒性学习),该算法通过适配器网络将特定于任务的策略参数分解为共享的低级参数,学习概率潜在空间以提取跨不同任务的通用信息并执行时间扩展探索。在几个元rl基准测试中,我们的方法在样本效率和渐近性能方面都优于基线方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Celebrating Robustness in Efficient Off-Policy Meta-Reinforcement Learning
Deep reinforcement learning algorithms can enable agents to learn policies for complex tasks without expert knowledge. However, the learned policies are typically specialized to one specific task and can not generalize to new tasks. While meta-reinforcement learning (meta-RL) algorithms can enable agents to solve new tasks based on prior experience, most of them build on on-policy reinforcement learning algorithms which require large amounts of samples during meta-training and do not consider task-specific features across different tasks and thus make it very difficult to train an agent with high performance. To address these challenges, in this paper, we propose an off-policy meta-RL algorithm abbreviated as CRL (Celebrating Robustness Learning) that disentangles task-specific policy parameters by an adapter network to shared low-level parameters, learns a probabilistic latent space to extract universal information across different tasks and perform temporal-extended exploration. Our approach outperforms baseline methods both in sample efficiency and asymptotic performance on several meta-RL benchmarks.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Depth Recognition of Hard Inclusions in Tissue Phantoms for Robotic Palpation Design of a Miniaturized Magnetic Actuation System for Motion Control of Micro/Nano Swimming Robots Energy Shaping Based Nonlinear Anti-Swing Controller for Double-Pendulum Rotary Crane with Distributed-Mass Beams RCAR 2022 Cover Page Design and Implementation of Robot Middleware Service Integration Framework Based on DDS
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1