Reinforcement learning with Demonstrations from Mismatched Task under Sparse Reward

Yanjiang Guo, Jingyue Gao, Zheng Wu, Chengming Shi, Jianyu Chen
{"title":"Reinforcement learning with Demonstrations from Mismatched Task under Sparse Reward","authors":"Yanjiang Guo, Jingyue Gao, Zheng Wu, Chengming Shi, Jianyu Chen","doi":"10.48550/arXiv.2212.01509","DOIUrl":null,"url":null,"abstract":"Reinforcement learning often suffer from the sparse reward issue in real-world robotics problems. Learning from demonstration (LfD) is an effective way to eliminate this problem, which leverages collected expert data to aid online learning. Prior works often assume that the learning agent and the expert aim to accomplish the same task, which requires collecting new data for every new task. In this paper, we consider the case where the target task is mismatched from but similar with that of the expert. Such setting can be challenging and we found existing LfD methods can not effectively guide learning in mismatched new tasks with sparse rewards. We propose conservative reward shaping from demonstration (CRSfD), which shapes the sparse rewards using estimated expert value function. To accelerate learning processes, CRSfD guides the agent to conservatively explore around demonstrations. Experimental results of robot manipulation tasks show that our approach outperforms baseline LfD methods when transferring demonstrations collected in a single task to other different but similar tasks.","PeriodicalId":273870,"journal":{"name":"Conference on Robot Learning","volume":"258 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Conference on Robot Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2212.01509","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Reinforcement learning often suffer from the sparse reward issue in real-world robotics problems. Learning from demonstration (LfD) is an effective way to eliminate this problem, which leverages collected expert data to aid online learning. Prior works often assume that the learning agent and the expert aim to accomplish the same task, which requires collecting new data for every new task. In this paper, we consider the case where the target task is mismatched from but similar with that of the expert. Such setting can be challenging and we found existing LfD methods can not effectively guide learning in mismatched new tasks with sparse rewards. We propose conservative reward shaping from demonstration (CRSfD), which shapes the sparse rewards using estimated expert value function. To accelerate learning processes, CRSfD guides the agent to conservatively explore around demonstrations. Experimental results of robot manipulation tasks show that our approach outperforms baseline LfD methods when transferring demonstrations collected in a single task to other different but similar tasks.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
稀疏奖励下不匹配任务的强化学习演示
在现实世界的机器人问题中,强化学习经常受到稀疏奖励问题的困扰。从演示中学习(LfD)是消除这一问题的有效方法,它利用收集的专家数据来帮助在线学习。以前的工作通常假设学习代理和专家的目标是完成相同的任务,这需要为每个新任务收集新的数据。本文考虑了目标任务与专家任务不匹配但与专家任务相似的情况。这样的设置是具有挑战性的,我们发现现有的LfD方法不能有效地指导在稀疏奖励的不匹配新任务中的学习。我们提出了基于证明的保守奖励塑造(CRSfD),它使用估计的专家值函数来塑造稀疏奖励。为了加速学习过程,CRSfD引导agent保守地探索周围的演示。机器人操作任务的实验结果表明,当将单个任务中收集的演示转移到其他不同但相似的任务时,我们的方法优于基线LfD方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
MResT: Multi-Resolution Sensing for Real-Time Control with Vision-Language Models Lidar Line Selection with Spatially-Aware Shapley Value for Cost-Efficient Depth Completion Safe Robot Learning in Assistive Devices through Neural Network Repair COACH: Cooperative Robot Teaching Learning Goal-Conditioned Policies Offline with Self-Supervised Reward Shaping
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1