Distance Minimization for Reward Learning from Scored Trajectories

Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence Pub Date : 2016-02-12 DOI:10.1609/aaai.v30i1.10411

B. Burchfiel, Carlo Tomasi, Ronald E. Parr

{"title":"Distance Minimization for Reward Learning from Scored Trajectories","authors":"B. Burchfiel, Carlo Tomasi, Ronald E. Parr","doi":"10.1609/aaai.v30i1.10411","DOIUrl":null,"url":null,"abstract":"\n \n Many planning methods rely on the use of an immediate reward function as a portable and succinct representation of desired behavior. Rewards are often inferred from demonstrated behavior that is assumed to be near-optimal. We examine a framework, Distance Minimization IRL (DM-IRL), for learning reward functions from scores an expert assigns to possibly suboptimal demonstrations. By changing the expert’s role from a demonstrator to a judge, DM-IRL relaxes some of the assumptions present in IRL, enabling learning from the scoring of arbitrary demonstration trajectories with unknown transition functions. DM-IRL complements existing IRL approaches by addressing different assumptions about the expert. We show that DM-IRL is robust to expert scoring error and prove that finding a policy that produces maximally informative trajectories for an expert to score is strongly NP-hard. Experimentally, we demonstrate that the reward function DM-IRL learns from an MDP with an unknown transition model can transfer to an agent with known characteristics in a novel environment, and we achieve successful learning with limited available training data.\n \n","PeriodicalId":74506,"journal":{"name":"Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence","volume":"44 1","pages":"3330-3336"},"PeriodicalIF":0.0000,"publicationDate":"2016-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"32","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1609/aaai.v30i1.10411","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 32

Abstract

Many planning methods rely on the use of an immediate reward function as a portable and succinct representation of desired behavior. Rewards are often inferred from demonstrated behavior that is assumed to be near-optimal. We examine a framework, Distance Minimization IRL (DM-IRL), for learning reward functions from scores an expert assigns to possibly suboptimal demonstrations. By changing the expert’s role from a demonstrator to a judge, DM-IRL relaxes some of the assumptions present in IRL, enabling learning from the scoring of arbitrary demonstration trajectories with unknown transition functions. DM-IRL complements existing IRL approaches by addressing different assumptions about the expert. We show that DM-IRL is robust to expert scoring error and prove that finding a policy that produces maximally informative trajectories for an expert to score is strongly NP-hard. Experimentally, we demonstrate that the reward function DM-IRL learns from an MDP with an unknown transition model can transfer to an agent with known characteristics in a novel environment, and we achieve successful learning with limited available training data.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于得分轨迹奖励学习的距离最小化

许多计划方法依赖于使用即时奖励函数作为期望行为的可移植和简洁的表示。奖励通常是从被认为接近最佳的表现行为中推断出来的。我们研究了一个框架，距离最小化IRL (DM-IRL)，用于从专家分配给可能次优演示的分数中学习奖励函数。通过将专家的角色从演示者转变为判断者，DM-IRL放宽了IRL中存在的一些假设，从而能够从具有未知过渡函数的任意演示轨迹的评分中进行学习。DM-IRL通过解决关于专家的不同假设来补充现有的IRL方法。我们证明了DM-IRL对专家评分误差具有鲁棒性，并证明了寻找一个产生最大信息轨迹的策略是强np困难的。实验证明，从具有未知转移模型的MDP中学习的奖励函数DM-IRL可以在新的环境中转移到具有已知特征的agent上，并且我们在有限的可用训练数据下成功学习。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence

自引率

0.00%

发文量