Learning from Interventions: Human-robot interaction as both explicit and implicit feedback

Jonathan Spencer, Sanjiban Choudhury, Matt Barnes, Matt Schmittle, M. Chiang, P. Ramadge, S. Srinivasa
{"title":"Learning from Interventions: Human-robot interaction as both explicit and implicit feedback","authors":"Jonathan Spencer, Sanjiban Choudhury, Matt Barnes, Matt Schmittle, M. Chiang, P. Ramadge, S. Srinivasa","doi":"10.15607/rss.2020.xvi.055","DOIUrl":null,"url":null,"abstract":"—Scalable robot learning from seamless human-robot interaction is critical if robots are to solve a multitude of tasks in the real world. Current approaches to imitation learning suffer from one of two drawbacks. On the one hand, they rely solely on off-policy human demonstration, which in some cases leads to a mismatch in train-test distribution. On the other, they burden the human to label every state the learner visits, rendering it impractical in many applications. We argue that learning interactively from expert interventions enjoys the best of both worlds. Our key insight is that any amount of expert feedback, whether by intervention or non-intervention, provides information about the quality of the current state, the optimality of the action, or both. We formalize this as a constraint on the learner’s value function, which we can efficiently learn using no regret, online learning techniques. We call our approach Expert Intervention Learning (EIL), and evaluate it on a real and simulated driving task with a human expert, where it learns collision avoidance from scratch with just a few hundred samples (about one minute) of expert control.","PeriodicalId":231005,"journal":{"name":"Robotics: Science and Systems XVI","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"33","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Robotics: Science and Systems XVI","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.15607/rss.2020.xvi.055","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 33

Abstract

—Scalable robot learning from seamless human-robot interaction is critical if robots are to solve a multitude of tasks in the real world. Current approaches to imitation learning suffer from one of two drawbacks. On the one hand, they rely solely on off-policy human demonstration, which in some cases leads to a mismatch in train-test distribution. On the other, they burden the human to label every state the learner visits, rendering it impractical in many applications. We argue that learning interactively from expert interventions enjoys the best of both worlds. Our key insight is that any amount of expert feedback, whether by intervention or non-intervention, provides information about the quality of the current state, the optimality of the action, or both. We formalize this as a constraint on the learner’s value function, which we can efficiently learn using no regret, online learning techniques. We call our approach Expert Intervention Learning (EIL), and evaluate it on a real and simulated driving task with a human expert, where it learns collision avoidance from scratch with just a few hundred samples (about one minute) of expert control.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
从干预中学习:作为显式和隐式反馈的人机交互
如果机器人要解决现实世界中的大量任务,那么从无缝人机交互中进行可扩展的机器人学习是至关重要的。目前的模仿学习方法有两个缺点。一方面,它们完全依赖于非政策的人类演示,这在某些情况下会导致训练测试分布的不匹配。另一方面,它们给人类的负担是给学习者访问的每一个状态都贴上标签,这使得它在许多应用中不切实际。我们认为,从专家的干预中进行互动学习是两全其美的。我们的关键观点是,任何数量的专家反馈,无论是通过干预还是不干预,都能提供有关当前状态质量、行动的最佳性或两者兼而有之的信息。我们将其形式化为对学习者价值函数的约束,我们可以使用无悔在线学习技术有效地学习它。我们称我们的方法为专家干预学习(EIL),并与人类专家一起在真实和模拟的驾驶任务中对其进行评估,在那里它只需要几百个样本(大约一分钟)的专家控制就可以从零开始学习避免碰撞。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Non-revisiting Coverage Task with Minimal Discontinuities for Non-redundant Manipulators A Global Quasi-Dynamic Model for Contact-Trajectory Optimization in Manipulation Grounding Language to Non-Markovian Tasks with No Supervision of Task Specifications Towards neuromorphic control: A spiking neural network based PID controller for UAV Regularized Graph Matching for Correspondence Identification under Uncertainty in Collaborative Perception
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1