Learning from Unreliable Human Action Advice in Interactive Reinforcement Learning

L. Scherf, Cigdem Turan, Dorothea Koert
{"title":"Learning from Unreliable Human Action Advice in Interactive Reinforcement Learning","authors":"L. Scherf, Cigdem Turan, Dorothea Koert","doi":"10.1109/Humanoids53995.2022.10000078","DOIUrl":null,"url":null,"abstract":"Interactive Reinforcement Learning (IRL) uses human input to improve learning speed and enable learning in more complex environments. Human action advice is here one of the input channels preferred by human users. However, many existing IRL approaches do not explicitly consider the possibility of inaccurate human action advice. Moreover, most approaches that account for inaccurate advice compute trust in human action advice independent of a state. This can lead to problems in practical cases, where human input might be inaccurate only in some states while it is still useful in others. To this end, we propose a novel algorithm that can handle state-dependent unreliable human action advice in IRL. Here, we combine three potential indicator signals for unreliable advice, i.e. consistency of advice, retrospective optimality of advice, and behavioral cues that hint at human uncertainty. We evaluate our method in a simulated gridworld and in robotic sorting tasks with 28 subjects. We show that our method outperforms a state-independent baseline and analyze occurrences of behavioral cues related to unreliable advice.","PeriodicalId":180816,"journal":{"name":"2022 IEEE-RAS 21st International Conference on Humanoid Robots (Humanoids)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE-RAS 21st International Conference on Humanoid Robots (Humanoids)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/Humanoids53995.2022.10000078","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Interactive Reinforcement Learning (IRL) uses human input to improve learning speed and enable learning in more complex environments. Human action advice is here one of the input channels preferred by human users. However, many existing IRL approaches do not explicitly consider the possibility of inaccurate human action advice. Moreover, most approaches that account for inaccurate advice compute trust in human action advice independent of a state. This can lead to problems in practical cases, where human input might be inaccurate only in some states while it is still useful in others. To this end, we propose a novel algorithm that can handle state-dependent unreliable human action advice in IRL. Here, we combine three potential indicator signals for unreliable advice, i.e. consistency of advice, retrospective optimality of advice, and behavioral cues that hint at human uncertainty. We evaluate our method in a simulated gridworld and in robotic sorting tasks with 28 subjects. We show that our method outperforms a state-independent baseline and analyze occurrences of behavioral cues related to unreliable advice.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
从交互式强化学习中不可靠的人类行为建议中学习
交互式强化学习(IRL)使用人工输入来提高学习速度,并使学习能够在更复杂的环境中进行。人类操作建议是人类用户首选的输入渠道之一。然而,许多现有的IRL方法并没有明确考虑不准确的人类行为建议的可能性。此外,大多数考虑不准确建议的方法都是独立于状态计算人类行为建议的信任。这在实际情况中可能会导致问题,即人工输入可能只在某些状态下不准确,而在其他状态下仍然有用。为此,我们提出了一种新的算法来处理IRL中状态依赖的不可靠的人类行为建议。在这里,我们结合了三个潜在的不可靠建议的指标信号,即建议的一致性,建议的回顾性最优性,以及暗示人类不确定性的行为线索。我们在模拟网格世界和28个受试者的机器人分类任务中评估了我们的方法。我们表明,我们的方法优于独立于状态的基线,并分析与不可靠建议相关的行为线索的发生。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Enabling Patient- and Teleoperator-led Robotic Physiotherapy via Strain Map Segmentation and Shared-authority Self-Contained Calibration of an Elastic Humanoid Upper Body Using Only a Head-Mounted RGB Camera Self-collision avoidance in bimanual teleoperation using CollisionIK: algorithm revision and usability experiment Bimanual Manipulation Workspace Analysis of Humanoid Robots with Object Specific Coupling Constraints A Dexterous, Adaptive, Affordable, Humanlike Robot Hand: Towards Prostheses with Dexterous Manipulation Capabilities
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1