快速终身自适应逆强化学习演示

Conference on Robot Learning Pub Date : 2022-09-24 DOI:10.48550/arXiv.2209.11908

Letian Chen, Sravan Jayanthi, Rohan R. Paleja, Daniel Martin, Viacheslav Zakharov, M. Gombolay

{"title":"快速终身自适应逆强化学习演示","authors":"Letian Chen, Sravan Jayanthi, Rohan R. Paleja, Daniel Martin, Viacheslav Zakharov, M. Gombolay","doi":"10.48550/arXiv.2209.11908","DOIUrl":null,"url":null,"abstract":"Learning from Demonstration (LfD) approaches empower end-users to teach robots novel tasks via demonstrations of the desired behaviors, democratizing access to robotics. However, current LfD frameworks are not capable of fast adaptation to heterogeneous human demonstrations nor the large-scale deployment in ubiquitous robotics applications. In this paper, we propose a novel LfD framework, Fast Lifelong Adaptive Inverse Reinforcement learning (FLAIR). Our approach (1) leverages learned strategies to construct policy mixtures for fast adaptation to new demonstrations, allowing for quick end-user personalization, (2) distills common knowledge across demonstrations, achieving accurate task inference; and (3) expands its model only when needed in lifelong deployments, maintaining a concise set of prototypical strategies that can approximate all behaviors via policy mixtures. We empirically validate that FLAIR achieves adaptability (i.e., the robot adapts to heterogeneous, user-specific task preferences), efficiency (i.e., the robot achieves sample-efficient adaptation), and scalability (i.e., the model grows sublinearly with the number of demonstrations while maintaining high performance). FLAIR surpasses benchmarks across three control tasks with an average 57% improvement in policy returns and an average 78% fewer episodes required for demonstration modeling using policy mixtures. Finally, we demonstrate the success of FLAIR in a table tennis task and find users rate FLAIR as having higher task (p<.05) and personalization (p<.05) performance.","PeriodicalId":273870,"journal":{"name":"Conference on Robot Learning","volume":"109 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Fast Lifelong Adaptive Inverse Reinforcement Learning from Demonstrations\",\"authors\":\"Letian Chen, Sravan Jayanthi, Rohan R. Paleja, Daniel Martin, Viacheslav Zakharov, M. Gombolay\",\"doi\":\"10.48550/arXiv.2209.11908\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Learning from Demonstration (LfD) approaches empower end-users to teach robots novel tasks via demonstrations of the desired behaviors, democratizing access to robotics. However, current LfD frameworks are not capable of fast adaptation to heterogeneous human demonstrations nor the large-scale deployment in ubiquitous robotics applications. In this paper, we propose a novel LfD framework, Fast Lifelong Adaptive Inverse Reinforcement learning (FLAIR). Our approach (1) leverages learned strategies to construct policy mixtures for fast adaptation to new demonstrations, allowing for quick end-user personalization, (2) distills common knowledge across demonstrations, achieving accurate task inference; and (3) expands its model only when needed in lifelong deployments, maintaining a concise set of prototypical strategies that can approximate all behaviors via policy mixtures. We empirically validate that FLAIR achieves adaptability (i.e., the robot adapts to heterogeneous, user-specific task preferences), efficiency (i.e., the robot achieves sample-efficient adaptation), and scalability (i.e., the model grows sublinearly with the number of demonstrations while maintaining high performance). FLAIR surpasses benchmarks across three control tasks with an average 57% improvement in policy returns and an average 78% fewer episodes required for demonstration modeling using policy mixtures. Finally, we demonstrate the success of FLAIR in a table tennis task and find users rate FLAIR as having higher task (p<.05) and personalization (p<.05) performance.\",\"PeriodicalId\":273870,\"journal\":{\"name\":\"Conference on Robot Learning\",\"volume\":\"109 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-09-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Conference on Robot Learning\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.48550/arXiv.2209.11908\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Conference on Robot Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2209.11908","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

从演示中学习(LfD)方法使最终用户能够通过演示所需的行为来教授机器人新的任务，从而使机器人的使用民主化。然而，目前的LfD框架不能快速适应异类的人类演示，也不能在无处不在的机器人应用中大规模部署。在本文中，我们提出了一个新的LfD框架，快速终身自适应逆强化学习(FLAIR)。我们的方法(1)利用学习到的策略来构建策略混合物，以便快速适应新的演示，允许快速的最终用户个性化;(2)提取演示中的共同知识，实现准确的任务推理;(3)仅在终身部署需要时扩展其模型，维护一组简明的原型策略，可以通过策略混合近似所有行为。我们通过经验验证了FLAIR实现了适应性(即机器人适应异构的、用户特定的任务偏好)、效率(即机器人实现了样本高效适应)和可扩展性(即模型随着演示次数的增加而次线性增长，同时保持高性能)。FLAIR在三个控制任务中超过基准，策略回报平均提高57%，使用策略混合演示建模所需的集平均减少78%。最后，我们证明了FLAIR在乒乓球任务中的成功，并发现用户认为FLAIR具有更高的任务(p< 0.05)和个性化(p< 0.05)性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Fast Lifelong Adaptive Inverse Reinforcement Learning from Demonstrations

Learning from Demonstration (LfD) approaches empower end-users to teach robots novel tasks via demonstrations of the desired behaviors, democratizing access to robotics. However, current LfD frameworks are not capable of fast adaptation to heterogeneous human demonstrations nor the large-scale deployment in ubiquitous robotics applications. In this paper, we propose a novel LfD framework, Fast Lifelong Adaptive Inverse Reinforcement learning (FLAIR). Our approach (1) leverages learned strategies to construct policy mixtures for fast adaptation to new demonstrations, allowing for quick end-user personalization, (2) distills common knowledge across demonstrations, achieving accurate task inference; and (3) expands its model only when needed in lifelong deployments, maintaining a concise set of prototypical strategies that can approximate all behaviors via policy mixtures. We empirically validate that FLAIR achieves adaptability (i.e., the robot adapts to heterogeneous, user-specific task preferences), efficiency (i.e., the robot achieves sample-efficient adaptation), and scalability (i.e., the model grows sublinearly with the number of demonstrations while maintaining high performance). FLAIR surpasses benchmarks across three control tasks with an average 57% improvement in policy returns and an average 78% fewer episodes required for demonstration modeling using policy mixtures. Finally, we demonstrate the success of FLAIR in a table tennis task and find users rate FLAIR as having higher task (p<.05) and personalization (p<.05) performance.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助