首页 > 最新文献

Conference on Robot Learning最新文献

英文 中文
HERD: Continuous Human-to-Robot Evolution for Learning from Human Demonstration 从人类示范中学习的持续人机进化
Pub Date : 2022-12-08 DOI: 10.48550/arXiv.2212.04359
Xingyu Liu, Deepak Pathak, Kris M. Kitani
The ability to learn from human demonstration endows robots with the ability to automate various tasks. However, directly learning from human demonstration is challenging since the structure of the human hand can be very different from the desired robot gripper. In this work, we show that manipulation skills can be transferred from a human to a robot through the use of micro-evolutionary reinforcement learning, where a five-finger human dexterous hand robot gradually evolves into a commercial robot, while repeated interacting in a physics simulator to continuously update the policy that is first learned from human demonstration. To deal with the high dimensions of robot parameters, we propose an algorithm for multi-dimensional evolution path searching that allows joint optimization of both the robot evolution path and the policy. Through experiments on human object manipulation datasets, we show that our framework can efficiently transfer the expert human agent policy trained from human demonstrations in diverse modalities to target commercial robots.
从人类演示中学习的能力赋予了机器人自动化各种任务的能力。然而,直接从人类演示中学习是具有挑战性的,因为人手的结构可能与期望的机器人抓手非常不同。在这项工作中,我们展示了操作技能可以通过使用微进化强化学习从人类转移到机器人,其中五指人类灵巧手机器人逐渐演变为商业机器人,同时在物理模拟器中重复交互以不断更新首先从人类演示中学习到的策略。针对机器人参数的高维性,提出了一种多维进化路径搜索算法,实现了机器人进化路径和策略的联合优化。通过对人类对象操作数据集的实验,我们表明我们的框架可以有效地将从不同模式的人类演示中训练出来的专家人类代理策略转移到目标商业机器人上。
{"title":"HERD: Continuous Human-to-Robot Evolution for Learning from Human Demonstration","authors":"Xingyu Liu, Deepak Pathak, Kris M. Kitani","doi":"10.48550/arXiv.2212.04359","DOIUrl":"https://doi.org/10.48550/arXiv.2212.04359","url":null,"abstract":"The ability to learn from human demonstration endows robots with the ability to automate various tasks. However, directly learning from human demonstration is challenging since the structure of the human hand can be very different from the desired robot gripper. In this work, we show that manipulation skills can be transferred from a human to a robot through the use of micro-evolutionary reinforcement learning, where a five-finger human dexterous hand robot gradually evolves into a commercial robot, while repeated interacting in a physics simulator to continuously update the policy that is first learned from human demonstration. To deal with the high dimensions of robot parameters, we propose an algorithm for multi-dimensional evolution path searching that allows joint optimization of both the robot evolution path and the policy. Through experiments on human object manipulation datasets, we show that our framework can efficiently transfer the expert human agent policy trained from human demonstrations in diverse modalities to target commercial robots.","PeriodicalId":273870,"journal":{"name":"Conference on Robot Learning","volume":"109 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126373101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Modularity through Attention: Efficient Training and Transfer of Language-Conditioned Policies for Robot Manipulation 通过关注的模块化:机器人操作语言条件策略的有效训练和迁移
Pub Date : 2022-12-08 DOI: 10.48550/arXiv.2212.04573
Yifan Zhou, Shubham D. Sonawani, Mariano Phielipp, Simon Stepputtis, H. B. Amor
Language-conditioned policies allow robots to interpret and execute human instructions. Learning such policies requires a substantial investment with regards to time and compute resources. Still, the resulting controllers are highly device-specific and cannot easily be transferred to a robot with different morphology, capability, appearance or dynamics. In this paper, we propose a sample-efficient approach for training language-conditioned manipulation policies that allows for rapid transfer across different types of robots. By introducing a novel method, namely Hierarchical Modularity, and adopting supervised attention across multiple sub-modules, we bridge the divide between modular and end-to-end learning and enable the reuse of functional building blocks. In both simulated and real world robot manipulation experiments, we demonstrate that our method outperforms the current state-of-the-art methods and can transfer policies across 4 different robots in a sample-efficient manner. Finally, we show that the functionality of learned sub-modules is maintained beyond the training process and can be used to introspect the robot decision-making process. Code is available at https://github.com/ir-lab/ModAttn.
语言条件策略允许机器人解释和执行人类的指令。学习这些策略需要在时间和计算资源方面进行大量投资。然而,所得到的控制器是高度特定于设备的,不能轻易地转移到具有不同形态、能力、外观或动力学的机器人上。在本文中,我们提出了一种样本高效的方法来训练语言条件操作策略,该策略允许在不同类型的机器人之间快速转移。通过引入一种新颖的方法,即分层模块化,并在多个子模块之间采用监督关注,我们弥合了模块化和端到端学习之间的鸿沟,并实现了功能构建块的重用。在模拟和现实世界的机器人操作实验中,我们证明了我们的方法优于当前最先进的方法,并且可以以样本高效的方式在4个不同的机器人之间传递策略。最后,我们证明了学习到的子模块的功能在训练过程之外是保持的,并且可以用来反省机器人的决策过程。代码可从https://github.com/ir-lab/ModAttn获得。
{"title":"Modularity through Attention: Efficient Training and Transfer of Language-Conditioned Policies for Robot Manipulation","authors":"Yifan Zhou, Shubham D. Sonawani, Mariano Phielipp, Simon Stepputtis, H. B. Amor","doi":"10.48550/arXiv.2212.04573","DOIUrl":"https://doi.org/10.48550/arXiv.2212.04573","url":null,"abstract":"Language-conditioned policies allow robots to interpret and execute human instructions. Learning such policies requires a substantial investment with regards to time and compute resources. Still, the resulting controllers are highly device-specific and cannot easily be transferred to a robot with different morphology, capability, appearance or dynamics. In this paper, we propose a sample-efficient approach for training language-conditioned manipulation policies that allows for rapid transfer across different types of robots. By introducing a novel method, namely Hierarchical Modularity, and adopting supervised attention across multiple sub-modules, we bridge the divide between modular and end-to-end learning and enable the reuse of functional building blocks. In both simulated and real world robot manipulation experiments, we demonstrate that our method outperforms the current state-of-the-art methods and can transfer policies across 4 different robots in a sample-efficient manner. Finally, we show that the functionality of learned sub-modules is maintained beyond the training process and can be used to introspect the robot decision-making process. Code is available at https://github.com/ir-lab/ModAttn.","PeriodicalId":273870,"journal":{"name":"Conference on Robot Learning","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121548840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
See, Hear, and Feel: Smart Sensory Fusion for Robotic Manipulation 看、听和感觉:机器人操作的智能感官融合
Pub Date : 2022-12-07 DOI: 10.48550/arXiv.2212.03858
Hao Li, Yizhi Zhang, Junzhe Zhu, Shaoxiong Wang, Michelle A. Lee, Huazhe Xu, E. Adelson, Li Fei-Fei, Ruohan Gao, Jiajun Wu
Humans use all of their senses to accomplish different tasks in everyday activities. In contrast, existing work on robotic manipulation mostly relies on one, or occasionally two modalities, such as vision and touch. In this work, we systematically study how visual, auditory, and tactile perception can jointly help robots to solve complex manipulation tasks. We build a robot system that can see with a camera, hear with a contact microphone, and feel with a vision-based tactile sensor, with all three sensory modalities fused with a self-attention model. Results on two challenging tasks, dense packing and pouring, demonstrate the necessity and power of multisensory perception for robotic manipulation: vision displays the global status of the robot but can often suffer from occlusion, audio provides immediate feedback of key moments that are even invisible, and touch offers precise local geometry for decision making. Leveraging all three modalities, our robotic system significantly outperforms prior methods.
人类在日常活动中使用所有的感官来完成不同的任务。相比之下,现有的机器人操作工作主要依赖于一种或偶尔两种模式,如视觉和触觉。在这项工作中,我们系统地研究了视觉、听觉和触觉感知如何共同帮助机器人解决复杂的操作任务。我们建立了一个机器人系统,它可以用摄像头看,用接触式麦克风听,用基于视觉的触觉传感器感觉,这三种感觉模式都融合在一个自我关注模型中。两个具有挑战性的任务,密集包装和浇注的结果,证明了机器人操作的多感官感知的必要性和力量:视觉显示机器人的全局状态,但往往会受到遮挡,音频提供关键时刻的即时反馈,甚至是不可见的,触摸提供精确的局部几何形状的决策。利用这三种模式,我们的机器人系统明显优于之前的方法。
{"title":"See, Hear, and Feel: Smart Sensory Fusion for Robotic Manipulation","authors":"Hao Li, Yizhi Zhang, Junzhe Zhu, Shaoxiong Wang, Michelle A. Lee, Huazhe Xu, E. Adelson, Li Fei-Fei, Ruohan Gao, Jiajun Wu","doi":"10.48550/arXiv.2212.03858","DOIUrl":"https://doi.org/10.48550/arXiv.2212.03858","url":null,"abstract":"Humans use all of their senses to accomplish different tasks in everyday activities. In contrast, existing work on robotic manipulation mostly relies on one, or occasionally two modalities, such as vision and touch. In this work, we systematically study how visual, auditory, and tactile perception can jointly help robots to solve complex manipulation tasks. We build a robot system that can see with a camera, hear with a contact microphone, and feel with a vision-based tactile sensor, with all three sensory modalities fused with a self-attention model. Results on two challenging tasks, dense packing and pouring, demonstrate the necessity and power of multisensory perception for robotic manipulation: vision displays the global status of the robot but can often suffer from occlusion, audio provides immediate feedback of key moments that are even invisible, and touch offers precise local geometry for decision making. Leveraging all three modalities, our robotic system significantly outperforms prior methods.","PeriodicalId":273870,"journal":{"name":"Conference on Robot Learning","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131903441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Few-Shot Preference Learning for Human-in-the-Loop RL 人在环强化学习的少镜头偏好学习
Pub Date : 2022-12-06 DOI: 10.48550/arXiv.2212.03363
Joey Hejna, Dorsa Sadigh
While reinforcement learning (RL) has become a more popular approach for robotics, designing sufficiently informative reward functions for complex tasks has proven to be extremely difficult due their inability to capture human intent and policy exploitation. Preference based RL algorithms seek to overcome these challenges by directly learning reward functions from human feedback. Unfortunately, prior work either requires an unreasonable number of queries implausible for any human to answer or overly restricts the class of reward functions to guarantee the elicitation of the most informative queries, resulting in models that are insufficiently expressive for realistic robotics tasks. Contrary to most works that focus on query selection to emph{minimize} the amount of data required for learning reward functions, we take an opposite approach: emph{expanding} the pool of available data by viewing human-in-the-loop RL through the more flexible lens of multi-task learning. Motivated by the success of meta-learning, we pre-train preference models on prior task data and quickly adapt them for new tasks using only a handful of queries. Empirically, we reduce the amount of online feedback needed to train manipulation policies in Meta-World by 20$times$, and demonstrate the effectiveness of our method on a real Franka Panda Robot. Moreover, this reduction in query-complexity allows us to train robot policies from actual human users. Videos of our results and code can be found at https://sites.google.com/view/few-shot-preference-rl/home.
虽然强化学习(RL)已成为机器人技术的一种更流行的方法,但由于无法捕捉人类意图和策略利用,为复杂任务设计足够信息的奖励函数已被证明是极其困难的。基于偏好的强化学习算法试图通过直接从人类反馈中学习奖励函数来克服这些挑战。不幸的是,之前的工作要么需要不合理的查询数量,不可能让任何人回答,要么过度限制奖励函数的类别,以保证获得最具信息量的查询,导致模型对现实机器人任务的表达能力不足。与大多数专注于查询选择以emph{最小化}学习奖励函数所需的数据量的工作相反,我们采取了相反的方法:通过更灵活的多任务学习视角来看待人在循环强化学习,从而emph{扩大}可用数据池。在元学习成功的激励下,我们在先前的任务数据上预训练偏好模型,并使用少量查询快速调整它们以适应新任务。经验上,我们将Meta-World中训练操纵策略所需的在线反馈量减少了20 $times$,并在真实的Franka Panda机器人上证明了我们的方法的有效性。此外,这种查询复杂性的降低使我们能够从实际的人类用户中训练机器人策略。我们的结果和代码的视频可以在https://sites.google.com/view/few-shot-preference-rl/home上找到。
{"title":"Few-Shot Preference Learning for Human-in-the-Loop RL","authors":"Joey Hejna, Dorsa Sadigh","doi":"10.48550/arXiv.2212.03363","DOIUrl":"https://doi.org/10.48550/arXiv.2212.03363","url":null,"abstract":"While reinforcement learning (RL) has become a more popular approach for robotics, designing sufficiently informative reward functions for complex tasks has proven to be extremely difficult due their inability to capture human intent and policy exploitation. Preference based RL algorithms seek to overcome these challenges by directly learning reward functions from human feedback. Unfortunately, prior work either requires an unreasonable number of queries implausible for any human to answer or overly restricts the class of reward functions to guarantee the elicitation of the most informative queries, resulting in models that are insufficiently expressive for realistic robotics tasks. Contrary to most works that focus on query selection to emph{minimize} the amount of data required for learning reward functions, we take an opposite approach: emph{expanding} the pool of available data by viewing human-in-the-loop RL through the more flexible lens of multi-task learning. Motivated by the success of meta-learning, we pre-train preference models on prior task data and quickly adapt them for new tasks using only a handful of queries. Empirically, we reduce the amount of online feedback needed to train manipulation policies in Meta-World by 20$times$, and demonstrate the effectiveness of our method on a real Franka Panda Robot. Moreover, this reduction in query-complexity allows us to train robot policies from actual human users. Videos of our results and code can be found at https://sites.google.com/view/few-shot-preference-rl/home.","PeriodicalId":273870,"journal":{"name":"Conference on Robot Learning","volume":"81 7","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114050662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Walk These Ways: Tuning Robot Control for Generalization with Multiplicity of Behavior 走这些路:具有行为多样性的机器人泛化调谐控制
Pub Date : 2022-12-06 DOI: 10.48550/arXiv.2212.03238
G. Margolis
Learned locomotion policies can rapidly adapt to diverse environments similar to those experienced during training but lack a mechanism for fast tuning when they fail in an out-of-distribution test environment. This necessitates a slow and iterative cycle of reward and environment redesign to achieve good performance on a new task. As an alternative, we propose learning a single policy that encodes a structured family of locomotion strategies that solve training tasks in different ways, resulting in Multiplicity of Behavior (MoB). Different strategies generalize differently and can be chosen in real-time for new tasks or environments, bypassing the need for time-consuming retraining. We release a fast, robust open-source MoB locomotion controller, Walk These Ways, that can execute diverse gaits with variable footswing, posture, and speed, unlocking diverse downstream tasks: crouching, hopping, high-speed running, stair traversal, bracing against shoves, rhythmic dance, and more. Video and code release: https://gmargo11.github.io/walk-these-ways/
习得的运动策略可以快速适应不同的环境,类似于训练期间的经验,但缺乏在分布外测试环境中失败时快速调整的机制。这就需要一个缓慢而迭代的奖励和环境重新设计周期,以实现新任务的良好表现。作为替代方案,我们建议学习一个单一的策略,该策略编码一个结构化的运动策略家族,以不同的方式解决训练任务,从而产生行为多样性(MoB)。不同的策略有不同的概括方式,可以在新的任务或环境中实时选择,而无需耗时的再培训。我们发布了一个快速,强大的开源MoB运动控制器,Walk These Ways,可以执行不同的步态与可变的脚摆,姿势和速度,解锁不同的下游任务:蹲伏,跳跃,高速跑步,楼梯穿越,支撑对推搡,有节奏的舞蹈,和更多。视频及代码发布:https://gmargo11.github.io/walk-these-ways/
{"title":"Walk These Ways: Tuning Robot Control for Generalization with Multiplicity of Behavior","authors":"G. Margolis","doi":"10.48550/arXiv.2212.03238","DOIUrl":"https://doi.org/10.48550/arXiv.2212.03238","url":null,"abstract":"Learned locomotion policies can rapidly adapt to diverse environments similar to those experienced during training but lack a mechanism for fast tuning when they fail in an out-of-distribution test environment. This necessitates a slow and iterative cycle of reward and environment redesign to achieve good performance on a new task. As an alternative, we propose learning a single policy that encodes a structured family of locomotion strategies that solve training tasks in different ways, resulting in Multiplicity of Behavior (MoB). Different strategies generalize differently and can be chosen in real-time for new tasks or environments, bypassing the need for time-consuming retraining. We release a fast, robust open-source MoB locomotion controller, Walk These Ways, that can execute diverse gaits with variable footswing, posture, and speed, unlocking diverse downstream tasks: crouching, hopping, high-speed running, stair traversal, bracing against shoves, rhythmic dance, and more. Video and code release: https://gmargo11.github.io/walk-these-ways/","PeriodicalId":273870,"journal":{"name":"Conference on Robot Learning","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125698143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
Learning Representations that Enable Generalization in Assistive Tasks 辅助任务中实现泛化的学习表征
Pub Date : 2022-12-05 DOI: 10.48550/arXiv.2212.03175
Jerry Zhi-Yang He, Aditi Raghunathan, Daniel S. Brown, Zackory M. Erickson, A. Dragan
Recent work in sim2real has successfully enabled robots to act in physical environments by training in simulation with a diverse ''population'' of environments (i.e. domain randomization). In this work, we focus on enabling generalization in assistive tasks: tasks in which the robot is acting to assist a user (e.g. helping someone with motor impairments with bathing or with scratching an itch). Such tasks are particularly interesting relative to prior sim2real successes because the environment now contains a human who is also acting. This complicates the problem because the diversity of human users (instead of merely physical environment parameters) is more difficult to capture in a population, thus increasing the likelihood of encountering out-of-distribution (OOD) human policies at test time. We advocate that generalization to such OOD policies benefits from (1) learning a good latent representation for human policies that test-time humans can accurately be mapped to, and (2) making that representation adaptable with test-time interaction data, instead of relying on it to perfectly capture the space of human policies based on the simulated population only. We study how to best learn such a representation by evaluating on purposefully constructed OOD test policies. We find that sim2real methods that encode environment (or population) parameters and work well in tasks that robots do in isolation, do not work well in assistance. In assistance, it seems crucial to train the representation based on the history of interaction directly, because that is what the robot will have access to at test time. Further, training these representations to then predict human actions not only gives them better structure, but also enables them to be fine-tuned at test-time, when the robot observes the partner act. https://adaptive-caregiver.github.io.
sim2real最近的工作已经成功地使机器人能够在物理环境中通过不同的“人口”环境(即领域随机化)进行模拟训练。在这项工作中,我们专注于在辅助任务中实现泛化:机器人协助用户的任务(例如,帮助有运动障碍的人洗澡或搔痒)。与之前的模拟现实成功相比,这样的任务特别有趣,因为现在的环境中包含了一个也在演戏的人。这使问题复杂化,因为人类用户的多样性(而不仅仅是物理环境参数)更难以在种群中捕获,从而增加了在测试时遇到分布外(OOD)人类策略的可能性。我们主张对这种OOD策略的泛化受益于(1)学习人类策略的良好潜在表示,可以准确地映射到测试时间的人类,以及(2)使该表示与测试时间交互数据相适应,而不是依赖于它来完全捕获基于模拟人口的人类策略空间。我们通过评估有目的构建的OOD测试策略来研究如何最好地学习这种表示。我们发现编码环境(或人口)参数的sim2real方法在机器人孤立完成的任务中工作得很好,但在辅助任务中工作得不好。在辅助方面,根据交互历史直接训练表征似乎至关重要,因为这是机器人在测试时可以访问的内容。此外,训练这些表征来预测人类的行为,不仅可以给它们提供更好的结构,还可以在测试时对它们进行微调,当机器人观察同伴的行为时。https://adaptive-caregiver.github.io。
{"title":"Learning Representations that Enable Generalization in Assistive Tasks","authors":"Jerry Zhi-Yang He, Aditi Raghunathan, Daniel S. Brown, Zackory M. Erickson, A. Dragan","doi":"10.48550/arXiv.2212.03175","DOIUrl":"https://doi.org/10.48550/arXiv.2212.03175","url":null,"abstract":"Recent work in sim2real has successfully enabled robots to act in physical environments by training in simulation with a diverse ''population'' of environments (i.e. domain randomization). In this work, we focus on enabling generalization in assistive tasks: tasks in which the robot is acting to assist a user (e.g. helping someone with motor impairments with bathing or with scratching an itch). Such tasks are particularly interesting relative to prior sim2real successes because the environment now contains a human who is also acting. This complicates the problem because the diversity of human users (instead of merely physical environment parameters) is more difficult to capture in a population, thus increasing the likelihood of encountering out-of-distribution (OOD) human policies at test time. We advocate that generalization to such OOD policies benefits from (1) learning a good latent representation for human policies that test-time humans can accurately be mapped to, and (2) making that representation adaptable with test-time interaction data, instead of relying on it to perfectly capture the space of human policies based on the simulated population only. We study how to best learn such a representation by evaluating on purposefully constructed OOD test policies. We find that sim2real methods that encode environment (or population) parameters and work well in tasks that robots do in isolation, do not work well in assistance. In assistance, it seems crucial to train the representation based on the history of interaction directly, because that is what the robot will have access to at test time. Further, training these representations to then predict human actions not only gives them better structure, but also enables them to be fine-tuned at test-time, when the robot observes the partner act. https://adaptive-caregiver.github.io.","PeriodicalId":273870,"journal":{"name":"Conference on Robot Learning","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115143612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Reinforcement learning with Demonstrations from Mismatched Task under Sparse Reward 稀疏奖励下不匹配任务的强化学习演示
Pub Date : 2022-12-03 DOI: 10.48550/arXiv.2212.01509
Yanjiang Guo, Jingyue Gao, Zheng Wu, Chengming Shi, Jianyu Chen
Reinforcement learning often suffer from the sparse reward issue in real-world robotics problems. Learning from demonstration (LfD) is an effective way to eliminate this problem, which leverages collected expert data to aid online learning. Prior works often assume that the learning agent and the expert aim to accomplish the same task, which requires collecting new data for every new task. In this paper, we consider the case where the target task is mismatched from but similar with that of the expert. Such setting can be challenging and we found existing LfD methods can not effectively guide learning in mismatched new tasks with sparse rewards. We propose conservative reward shaping from demonstration (CRSfD), which shapes the sparse rewards using estimated expert value function. To accelerate learning processes, CRSfD guides the agent to conservatively explore around demonstrations. Experimental results of robot manipulation tasks show that our approach outperforms baseline LfD methods when transferring demonstrations collected in a single task to other different but similar tasks.
在现实世界的机器人问题中,强化学习经常受到稀疏奖励问题的困扰。从演示中学习(LfD)是消除这一问题的有效方法,它利用收集的专家数据来帮助在线学习。以前的工作通常假设学习代理和专家的目标是完成相同的任务,这需要为每个新任务收集新的数据。本文考虑了目标任务与专家任务不匹配但与专家任务相似的情况。这样的设置是具有挑战性的,我们发现现有的LfD方法不能有效地指导在稀疏奖励的不匹配新任务中的学习。我们提出了基于证明的保守奖励塑造(CRSfD),它使用估计的专家值函数来塑造稀疏奖励。为了加速学习过程,CRSfD引导agent保守地探索周围的演示。机器人操作任务的实验结果表明,当将单个任务中收集的演示转移到其他不同但相似的任务时,我们的方法优于基线LfD方法。
{"title":"Reinforcement learning with Demonstrations from Mismatched Task under Sparse Reward","authors":"Yanjiang Guo, Jingyue Gao, Zheng Wu, Chengming Shi, Jianyu Chen","doi":"10.48550/arXiv.2212.01509","DOIUrl":"https://doi.org/10.48550/arXiv.2212.01509","url":null,"abstract":"Reinforcement learning often suffer from the sparse reward issue in real-world robotics problems. Learning from demonstration (LfD) is an effective way to eliminate this problem, which leverages collected expert data to aid online learning. Prior works often assume that the learning agent and the expert aim to accomplish the same task, which requires collecting new data for every new task. In this paper, we consider the case where the target task is mismatched from but similar with that of the expert. Such setting can be challenging and we found existing LfD methods can not effectively guide learning in mismatched new tasks with sparse rewards. We propose conservative reward shaping from demonstration (CRSfD), which shapes the sparse rewards using estimated expert value function. To accelerate learning processes, CRSfD guides the agent to conservatively explore around demonstrations. Experimental results of robot manipulation tasks show that our approach outperforms baseline LfD methods when transferring demonstrations collected in a single task to other different but similar tasks.","PeriodicalId":273870,"journal":{"name":"Conference on Robot Learning","volume":"258 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122742514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Embedding Synthetic Off-Policy Experience for Autonomous Driving via Zero-Shot Curricula 通过零射击课程嵌入自动驾驶的综合非政策经验
Pub Date : 2022-12-02 DOI: 10.48550/arXiv.2212.01375
Eli Bronstein, S. Srinivasan, Supratik Paul, Aman Sinha, Matthew O'Kelly, Payam Nikdel, Shimon Whiteson
ML-based motion planning is a promising approach to produce agents that exhibit complex behaviors, and automatically adapt to novel environments. In the context of autonomous driving, it is common to treat all available training data equally. However, this approach produces agents that do not perform robustly in safety-critical settings, an issue that cannot be addressed by simply adding more data to the training set - we show that an agent trained using only a 10% subset of the data performs just as well as an agent trained on the entire dataset. We present a method to predict the inherent difficulty of a driving situation given data collected from a fleet of autonomous vehicles deployed on public roads. We then demonstrate that this difficulty score can be used in a zero-shot transfer to generate curricula for an imitation-learning based planning agent. Compared to training on the entire unbiased training dataset, we show that prioritizing difficult driving scenarios both reduces collisions by 15% and increases route adherence by 14% in closed-loop evaluation, all while using only 10% of the training data.
基于机器学习的运动规划是一种很有前途的方法,可以产生具有复杂行为并自动适应新环境的智能体。在自动驾驶的背景下,通常平等地对待所有可用的训练数据。然而,这种方法产生的代理不能在安全关键设置中执行健壮性,这是一个不能通过简单地向训练集中添加更多数据来解决的问题——我们表明,仅使用10%的数据子集训练的代理的性能与在整个数据集上训练的代理一样好。我们提出了一种方法,根据从公共道路上部署的自动驾驶汽车车队收集的数据,预测驾驶情况的固有难度。然后,我们证明了该难度分数可以用于零次迁移,为基于模仿学习的规划代理生成课程。与在整个无偏训练数据集上进行训练相比,我们发现在闭环评估中,优先考虑困难驾驶场景既减少了15%的碰撞,又增加了14%的路线依从性,而这一切都只使用了10%的训练数据。
{"title":"Embedding Synthetic Off-Policy Experience for Autonomous Driving via Zero-Shot Curricula","authors":"Eli Bronstein, S. Srinivasan, Supratik Paul, Aman Sinha, Matthew O'Kelly, Payam Nikdel, Shimon Whiteson","doi":"10.48550/arXiv.2212.01375","DOIUrl":"https://doi.org/10.48550/arXiv.2212.01375","url":null,"abstract":"ML-based motion planning is a promising approach to produce agents that exhibit complex behaviors, and automatically adapt to novel environments. In the context of autonomous driving, it is common to treat all available training data equally. However, this approach produces agents that do not perform robustly in safety-critical settings, an issue that cannot be addressed by simply adding more data to the training set - we show that an agent trained using only a 10% subset of the data performs just as well as an agent trained on the entire dataset. We present a method to predict the inherent difficulty of a driving situation given data collected from a fleet of autonomous vehicles deployed on public roads. We then demonstrate that this difficulty score can be used in a zero-shot transfer to generate curricula for an imitation-learning based planning agent. Compared to training on the entire unbiased training dataset, we show that prioritizing difficult driving scenarios both reduces collisions by 15% and increases route adherence by 14% in closed-loop evaluation, all while using only 10% of the training data.","PeriodicalId":273870,"journal":{"name":"Conference on Robot Learning","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129335154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Proactive Robot Assistance via Spatio-Temporal Object Modeling 基于时空对象建模的主动机器人辅助
Pub Date : 2022-11-28 DOI: 10.48550/arXiv.2211.15501
Maithili Patel, S. Chernova
Proactive robot assistance enables a robot to anticipate and provide for a user's needs without being explicitly asked. We formulate proactive assistance as the problem of the robot anticipating temporal patterns of object movements associated with everyday user routines, and proactively assisting the user by placing objects to adapt the environment to their needs. We introduce a generative graph neural network to learn a unified spatio-temporal predictive model of object dynamics from temporal sequences of object arrangements. We additionally contribute the Household Object Movements from Everyday Routines (HOMER) dataset, which tracks household objects associated with human activities of daily living across 50+ days for five simulated households. Our model outperforms the leading baseline in predicting object movement, correctly predicting locations for 11.1% more objects and wrongly predicting locations for 11.5% fewer objects used by the human user.
主动机器人辅助使机器人能够在没有明确要求的情况下预测并提供用户的需求。我们将主动协助定义为机器人预测与日常用户程序相关的物体运动的时间模式的问题,并通过放置物体以适应用户的需求来主动协助用户。引入生成图神经网络,从对象排列的时间序列中学习统一的对象动态时空预测模型。我们还提供了来自日常生活(HOMER)数据集的家庭物体运动,该数据集跟踪了五个模拟家庭50多天内与人类日常生活活动相关的家庭物体。我们的模型在预测物体运动方面优于领先的基线,正确预测11.1%的物体的位置,错误预测11.5%的人类用户使用的物体的位置。
{"title":"Proactive Robot Assistance via Spatio-Temporal Object Modeling","authors":"Maithili Patel, S. Chernova","doi":"10.48550/arXiv.2211.15501","DOIUrl":"https://doi.org/10.48550/arXiv.2211.15501","url":null,"abstract":"Proactive robot assistance enables a robot to anticipate and provide for a user's needs without being explicitly asked. We formulate proactive assistance as the problem of the robot anticipating temporal patterns of object movements associated with everyday user routines, and proactively assisting the user by placing objects to adapt the environment to their needs. We introduce a generative graph neural network to learn a unified spatio-temporal predictive model of object dynamics from temporal sequences of object arrangements. We additionally contribute the Household Object Movements from Everyday Routines (HOMER) dataset, which tracks household objects associated with human activities of daily living across 50+ days for five simulated households. Our model outperforms the leading baseline in predicting object movement, correctly predicting locations for 11.1% more objects and wrongly predicting locations for 11.5% fewer objects used by the human user.","PeriodicalId":273870,"journal":{"name":"Conference on Robot Learning","volume":"519 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134214303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Learning Bimanual Scooping Policies for Food Acquisition 学习双手舀取食物的策略
Pub Date : 2022-11-26 DOI: 10.48550/arXiv.2211.14652
J. Grannen, Yilin Wu, Suneel Belkhale, Dorsa Sadigh
A robotic feeding system must be able to acquire a variety of foods. Prior bite acquisition works consider single-arm spoon scooping or fork skewering, which do not generalize to foods with complex geometries and deformabilities. For example, when acquiring a group of peas, skewering could smoosh the peas while scooping without a barrier could result in chasing the peas on the plate. In order to acquire foods with such diverse properties, we propose stabilizing food items during scooping using a second arm, for example, by pushing peas against the spoon with a flat surface to prevent dispersion. The added stabilizing arm can lead to new challenges. Critically, this arm should stabilize the food scene without interfering with the acquisition motion, which is especially difficult for easily breakable high-risk food items like tofu. These high-risk foods can break between the pusher and spoon during scooping, which can lead to food waste falling out of the spoon. We propose a general bimanual scooping primitive and an adaptive stabilization strategy that enables successful acquisition of a diverse set of food geometries and physical properties. Our approach, CARBS: Coordinated Acquisition with Reactive Bimanual Scooping, learns to stabilize without impeding task progress by identifying high-risk foods and robustly scooping them using closed-loop visual feedback. We find that CARBS is able to generalize across food shape, size, and deformability and is additionally able to manipulate multiple food items simultaneously. CARBS achieves 87.0% success on scooping rigid foods, which is 25.8% more successful than a single-arm baseline, and reduces food breakage by 16.2% compared to an analytical baseline. Videos can be found at https://sites.google.com/view/bimanualscoop-corl22/home .
机器人喂食系统必须能够获取各种食物。例如,当获得一组豌豆时,串可能会滑倒豌豆,而没有障碍的舀可能会导致追逐盘子上的豌豆。为了获得具有如此多样化特性的食物,我们建议在舀食物时使用第二只手臂来稳定食物,例如,通过将豌豆推到具有平坦表面的勺子上以防止分散。增加的稳定臂可能会带来新的挑战。至关重要的是,这个手臂应该在不干扰获取运动的情况下稳定食物场景,这对于像豆腐这样易碎的高风险食物来说尤其困难。这些高风险食物在舀食物时可能会在勺子和勺子之间断裂,导致食物垃圾从勺子里掉出来。我们提出了一个通用的双手舀原语和一个自适应稳定策略,能够成功地获取多种食物的几何形状和物理特性。我们的方法,碳水化合物:反应性双手舀取的协调获取,通过识别高风险食物和使用闭环视觉反馈稳健地舀取它们,学会在不妨碍任务进度的情况下稳定下来。我们发现碳水化合物能够概括食物的形状、大小和可变形性,并且能够同时操纵多种食物。碳水化合物在舀硬食物上的成功率为87.0%,比单臂基线高出25.8%,与分析基线相比,减少了16.2%的食物破损。视频可以在https://sites.google.com/view/bimanualscoop-corl22/home上找到。
{"title":"Learning Bimanual Scooping Policies for Food Acquisition","authors":"J. Grannen, Yilin Wu, Suneel Belkhale, Dorsa Sadigh","doi":"10.48550/arXiv.2211.14652","DOIUrl":"https://doi.org/10.48550/arXiv.2211.14652","url":null,"abstract":"A robotic feeding system must be able to acquire a variety of foods. Prior bite acquisition works consider single-arm spoon scooping or fork skewering, which do not generalize to foods with complex geometries and deformabilities. For example, when acquiring a group of peas, skewering could smoosh the peas while scooping without a barrier could result in chasing the peas on the plate. In order to acquire foods with such diverse properties, we propose stabilizing food items during scooping using a second arm, for example, by pushing peas against the spoon with a flat surface to prevent dispersion. The added stabilizing arm can lead to new challenges. Critically, this arm should stabilize the food scene without interfering with the acquisition motion, which is especially difficult for easily breakable high-risk food items like tofu. These high-risk foods can break between the pusher and spoon during scooping, which can lead to food waste falling out of the spoon. We propose a general bimanual scooping primitive and an adaptive stabilization strategy that enables successful acquisition of a diverse set of food geometries and physical properties. Our approach, CARBS: Coordinated Acquisition with Reactive Bimanual Scooping, learns to stabilize without impeding task progress by identifying high-risk foods and robustly scooping them using closed-loop visual feedback. We find that CARBS is able to generalize across food shape, size, and deformability and is additionally able to manipulate multiple food items simultaneously. CARBS achieves 87.0% success on scooping rigid foods, which is 25.8% more successful than a single-arm baseline, and reduces food breakage by 16.2% compared to an analytical baseline. Videos can be found at https://sites.google.com/view/bimanualscoop-corl22/home .","PeriodicalId":273870,"journal":{"name":"Conference on Robot Learning","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121081586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
期刊
Conference on Robot Learning
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1