首页 > 最新文献

Conference on Robot Learning最新文献

英文 中文
Learning Model Predictive Controllers with Real-Time Attention for Real-World Navigation 现实世界导航中具有实时关注的学习模型预测控制器
Pub Date : 2022-09-22 DOI: 10.48550/arXiv.2209.10780
Xuesu Xiao, Tingnan Zhang, K. Choromanski, Edward Lee, Anthony Francis, Jacob Varley, Stephen Tu, Sumeet Singh, Peng Xu, Fei Xia, S. M. Persson, Dmitry Kalashnikov, L. Takayama, Roy Frostig, Jie Tan, Carolina Parada, Vikas Sindhwani
Despite decades of research, existing navigation systems still face real-world challenges when deployed in the wild, e.g., in cluttered home environments or in human-occupied public spaces. To address this, we present a new class of implicit control policies combining the benefits of imitation learning with the robust handling of system constraints from Model Predictive Control (MPC). Our approach, called Performer-MPC, uses a learned cost function parameterized by vision context embeddings provided by Performers -- a low-rank implicit-attention Transformer. We jointly train the cost function and construct the controller relying on it, effectively solving end-to-end the corresponding bi-level optimization problem. We show that the resulting policy improves standard MPC performance by leveraging a few expert demonstrations of the desired navigation behavior in different challenging real-world scenarios. Compared with a standard MPC policy, Performer-MPC achieves>40% better goal reached in cluttered environments and>65% better on social metrics when navigating around humans.
尽管经过数十年的研究,现有的导航系统在野外部署时仍然面临着现实世界的挑战,例如,在混乱的家庭环境或人类占据的公共空间中。为了解决这个问题,我们提出了一类新的隐式控制策略,将模仿学习的优点与模型预测控制(MPC)对系统约束的鲁棒处理相结合。我们的方法,称为Performer-MPC,使用由performer提供的视觉上下文嵌入参数化的学习成本函数——一个低级别隐式注意力转换器。我们共同训练代价函数并以此为基础构造控制器,有效地解决了端到端相应的双层优化问题。通过在不同具有挑战性的现实场景中利用一些专家演示所需的导航行为,我们证明了所得到的策略提高了标准MPC性能。与标准MPC策略相比,Performer-MPC在混乱环境中的目标达到了40%以上,在人类周围导航时的社交指标达到了65%以上。
{"title":"Learning Model Predictive Controllers with Real-Time Attention for Real-World Navigation","authors":"Xuesu Xiao, Tingnan Zhang, K. Choromanski, Edward Lee, Anthony Francis, Jacob Varley, Stephen Tu, Sumeet Singh, Peng Xu, Fei Xia, S. M. Persson, Dmitry Kalashnikov, L. Takayama, Roy Frostig, Jie Tan, Carolina Parada, Vikas Sindhwani","doi":"10.48550/arXiv.2209.10780","DOIUrl":"https://doi.org/10.48550/arXiv.2209.10780","url":null,"abstract":"Despite decades of research, existing navigation systems still face real-world challenges when deployed in the wild, e.g., in cluttered home environments or in human-occupied public spaces. To address this, we present a new class of implicit control policies combining the benefits of imitation learning with the robust handling of system constraints from Model Predictive Control (MPC). Our approach, called Performer-MPC, uses a learned cost function parameterized by vision context embeddings provided by Performers -- a low-rank implicit-attention Transformer. We jointly train the cost function and construct the controller relying on it, effectively solving end-to-end the corresponding bi-level optimization problem. We show that the resulting policy improves standard MPC performance by leveraging a few expert demonstrations of the desired navigation behavior in different challenging real-world scenarios. Compared with a standard MPC policy, Performer-MPC achieves>40% better goal reached in cluttered environments and>65% better on social metrics when navigating around humans.","PeriodicalId":273870,"journal":{"name":"Conference on Robot Learning","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129682726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
Latent Plans for Task-Agnostic Offline Reinforcement Learning 任务不可知离线强化学习的潜在计划
Pub Date : 2022-09-19 DOI: 10.48550/arXiv.2209.08959
Erick Rosete-Beas, Oier Mees, Gabriel Kalweit, J. Boedecker, Wolfram Burgard
Everyday tasks of long-horizon and comprising a sequence of multiple implicit subtasks still impose a major challenge in offline robot control. While a number of prior methods aimed to address this setting with variants of imitation and offline reinforcement learning, the learned behavior is typically narrow and often struggles to reach configurable long-horizon goals. As both paradigms have complementary strengths and weaknesses, we propose a novel hierarchical approach that combines the strengths of both methods to learn task-agnostic long-horizon policies from high-dimensional camera observations. Concretely, we combine a low-level policy that learns latent skills via imitation learning and a high-level policy learned from offline reinforcement learning for skill-chaining the latent behavior priors. Experiments in various simulated and real robot control tasks show that our formulation enables producing previously unseen combinations of skills to reach temporally extended goals by"stitching"together latent skills through goal chaining with an order-of-magnitude improvement in performance upon state-of-the-art baselines. We even learn one multi-task visuomotor policy for 25 distinct manipulation tasks in the real world which outperforms both imitation learning and offline reinforcement learning techniques.
长视界的日常任务和由多个隐式子任务组成的序列仍然是离线机器人控制的主要挑战。虽然许多先前的方法旨在通过模仿和离线强化学习的变体来解决这种设置,但学习的行为通常是狭窄的,并且经常难以达到可配置的长期目标。由于这两种范式具有互补的优势和劣势,我们提出了一种新的分层方法,结合这两种方法的优势,从高维相机观察中学习任务不可知论的长视界策略。具体而言,我们结合了通过模仿学习学习潜在技能的低级策略和通过离线强化学习学习潜在行为先验的高级策略。在各种模拟和真实机器人控制任务中的实验表明,我们的公式能够通过目标链将潜在技能“拼接”在一起,从而产生以前看不见的技能组合,从而达到暂时扩展的目标,并在最先进的基线上实现数量级的性能改进。我们甚至在现实世界中为25个不同的操作任务学习了一个多任务视觉运动策略,它优于模仿学习和离线强化学习技术。
{"title":"Latent Plans for Task-Agnostic Offline Reinforcement Learning","authors":"Erick Rosete-Beas, Oier Mees, Gabriel Kalweit, J. Boedecker, Wolfram Burgard","doi":"10.48550/arXiv.2209.08959","DOIUrl":"https://doi.org/10.48550/arXiv.2209.08959","url":null,"abstract":"Everyday tasks of long-horizon and comprising a sequence of multiple implicit subtasks still impose a major challenge in offline robot control. While a number of prior methods aimed to address this setting with variants of imitation and offline reinforcement learning, the learned behavior is typically narrow and often struggles to reach configurable long-horizon goals. As both paradigms have complementary strengths and weaknesses, we propose a novel hierarchical approach that combines the strengths of both methods to learn task-agnostic long-horizon policies from high-dimensional camera observations. Concretely, we combine a low-level policy that learns latent skills via imitation learning and a high-level policy learned from offline reinforcement learning for skill-chaining the latent behavior priors. Experiments in various simulated and real robot control tasks show that our formulation enables producing previously unseen combinations of skills to reach temporally extended goals by\"stitching\"together latent skills through goal chaining with an order-of-magnitude improvement in performance upon state-of-the-art baselines. We even learn one multi-task visuomotor policy for 25 distinct manipulation tasks in the real world which outperforms both imitation learning and offline reinforcement learning techniques.","PeriodicalId":273870,"journal":{"name":"Conference on Robot Learning","volume":"115 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127616245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
One-Shot Transfer of Affordance Regions? AffCorrs! 一次性转移功能区?AffCorrs !
Pub Date : 2022-09-15 DOI: 10.48550/arXiv.2209.07147
Denis Hadjivelichkov, Sicelukwanda Zwane, M. Deisenroth, L. Agapito, D. Kanoulas
In this work, we tackle one-shot visual search of object parts. Given a single reference image of an object with annotated affordance regions, we segment semantically corresponding parts within a target scene. We propose AffCorrs, an unsupervised model that combines the properties of pre-trained DINO-ViT's image descriptors and cyclic correspondences. We use AffCorrs to find corresponding affordances both for intra- and inter-class one-shot part segmentation. This task is more difficult than supervised alternatives, but enables future work such as learning affordances via imitation and assisted teleoperation.
在这项工作中,我们解决了物体部分的一次性视觉搜索。给定带有注释的功能区的对象的单个参考图像,我们在目标场景中分割语义上对应的部分。我们提出AffCorrs,一种结合了预训练DINO-ViT图像描述符和循环对应属性的无监督模型。我们使用AffCorrs为类内和类间的一次性零件分割找到相应的启示。这项任务比有监督的替代方案更困难,但可以通过模仿和辅助远程操作来实现未来的工作。
{"title":"One-Shot Transfer of Affordance Regions? AffCorrs!","authors":"Denis Hadjivelichkov, Sicelukwanda Zwane, M. Deisenroth, L. Agapito, D. Kanoulas","doi":"10.48550/arXiv.2209.07147","DOIUrl":"https://doi.org/10.48550/arXiv.2209.07147","url":null,"abstract":"In this work, we tackle one-shot visual search of object parts. Given a single reference image of an object with annotated affordance regions, we segment semantically corresponding parts within a target scene. We propose AffCorrs, an unsupervised model that combines the properties of pre-trained DINO-ViT's image descriptors and cyclic correspondences. We use AffCorrs to find corresponding affordances both for intra- and inter-class one-shot part segmentation. This task is more difficult than supervised alternatives, but enables future work such as learning affordances via imitation and assisted teleoperation.","PeriodicalId":273870,"journal":{"name":"Conference on Robot Learning","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127766012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Proactive slip control by learned slip model and trajectory adaptation 基于学习滑移模型和轨迹自适应的主动滑移控制
Pub Date : 2022-09-13 DOI: 10.48550/arXiv.2209.06019
Kiyanoush Nazari, Willow Mandil, E. AmirGhalamzan
This paper presents a novel control approach to dealing with object slip during robotic manipulative movements. Slip is a major cause of failure in many robotic grasping and manipulation tasks. Existing works increase grip force to avoid/control slip. However, this may not be feasible when (i) the robot cannot increase the gripping force -- the max gripping force is already applied or (ii) increased force damages the grasped object, such as soft fruit. Moreover, the robot fixes the gripping force when it forms a stable grasp on the surface of an object, and changing the gripping force during real-time manipulation may not be an effective control policy. We propose a novel control approach to slip avoidance including a learned action-conditioned slip predictor and a constrained optimiser avoiding a predicted slip given a desired robot action. We show the effectiveness of the proposed trajectory adaptation method with receding horizon controller with a series of real-robot test cases. Our experimental results show our proposed data-driven predictive controller can control slip for objects unseen in training.
提出了一种新的控制方法来处理机器人操纵运动中的物体滑移问题。滑移是许多机器人抓取和操作任务失败的主要原因。现有工程增加抓地力,以避免/控制打滑。然而,当(i)机器人不能增加夹持力——最大夹持力已经施加,或者(ii)增加的力损坏了被抓取的物体,比如软水果时,这可能是不可行的。此外,机器人在物体表面形成稳定抓握时固定了抓握力,在实时操作过程中改变抓握力可能不是有效的控制策略。我们提出了一种新的防滑控制方法,包括一个学习的动作条件滑移预测器和一个约束优化器,以避免给定期望机器人动作的预测滑移。通过一系列实际机器人测试案例,验证了该方法的有效性。实验结果表明,所提出的数据驱动预测控制器可以有效地控制训练中未见对象的滑移。
{"title":"Proactive slip control by learned slip model and trajectory adaptation","authors":"Kiyanoush Nazari, Willow Mandil, E. AmirGhalamzan","doi":"10.48550/arXiv.2209.06019","DOIUrl":"https://doi.org/10.48550/arXiv.2209.06019","url":null,"abstract":"This paper presents a novel control approach to dealing with object slip during robotic manipulative movements. Slip is a major cause of failure in many robotic grasping and manipulation tasks. Existing works increase grip force to avoid/control slip. However, this may not be feasible when (i) the robot cannot increase the gripping force -- the max gripping force is already applied or (ii) increased force damages the grasped object, such as soft fruit. Moreover, the robot fixes the gripping force when it forms a stable grasp on the surface of an object, and changing the gripping force during real-time manipulation may not be an effective control policy. We propose a novel control approach to slip avoidance including a learned action-conditioned slip predictor and a constrained optimiser avoiding a predicted slip given a desired robot action. We show the effectiveness of the proposed trajectory adaptation method with receding horizon controller with a series of real-robot test cases. Our experimental results show our proposed data-driven predictive controller can control slip for objects unseen in training.","PeriodicalId":273870,"journal":{"name":"Conference on Robot Learning","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115878255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
GenLoco: Generalized Locomotion Controllers for Quadrupedal Robots GenLoco:四足机器人通用运动控制器
Pub Date : 2022-09-12 DOI: 10.48550/arXiv.2209.05309
Gilbert Feng, Hongbo Zhang, Zhongyu Li, X. B. Peng, Bhuvan Basireddy, Linzhu Yue, Zhitao Song, Lizhi Yang, Yunhui Liu, K. Sreenath, S. Levine
Recent years have seen a surge in commercially-available and affordable quadrupedal robots, with many of these platforms being actively used in research and industry. As the availability of legged robots grows, so does the need for controllers that enable these robots to perform useful skills. However, most learning-based frameworks for controller development focus on training robot-specific controllers, a process that needs to be repeated for every new robot. In this work, we introduce a framework for training generalized locomotion (GenLoco) controllers for quadrupedal robots. Our framework synthesizes general-purpose locomotion controllers that can be deployed on a large variety of quadrupedal robots with similar morphologies. We present a simple but effective morphology randomization method that procedurally generates a diverse set of simulated robots for training. We show that by training a controller on this large set of simulated robots, our models acquire more general control strategies that can be directly transferred to novel simulated and real-world robots with diverse morphologies, which were not observed during training.
近年来,商业上可用且价格合理的四足机器人激增,其中许多平台被积极用于研究和工业。随着有腿机器人越来越多,对控制器的需求也越来越大,控制器可以使这些机器人执行有用的技能。然而,大多数基于学习的控制器开发框架侧重于训练特定于机器人的控制器,这一过程需要为每个新机器人重复。在这项工作中,我们介绍了一个训练四足机器人广义运动(GenLoco)控制器的框架。我们的框架综合了通用运动控制器,可以部署在各种形态相似的四足机器人上。我们提出了一种简单而有效的形态学随机化方法,该方法可以程序化地生成一组不同的模拟机器人用于训练。我们表明,通过在这一大型模拟机器人集上训练控制器,我们的模型获得了更通用的控制策略,可以直接转移到具有不同形态的新型模拟和现实世界机器人上,这些机器人在训练期间没有观察到。
{"title":"GenLoco: Generalized Locomotion Controllers for Quadrupedal Robots","authors":"Gilbert Feng, Hongbo Zhang, Zhongyu Li, X. B. Peng, Bhuvan Basireddy, Linzhu Yue, Zhitao Song, Lizhi Yang, Yunhui Liu, K. Sreenath, S. Levine","doi":"10.48550/arXiv.2209.05309","DOIUrl":"https://doi.org/10.48550/arXiv.2209.05309","url":null,"abstract":"Recent years have seen a surge in commercially-available and affordable quadrupedal robots, with many of these platforms being actively used in research and industry. As the availability of legged robots grows, so does the need for controllers that enable these robots to perform useful skills. However, most learning-based frameworks for controller development focus on training robot-specific controllers, a process that needs to be repeated for every new robot. In this work, we introduce a framework for training generalized locomotion (GenLoco) controllers for quadrupedal robots. Our framework synthesizes general-purpose locomotion controllers that can be deployed on a large variety of quadrupedal robots with similar morphologies. We present a simple but effective morphology randomization method that procedurally generates a diverse set of simulated robots for training. We show that by training a controller on this large set of simulated robots, our models acquire more general control strategies that can be directly transferred to novel simulated and real-world robots with diverse morphologies, which were not observed during training.","PeriodicalId":273870,"journal":{"name":"Conference on Robot Learning","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128364964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation 感知者-行动者:机器人操作的多任务转换器
Pub Date : 2022-09-12 DOI: 10.48550/arXiv.2209.05451
Mohit Shridhar, Lucas Manuelli, D. Fox
Transformers have revolutionized vision and natural language processing with their ability to scale with large datasets. But in robotic manipulation, data is both limited and expensive. Can manipulation still benefit from Transformers with the right problem formulation? We investigate this question with PerAct, a language-conditioned behavior-cloning agent for multi-task 6-DoF manipulation. PerAct encodes language goals and RGB-D voxel observations with a Perceiver Transformer, and outputs discretized actions by ``detecting the next best voxel action''. Unlike frameworks that operate on 2D images, the voxelized 3D observation and action space provides a strong structural prior for efficiently learning 6-DoF actions. With this formulation, we train a single multi-task Transformer for 18 RLBench tasks (with 249 variations) and 7 real-world tasks (with 18 variations) from just a few demonstrations per task. Our results show that PerAct significantly outperforms unstructured image-to-action agents and 3D ConvNet baselines for a wide range of tabletop tasks.
变形金刚已经彻底改变了视觉和自然语言处理,因为它们具有大规模数据集的能力。但在机器人操作中,数据既有限又昂贵。通过正确的问题表述,操作还能从变形金刚中获益吗?我们用PerAct来研究这个问题,PerAct是一个用于多任务6自由度操作的语言条件行为克隆代理。PerAct使用感知转换器对语言目标和RGB-D体素观察进行编码,并通过“检测下一个最佳体素动作”输出离散动作。与在2D图像上操作的框架不同,体素化的3D观察和动作空间为有效学习6-DoF动作提供了强大的结构先验。有了这个公式,我们为18个RLBench任务(有249个变化)和7个现实世界的任务(有18个变化)训练了一个多任务Transformer,每个任务只有几个演示。我们的结果表明,PerAct在广泛的桌面任务中显著优于非结构化的图像到动作代理和3D ConvNet基线。
{"title":"Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation","authors":"Mohit Shridhar, Lucas Manuelli, D. Fox","doi":"10.48550/arXiv.2209.05451","DOIUrl":"https://doi.org/10.48550/arXiv.2209.05451","url":null,"abstract":"Transformers have revolutionized vision and natural language processing with their ability to scale with large datasets. But in robotic manipulation, data is both limited and expensive. Can manipulation still benefit from Transformers with the right problem formulation? We investigate this question with PerAct, a language-conditioned behavior-cloning agent for multi-task 6-DoF manipulation. PerAct encodes language goals and RGB-D voxel observations with a Perceiver Transformer, and outputs discretized actions by ``detecting the next best voxel action''. Unlike frameworks that operate on 2D images, the voxelized 3D observation and action space provides a strong structural prior for efficiently learning 6-DoF actions. With this formulation, we train a single multi-task Transformer for 18 RLBench tasks (with 249 variations) and 7 real-world tasks (with 18 variations) from just a few demonstrations per task. Our results show that PerAct significantly outperforms unstructured image-to-action agents and 3D ConvNet baselines for a wide range of tabletop tasks.","PeriodicalId":273870,"journal":{"name":"Conference on Robot Learning","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123716722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 163
Learning Dense Visual Descriptors using Image Augmentations for Robot Manipulation Tasks 使用图像增强学习机器人操作任务的密集视觉描述符
Pub Date : 2022-09-12 DOI: 10.48550/arXiv.2209.05213
Christian Graf, David B. Adrian, Joshua Weil, Miroslav Gabriel, Philipp Schillinger, Markus Spies, H. Neumann, A. Kupcsik
We propose a self-supervised training approach for learning view-invariant dense visual descriptors using image augmentations. Unlike existing works, which often require complex datasets, such as registered RGBD sequences, we train on an unordered set of RGB images. This allows for learning from a single camera view, e.g., in an existing robotic cell with a fix-mounted camera. We create synthetic views and dense pixel correspondences using data augmentations. We find our descriptors are competitive to the existing methods, despite the simpler data recording and setup requirements. We show that training on synthetic correspondences provides descriptor consistency across a broad range of camera views. We compare against training with geometric correspondence from multiple views and provide ablation studies. We also show a robotic bin-picking experiment using descriptors learned from a fix-mounted camera for defining grasp preferences.
我们提出了一种使用图像增强来学习视图不变密集视觉描述符的自监督训练方法。现有的工作通常需要复杂的数据集,如注册的RGBD序列,与此不同,我们在无序的RGB图像集上进行训练。这允许从单个摄像机视图进行学习,例如,在现有的机器人单元中安装固定安装的摄像机。我们使用数据增强创建合成视图和密集像素对应。我们发现我们的描述符与现有方法相比具有竞争力,尽管数据记录和设置要求更简单。我们表明,对合成对应的训练在广泛的相机视图范围内提供描述符一致性。我们从多个角度比较几何对应的训练,并提供消融研究。我们还展示了一个机器人拾取垃圾箱的实验,使用从固定安装的相机学习的描述符来定义抓取偏好。
{"title":"Learning Dense Visual Descriptors using Image Augmentations for Robot Manipulation Tasks","authors":"Christian Graf, David B. Adrian, Joshua Weil, Miroslav Gabriel, Philipp Schillinger, Markus Spies, H. Neumann, A. Kupcsik","doi":"10.48550/arXiv.2209.05213","DOIUrl":"https://doi.org/10.48550/arXiv.2209.05213","url":null,"abstract":"We propose a self-supervised training approach for learning view-invariant dense visual descriptors using image augmentations. Unlike existing works, which often require complex datasets, such as registered RGBD sequences, we train on an unordered set of RGB images. This allows for learning from a single camera view, e.g., in an existing robotic cell with a fix-mounted camera. We create synthetic views and dense pixel correspondences using data augmentations. We find our descriptors are competitive to the existing methods, despite the simpler data recording and setup requirements. We show that training on synthetic correspondences provides descriptor consistency across a broad range of camera views. We compare against training with geometric correspondence from multiple views and provide ablation studies. We also show a robotic bin-picking experiment using descriptors learned from a fix-mounted camera for defining grasp preferences.","PeriodicalId":273870,"journal":{"name":"Conference on Robot Learning","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124297040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Instruction-driven history-aware policies for robotic manipulations 指令驱动的机器人操作历史感知策略
Pub Date : 2022-09-11 DOI: 10.48550/arXiv.2209.04899
Pierre-Louis Guhur, Shizhe Chen, Ricardo Garcia Pinel, Makarand Tapaswi, I. Laptev, C. Schmid
In human environments, robots are expected to accomplish a variety of manipulation tasks given simple natural language instructions. Yet, robotic manipulation is extremely challenging as it requires fine-grained motor control, long-term memory as well as generalization to previously unseen tasks and environments. To address these challenges, we propose a unified transformer-based approach that takes into account multiple inputs. In particular, our transformer architecture integrates (i) natural language instructions and (ii) multi-view scene observations while (iii) keeping track of the full history of observations and actions. Such an approach enables learning dependencies between history and instructions and improves manipulation precision using multiple views. We evaluate our method on the challenging RLBench benchmark and on a real-world robot. Notably, our approach scales to 74 diverse RLBench tasks and outperforms the state of the art. We also address instruction-conditioned tasks and demonstrate excellent generalization to previously unseen variations.
在人类环境中,机器人被期望在给出简单的自然语言指令的情况下完成各种操作任务。然而,机器人操作是极具挑战性的,因为它需要精细的运动控制,长期记忆以及对以前看不见的任务和环境的概括。为了应对这些挑战,我们提出了一种统一的基于变压器的方法,该方法考虑了多个输入。特别是,我们的变压器架构集成了(i)自然语言指令和(ii)多视图场景观察,同时(iii)跟踪观察和动作的完整历史。这种方法可以学习历史和指令之间的依赖关系,并使用多个视图提高操作精度。我们在具有挑战性的RLBench基准测试和现实世界的机器人上评估了我们的方法。值得注意的是,我们的方法可以扩展到74个不同的RLBench任务,并且优于目前的技术水平。我们还解决了指令条件任务,并展示了对以前未见过的变化的出色泛化。
{"title":"Instruction-driven history-aware policies for robotic manipulations","authors":"Pierre-Louis Guhur, Shizhe Chen, Ricardo Garcia Pinel, Makarand Tapaswi, I. Laptev, C. Schmid","doi":"10.48550/arXiv.2209.04899","DOIUrl":"https://doi.org/10.48550/arXiv.2209.04899","url":null,"abstract":"In human environments, robots are expected to accomplish a variety of manipulation tasks given simple natural language instructions. Yet, robotic manipulation is extremely challenging as it requires fine-grained motor control, long-term memory as well as generalization to previously unseen tasks and environments. To address these challenges, we propose a unified transformer-based approach that takes into account multiple inputs. In particular, our transformer architecture integrates (i) natural language instructions and (ii) multi-view scene observations while (iii) keeping track of the full history of observations and actions. Such an approach enables learning dependencies between history and instructions and improves manipulation precision using multiple views. We evaluate our method on the challenging RLBench benchmark and on a real-world robot. Notably, our approach scales to 74 diverse RLBench tasks and outperforms the state of the art. We also address instruction-conditioned tasks and demonstrate excellent generalization to previously unseen variations.","PeriodicalId":273870,"journal":{"name":"Conference on Robot Learning","volume":"105 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116087940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 40
Robust Trajectory Prediction against Adversarial Attacks 针对对抗性攻击的稳健轨迹预测
Pub Date : 2022-07-29 DOI: 10.48550/arXiv.2208.00094
Yulong Cao, Danfei Xu, Xinshuo Weng, Z. Mao, Anima Anandkumar, Chaowei Xiao, M. Pavone
Trajectory prediction using deep neural networks (DNNs) is an essential component of autonomous driving (AD) systems. However, these methods are vulnerable to adversarial attacks, leading to serious consequences such as collisions. In this work, we identify two key ingredients to defend trajectory prediction models against adversarial attacks including (1) designing effective adversarial training methods and (2) adding domain-specific data augmentation to mitigate the performance degradation on clean data. We demonstrate that our method is able to improve the performance by 46% on adversarial data and at the cost of only 3% performance degradation on clean data, compared to the model trained with clean data. Additionally, compared to existing robust methods, our method can improve performance by 21% on adversarial examples and 9% on clean data. Our robust model is evaluated with a planner to study its downstream impacts. We demonstrate that our model can significantly reduce the severe accident rates (e.g., collisions and off-road driving).
利用深度神经网络(dnn)进行轨迹预测是自动驾驶系统的重要组成部分。然而,这些方法容易受到对抗性攻击,导致碰撞等严重后果。在这项工作中,我们确定了保护轨迹预测模型免受对抗性攻击的两个关键要素,包括(1)设计有效的对抗性训练方法和(2)添加特定领域的数据增强以减轻干净数据上的性能下降。我们证明,与使用干净数据训练的模型相比,我们的方法能够在对抗数据上提高46%的性能,而在干净数据上仅降低3%的性能。此外,与现有的鲁棒性方法相比,我们的方法在对抗样本上的性能提高了21%,在干净数据上的性能提高了9%。我们的稳健模型是评估与计划,以研究其下游影响。我们证明,我们的模型可以显著降低严重事故率(例如,碰撞和越野驾驶)。
{"title":"Robust Trajectory Prediction against Adversarial Attacks","authors":"Yulong Cao, Danfei Xu, Xinshuo Weng, Z. Mao, Anima Anandkumar, Chaowei Xiao, M. Pavone","doi":"10.48550/arXiv.2208.00094","DOIUrl":"https://doi.org/10.48550/arXiv.2208.00094","url":null,"abstract":"Trajectory prediction using deep neural networks (DNNs) is an essential component of autonomous driving (AD) systems. However, these methods are vulnerable to adversarial attacks, leading to serious consequences such as collisions. In this work, we identify two key ingredients to defend trajectory prediction models against adversarial attacks including (1) designing effective adversarial training methods and (2) adding domain-specific data augmentation to mitigate the performance degradation on clean data. We demonstrate that our method is able to improve the performance by 46% on adversarial data and at the cost of only 3% performance degradation on clean data, compared to the model trained with clean data. Additionally, compared to existing robust methods, our method can improve performance by 21% on adversarial examples and 9% on clean data. Our robust model is evaluated with a planner to study its downstream impacts. We demonstrate that our model can significantly reduce the severe accident rates (e.g., collisions and off-road driving).","PeriodicalId":273870,"journal":{"name":"Conference on Robot Learning","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130308065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Safety-Enhanced Autonomous Driving Using Interpretable Sensor Fusion Transformer 使用可解释传感器融合变压器的安全增强自动驾驶
Pub Date : 2022-07-28 DOI: 10.48550/arXiv.2207.14024
Hao Shao, Letian Wang, Ruobing Chen, Hongsheng Li, Y. Liu
Large-scale deployment of autonomous vehicles has been continually delayed due to safety concerns. On the one hand, comprehensive scene understanding is indispensable, a lack of which would result in vulnerability to rare but complex traffic situations, such as the sudden emergence of unknown objects. However, reasoning from a global context requires access to sensors of multiple types and adequate fusion of multi-modal sensor signals, which is difficult to achieve. On the other hand, the lack of interpretability in learning models also hampers the safety with unverifiable failure causes. In this paper, we propose a safety-enhanced autonomous driving framework, named Interpretable Sensor Fusion Transformer(InterFuser), to fully process and fuse information from multi-modal multi-view sensors for achieving comprehensive scene understanding and adversarial event detection. Besides, intermediate interpretable features are generated from our framework, which provide more semantics and are exploited to better constrain actions to be within the safe sets. We conducted extensive experiments on CARLA benchmarks, where our model outperforms prior methods, ranking the first on the public CARLA Leaderboard. Our code will be made available at https://github.com/opendilab/InterFuser
出于安全考虑,自动驾驶汽车的大规模部署一直被推迟。一方面,全面的场景理解是必不可少的,缺乏全面的场景理解会导致在面对罕见但复杂的交通情况时变得脆弱,比如突然出现未知物体。然而,从全局角度进行推理需要使用多种类型的传感器,并充分融合多模态传感器信号,这很难实现。另一方面,由于学习模型缺乏可解释性,导致故障原因无法验证,影响了安全性。在本文中,我们提出了一个安全增强的自动驾驶框架,称为可解释传感器融合变压器(interuser),以充分处理和融合来自多模态多视图传感器的信息,以实现全面的场景理解和对抗事件检测。此外,从我们的框架中生成了中间可解释的特征,这些特征提供了更多的语义,并被用于更好地将操作约束在安全集中。我们在CARLA基准上进行了广泛的实验,我们的模型优于先前的方法,在公开的CARLA排行榜上排名第一。我们的代码将在https://github.com/opendilab/InterFuser上提供
{"title":"Safety-Enhanced Autonomous Driving Using Interpretable Sensor Fusion Transformer","authors":"Hao Shao, Letian Wang, Ruobing Chen, Hongsheng Li, Y. Liu","doi":"10.48550/arXiv.2207.14024","DOIUrl":"https://doi.org/10.48550/arXiv.2207.14024","url":null,"abstract":"Large-scale deployment of autonomous vehicles has been continually delayed due to safety concerns. On the one hand, comprehensive scene understanding is indispensable, a lack of which would result in vulnerability to rare but complex traffic situations, such as the sudden emergence of unknown objects. However, reasoning from a global context requires access to sensors of multiple types and adequate fusion of multi-modal sensor signals, which is difficult to achieve. On the other hand, the lack of interpretability in learning models also hampers the safety with unverifiable failure causes. In this paper, we propose a safety-enhanced autonomous driving framework, named Interpretable Sensor Fusion Transformer(InterFuser), to fully process and fuse information from multi-modal multi-view sensors for achieving comprehensive scene understanding and adversarial event detection. Besides, intermediate interpretable features are generated from our framework, which provide more semantics and are exploited to better constrain actions to be within the safe sets. We conducted extensive experiments on CARLA benchmarks, where our model outperforms prior methods, ranking the first on the public CARLA Leaderboard. Our code will be made available at https://github.com/opendilab/InterFuser","PeriodicalId":273870,"journal":{"name":"Conference on Robot Learning","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129011450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 43
期刊
Conference on Robot Learning
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1