首页 > 最新文献

Conference on Robot Learning最新文献

英文 中文
Particle-Based Score Estimation for State Space Model Learning in Autonomous Driving 基于粒子的自动驾驶状态空间模型学习分数估计
Pub Date : 2022-12-14 DOI: 10.48550/arXiv.2212.06968
Angad Singh, Omar Makhlouf, Maximilian Igl, J. Messias, A. Doucet, Shimon Whiteson
Multi-object state estimation is a fundamental problem for robotic applications where a robot must interact with other moving objects. Typically, other objects' relevant state features are not directly observable, and must instead be inferred from observations. Particle filtering can perform such inference given approximate transition and observation models. However, these models are often unknown a priori, yielding a difficult parameter estimation problem since observations jointly carry transition and observation noise. In this work, we consider learning maximum-likelihood parameters using particle methods. Recent methods addressing this problem typically differentiate through time in a particle filter, which requires workarounds to the non-differentiable resampling step, that yield biased or high variance gradient estimates. By contrast, we exploit Fisher's identity to obtain a particle-based approximation of the score function (the gradient of the log likelihood) that yields a low variance estimate while only requiring stepwise differentiation through the transition and observation models. We apply our method to real data collected from autonomous vehicles (AVs) and show that it learns better models than existing techniques and is more stable in training, yielding an effective smoother for tracking the trajectories of vehicles around an AV.
多目标状态估计是机器人应用中的一个基本问题,因为机器人必须与其他运动物体相互作用。通常,其他对象的相关状态特征不能直接观察到,而必须从观察中推断出来。粒子滤波可以在给定近似跃迁和观测模型的情况下进行这种推理。然而,这些模型通常是先验未知的,由于观测值同时带有过渡和观测噪声,因此产生了一个困难的参数估计问题。在这项工作中,我们考虑使用粒子方法学习最大似然参数。最近解决这个问题的方法通常是在粒子滤波器中通过时间进行微分,这需要解决不可微分重采样步骤,从而产生有偏差或高方差梯度估计。相比之下,我们利用Fisher恒等式来获得分数函数(对数似然的梯度)的基于粒子的近似值,该近似值产生低方差估计,同时只需要通过过渡和观察模型逐步微分。我们将我们的方法应用于从自动驾驶汽车(AV)收集的真实数据,并表明它比现有技术学习更好的模型,并且在训练中更稳定,为跟踪自动驾驶汽车周围车辆的轨迹提供了有效的平滑。
{"title":"Particle-Based Score Estimation for State Space Model Learning in Autonomous Driving","authors":"Angad Singh, Omar Makhlouf, Maximilian Igl, J. Messias, A. Doucet, Shimon Whiteson","doi":"10.48550/arXiv.2212.06968","DOIUrl":"https://doi.org/10.48550/arXiv.2212.06968","url":null,"abstract":"Multi-object state estimation is a fundamental problem for robotic applications where a robot must interact with other moving objects. Typically, other objects' relevant state features are not directly observable, and must instead be inferred from observations. Particle filtering can perform such inference given approximate transition and observation models. However, these models are often unknown a priori, yielding a difficult parameter estimation problem since observations jointly carry transition and observation noise. In this work, we consider learning maximum-likelihood parameters using particle methods. Recent methods addressing this problem typically differentiate through time in a particle filter, which requires workarounds to the non-differentiable resampling step, that yield biased or high variance gradient estimates. By contrast, we exploit Fisher's identity to obtain a particle-based approximation of the score function (the gradient of the log likelihood) that yields a low variance estimate while only requiring stepwise differentiation through the transition and observation models. We apply our method to real data collected from autonomous vehicles (AVs) and show that it learns better models than existing techniques and is more stable in training, yielding an effective smoother for tracking the trajectories of vehicles around an AV.","PeriodicalId":273870,"journal":{"name":"Conference on Robot Learning","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128843803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cross-Domain Transfer via Semantic Skill Imitation 语义技能模仿的跨领域迁移
Pub Date : 2022-12-14 DOI: 10.48550/arXiv.2212.07407
Karl Pertsch, Ruta Desai, Vikash Kumar, Franziska Meier, Joseph J. Lim, Dhruv Batra, Akshara Rai
We propose an approach for semantic imitation, which uses demonstrations from a source domain, e.g. human videos, to accelerate reinforcement learning (RL) in a different target domain, e.g. a robotic manipulator in a simulated kitchen. Instead of imitating low-level actions like joint velocities, our approach imitates the sequence of demonstrated semantic skills like"opening the microwave"or"turning on the stove". This allows us to transfer demonstrations across environments (e.g. real-world to simulated kitchen) and agent embodiments (e.g. bimanual human demonstration to robotic arm). We evaluate on three challenging cross-domain learning problems and match the performance of demonstration-accelerated RL approaches that require in-domain demonstrations. In a simulated kitchen environment, our approach learns long-horizon robot manipulation tasks, using less than 3 minutes of human video demonstrations from a real-world kitchen. This enables scaling robot learning via the reuse of demonstrations, e.g. collected as human videos, for learning in any number of target domains.
我们提出了一种语义模仿方法,该方法使用源域(例如人类视频)的演示来加速不同目标域(例如模拟厨房中的机器人机械手)中的强化学习(RL)。我们的方法不是模仿关节速度等低级动作,而是模仿“打开微波炉”或“打开炉子”等已演示的语义技能的顺序。这使我们能够跨环境(例如,真实世界到模拟厨房)和代理实施例(例如,手动人类演示到机械手臂)转移演示。我们评估了三个具有挑战性的跨领域学习问题,并匹配了需要领域内演示的演示加速RL方法的性能。在模拟的厨房环境中,我们的方法使用不到3分钟的真实厨房人类视频演示来学习长期机器人操作任务。这可以通过重复使用演示来扩展机器人的学习,例如收集为人类视频,以便在任意数量的目标领域进行学习。
{"title":"Cross-Domain Transfer via Semantic Skill Imitation","authors":"Karl Pertsch, Ruta Desai, Vikash Kumar, Franziska Meier, Joseph J. Lim, Dhruv Batra, Akshara Rai","doi":"10.48550/arXiv.2212.07407","DOIUrl":"https://doi.org/10.48550/arXiv.2212.07407","url":null,"abstract":"We propose an approach for semantic imitation, which uses demonstrations from a source domain, e.g. human videos, to accelerate reinforcement learning (RL) in a different target domain, e.g. a robotic manipulator in a simulated kitchen. Instead of imitating low-level actions like joint velocities, our approach imitates the sequence of demonstrated semantic skills like\"opening the microwave\"or\"turning on the stove\". This allows us to transfer demonstrations across environments (e.g. real-world to simulated kitchen) and agent embodiments (e.g. bimanual human demonstration to robotic arm). We evaluate on three challenging cross-domain learning problems and match the performance of demonstration-accelerated RL approaches that require in-domain demonstrations. In a simulated kitchen environment, our approach learns long-horizon robot manipulation tasks, using less than 3 minutes of human video demonstrations from a real-world kitchen. This enables scaling robot learning via the reuse of demonstrations, e.g. collected as human videos, for learning in any number of target domains.","PeriodicalId":273870,"journal":{"name":"Conference on Robot Learning","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130246842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
DiffStack: A Differentiable and Modular Control Stack for Autonomous Vehicles DiffStack:自动驾驶汽车的可微模块化控制堆栈
Pub Date : 2022-12-13 DOI: 10.48550/arXiv.2212.06437
Peter Karkus, B. Ivanovic, Shie Mannor, M. Pavone
Autonomous vehicle (AV) stacks are typically built in a modular fashion, with explicit components performing detection, tracking, prediction, planning, control, etc. While modularity improves reusability, interpretability, and generalizability, it also suffers from compounding errors, information bottlenecks, and integration challenges. To overcome these challenges, a prominent approach is to convert the AV stack into an end-to-end neural network and train it with data. While such approaches have achieved impressive results, they typically lack interpretability and reusability, and they eschew principled analytical components, such as planning and control, in favor of deep neural networks. To enable the joint optimization of AV stacks while retaining modularity, we present DiffStack, a differentiable and modular stack for prediction, planning, and control. Crucially, our model-based planning and control algorithms leverage recent advancements in differentiable optimization to produce gradients, enabling optimization of upstream components, such as prediction, via backpropagation through planning and control. Our results on the nuScenes dataset indicate that end-to-end training with DiffStack yields substantial improvements in open-loop and closed-loop planning metrics by, e.g., learning to make fewer prediction errors that would affect planning. Beyond these immediate benefits, DiffStack opens up new opportunities for fully data-driven yet modular and interpretable AV architectures. Project website: https://sites.google.com/view/diffstack
自动驾驶汽车(AV)堆栈通常以模块化的方式构建,由显式组件执行检测、跟踪、预测、计划、控制等。虽然模块化提高了可重用性、可解释性和泛化性,但它也受到复合错误、信息瓶颈和集成挑战的困扰。为了克服这些挑战,一个突出的方法是将AV堆栈转换为端到端的神经网络,并用数据对其进行训练。虽然这些方法取得了令人印象深刻的结果,但它们通常缺乏可解释性和可重用性,并且它们避开了原则性的分析组件,例如计划和控制,而倾向于深度神经网络。为了在保持模块化的同时实现AV堆栈的联合优化,我们提出了DiffStack,一种用于预测、规划和控制的可微分和模块化堆栈。至关重要的是,我们基于模型的规划和控制算法利用可微分优化的最新进展来产生梯度,从而通过规划和控制的反向传播来优化上游组件,例如预测。我们在nuScenes数据集上的结果表明,使用DiffStack的端到端训练在开环和闭环规划指标上产生了实质性的改进,例如,通过学习减少影响规划的预测错误。除了这些直接的好处之外,DiffStack还为完全数据驱动的模块化和可解释的AV架构开辟了新的机会。项目网站:https://sites.google.com/view/diffstack
{"title":"DiffStack: A Differentiable and Modular Control Stack for Autonomous Vehicles","authors":"Peter Karkus, B. Ivanovic, Shie Mannor, M. Pavone","doi":"10.48550/arXiv.2212.06437","DOIUrl":"https://doi.org/10.48550/arXiv.2212.06437","url":null,"abstract":"Autonomous vehicle (AV) stacks are typically built in a modular fashion, with explicit components performing detection, tracking, prediction, planning, control, etc. While modularity improves reusability, interpretability, and generalizability, it also suffers from compounding errors, information bottlenecks, and integration challenges. To overcome these challenges, a prominent approach is to convert the AV stack into an end-to-end neural network and train it with data. While such approaches have achieved impressive results, they typically lack interpretability and reusability, and they eschew principled analytical components, such as planning and control, in favor of deep neural networks. To enable the joint optimization of AV stacks while retaining modularity, we present DiffStack, a differentiable and modular stack for prediction, planning, and control. Crucially, our model-based planning and control algorithms leverage recent advancements in differentiable optimization to produce gradients, enabling optimization of upstream components, such as prediction, via backpropagation through planning and control. Our results on the nuScenes dataset indicate that end-to-end training with DiffStack yields substantial improvements in open-loop and closed-loop planning metrics by, e.g., learning to make fewer prediction errors that would affect planning. Beyond these immediate benefits, DiffStack opens up new opportunities for fully data-driven yet modular and interpretable AV architectures. Project website: https://sites.google.com/view/diffstack","PeriodicalId":273870,"journal":{"name":"Conference on Robot Learning","volume":"2007 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128576147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
MegaPose: 6D Pose Estimation of Novel Objects via Render & Compare MegaPose:通过渲染和比较的新对象的6D姿态估计
Pub Date : 2022-12-13 DOI: 10.48550/arXiv.2212.06870
Yann Labb'e, Lucas Manuelli, Arsalan Mousavian, Stephen Tyree, Stan Birchfield, Jonathan Tremblay, Justin Carpentier, Mathieu Aubry, D. Fox, Josef Sivic
We introduce MegaPose, a method to estimate the 6D pose of novel objects, that is, objects unseen during training. At inference time, the method only assumes knowledge of (i) a region of interest displaying the object in the image and (ii) a CAD model of the observed object. The contributions of this work are threefold. First, we present a 6D pose refiner based on a render&compare strategy which can be applied to novel objects. The shape and coordinate system of the novel object are provided as inputs to the network by rendering multiple synthetic views of the object's CAD model. Second, we introduce a novel approach for coarse pose estimation which leverages a network trained to classify whether the pose error between a synthetic rendering and an observed image of the same object can be corrected by the refiner. Third, we introduce a large-scale synthetic dataset of photorealistic images of thousands of objects with diverse visual and shape properties and show that this diversity is crucial to obtain good generalization performance on novel objects. We train our approach on this large synthetic dataset and apply it without retraining to hundreds of novel objects in real images from several pose estimation benchmarks. Our approach achieves state-of-the-art performance on the ModelNet and YCB-Video datasets. An extensive evaluation on the 7 core datasets of the BOP challenge demonstrates that our approach achieves performance competitive with existing approaches that require access to the target objects during training. Code, dataset and trained models are available on the project page: https://megapose6d.github.io/.
我们引入MegaPose,一种估算新物体(即训练中未见过的物体)6D姿态的方法。在推理时,该方法仅假设(i)图像中显示对象的感兴趣区域和(ii)观察对象的CAD模型的知识。这项工作的贡献是三重的。首先,我们提出了一种基于渲染和比较策略的6D姿态精细器,可以应用于新对象。通过绘制物体CAD模型的多个综合视图,将新物体的形状和坐标系作为输入提供给网络。其次,我们引入了一种新的粗糙姿态估计方法,该方法利用一个训练好的网络来分类同一物体的合成渲染和观察图像之间的姿态误差是否可以被微调器校正。第三,我们引入了一个由数千个具有不同视觉和形状属性的物体的逼真图像组成的大规模合成数据集,并表明这种多样性对于在新物体上获得良好的泛化性能至关重要。我们在这个大型合成数据集上训练我们的方法,并将其应用于来自几个姿态估计基准的真实图像中的数百个新对象,而无需重新训练。我们的方法在ModelNet和YCB-Video数据集上实现了最先进的性能。对防喷器挑战的7个核心数据集的广泛评估表明,我们的方法在性能上与需要在训练期间访问目标对象的现有方法相比具有竞争力。代码、数据集和训练模型可在项目页面上获得:https://megapose6d.github.io/。
{"title":"MegaPose: 6D Pose Estimation of Novel Objects via Render & Compare","authors":"Yann Labb'e, Lucas Manuelli, Arsalan Mousavian, Stephen Tyree, Stan Birchfield, Jonathan Tremblay, Justin Carpentier, Mathieu Aubry, D. Fox, Josef Sivic","doi":"10.48550/arXiv.2212.06870","DOIUrl":"https://doi.org/10.48550/arXiv.2212.06870","url":null,"abstract":"We introduce MegaPose, a method to estimate the 6D pose of novel objects, that is, objects unseen during training. At inference time, the method only assumes knowledge of (i) a region of interest displaying the object in the image and (ii) a CAD model of the observed object. The contributions of this work are threefold. First, we present a 6D pose refiner based on a render&compare strategy which can be applied to novel objects. The shape and coordinate system of the novel object are provided as inputs to the network by rendering multiple synthetic views of the object's CAD model. Second, we introduce a novel approach for coarse pose estimation which leverages a network trained to classify whether the pose error between a synthetic rendering and an observed image of the same object can be corrected by the refiner. Third, we introduce a large-scale synthetic dataset of photorealistic images of thousands of objects with diverse visual and shape properties and show that this diversity is crucial to obtain good generalization performance on novel objects. We train our approach on this large synthetic dataset and apply it without retraining to hundreds of novel objects in real images from several pose estimation benchmarks. Our approach achieves state-of-the-art performance on the ModelNet and YCB-Video datasets. An extensive evaluation on the 7 core datasets of the BOP challenge demonstrates that our approach achieves performance competitive with existing approaches that require access to the target objects during training. Code, dataset and trained models are available on the project page: https://megapose6d.github.io/.","PeriodicalId":273870,"journal":{"name":"Conference on Robot Learning","volume":"133 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127366667","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 31
ROAD: Learning an Implicit Recursive Octree Auto-Decoder to Efficiently Encode 3D Shapes ROAD:学习一个隐式递归八叉树自动解码器来有效地编码3D形状
Pub Date : 2022-12-12 DOI: 10.48550/arXiv.2212.06193
Sergey Zakharov, Rares Ambrus, Katherine Liu, Adrien Gaidon
Compact and accurate representations of 3D shapes are central to many perception and robotics tasks. State-of-the-art learning-based methods can reconstruct single objects but scale poorly to large datasets. We present a novel recursive implicit representation to efficiently and accurately encode large datasets of complex 3D shapes by recursively traversing an implicit octree in latent space. Our implicit Recursive Octree Auto-Decoder (ROAD) learns a hierarchically structured latent space enabling state-of-the-art reconstruction results at a compression ratio above 99%. We also propose an efficient curriculum learning scheme that naturally exploits the coarse-to-fine properties of the underlying octree spatial representation. We explore the scaling law relating latent space dimension, dataset size, and reconstruction accuracy, showing that increasing the latent space dimension is enough to scale to large shape datasets. Finally, we show that our learned latent space encodes a coarse-to-fine hierarchical structure yielding reusable latents across different levels of details, and we provide qualitative evidence of generalization to novel shapes outside the training set.
紧凑和准确的3D形状表示是许多感知和机器人任务的核心。最先进的基于学习的方法可以重建单个对象,但对大型数据集的扩展能力很差。本文提出了一种递归隐式表示,通过递归遍历隐式八叉树,有效准确地对复杂三维形状的大型数据集进行编码。我们的隐式递归八叉树自动解码器(ROAD)学习了一个分层结构的潜在空间,在压缩比超过99%的情况下实现了最先进的重建结果。我们还提出了一种有效的课程学习方案,该方案自然地利用了底层八叉树空间表示的从粗到精的特性。我们探索了潜在空间维数、数据集大小和重建精度之间的比例规律,表明增加潜在空间维数足以扩展到大型形状数据集。最后,我们证明了我们学习到的潜在空间编码了一个从粗到细的层次结构,在不同的细节层次上产生了可重用的潜在,并且我们提供了定性的证据,证明了我们对训练集之外的新形状的泛化。
{"title":"ROAD: Learning an Implicit Recursive Octree Auto-Decoder to Efficiently Encode 3D Shapes","authors":"Sergey Zakharov, Rares Ambrus, Katherine Liu, Adrien Gaidon","doi":"10.48550/arXiv.2212.06193","DOIUrl":"https://doi.org/10.48550/arXiv.2212.06193","url":null,"abstract":"Compact and accurate representations of 3D shapes are central to many perception and robotics tasks. State-of-the-art learning-based methods can reconstruct single objects but scale poorly to large datasets. We present a novel recursive implicit representation to efficiently and accurately encode large datasets of complex 3D shapes by recursively traversing an implicit octree in latent space. Our implicit Recursive Octree Auto-Decoder (ROAD) learns a hierarchically structured latent space enabling state-of-the-art reconstruction results at a compression ratio above 99%. We also propose an efficient curriculum learning scheme that naturally exploits the coarse-to-fine properties of the underlying octree spatial representation. We explore the scaling law relating latent space dimension, dataset size, and reconstruction accuracy, showing that increasing the latent space dimension is enough to scale to large shape datasets. Finally, we show that our learned latent space encodes a coarse-to-fine hierarchical structure yielding reusable latents across different levels of details, and we provide qualitative evidence of generalization to novel shapes outside the training set.","PeriodicalId":273870,"journal":{"name":"Conference on Robot Learning","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132569320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
MIRA: Mental Imagery for Robotic Affordances MIRA:机器人能力的心理意象
Pub Date : 2022-12-12 DOI: 10.48550/arXiv.2212.06088
Yilun Du
Humans form mental images of 3D scenes to support counterfactual imagination, planning, and motor control. Our abilities to predict the appearance and affordance of the scene from previously unobserved viewpoints aid us in performing manipulation tasks (e.g., 6-DoF kitting) with a level of ease that is currently out of reach for existing robot learning frameworks. In this work, we aim to build artificial systems that can analogously plan actions on top of imagined images. To this end, we introduce Mental Imagery for Robotic Affordances (MIRA), an action reasoning framework that optimizes actions with novel-view synthesis and affordance prediction in the loop. Given a set of 2D RGB images, MIRA builds a consistent 3D scene representation, through which we synthesize novel orthographic views amenable to pixel-wise affordances prediction for action optimization. We illustrate how this optimization process enables us to generalize to unseen out-of-plane rotations for 6-DoF robotic manipulation tasks given a limited number of demonstrations, paving the way toward machines that autonomously learn to understand the world around them for planning actions.
人类形成3D场景的心理图像,以支持反事实的想象、计划和运动控制。我们能够从以前未观察到的视点预测场景的外观和可用性,这有助于我们轻松地执行操作任务(例如,6自由度套件),这是目前现有机器人学习框架无法实现的。在这项工作中,我们的目标是建立人工系统,可以在想象的图像上类似地计划行动。为此,我们引入了机器人功能的心理意象(MIRA),这是一个动作推理框架,通过新颖视图合成和循环中的功能预测来优化动作。给定一组2D RGB图像,MIRA构建一致的3D场景表示,通过该表示,我们合成了适用于像素级可视性预测的新正交视图,以进行动作优化。我们说明了这种优化过程如何使我们能够在有限的演示次数下将6自由度机器人操作任务推广到看不见的面外旋转,为机器自主学习了解周围世界以规划行动铺平了道路。
{"title":"MIRA: Mental Imagery for Robotic Affordances","authors":"Yilun Du","doi":"10.48550/arXiv.2212.06088","DOIUrl":"https://doi.org/10.48550/arXiv.2212.06088","url":null,"abstract":"Humans form mental images of 3D scenes to support counterfactual imagination, planning, and motor control. Our abilities to predict the appearance and affordance of the scene from previously unobserved viewpoints aid us in performing manipulation tasks (e.g., 6-DoF kitting) with a level of ease that is currently out of reach for existing robot learning frameworks. In this work, we aim to build artificial systems that can analogously plan actions on top of imagined images. To this end, we introduce Mental Imagery for Robotic Affordances (MIRA), an action reasoning framework that optimizes actions with novel-view synthesis and affordance prediction in the loop. Given a set of 2D RGB images, MIRA builds a consistent 3D scene representation, through which we synthesize novel orthographic views amenable to pixel-wise affordances prediction for action optimization. We illustrate how this optimization process enables us to generalize to unseen out-of-plane rotations for 6-DoF robotic manipulation tasks given a limited number of demonstrations, paving the way toward machines that autonomously learn to understand the world around them for planning actions.","PeriodicalId":273870,"journal":{"name":"Conference on Robot Learning","volume":"140 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116493866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Where To Start? Transferring Simple Skills to Complex Environments 从哪里开始?将简单技能转移到复杂环境中
Pub Date : 2022-12-12 DOI: 10.48550/arXiv.2212.06111
Vitalis Vosylius, Edward Johns
Robot learning provides a number of ways to teach robots simple skills, such as grasping. However, these skills are usually trained in open, clutter-free environments, and therefore would likely cause undesirable collisions in more complex, cluttered environments. In this work, we introduce an affordance model based on a graph representation of an environment, which is optimised during deployment to find suitable robot configurations to start a skill from, such that the skill can be executed without any collisions. We demonstrate that our method can generalise a priori acquired skills to previously unseen cluttered and constrained environments, in simulation and in the real world, for both a grasping and a placing task.
机器人学习提供了许多方法来教授机器人简单的技能,比如抓取。然而,这些技能通常是在开放的、没有杂乱的环境中训练的,因此在更复杂、杂乱的环境中可能会造成不希望的碰撞。在这项工作中,我们引入了一个基于环境图表示的功能模型,该模型在部署过程中进行优化,以找到合适的机器人配置来启动技能,从而使技能可以在没有任何碰撞的情况下执行。我们证明,我们的方法可以将先验获得的技能推广到以前未见过的混乱和受限的环境中,无论是在模拟还是在现实世界中,都适用于抓取和放置任务。
{"title":"Where To Start? Transferring Simple Skills to Complex Environments","authors":"Vitalis Vosylius, Edward Johns","doi":"10.48550/arXiv.2212.06111","DOIUrl":"https://doi.org/10.48550/arXiv.2212.06111","url":null,"abstract":"Robot learning provides a number of ways to teach robots simple skills, such as grasping. However, these skills are usually trained in open, clutter-free environments, and therefore would likely cause undesirable collisions in more complex, cluttered environments. In this work, we introduce an affordance model based on a graph representation of an environment, which is optimised during deployment to find suitable robot configurations to start a skill from, such that the skill can be executed without any collisions. We demonstrate that our method can generalise a priori acquired skills to previously unseen cluttered and constrained environments, in simulation and in the real world, for both a grasping and a placing task.","PeriodicalId":273870,"journal":{"name":"Conference on Robot Learning","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127221532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Towards Scale Balanced 6-DoF Grasp Detection in Cluttered Scenes 杂乱场景中尺度平衡六自由度抓握检测方法研究
Pub Date : 2022-12-10 DOI: 10.48550/arXiv.2212.05275
Haoxiang Ma, Di Huang
In this paper, we focus on the problem of feature learning in the presence of scale imbalance for 6-DoF grasp detection and propose a novel approach to especially address the difficulty in dealing with small-scale samples. A Multi-scale Cylinder Grouping (MsCG) module is presented to enhance local geometry representation by combining multi-scale cylinder features and global context. Moreover, a Scale Balanced Learning (SBL) loss and an Object Balanced Sampling (OBS) strategy are designed, where SBL enlarges the gradients of the samples whose scales are in low frequency by apriori weights while OBS captures more points on small-scale objects with the help of an auxiliary segmentation network. They alleviate the influence of the uneven distribution of grasp scales in training and inference respectively. In addition, Noisy-clean Mix (NcM) data augmentation is introduced to facilitate training, aiming to bridge the domain gap between synthetic and raw scenes in an efficient way by generating more data which mix them into single ones at instance-level. Extensive experiments are conducted on the GraspNet-1Billion benchmark and competitive results are reached with significant gains on small-scale cases. Besides, the performance of real-world grasping highlights its generalization ability. Our code is available at https://github.com/mahaoxiang822/Scale-Balanced-Grasp.
本文针对六自由度抓握检测中存在尺度不平衡时的特征学习问题,提出了一种新的方法来解决小尺度样本处理的困难。提出了一种多尺度柱面分组(MsCG)模块,将多尺度柱面特征与全局上下文相结合,增强局部几何表示。它们分别缓解了在训练和推理中掌握尺度分布不均匀的影响。此外,为了方便训练,引入了noise -clean Mix (NcM)数据增强,旨在通过生成更多的数据,在实例级将它们混合成单个数据,以有效的方式弥合合成场景和原始场景之间的领域差距。在graspnet - 10亿基准上进行了大量的实验,在小规模案例上取得了显著的进步,取得了具有竞争力的结果。此外,真实世界抓取的性能突出了其泛化能力。我们的代码可在https://github.com/mahaoxiang822/Scale-Balanced-Grasp上获得。
{"title":"Towards Scale Balanced 6-DoF Grasp Detection in Cluttered Scenes","authors":"Haoxiang Ma, Di Huang","doi":"10.48550/arXiv.2212.05275","DOIUrl":"https://doi.org/10.48550/arXiv.2212.05275","url":null,"abstract":"In this paper, we focus on the problem of feature learning in the presence of scale imbalance for 6-DoF grasp detection and propose a novel approach to especially address the difficulty in dealing with small-scale samples. A Multi-scale Cylinder Grouping (MsCG) module is presented to enhance local geometry representation by combining multi-scale cylinder features and global context. Moreover, a Scale Balanced Learning (SBL) loss and an Object Balanced Sampling (OBS) strategy are designed, where SBL enlarges the gradients of the samples whose scales are in low frequency by apriori weights while OBS captures more points on small-scale objects with the help of an auxiliary segmentation network. They alleviate the influence of the uneven distribution of grasp scales in training and inference respectively. In addition, Noisy-clean Mix (NcM) data augmentation is introduced to facilitate training, aiming to bridge the domain gap between synthetic and raw scenes in an efficient way by generating more data which mix them into single ones at instance-level. Extensive experiments are conducted on the GraspNet-1Billion benchmark and competitive results are reached with significant gains on small-scale cases. Besides, the performance of real-world grasping highlights its generalization ability. Our code is available at https://github.com/mahaoxiang822/Scale-Balanced-Grasp.","PeriodicalId":273870,"journal":{"name":"Conference on Robot Learning","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116266795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Visuotactile Affordances for Cloth Manipulation with Local Control 局部控制布料操作的视觉可视性
Pub Date : 2022-12-09 DOI: 10.48550/arXiv.2212.05108
N. Sunil, Shaoxiong Wang, Y. She, E. Adelson, Alberto Rodriguez
Cloth in the real world is often crumpled, self-occluded, or folded in on itself such that key regions, such as corners, are not directly graspable, making manipulation difficult. We propose a system that leverages visual and tactile perception to unfold the cloth via grasping and sliding on edges. By doing so, the robot is able to grasp two adjacent corners, enabling subsequent manipulation tasks like folding or hanging. As components of this system, we develop tactile perception networks that classify whether an edge is grasped and estimate the pose of the edge. We use the edge classification network to supervise a visuotactile edge grasp affordance network that can grasp edges with a 90% success rate. Once an edge is grasped, we demonstrate that the robot can slide along the cloth to the adjacent corner using tactile pose estimation/control in real time. See http://nehasunil.com/visuotactile/visuotactile.html for videos.
现实世界中的布料通常是皱巴巴的、自我封闭的,或者是自己折叠起来的,这样关键的区域,比如角落,就不能直接抓住,这使得操作变得困难。我们提出了一个利用视觉和触觉感知的系统,通过抓取和滑动边缘来展开布料。通过这样做,机器人能够抓住两个相邻的角落,从而实现折叠或悬挂等后续操作任务。作为该系统的组成部分,我们开发了触觉感知网络,用于分类边缘是否被抓住并估计边缘的姿态。我们使用边缘分类网络来监督一个视觉触觉边缘抓取功能网络,该网络能够以90%的成功率抓取边缘。一旦抓住边缘,我们证明了机器人可以通过实时触觉姿态估计/控制沿布料滑动到相邻的角落。请参见http://nehasunil.com/visuotactile/visuotactile.html观看视频。
{"title":"Visuotactile Affordances for Cloth Manipulation with Local Control","authors":"N. Sunil, Shaoxiong Wang, Y. She, E. Adelson, Alberto Rodriguez","doi":"10.48550/arXiv.2212.05108","DOIUrl":"https://doi.org/10.48550/arXiv.2212.05108","url":null,"abstract":"Cloth in the real world is often crumpled, self-occluded, or folded in on itself such that key regions, such as corners, are not directly graspable, making manipulation difficult. We propose a system that leverages visual and tactile perception to unfold the cloth via grasping and sliding on edges. By doing so, the robot is able to grasp two adjacent corners, enabling subsequent manipulation tasks like folding or hanging. As components of this system, we develop tactile perception networks that classify whether an edge is grasped and estimate the pose of the edge. We use the edge classification network to supervise a visuotactile edge grasp affordance network that can grasp edges with a 90% success rate. Once an edge is grasped, we demonstrate that the robot can slide along the cloth to the adjacent corner using tactile pose estimation/control in real time. See http://nehasunil.com/visuotactile/visuotactile.html for videos.","PeriodicalId":273870,"journal":{"name":"Conference on Robot Learning","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127063474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
VideoDex: Learning Dexterity from Internet Videos VideoDex:从网络视频中学习灵活性
Pub Date : 2022-12-08 DOI: 10.48550/arXiv.2212.04498
Kenneth Shaw, Shikhar Bahl, Deepak Pathak
To build general robotic agents that can operate in many environments, it is often imperative for the robot to collect experience in the real world. However, this is often not feasible due to safety, time, and hardware restrictions. We thus propose leveraging the next best thing as real-world experience: internet videos of humans using their hands. Visual priors, such as visual features, are often learned from videos, but we believe that more information from videos can be utilized as a stronger prior. We build a learning algorithm, VideoDex, that leverages visual, action, and physical priors from human video datasets to guide robot behavior. These actions and physical priors in the neural network dictate the typical human behavior for a particular robot task. We test our approach on a robot arm and dexterous hand-based system and show strong results on various manipulation tasks, outperforming various state-of-the-art methods. Videos at https://video-dex.github.io
为了构建能够在多种环境中运行的通用机器人代理,机器人通常必须在现实世界中收集经验。然而,由于安全、时间和硬件的限制,这通常是不可行的。因此,我们建议利用下一个最好的东西作为现实世界的经验:人类使用双手的互联网视频。视觉先验,如视觉特征,通常是从视频中学习到的,但我们相信更多来自视频的信息可以被用作更强的先验。我们建立了一个学习算法VideoDex,它利用人类视频数据集的视觉、动作和物理先验来指导机器人的行为。神经网络中的这些动作和物理先验决定了特定机器人任务的典型人类行为。我们在机器人手臂和基于灵巧手的系统上测试了我们的方法,并在各种操作任务上显示出强大的结果,优于各种最先进的方法。视频请访问https://video-dex.github.io
{"title":"VideoDex: Learning Dexterity from Internet Videos","authors":"Kenneth Shaw, Shikhar Bahl, Deepak Pathak","doi":"10.48550/arXiv.2212.04498","DOIUrl":"https://doi.org/10.48550/arXiv.2212.04498","url":null,"abstract":"To build general robotic agents that can operate in many environments, it is often imperative for the robot to collect experience in the real world. However, this is often not feasible due to safety, time, and hardware restrictions. We thus propose leveraging the next best thing as real-world experience: internet videos of humans using their hands. Visual priors, such as visual features, are often learned from videos, but we believe that more information from videos can be utilized as a stronger prior. We build a learning algorithm, VideoDex, that leverages visual, action, and physical priors from human video datasets to guide robot behavior. These actions and physical priors in the neural network dictate the typical human behavior for a particular robot task. We test our approach on a robot arm and dexterous hand-based system and show strong results on various manipulation tasks, outperforming various state-of-the-art methods. Videos at https://video-dex.github.io","PeriodicalId":273870,"journal":{"name":"Conference on Robot Learning","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128586829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
期刊
Conference on Robot Learning
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1