The International Journal of Robotics Research最新文献_第7页

Learning dexterity from human hand motion in internet videos 从网络视频中的人体手部动作学习灵巧性

The International Journal of Robotics Research

Pub Date : 2024-01-22 DOI: 10.1177/02783649241227559

Kenneth Shaw, Shikhar Bahl, Aravind Sivakumar, Aditya Kannan, Deepak Pathak

To build general robotic agents that can operate in many environments, it is often useful for robots to collect experience in the real world. However, unguided experience collection is often not feasible due to safety, time, and hardware restrictions. We thus propose leveraging the next best thing as real world experience: videos of humans using their hands. To utilize these videos, we develop a method that retargets any 1st person or 3rd person video of human hands and arms into the robot hand and arm trajectories. While retargeting is a difficult problem, our key insight is to rely on only internet human hand video to train it. We use this method to present results in two areas: First, we build a system that enables any human to control a robot hand and arm, simply by demonstrating motions with their own hand. The robot observes the human operator via a single RGB camera and imitates their actions in real-time. This enables the robot to collect real-world experience safely using supervision. See these results at https://robotic-telekinesis.github.io . Second, we retarget in-the-wild human internet video into task-conditioned pseudo-robot trajectories to use as artificial robot experience. This learning algorithm leverages action priors from human hand actions, visual features from the images, and physical priors from dynamical systems to pretrain typical human behavior for a particular robot task. We show that by leveraging internet human hand experience, we need fewer robot demonstrations compared to many other methods. See these results at https://video-dex.github.io

为了构建可在多种环境下运行的通用机器人代理，机器人在现实世界中收集经验通常是非常有用的。然而，由于安全、时间和硬件方面的限制，无指导的经验收集通常并不可行。因此，我们建议利用现实世界中最优秀的经验：人类使用双手的视频。为了利用这些视频，我们开发了一种方法，可以将任何第一人称或第三人称的人类手部和手臂视频重新定位到机器人手部和手臂轨迹中。虽然重新定位是一个难题，但我们的关键见解是仅依靠互联网上的人类手部视频来训练它。我们利用这种方法在两个领域取得了成果：首先，我们建立了一个系统，任何人类只需用自己的手示范动作，就能控制机器人的手和手臂。机器人通过一个 RGB 摄像头观察人类操作者，并实时模仿他们的动作。这样，机器人就能在监督下安全地收集真实世界的经验。请访问 https://robotic-telekinesis.github.io 查看这些成果。其次，我们将野外人类互联网视频重定向为任务条件假机器人轨迹，作为人工机器人经验使用。这种学习算法利用了来自人类手部动作的动作先验、图像的视觉特征和动力系统的物理先验，为特定的机器人任务预训练典型的人类行为。我们的研究表明，通过利用互联网上的人类手部经验，与许多其他方法相比，我们需要的机器人演示次数更少。请访问 https://video-dex.github.io 查看这些成果。

{"title":"Learning dexterity from human hand motion in internet videos","authors":"Kenneth Shaw, Shikhar Bahl, Aravind Sivakumar, Aditya Kannan, Deepak Pathak","doi":"10.1177/02783649241227559","DOIUrl":"https://doi.org/10.1177/02783649241227559","url":null,"abstract":"To build general robotic agents that can operate in many environments, it is often useful for robots to collect experience in the real world. However, unguided experience collection is often not feasible due to safety, time, and hardware restrictions. We thus propose leveraging the next best thing as real world experience: videos of humans using their hands. To utilize these videos, we develop a method that retargets any 1st person or 3rd person video of human hands and arms into the robot hand and arm trajectories. While retargeting is a difficult problem, our key insight is to rely on only internet human hand video to train it. We use this method to present results in two areas: First, we build a system that enables any human to control a robot hand and arm, simply by demonstrating motions with their own hand. The robot observes the human operator via a single RGB camera and imitates their actions in real-time. This enables the robot to collect real-world experience safely using supervision. See these results at https://robotic-telekinesis.github.io . Second, we retarget in-the-wild human internet video into task-conditioned pseudo-robot trajectories to use as artificial robot experience. This learning algorithm leverages action priors from human hand actions, visual features from the images, and physical priors from dynamical systems to pretrain typical human behavior for a particular robot task. We show that by leveraging internet human hand experience, we need fewer robot demonstrations compared to many other methods. See these results at https://video-dex.github.io","PeriodicalId":501362,"journal":{"name":"The International Journal of Robotics Research","volume":"19 8","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139608207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Kernel-based diffusion approximated Markov decision processes for autonomous navigation and control on unstructured terrains 基于核扩散的近似马尔可夫决策过程，用于非结构化地形上的自主导航和控制

The International Journal of Robotics Research

Pub Date : 2024-01-19 DOI: 10.1177/02783649231225977

Junhong Xu, Kai Yin, Zheng Chen, Jason M Gregory, Ethan A Stump, Lantao Liu

We propose a diffusion approximation method to the continuous-state Markov decision processes that can be utilized to address autonomous navigation and control in unstructured off-road environments. In contrast to most decision-theoretic planning frameworks that assume fully known state transition models, we design a method that eliminates such a strong assumption that is often extremely difficult to engineer in reality. We first take the second-order Taylor expansion of the value function. The Bellman optimality equation is then approximated by a partial differential equation, which only relies on the first and second moments of the transition model. By combining the kernel representation of the value function, we design an efficient policy iteration algorithm whose policy evaluation step can be represented as a linear system of equations characterized by a finite set of supporting states. We first validate the proposed method through extensive simulations in 2 D obstacle avoidance and 2.5 D terrain navigation problems. The results show that the proposed approach leads to a much superior performance over several baselines. We then develop a system that integrates our decision-making framework with onboard perception and conduct real-world experiments in both cluttered indoor and unstructured outdoor environments. The results from the physical systems further demonstrate the applicability of our method in challenging real-world environments.

我们提出了一种连续状态马尔可夫决策过程的扩散近似方法，可用于解决非结构化越野环境中的自主导航和控制问题。与大多数假设状态转换模型完全已知的决策理论规划框架不同，我们设计的方法消除了这种在现实中通常极难设计的强烈假设。我们首先对价值函数进行二阶泰勒展开。然后用偏微分方程逼近贝尔曼最优方程，该方程只依赖于过渡模型的第一和第二矩。通过结合价值函数的核表示，我们设计出了一种高效的策略迭代算法，其策略评估步骤可以表示为一个线性方程组，其特征是支持状态的有限集合。我们首先在 2 D 避障和 2.5 D 地形导航问题中进行了大量模拟，验证了所提出的方法。结果表明，所提出的方法比几种基线方法性能优越得多。然后，我们开发了一个系统，将我们的决策框架与车载感知集成在一起，并在杂乱的室内和非结构化的室外环境中进行了实际实验。物理系统的结果进一步证明了我们的方法在具有挑战性的现实环境中的适用性。

{"title":"Kernel-based diffusion approximated Markov decision processes for autonomous navigation and control on unstructured terrains","authors":"Junhong Xu, Kai Yin, Zheng Chen, Jason M Gregory, Ethan A Stump, Lantao Liu","doi":"10.1177/02783649231225977","DOIUrl":"https://doi.org/10.1177/02783649231225977","url":null,"abstract":"We propose a diffusion approximation method to the continuous-state Markov decision processes that can be utilized to address autonomous navigation and control in unstructured off-road environments. In contrast to most decision-theoretic planning frameworks that assume fully known state transition models, we design a method that eliminates such a strong assumption that is often extremely difficult to engineer in reality. We first take the second-order Taylor expansion of the value function. The Bellman optimality equation is then approximated by a partial differential equation, which only relies on the first and second moments of the transition model. By combining the kernel representation of the value function, we design an efficient policy iteration algorithm whose policy evaluation step can be represented as a linear system of equations characterized by a finite set of supporting states. We first validate the proposed method through extensive simulations in 2 D obstacle avoidance and 2.5 D terrain navigation problems. The results show that the proposed approach leads to a much superior performance over several baselines. We then develop a system that integrates our decision-making framework with onboard perception and conduct real-world experiments in both cluttered indoor and unstructured outdoor environments. The results from the physical systems further demonstrate the applicability of our method in challenging real-world environments.","PeriodicalId":501362,"journal":{"name":"The International Journal of Robotics Research","volume":"93 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139612708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A cross-domain challenge with panoptic segmentation in agriculture 农业全景细分的跨领域挑战

The International Journal of Robotics Research

Pub Date : 2024-01-18 DOI: 10.1177/02783649241227448

Michael Halstead, Patrick Zimmer, Chris McCool

Automation in agriculture is a growing area of research with fundamental societal importance as farmers are expected to produce more and better crop with fewer resources. A key enabling factor is robotic vision techniques allowing us to sense and then interact with the environment. A limiting factor for these robotic vision systems is their cross-domain performance, that is, their ability to operate in a large range of environments. In this paper, we propose the use of auxiliary tasks to enhance cross-domain performance without the need for extra data. We perform experiments using four datasets (two in a glasshouse and two in arable farmland) for four cross-domain evaluations. These experiments demonstrate the effectiveness of our auxiliary tasks to improve network generalisability. In glasshouse experiments, our approach improves the panoptic quality of things from 10.4 to 18.5 and in arable farmland from 16.0 to 27.5; where a score of 100 is the best. To further evaluate the generalisability of our approach, we perform an ablation study using the large Crop and Weed dataset (CAW) where we improve cross-domain performance (panoptic quality of things) from 12.8 to 30.6 for the CAW dataset to our novel WeedAI dataset, and 21.2 to 36.0 from CAW to the other arable farmland dataset. Although our proposed approaches considerably improve cross-domain performance we still do not generally outperform in-domain trained systems. This highlights the potential room for improvement in this area and the importance of cross-domain research for robotic vision systems.

农业自动化是一个不断增长的研究领域，对社会具有根本性的重要意义，因为农民们需要用更少的资源生产出更多更好的作物。一个关键的有利因素是机器人视觉技术，它使我们能够感知环境，然后与环境互动。这些机器人视觉系统的一个限制因素是它们的跨域性能，即它们在大范围环境中运行的能力。在本文中，我们提出使用辅助任务来提高跨域性能，而无需额外的数据。我们使用四个数据集（两个在温室中，两个在可耕农田中）进行了四次跨领域评估实验。这些实验证明了我们的辅助任务在提高网络通用性方面的有效性。在玻璃温室实验中，我们的方法将事物的全景质量从 10.4 分提高到 18.5 分，在可耕农田中从 16.0 分提高到 27.5 分；其中 100 分为最佳。为了进一步评估我们的方法的通用性，我们使用大型作物和杂草数据集（CAW）进行了一项消融研究，结果显示，从 CAW 数据集到我们的新型 WeedAI 数据集，我们的跨域性能（事物的全景质量）从 12.8 提高到 30.6，从 CAW 到其他可耕农田数据集，我们的跨域性能从 21.2 提高到 36.0。尽管我们提出的方法大大提高了跨域性能，但总体而言，我们仍然没有超越域内训练系统。这凸显了这一领域的潜在改进空间，以及跨域研究对机器人视觉系统的重要性。

{"title":"A cross-domain challenge with panoptic segmentation in agriculture","authors":"Michael Halstead, Patrick Zimmer, Chris McCool","doi":"10.1177/02783649241227448","DOIUrl":"https://doi.org/10.1177/02783649241227448","url":null,"abstract":"Automation in agriculture is a growing area of research with fundamental societal importance as farmers are expected to produce more and better crop with fewer resources. A key enabling factor is robotic vision techniques allowing us to sense and then interact with the environment. A limiting factor for these robotic vision systems is their cross-domain performance, that is, their ability to operate in a large range of environments. In this paper, we propose the use of auxiliary tasks to enhance cross-domain performance without the need for extra data. We perform experiments using four datasets (two in a glasshouse and two in arable farmland) for four cross-domain evaluations. These experiments demonstrate the effectiveness of our auxiliary tasks to improve network generalisability. In glasshouse experiments, our approach improves the panoptic quality of things from 10.4 to 18.5 and in arable farmland from 16.0 to 27.5; where a score of 100 is the best. To further evaluate the generalisability of our approach, we perform an ablation study using the large Crop and Weed dataset (CAW) where we improve cross-domain performance (panoptic quality of things) from 12.8 to 30.6 for the CAW dataset to our novel WeedAI dataset, and 21.2 to 36.0 from CAW to the other arable farmland dataset. Although our proposed approaches considerably improve cross-domain performance we still do not generally outperform in-domain trained systems. This highlights the potential room for improvement in this area and the importance of cross-domain research for robotic vision systems.","PeriodicalId":501362,"journal":{"name":"The International Journal of Robotics Research","volume":"120 17","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139616240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Intelligent robotic sonographer: Mutual information-based disentangled reward learning from few demonstrations 智能超声机器人：基于互信息的从少量演示中进行分离奖励学习

The International Journal of Robotics Research

Pub Date : 2024-01-16 DOI: 10.1177/02783649231223547

Zhongliang Jiang, Yuan Bi, Mingchuan Zhou, Ying Hu, Michael Burke, Nassir Navab

Ultrasound (US) imaging is widely used for biometric measurement and diagnosis of internal organs due to the advantages of being real-time and radiation-free. However, due to inter-operator variations, resulting images highly depend on the experience of sonographers. This work proposes an intelligent robotic sonographer to autonomously “explore” target anatomies and navigate a US probe to standard planes by learning from the expert. The underlying high-level physiological knowledge from experts is inferred by a neural reward function, using a ranked pairwise image comparison approach in a self-supervised fashion. This process can be referred to as understanding the “language of sonography.” Considering the generalization capability to overcome inter-patient variations, mutual information is estimated by a network to explicitly disentangle the task-related and domain features in latent space. The robotic localization is carried out in coarse-to-fine mode based on the predicted reward associated with B-mode images. To validate the effectiveness of the proposed reward inference network, representative experiments were performed on vascular phantoms (“line” target), two types of ex vivo animal organ phantoms (chicken heart and lamb kidney representing “point” target), and in vivo human carotids. To further validate the performance of the autonomous acquisition framework, physical robotic acquisitions were performed on three phantoms (vascular, chicken heart, and lamb kidney). The results demonstrated that the proposed advanced framework can robustly work on a variety of seen and unseen phantoms as well as in vivo human carotid data. Code: https://github.com/yuan-12138/MI-GPSR . Video: https://youtu.be/u4ThAA9onE0 .

由于超声（US）成像具有实时和无辐射的优点，因此被广泛用于内脏器官的生物测量和诊断。然而，由于操作员之间的差异，成像结果在很大程度上取决于超声技师的经验。这项工作提出了一种智能机器人超声技师，通过向专家学习，自主 "探索 "目标解剖结构，并将 US 探头导航到标准平面。专家提供的基本高级生理知识是通过神经奖励函数，以自我监督的方式使用排序配对图像比较方法推断出来的。这一过程可称为理解 "超声语言"。考虑到克服患者间差异的泛化能力，通过网络估算互信息，明确地将潜在空间中与任务相关的特征和领域特征区分开来。根据与 B 型图像相关的预测奖励，以从粗到细的模式进行机器人定位。为了验证所提出的奖励推理网络的有效性，在血管模型（"线 "目标）、两种活体动物器官模型（代表 "点 "目标的鸡心和羊肾模型）和活体人体颈动脉上进行了代表性实验。为了进一步验证自主采集框架的性能，对三个模型（血管、鸡心和羊肾）进行了物理机器人采集。结果表明，所提出的先进框架能在各种可见和未知模型以及活体人体颈动脉数据上稳健工作。代码： https://github.com/yuan-12138/MI-GPSR 。视频： https://youtu.be/u4ThAA9onE0 。

{"title":"Intelligent robotic sonographer: Mutual information-based disentangled reward learning from few demonstrations","authors":"Zhongliang Jiang, Yuan Bi, Mingchuan Zhou, Ying Hu, Michael Burke, Nassir Navab","doi":"10.1177/02783649231223547","DOIUrl":"https://doi.org/10.1177/02783649231223547","url":null,"abstract":"Ultrasound (US) imaging is widely used for biometric measurement and diagnosis of internal organs due to the advantages of being real-time and radiation-free. However, due to inter-operator variations, resulting images highly depend on the experience of sonographers. This work proposes an intelligent robotic sonographer to autonomously “explore” target anatomies and navigate a US probe to standard planes by learning from the expert. The underlying high-level physiological knowledge from experts is inferred by a neural reward function, using a ranked pairwise image comparison approach in a self-supervised fashion. This process can be referred to as understanding the “language of sonography.” Considering the generalization capability to overcome inter-patient variations, mutual information is estimated by a network to explicitly disentangle the task-related and domain features in latent space. The robotic localization is carried out in coarse-to-fine mode based on the predicted reward associated with B-mode images. To validate the effectiveness of the proposed reward inference network, representative experiments were performed on vascular phantoms (“line” target), two types of ex vivo animal organ phantoms (chicken heart and lamb kidney representing “point” target), and in vivo human carotids. To further validate the performance of the autonomous acquisition framework, physical robotic acquisitions were performed on three phantoms (vascular, chicken heart, and lamb kidney). The results demonstrated that the proposed advanced framework can robustly work on a variety of seen and unseen phantoms as well as in vivo human carotid data. Code: https://github.com/yuan-12138/MI-GPSR . Video: https://youtu.be/u4ThAA9onE0 .","PeriodicalId":501362,"journal":{"name":"The International Journal of Robotics Research","volume":"19 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139945629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Lane-level route planning for autonomous vehicles 自动驾驶汽车的车道级路线规划

The International Journal of Robotics Research

Pub Date : 2024-01-10 DOI: 10.1177/02783649231225474

Mitchell Jones, Maximilian Haas-Heger, Jur van den Berg

We present an algorithm that, given a representation of a road network in lane-level detail, computes a route that minimizes the expected cost to reach a given destination. In doing so, our algorithm allows us to solve for the complex trade-offs encountered when trying to decide not just which roads to follow, but also when to change between the lanes making up these roads, in order to—for example—reduce the likelihood of missing a left exit while not unnecessarily driving in the leftmost lane. This routing problem can naturally be formulated as a Markov Decision Process (MDP), in which lane change actions have stochastic outcomes. However, MDPs are known to be time-consuming to solve in general. In this paper, we show that—under reasonable assumptions—we can use a Dijkstra-like approach to solve this stochastic problem, and benefit from its efficient O( n log n) running time. This enables an autonomous vehicle to exhibit lane-selection behavior as it efficiently plans an optimal route to its destination.

我们提出了一种算法，在给定车道级道路网络的详细表示后，可以计算出一条到达给定目的地的预期成本最小的路线。在此过程中，我们的算法允许我们解决复杂的权衡问题，因为我们不仅要决定沿着哪条道路行驶，还要决定何时在组成这些道路的车道之间切换，以便--例如--降低错过左侧出口的可能性，同时又不会不必要地在最左侧车道行驶。这个路由问题可以自然地表述为马尔可夫决策过程（Markov Decision Process，MDP），其中车道变换操作具有随机结果。然而，众所周知，MDP 的求解一般都很耗时。本文表明，在合理的假设条件下，我们可以使用类似于 Dijkstra 的方法来解决这一随机问题，并受益于其高效的 O( n log n) 运行时间。这使得自动驾驶汽车在高效规划通往目的地的最优路线时，能够表现出车道选择行为。

引用次数: 0

Leveraging symmetries in pick and place 利用采摘和放置的对称性

The International Journal of Robotics Research

Pub Date : 2024-01-06 DOI: 10.1177/02783649231225775

Haojie Huang, Dian Wang, Arsh Tangri, Robin Walters, Robert Platt

Robotic pick and place tasks are symmetric under translations and rotations of both the object to be picked and the desired place pose. For example, if the pick object is rotated or translated, then the optimal pick action should also rotate or translate. The same is true for the place pose; if the desired place pose changes, then the place action should also transform accordingly. A recently proposed pick and place framework known as Transporter Net (Zeng, Florence, Tompson, Welker, Chien, Attarian, Armstrong, Krasin, Duong, Sindhwani et al., 2021) captures some of these symmetries, but not all. This paper analytically studies the symmetries present in planar robotic pick and place and proposes a method of incorporating equivariant neural models into Transporter Net in a way that captures all symmetries. The new model, which we call Equivariant Transporter Net, is equivariant to both pick and place symmetries and can immediately generalize pick and place knowledge to different pick and place poses. We evaluate the new model empirically and show that it is much more sample-efficient than the non-symmetric version, resulting in a system that can imitate demonstrated pick and place behavior using very few human demonstrations on a variety of imitation learning tasks.

机器人拾取和放置任务在待拾取物体和所需放置姿势的平移和旋转下是对称的。例如，如果拾取对象旋转或平移，那么最佳拾取动作也应旋转或平移。摆放姿势也是如此；如果所需的摆放姿势发生了变化，那么摆放动作也应进行相应的变换。最近提出的一种名为 Transporter Net 的取放框架（Zeng、Florence、Tompson、Welker、Chien、Attarian、Armstrong、Krasin、Duong、Sindhwani 等人，2021 年）捕捉到了其中的一些对称性，但并非全部。本文分析研究了平面机器人拾放中存在的对称性，并提出了一种将等变神经模型纳入 Transporter Net 的方法，这种方法能捕捉到所有对称性。我们称之为等变传输网的新模型对取放对称性都具有等变性，并能立即将取放知识推广到不同的取放姿势。我们对新模型进行了实证评估，结果表明它比非对称性版本更节省样本，因此在各种模仿学习任务中，只需极少的人类示范，系统就能模仿出已演示过的拾放行为。

{"title":"Leveraging symmetries in pick and place","authors":"Haojie Huang, Dian Wang, Arsh Tangri, Robin Walters, Robert Platt","doi":"10.1177/02783649231225775","DOIUrl":"https://doi.org/10.1177/02783649231225775","url":null,"abstract":"Robotic pick and place tasks are symmetric under translations and rotations of both the object to be picked and the desired place pose. For example, if the pick object is rotated or translated, then the optimal pick action should also rotate or translate. The same is true for the place pose; if the desired place pose changes, then the place action should also transform accordingly. A recently proposed pick and place framework known as Transporter Net (Zeng, Florence, Tompson, Welker, Chien, Attarian, Armstrong, Krasin, Duong, Sindhwani et al., 2021) captures some of these symmetries, but not all. This paper analytically studies the symmetries present in planar robotic pick and place and proposes a method of incorporating equivariant neural models into Transporter Net in a way that captures all symmetries. The new model, which we call Equivariant Transporter Net, is equivariant to both pick and place symmetries and can immediately generalize pick and place knowledge to different pick and place poses. We evaluate the new model empirically and show that it is much more sample-efficient than the non-symmetric version, resulting in a system that can imitate demonstrated pick and place behavior using very few human demonstrations on a variety of imitation learning tasks.","PeriodicalId":501362,"journal":{"name":"The International Journal of Robotics Research","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139945704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Self-reflective terrain-aware robot adaptation for consistent off-road ground navigation 自反射地形感知机器人适应性，实现一致的越野地面导航

The International Journal of Robotics Research

Pub Date : 2024-01-06 DOI: 10.1177/02783649231225243

Sriram Siva, Maggie Wigness, John G. Rogers, Long Quang, Hao Zhang

Ground robots require the crucial capability of traversing unstructured and unprepared terrains and avoiding obstacles to complete tasks in real-world robotics applications such as disaster response. When a robot operates in off-road field environments such as forests, the robot’s actual behaviors often do not match its expected or planned behaviors, due to changes in the characteristics of terrains and the robot itself. Therefore, the capability of robot adaptation for consistent behavior generation is essential for maneuverability on unstructured off-road terrains. In order to address the challenge, we propose a novel method of self-reflective terrain-aware adaptation for ground robots to generate consistent controls to navigate over unstructured off-road terrains, which enables robots to more accurately execute the expected behaviors through robot self-reflection while adapting to varying unstructured terrains. To evaluate our method’s performance, we conduct extensive experiments using real ground robots with various functionality changes over diverse unstructured off-road terrains. The comprehensive experimental results have shown that our self-reflective terrain-aware adaptation method enables ground robots to generate consistent navigational behaviors and outperforms the compared previous and baseline techniques.

地面机器人需要具备穿越非结构化和无准备地形以及避开障碍物的关键能力，以完成灾难响应等实际机器人应用中的任务。当机器人在森林等越野野外环境中工作时，由于地形和机器人自身特性的变化，机器人的实际行为往往与预期或计划行为不一致。因此，要想在非结构化越野地形上实现可操作性，就必须具备机器人自适应能力，以生成一致的行为。为了应对这一挑战，我们提出了一种新颖的自反地形感知适应方法，用于地面机器人在非结构化越野地形上生成一致的导航控制，通过机器人自反，使机器人能够更准确地执行预期行为，同时适应不同的非结构化地形。为了评估我们方法的性能，我们使用真实的地面机器人在各种非结构化越野地形上进行了广泛的实验，这些机器人具有不同的功能变化。综合实验结果表明，我们的自反射地形感知适应方法能够让地面机器人产生一致的导航行为，并且优于之前的技术和基准技术。

{"title":"Self-reflective terrain-aware robot adaptation for consistent off-road ground navigation","authors":"Sriram Siva, Maggie Wigness, John G. Rogers, Long Quang, Hao Zhang","doi":"10.1177/02783649231225243","DOIUrl":"https://doi.org/10.1177/02783649231225243","url":null,"abstract":"Ground robots require the crucial capability of traversing unstructured and unprepared terrains and avoiding obstacles to complete tasks in real-world robotics applications such as disaster response. When a robot operates in off-road field environments such as forests, the robot’s actual behaviors often do not match its expected or planned behaviors, due to changes in the characteristics of terrains and the robot itself. Therefore, the capability of robot adaptation for consistent behavior generation is essential for maneuverability on unstructured off-road terrains. In order to address the challenge, we propose a novel method of self-reflective terrain-aware adaptation for ground robots to generate consistent controls to navigate over unstructured off-road terrains, which enables robots to more accurately execute the expected behaviors through robot self-reflection while adapting to varying unstructured terrains. To evaluate our method’s performance, we conduct extensive experiments using real ground robots with various functionality changes over diverse unstructured off-road terrains. The comprehensive experimental results have shown that our self-reflective terrain-aware adaptation method enables ground robots to generate consistent navigational behaviors and outperforms the compared previous and baseline techniques.","PeriodicalId":501362,"journal":{"name":"The International Journal of Robotics Research","volume":"68 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139945626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Rapid locomotion via reinforcement learning 通过强化学习实现快速运动

The International Journal of Robotics Research

Pub Date : 2024-01-02 DOI: 10.1177/02783649231224053

Gabriel B. Margolis, Ge Yang, Kartik Paigwar, Tao Chen, Pulkit Agrawal

Agile maneuvers such as sprinting and high-speed turning in the wild are challenging for legged robots. We present an end-to-end learned controller that achieves record agility for the MIT Mini Cheetah, sustaining speeds up to 3.9 m/s. This system runs and turns fast on natural terrains like grass, ice, and gravel and responds robustly to disturbances. Our controller is a neural network trained in simulation via reinforcement learning and transferred to the real world. The two key components are (i) an adaptive curriculum on velocity commands and (ii) an online system identification strategy for sim-to-real transfer. Videos of the robot’s behaviors are available at https://agility.csail.mit.edu/ .

野外冲刺和高速转弯等敏捷动作对于有腿机器人来说极具挑战性。我们介绍了一种端到端学习控制器，它使麻省理工学院的迷你猎豹实现了创纪录的敏捷性，速度最高可达 3.9 米/秒。该系统能在草地、冰面和砾石等自然地形上快速奔跑和转弯，并能对干扰做出稳健的响应。我们的控制器是一个神经网络，通过强化学习在模拟中进行训练，然后移植到现实世界中。其中两个关键部分是：(i) 速度指令的自适应课程；(ii) 用于从模拟到现实转换的在线系统识别策略。有关机器人行为的视频，请访问 https://agility.csail.mit.edu/ 。

引用次数: 0

Energy-optimal trajectories for skid-steer rovers 滑移转向车的能量优化轨迹

The International Journal of Robotics Research

Pub Date : 2023-12-30 DOI: 10.1177/02783649231216499

M. Effati, Krzysztof Skonieczny, Devin J. Balkcom

This paper presents the energy-optimal trajectories for skid-steer rovers on hard ground, without obstacles. We obtain 29 trajectory structures that are sufficient to describe minimum-energy motion, which are enumerated and described geometrically; 28 of these structures are composed of sequences of circular arcs and straight lines; there is also a special structure called whirls consisting of different circular arcs. Our analysis identifies that the turns in the trajectory structures (aside from whirls) are all circular arcs of a particular turning radius, R′, the turning radius at which the inner wheels of a skid-steer rover are not commanded to turn. This work demonstrates its paramount importance in energy-optimal path planning. There has been a lack of analytical energy-optimal trajectory generation for skid-steer rovers, and we address this problem by a novel approach. The equivalency theorem presented in this work shows that all minimum-energy solutions follow the same path irrespective of velocity constraints that may or may not be imposed. This non-intuitive result stems from the fact that with this model of the system the total energy is fully parameterized by the geometry of the path alone. With this equivalency in mind, one can choose velocity constraints to enforce constant power consumption, thus transforming the energy-optimal problem into an equivalent time-optimal problem. Pontryagin’s Minimum Principle can then be used to solve the problem. Accordingly, the extremal paths are obtained and enumerated to find the minimum-energy path. Furthermore, our experimental results by using Husky UGV provide the experimental support for the equivalency theorem.

本文介绍了滑移转向车在无障碍物的硬地上的能量最优轨迹。我们获得了 29 种足以描述最小能量运动的轨迹结构，并对其进行了列举和几何描述；其中 28 种结构由圆弧和直线序列组成；还有一种特殊结构称为漩涡，由不同的圆弧组成。我们的分析表明，轨迹结构中的转弯（除漩涡外）都是具有特定转弯半径 R′的圆弧，R′是滑移转向车内轮不受控转弯的转弯半径。这项工作证明了其在能量优化路径规划中的极端重要性。一直以来，滑移转向车都缺乏能量最优轨迹生成的分析方法，而我们通过一种新颖的方法解决了这一问题。这项工作中提出的等价定理表明，无论是否施加速度限制，所有能量最小的解决方案都遵循相同的路径。这个非直观的结果源于这样一个事实，即在这个系统模型中，总能量完全由路径的几何参数决定。考虑到这种等效性，我们可以选择速度约束来强制执行恒定的功率消耗，从而将能量最优问题转化为等效的时间最优问题。然后就可以利用庞特里亚金最小原理来解决这个问题。相应地，极值路径被获得并枚举出来，从而找到能量最小的路径。此外，我们使用赫斯基 UGV 的实验结果为等效定理提供了实验支持。

{"title":"Energy-optimal trajectories for skid-steer rovers","authors":"M. Effati, Krzysztof Skonieczny, Devin J. Balkcom","doi":"10.1177/02783649231216499","DOIUrl":"https://doi.org/10.1177/02783649231216499","url":null,"abstract":"This paper presents the energy-optimal trajectories for skid-steer rovers on hard ground, without obstacles. We obtain 29 trajectory structures that are sufficient to describe minimum-energy motion, which are enumerated and described geometrically; 28 of these structures are composed of sequences of circular arcs and straight lines; there is also a special structure called whirls consisting of different circular arcs. Our analysis identifies that the turns in the trajectory structures (aside from whirls) are all circular arcs of a particular turning radius, R′, the turning radius at which the inner wheels of a skid-steer rover are not commanded to turn. This work demonstrates its paramount importance in energy-optimal path planning. There has been a lack of analytical energy-optimal trajectory generation for skid-steer rovers, and we address this problem by a novel approach. The equivalency theorem presented in this work shows that all minimum-energy solutions follow the same path irrespective of velocity constraints that may or may not be imposed. This non-intuitive result stems from the fact that with this model of the system the total energy is fully parameterized by the geometry of the path alone. With this equivalency in mind, one can choose velocity constraints to enforce constant power consumption, thus transforming the energy-optimal problem into an equivalent time-optimal problem. Pontryagin’s Minimum Principle can then be used to solve the problem. Accordingly, the extremal paths are obtained and enumerated to find the minimum-energy path. Furthermore, our experimental results by using Husky UGV provide the experimental support for the equivalency theorem.","PeriodicalId":501362,"journal":{"name":"The International Journal of Robotics Research","volume":" 33","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139141432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

CID-SIMS: Complex indoor dataset with semantic information and multi-sensor data from a ground wheeled robot viewpoint CID-SIMS：带语义信息的复杂室内数据集和地面轮式机器人视角的多传感器数据

The International Journal of Robotics Research

Pub Date : 2023-12-21 DOI: 10.1177/02783649231222507

Yidi Zhang, Ning An, Chenhui Shi, Shuo Wang, Hao Wei, Pengju Zhang, Xinrui Meng, Zengpeng Sun, Jinke Wang, Wenliang Liang, Fulin Tang, Yihong Wu

Simultaneous localization and mapping (SLAM) and 3D reconstruction have numerous applications for indoor ground wheeled robots such as floor sweeping and food delivery. To advance research in leveraging semantic information and multi-sensor data to enhance the performances of SLAM and 3D reconstruction in complex indoor scenes, we propose a novel and complex indoor dataset named CID-SIMS, where semantic annotated RGBD images, inertial measurement unit (IMU) measurements, and wheel odometer data are provided from a ground wheeled robot viewpoint. The dataset consists of 22 challenging sequences captured in nine different scenes including office building and apartment environments. Notably, our dataset achieves two significant breakthroughs. Firstly, semantic information and multi-sensor data are provided meanwhile for the first time. Secondly, GeoSLAM is utilized for the first time to generate ground truth trajectories and 3D point clouds within two-centimeter accuracy. With spatial-temporal synchronous ground truth trajectories and 3D point clouds, our dataset is capable of evaluating SLAM and 3D reconstruction algorithms in a unified global coordinate system. We evaluate state-of-the-art SLAM and 3D reconstruction approaches on our dataset, demonstrating that our benchmark is applicable. The dataset is publicly available on https://cid-sims.github.io .

同步定位与绘图（SLAM）和三维重建在室内地面轮式机器人（如扫地和送餐）中有着广泛的应用。为了推进利用语义信息和多传感器数据来提高复杂室内场景中 SLAM 和 3D 重建性能的研究，我们提出了一个名为 CID-SIMS 的新型复杂室内数据集，其中从地面轮式机器人的视角提供了语义注释的 RGBD 图像、惯性测量单元（IMU）测量值和车轮里程表数据。该数据集包括在办公楼和公寓环境等九种不同场景中捕获的 22 个具有挑战性的序列。值得注意的是，我们的数据集实现了两个重大突破。首先，首次同时提供了语义信息和多传感器数据。其次，首次利用 GeoSLAM 生成地面实况轨迹和三维点云，精度达到两厘米。有了时空同步的地面实况轨迹和三维点云，我们的数据集就能在统一的全球坐标系中评估 SLAM 和三维重建算法。我们在数据集上评估了最先进的 SLAM 和三维重建方法，证明我们的基准是适用的。该数据集可在 https://cid-sims.github.io 上公开获取。

{"title":"CID-SIMS: Complex indoor dataset with semantic information and multi-sensor data from a ground wheeled robot viewpoint","authors":"Yidi Zhang, Ning An, Chenhui Shi, Shuo Wang, Hao Wei, Pengju Zhang, Xinrui Meng, Zengpeng Sun, Jinke Wang, Wenliang Liang, Fulin Tang, Yihong Wu","doi":"10.1177/02783649231222507","DOIUrl":"https://doi.org/10.1177/02783649231222507","url":null,"abstract":"Simultaneous localization and mapping (SLAM) and 3D reconstruction have numerous applications for indoor ground wheeled robots such as floor sweeping and food delivery. To advance research in leveraging semantic information and multi-sensor data to enhance the performances of SLAM and 3D reconstruction in complex indoor scenes, we propose a novel and complex indoor dataset named CID-SIMS, where semantic annotated RGBD images, inertial measurement unit (IMU) measurements, and wheel odometer data are provided from a ground wheeled robot viewpoint. The dataset consists of 22 challenging sequences captured in nine different scenes including office building and apartment environments. Notably, our dataset achieves two significant breakthroughs. Firstly, semantic information and multi-sensor data are provided meanwhile for the first time. Secondly, GeoSLAM is utilized for the first time to generate ground truth trajectories and 3D point clouds within two-centimeter accuracy. With spatial-temporal synchronous ground truth trajectories and 3D point clouds, our dataset is capable of evaluating SLAM and 3D reconstruction algorithms in a unified global coordinate system. We evaluate state-of-the-art SLAM and 3D reconstruction approaches on our dataset, demonstrating that our benchmark is applicable. The dataset is publicly available on https://cid-sims.github.io .","PeriodicalId":501362,"journal":{"name":"The International Journal of Robotics Research","volume":"59 16","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138950921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0