To build general robotic agents that can operate in many environments, it is often useful for robots to collect experience in the real world. However, unguided experience collection is often not feasible due to safety, time, and hardware restrictions. We thus propose leveraging the next best thing as real world experience: videos of humans using their hands. To utilize these videos, we develop a method that retargets any 1st person or 3rd person video of human hands and arms into the robot hand and arm trajectories. While retargeting is a difficult problem, our key insight is to rely on only internet human hand video to train it. We use this method to present results in two areas: First, we build a system that enables any human to control a robot hand and arm, simply by demonstrating motions with their own hand. The robot observes the human operator via a single RGB camera and imitates their actions in real-time. This enables the robot to collect real-world experience safely using supervision. See these results at https://robotic-telekinesis.github.io . Second, we retarget in-the-wild human internet video into task-conditioned pseudo-robot trajectories to use as artificial robot experience. This learning algorithm leverages action priors from human hand actions, visual features from the images, and physical priors from dynamical systems to pretrain typical human behavior for a particular robot task. We show that by leveraging internet human hand experience, we need fewer robot demonstrations compared to many other methods. See these results at https://video-dex.github.io
{"title":"Learning dexterity from human hand motion in internet videos","authors":"Kenneth Shaw, Shikhar Bahl, Aravind Sivakumar, Aditya Kannan, Deepak Pathak","doi":"10.1177/02783649241227559","DOIUrl":"https://doi.org/10.1177/02783649241227559","url":null,"abstract":"To build general robotic agents that can operate in many environments, it is often useful for robots to collect experience in the real world. However, unguided experience collection is often not feasible due to safety, time, and hardware restrictions. We thus propose leveraging the next best thing as real world experience: videos of humans using their hands. To utilize these videos, we develop a method that retargets any 1st person or 3rd person video of human hands and arms into the robot hand and arm trajectories. While retargeting is a difficult problem, our key insight is to rely on only internet human hand video to train it. We use this method to present results in two areas: First, we build a system that enables any human to control a robot hand and arm, simply by demonstrating motions with their own hand. The robot observes the human operator via a single RGB camera and imitates their actions in real-time. This enables the robot to collect real-world experience safely using supervision. See these results at https://robotic-telekinesis.github.io . Second, we retarget in-the-wild human internet video into task-conditioned pseudo-robot trajectories to use as artificial robot experience. This learning algorithm leverages action priors from human hand actions, visual features from the images, and physical priors from dynamical systems to pretrain typical human behavior for a particular robot task. We show that by leveraging internet human hand experience, we need fewer robot demonstrations compared to many other methods. See these results at https://video-dex.github.io","PeriodicalId":501362,"journal":{"name":"The International Journal of Robotics Research","volume":"19 8","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139608207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-19DOI: 10.1177/02783649231225977
Junhong Xu, Kai Yin, Zheng Chen, Jason M Gregory, Ethan A Stump, Lantao Liu
We propose a diffusion approximation method to the continuous-state Markov decision processes that can be utilized to address autonomous navigation and control in unstructured off-road environments. In contrast to most decision-theoretic planning frameworks that assume fully known state transition models, we design a method that eliminates such a strong assumption that is often extremely difficult to engineer in reality. We first take the second-order Taylor expansion of the value function. The Bellman optimality equation is then approximated by a partial differential equation, which only relies on the first and second moments of the transition model. By combining the kernel representation of the value function, we design an efficient policy iteration algorithm whose policy evaluation step can be represented as a linear system of equations characterized by a finite set of supporting states. We first validate the proposed method through extensive simulations in 2 D obstacle avoidance and 2.5 D terrain navigation problems. The results show that the proposed approach leads to a much superior performance over several baselines. We then develop a system that integrates our decision-making framework with onboard perception and conduct real-world experiments in both cluttered indoor and unstructured outdoor environments. The results from the physical systems further demonstrate the applicability of our method in challenging real-world environments.
我们提出了一种连续状态马尔可夫决策过程的扩散近似方法,可用于解决非结构化越野环境中的自主导航和控制问题。与大多数假设状态转换模型完全已知的决策理论规划框架不同,我们设计的方法消除了这种在现实中通常极难设计的强烈假设。我们首先对价值函数进行二阶泰勒展开。然后用偏微分方程逼近贝尔曼最优方程,该方程只依赖于过渡模型的第一和第二矩。通过结合价值函数的核表示,我们设计出了一种高效的策略迭代算法,其策略评估步骤可以表示为一个线性方程组,其特征是支持状态的有限集合。我们首先在 2 D 避障和 2.5 D 地形导航问题中进行了大量模拟,验证了所提出的方法。结果表明,所提出的方法比几种基线方法性能优越得多。然后,我们开发了一个系统,将我们的决策框架与车载感知集成在一起,并在杂乱的室内和非结构化的室外环境中进行了实际实验。物理系统的结果进一步证明了我们的方法在具有挑战性的现实环境中的适用性。
{"title":"Kernel-based diffusion approximated Markov decision processes for autonomous navigation and control on unstructured terrains","authors":"Junhong Xu, Kai Yin, Zheng Chen, Jason M Gregory, Ethan A Stump, Lantao Liu","doi":"10.1177/02783649231225977","DOIUrl":"https://doi.org/10.1177/02783649231225977","url":null,"abstract":"We propose a diffusion approximation method to the continuous-state Markov decision processes that can be utilized to address autonomous navigation and control in unstructured off-road environments. In contrast to most decision-theoretic planning frameworks that assume fully known state transition models, we design a method that eliminates such a strong assumption that is often extremely difficult to engineer in reality. We first take the second-order Taylor expansion of the value function. The Bellman optimality equation is then approximated by a partial differential equation, which only relies on the first and second moments of the transition model. By combining the kernel representation of the value function, we design an efficient policy iteration algorithm whose policy evaluation step can be represented as a linear system of equations characterized by a finite set of supporting states. We first validate the proposed method through extensive simulations in 2 D obstacle avoidance and 2.5 D terrain navigation problems. The results show that the proposed approach leads to a much superior performance over several baselines. We then develop a system that integrates our decision-making framework with onboard perception and conduct real-world experiments in both cluttered indoor and unstructured outdoor environments. The results from the physical systems further demonstrate the applicability of our method in challenging real-world environments.","PeriodicalId":501362,"journal":{"name":"The International Journal of Robotics Research","volume":"93 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139612708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-18DOI: 10.1177/02783649241227448
Michael Halstead, Patrick Zimmer, Chris McCool
Automation in agriculture is a growing area of research with fundamental societal importance as farmers are expected to produce more and better crop with fewer resources. A key enabling factor is robotic vision techniques allowing us to sense and then interact with the environment. A limiting factor for these robotic vision systems is their cross-domain performance, that is, their ability to operate in a large range of environments. In this paper, we propose the use of auxiliary tasks to enhance cross-domain performance without the need for extra data. We perform experiments using four datasets (two in a glasshouse and two in arable farmland) for four cross-domain evaluations. These experiments demonstrate the effectiveness of our auxiliary tasks to improve network generalisability. In glasshouse experiments, our approach improves the panoptic quality of things from 10.4 to 18.5 and in arable farmland from 16.0 to 27.5; where a score of 100 is the best. To further evaluate the generalisability of our approach, we perform an ablation study using the large Crop and Weed dataset (CAW) where we improve cross-domain performance (panoptic quality of things) from 12.8 to 30.6 for the CAW dataset to our novel WeedAI dataset, and 21.2 to 36.0 from CAW to the other arable farmland dataset. Although our proposed approaches considerably improve cross-domain performance we still do not generally outperform in-domain trained systems. This highlights the potential room for improvement in this area and the importance of cross-domain research for robotic vision systems.
{"title":"A cross-domain challenge with panoptic segmentation in agriculture","authors":"Michael Halstead, Patrick Zimmer, Chris McCool","doi":"10.1177/02783649241227448","DOIUrl":"https://doi.org/10.1177/02783649241227448","url":null,"abstract":"Automation in agriculture is a growing area of research with fundamental societal importance as farmers are expected to produce more and better crop with fewer resources. A key enabling factor is robotic vision techniques allowing us to sense and then interact with the environment. A limiting factor for these robotic vision systems is their cross-domain performance, that is, their ability to operate in a large range of environments. In this paper, we propose the use of auxiliary tasks to enhance cross-domain performance without the need for extra data. We perform experiments using four datasets (two in a glasshouse and two in arable farmland) for four cross-domain evaluations. These experiments demonstrate the effectiveness of our auxiliary tasks to improve network generalisability. In glasshouse experiments, our approach improves the panoptic quality of things from 10.4 to 18.5 and in arable farmland from 16.0 to 27.5; where a score of 100 is the best. To further evaluate the generalisability of our approach, we perform an ablation study using the large Crop and Weed dataset (CAW) where we improve cross-domain performance (panoptic quality of things) from 12.8 to 30.6 for the CAW dataset to our novel WeedAI dataset, and 21.2 to 36.0 from CAW to the other arable farmland dataset. Although our proposed approaches considerably improve cross-domain performance we still do not generally outperform in-domain trained systems. This highlights the potential room for improvement in this area and the importance of cross-domain research for robotic vision systems.","PeriodicalId":501362,"journal":{"name":"The International Journal of Robotics Research","volume":"120 17","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139616240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ultrasound (US) imaging is widely used for biometric measurement and diagnosis of internal organs due to the advantages of being real-time and radiation-free. However, due to inter-operator variations, resulting images highly depend on the experience of sonographers. This work proposes an intelligent robotic sonographer to autonomously “explore” target anatomies and navigate a US probe to standard planes by learning from the expert. The underlying high-level physiological knowledge from experts is inferred by a neural reward function, using a ranked pairwise image comparison approach in a self-supervised fashion. This process can be referred to as understanding the “language of sonography.” Considering the generalization capability to overcome inter-patient variations, mutual information is estimated by a network to explicitly disentangle the task-related and domain features in latent space. The robotic localization is carried out in coarse-to-fine mode based on the predicted reward associated with B-mode images. To validate the effectiveness of the proposed reward inference network, representative experiments were performed on vascular phantoms (“line” target), two types of ex vivo animal organ phantoms (chicken heart and lamb kidney representing “point” target), and in vivo human carotids. To further validate the performance of the autonomous acquisition framework, physical robotic acquisitions were performed on three phantoms (vascular, chicken heart, and lamb kidney). The results demonstrated that the proposed advanced framework can robustly work on a variety of seen and unseen phantoms as well as in vivo human carotid data. Code: https://github.com/yuan-12138/MI-GPSR . Video: https://youtu.be/u4ThAA9onE0 .
由于超声(US)成像具有实时和无辐射的优点,因此被广泛用于内脏器官的生物测量和诊断。然而,由于操作员之间的差异,成像结果在很大程度上取决于超声技师的经验。这项工作提出了一种智能机器人超声技师,通过向专家学习,自主 "探索 "目标解剖结构,并将 US 探头导航到标准平面。专家提供的基本高级生理知识是通过神经奖励函数,以自我监督的方式使用排序配对图像比较方法推断出来的。这一过程可称为理解 "超声语言"。考虑到克服患者间差异的泛化能力,通过网络估算互信息,明确地将潜在空间中与任务相关的特征和领域特征区分开来。根据与 B 型图像相关的预测奖励,以从粗到细的模式进行机器人定位。为了验证所提出的奖励推理网络的有效性,在血管模型("线 "目标)、两种活体动物器官模型(代表 "点 "目标的鸡心和羊肾模型)和活体人体颈动脉上进行了代表性实验。为了进一步验证自主采集框架的性能,对三个模型(血管、鸡心和羊肾)进行了物理机器人采集。结果表明,所提出的先进框架能在各种可见和未知模型以及活体人体颈动脉数据上稳健工作。代码: https://github.com/yuan-12138/MI-GPSR 。视频: https://youtu.be/u4ThAA9onE0 。
{"title":"Intelligent robotic sonographer: Mutual information-based disentangled reward learning from few demonstrations","authors":"Zhongliang Jiang, Yuan Bi, Mingchuan Zhou, Ying Hu, Michael Burke, Nassir Navab","doi":"10.1177/02783649231223547","DOIUrl":"https://doi.org/10.1177/02783649231223547","url":null,"abstract":"Ultrasound (US) imaging is widely used for biometric measurement and diagnosis of internal organs due to the advantages of being real-time and radiation-free. However, due to inter-operator variations, resulting images highly depend on the experience of sonographers. This work proposes an intelligent robotic sonographer to autonomously “explore” target anatomies and navigate a US probe to standard planes by learning from the expert. The underlying high-level physiological knowledge from experts is inferred by a neural reward function, using a ranked pairwise image comparison approach in a self-supervised fashion. This process can be referred to as understanding the “language of sonography.” Considering the generalization capability to overcome inter-patient variations, mutual information is estimated by a network to explicitly disentangle the task-related and domain features in latent space. The robotic localization is carried out in coarse-to-fine mode based on the predicted reward associated with B-mode images. To validate the effectiveness of the proposed reward inference network, representative experiments were performed on vascular phantoms (“line” target), two types of ex vivo animal organ phantoms (chicken heart and lamb kidney representing “point” target), and in vivo human carotids. To further validate the performance of the autonomous acquisition framework, physical robotic acquisitions were performed on three phantoms (vascular, chicken heart, and lamb kidney). The results demonstrated that the proposed advanced framework can robustly work on a variety of seen and unseen phantoms as well as in vivo human carotid data. Code: https://github.com/yuan-12138/MI-GPSR . Video: https://youtu.be/u4ThAA9onE0 .","PeriodicalId":501362,"journal":{"name":"The International Journal of Robotics Research","volume":"19 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139945629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-10DOI: 10.1177/02783649231225474
Mitchell Jones, Maximilian Haas-Heger, Jur van den Berg
We present an algorithm that, given a representation of a road network in lane-level detail, computes a route that minimizes the expected cost to reach a given destination. In doing so, our algorithm allows us to solve for the complex trade-offs encountered when trying to decide not just which roads to follow, but also when to change between the lanes making up these roads, in order to—for example—reduce the likelihood of missing a left exit while not unnecessarily driving in the leftmost lane. This routing problem can naturally be formulated as a Markov Decision Process (MDP), in which lane change actions have stochastic outcomes. However, MDPs are known to be time-consuming to solve in general. In this paper, we show that—under reasonable assumptions—we can use a Dijkstra-like approach to solve this stochastic problem, and benefit from its efficient O( n log n) running time. This enables an autonomous vehicle to exhibit lane-selection behavior as it efficiently plans an optimal route to its destination.
{"title":"Lane-level route planning for autonomous vehicles","authors":"Mitchell Jones, Maximilian Haas-Heger, Jur van den Berg","doi":"10.1177/02783649231225474","DOIUrl":"https://doi.org/10.1177/02783649231225474","url":null,"abstract":"We present an algorithm that, given a representation of a road network in lane-level detail, computes a route that minimizes the expected cost to reach a given destination. In doing so, our algorithm allows us to solve for the complex trade-offs encountered when trying to decide not just which roads to follow, but also when to change between the lanes making up these roads, in order to—for example—reduce the likelihood of missing a left exit while not unnecessarily driving in the leftmost lane. This routing problem can naturally be formulated as a Markov Decision Process (MDP), in which lane change actions have stochastic outcomes. However, MDPs are known to be time-consuming to solve in general. In this paper, we show that—under reasonable assumptions—we can use a Dijkstra-like approach to solve this stochastic problem, and benefit from its efficient O( n log n) running time. This enables an autonomous vehicle to exhibit lane-selection behavior as it efficiently plans an optimal route to its destination.","PeriodicalId":501362,"journal":{"name":"The International Journal of Robotics Research","volume":"31 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139945625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-06DOI: 10.1177/02783649231225775
Haojie Huang, Dian Wang, Arsh Tangri, Robin Walters, Robert Platt
Robotic pick and place tasks are symmetric under translations and rotations of both the object to be picked and the desired place pose. For example, if the pick object is rotated or translated, then the optimal pick action should also rotate or translate. The same is true for the place pose; if the desired place pose changes, then the place action should also transform accordingly. A recently proposed pick and place framework known as Transporter Net (Zeng, Florence, Tompson, Welker, Chien, Attarian, Armstrong, Krasin, Duong, Sindhwani et al., 2021) captures some of these symmetries, but not all. This paper analytically studies the symmetries present in planar robotic pick and place and proposes a method of incorporating equivariant neural models into Transporter Net in a way that captures all symmetries. The new model, which we call Equivariant Transporter Net, is equivariant to both pick and place symmetries and can immediately generalize pick and place knowledge to different pick and place poses. We evaluate the new model empirically and show that it is much more sample-efficient than the non-symmetric version, resulting in a system that can imitate demonstrated pick and place behavior using very few human demonstrations on a variety of imitation learning tasks.
机器人拾取和放置任务在待拾取物体和所需放置姿势的平移和旋转下是对称的。例如,如果拾取对象旋转或平移,那么最佳拾取动作也应旋转或平移。摆放姿势也是如此;如果所需的摆放姿势发生了变化,那么摆放动作也应进行相应的变换。最近提出的一种名为 Transporter Net 的取放框架(Zeng、Florence、Tompson、Welker、Chien、Attarian、Armstrong、Krasin、Duong、Sindhwani 等人,2021 年)捕捉到了其中的一些对称性,但并非全部。本文分析研究了平面机器人拾放中存在的对称性,并提出了一种将等变神经模型纳入 Transporter Net 的方法,这种方法能捕捉到所有对称性。我们称之为等变传输网的新模型对取放对称性都具有等变性,并能立即将取放知识推广到不同的取放姿势。我们对新模型进行了实证评估,结果表明它比非对称性版本更节省样本,因此在各种模仿学习任务中,只需极少的人类示范,系统就能模仿出已演示过的拾放行为。
{"title":"Leveraging symmetries in pick and place","authors":"Haojie Huang, Dian Wang, Arsh Tangri, Robin Walters, Robert Platt","doi":"10.1177/02783649231225775","DOIUrl":"https://doi.org/10.1177/02783649231225775","url":null,"abstract":"Robotic pick and place tasks are symmetric under translations and rotations of both the object to be picked and the desired place pose. For example, if the pick object is rotated or translated, then the optimal pick action should also rotate or translate. The same is true for the place pose; if the desired place pose changes, then the place action should also transform accordingly. A recently proposed pick and place framework known as Transporter Net (Zeng, Florence, Tompson, Welker, Chien, Attarian, Armstrong, Krasin, Duong, Sindhwani et al., 2021) captures some of these symmetries, but not all. This paper analytically studies the symmetries present in planar robotic pick and place and proposes a method of incorporating equivariant neural models into Transporter Net in a way that captures all symmetries. The new model, which we call Equivariant Transporter Net, is equivariant to both pick and place symmetries and can immediately generalize pick and place knowledge to different pick and place poses. We evaluate the new model empirically and show that it is much more sample-efficient than the non-symmetric version, resulting in a system that can imitate demonstrated pick and place behavior using very few human demonstrations on a variety of imitation learning tasks.","PeriodicalId":501362,"journal":{"name":"The International Journal of Robotics Research","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139945704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-06DOI: 10.1177/02783649231225243
Sriram Siva, Maggie Wigness, John G. Rogers, Long Quang, Hao Zhang
Ground robots require the crucial capability of traversing unstructured and unprepared terrains and avoiding obstacles to complete tasks in real-world robotics applications such as disaster response. When a robot operates in off-road field environments such as forests, the robot’s actual behaviors often do not match its expected or planned behaviors, due to changes in the characteristics of terrains and the robot itself. Therefore, the capability of robot adaptation for consistent behavior generation is essential for maneuverability on unstructured off-road terrains. In order to address the challenge, we propose a novel method of self-reflective terrain-aware adaptation for ground robots to generate consistent controls to navigate over unstructured off-road terrains, which enables robots to more accurately execute the expected behaviors through robot self-reflection while adapting to varying unstructured terrains. To evaluate our method’s performance, we conduct extensive experiments using real ground robots with various functionality changes over diverse unstructured off-road terrains. The comprehensive experimental results have shown that our self-reflective terrain-aware adaptation method enables ground robots to generate consistent navigational behaviors and outperforms the compared previous and baseline techniques.
{"title":"Self-reflective terrain-aware robot adaptation for consistent off-road ground navigation","authors":"Sriram Siva, Maggie Wigness, John G. Rogers, Long Quang, Hao Zhang","doi":"10.1177/02783649231225243","DOIUrl":"https://doi.org/10.1177/02783649231225243","url":null,"abstract":"Ground robots require the crucial capability of traversing unstructured and unprepared terrains and avoiding obstacles to complete tasks in real-world robotics applications such as disaster response. When a robot operates in off-road field environments such as forests, the robot’s actual behaviors often do not match its expected or planned behaviors, due to changes in the characteristics of terrains and the robot itself. Therefore, the capability of robot adaptation for consistent behavior generation is essential for maneuverability on unstructured off-road terrains. In order to address the challenge, we propose a novel method of self-reflective terrain-aware adaptation for ground robots to generate consistent controls to navigate over unstructured off-road terrains, which enables robots to more accurately execute the expected behaviors through robot self-reflection while adapting to varying unstructured terrains. To evaluate our method’s performance, we conduct extensive experiments using real ground robots with various functionality changes over diverse unstructured off-road terrains. The comprehensive experimental results have shown that our self-reflective terrain-aware adaptation method enables ground robots to generate consistent navigational behaviors and outperforms the compared previous and baseline techniques.","PeriodicalId":501362,"journal":{"name":"The International Journal of Robotics Research","volume":"68 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139945626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-02DOI: 10.1177/02783649231224053
Gabriel B. Margolis, Ge Yang, Kartik Paigwar, Tao Chen, Pulkit Agrawal
Agile maneuvers such as sprinting and high-speed turning in the wild are challenging for legged robots. We present an end-to-end learned controller that achieves record agility for the MIT Mini Cheetah, sustaining speeds up to 3.9 m/s. This system runs and turns fast on natural terrains like grass, ice, and gravel and responds robustly to disturbances. Our controller is a neural network trained in simulation via reinforcement learning and transferred to the real world. The two key components are (i) an adaptive curriculum on velocity commands and (ii) an online system identification strategy for sim-to-real transfer. Videos of the robot’s behaviors are available at https://agility.csail.mit.edu/ .
{"title":"Rapid locomotion via reinforcement learning","authors":"Gabriel B. Margolis, Ge Yang, Kartik Paigwar, Tao Chen, Pulkit Agrawal","doi":"10.1177/02783649231224053","DOIUrl":"https://doi.org/10.1177/02783649231224053","url":null,"abstract":"Agile maneuvers such as sprinting and high-speed turning in the wild are challenging for legged robots. We present an end-to-end learned controller that achieves record agility for the MIT Mini Cheetah, sustaining speeds up to 3.9 m/s. This system runs and turns fast on natural terrains like grass, ice, and gravel and responds robustly to disturbances. Our controller is a neural network trained in simulation via reinforcement learning and transferred to the real world. The two key components are (i) an adaptive curriculum on velocity commands and (ii) an online system identification strategy for sim-to-real transfer. Videos of the robot’s behaviors are available at https://agility.csail.mit.edu/ .","PeriodicalId":501362,"journal":{"name":"The International Journal of Robotics Research","volume":"134 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139945624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-30DOI: 10.1177/02783649231216499
M. Effati, Krzysztof Skonieczny, Devin J. Balkcom
This paper presents the energy-optimal trajectories for skid-steer rovers on hard ground, without obstacles. We obtain 29 trajectory structures that are sufficient to describe minimum-energy motion, which are enumerated and described geometrically; 28 of these structures are composed of sequences of circular arcs and straight lines; there is also a special structure called whirls consisting of different circular arcs. Our analysis identifies that the turns in the trajectory structures (aside from whirls) are all circular arcs of a particular turning radius, R′, the turning radius at which the inner wheels of a skid-steer rover are not commanded to turn. This work demonstrates its paramount importance in energy-optimal path planning. There has been a lack of analytical energy-optimal trajectory generation for skid-steer rovers, and we address this problem by a novel approach. The equivalency theorem presented in this work shows that all minimum-energy solutions follow the same path irrespective of velocity constraints that may or may not be imposed. This non-intuitive result stems from the fact that with this model of the system the total energy is fully parameterized by the geometry of the path alone. With this equivalency in mind, one can choose velocity constraints to enforce constant power consumption, thus transforming the energy-optimal problem into an equivalent time-optimal problem. Pontryagin’s Minimum Principle can then be used to solve the problem. Accordingly, the extremal paths are obtained and enumerated to find the minimum-energy path. Furthermore, our experimental results by using Husky UGV provide the experimental support for the equivalency theorem.
{"title":"Energy-optimal trajectories for skid-steer rovers","authors":"M. Effati, Krzysztof Skonieczny, Devin J. Balkcom","doi":"10.1177/02783649231216499","DOIUrl":"https://doi.org/10.1177/02783649231216499","url":null,"abstract":"This paper presents the energy-optimal trajectories for skid-steer rovers on hard ground, without obstacles. We obtain 29 trajectory structures that are sufficient to describe minimum-energy motion, which are enumerated and described geometrically; 28 of these structures are composed of sequences of circular arcs and straight lines; there is also a special structure called whirls consisting of different circular arcs. Our analysis identifies that the turns in the trajectory structures (aside from whirls) are all circular arcs of a particular turning radius, R′, the turning radius at which the inner wheels of a skid-steer rover are not commanded to turn. This work demonstrates its paramount importance in energy-optimal path planning. There has been a lack of analytical energy-optimal trajectory generation for skid-steer rovers, and we address this problem by a novel approach. The equivalency theorem presented in this work shows that all minimum-energy solutions follow the same path irrespective of velocity constraints that may or may not be imposed. This non-intuitive result stems from the fact that with this model of the system the total energy is fully parameterized by the geometry of the path alone. With this equivalency in mind, one can choose velocity constraints to enforce constant power consumption, thus transforming the energy-optimal problem into an equivalent time-optimal problem. Pontryagin’s Minimum Principle can then be used to solve the problem. Accordingly, the extremal paths are obtained and enumerated to find the minimum-energy path. Furthermore, our experimental results by using Husky UGV provide the experimental support for the equivalency theorem.","PeriodicalId":501362,"journal":{"name":"The International Journal of Robotics Research","volume":" 33","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139141432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Simultaneous localization and mapping (SLAM) and 3D reconstruction have numerous applications for indoor ground wheeled robots such as floor sweeping and food delivery. To advance research in leveraging semantic information and multi-sensor data to enhance the performances of SLAM and 3D reconstruction in complex indoor scenes, we propose a novel and complex indoor dataset named CID-SIMS, where semantic annotated RGBD images, inertial measurement unit (IMU) measurements, and wheel odometer data are provided from a ground wheeled robot viewpoint. The dataset consists of 22 challenging sequences captured in nine different scenes including office building and apartment environments. Notably, our dataset achieves two significant breakthroughs. Firstly, semantic information and multi-sensor data are provided meanwhile for the first time. Secondly, GeoSLAM is utilized for the first time to generate ground truth trajectories and 3D point clouds within two-centimeter accuracy. With spatial-temporal synchronous ground truth trajectories and 3D point clouds, our dataset is capable of evaluating SLAM and 3D reconstruction algorithms in a unified global coordinate system. We evaluate state-of-the-art SLAM and 3D reconstruction approaches on our dataset, demonstrating that our benchmark is applicable. The dataset is publicly available on https://cid-sims.github.io .
同步定位与绘图(SLAM)和三维重建在室内地面轮式机器人(如扫地和送餐)中有着广泛的应用。为了推进利用语义信息和多传感器数据来提高复杂室内场景中 SLAM 和 3D 重建性能的研究,我们提出了一个名为 CID-SIMS 的新型复杂室内数据集,其中从地面轮式机器人的视角提供了语义注释的 RGBD 图像、惯性测量单元(IMU)测量值和车轮里程表数据。该数据集包括在办公楼和公寓环境等九种不同场景中捕获的 22 个具有挑战性的序列。值得注意的是,我们的数据集实现了两个重大突破。首先,首次同时提供了语义信息和多传感器数据。其次,首次利用 GeoSLAM 生成地面实况轨迹和三维点云,精度达到两厘米。有了时空同步的地面实况轨迹和三维点云,我们的数据集就能在统一的全球坐标系中评估 SLAM 和三维重建算法。我们在数据集上评估了最先进的 SLAM 和三维重建方法,证明我们的基准是适用的。该数据集可在 https://cid-sims.github.io 上公开获取。
{"title":"CID-SIMS: Complex indoor dataset with semantic information and multi-sensor data from a ground wheeled robot viewpoint","authors":"Yidi Zhang, Ning An, Chenhui Shi, Shuo Wang, Hao Wei, Pengju Zhang, Xinrui Meng, Zengpeng Sun, Jinke Wang, Wenliang Liang, Fulin Tang, Yihong Wu","doi":"10.1177/02783649231222507","DOIUrl":"https://doi.org/10.1177/02783649231222507","url":null,"abstract":"Simultaneous localization and mapping (SLAM) and 3D reconstruction have numerous applications for indoor ground wheeled robots such as floor sweeping and food delivery. To advance research in leveraging semantic information and multi-sensor data to enhance the performances of SLAM and 3D reconstruction in complex indoor scenes, we propose a novel and complex indoor dataset named CID-SIMS, where semantic annotated RGBD images, inertial measurement unit (IMU) measurements, and wheel odometer data are provided from a ground wheeled robot viewpoint. The dataset consists of 22 challenging sequences captured in nine different scenes including office building and apartment environments. Notably, our dataset achieves two significant breakthroughs. Firstly, semantic information and multi-sensor data are provided meanwhile for the first time. Secondly, GeoSLAM is utilized for the first time to generate ground truth trajectories and 3D point clouds within two-centimeter accuracy. With spatial-temporal synchronous ground truth trajectories and 3D point clouds, our dataset is capable of evaluating SLAM and 3D reconstruction algorithms in a unified global coordinate system. We evaluate state-of-the-art SLAM and 3D reconstruction approaches on our dataset, demonstrating that our benchmark is applicable. The dataset is publicly available on https://cid-sims.github.io .","PeriodicalId":501362,"journal":{"name":"The International Journal of Robotics Research","volume":"59 16","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138950921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}