Pub Date : 2023-10-23DOI: 10.1007/s10514-023-10132-6
Amine Elhafsi, Rohan Sinha, Christopher Agia, Edward Schmerling, Issa A. D. Nesnas, Marco Pavone
As robots acquire increasingly sophisticated skills and see increasingly complex and varied environments, the threat of an edge case or anomalous failure is ever present. For example, Tesla cars have seen interesting failure modes ranging from autopilot disengagements due to inactive traffic lights carried by trucks to phantom braking caused by images of stop signs on roadside billboards. These system-level failures are not due to failures of any individual component of the autonomy stack but rather system-level deficiencies in semantic reasoning. Such edge cases, which we call semantic anomalies, are simple for a human to disentangle yet require insightful reasoning. To this end, we study the application of large language models (LLMs), endowed with broad contextual understanding and reasoning capabilities, to recognize such edge cases and introduce a monitoring framework for semantic anomaly detection in vision-based policies. Our experiments apply this framework to a finite state machine policy for autonomous driving and a learned policy for object manipulation. These experiments demonstrate that the LLM-based monitor can effectively identify semantic anomalies in a manner that shows agreement with human reasoning. Finally, we provide an extended discussion on the strengths and weaknesses of this approach and motivate a research outlook on how we can further use foundation models for semantic anomaly detection. Our project webpage can be found at https://sites.google.com/view/llm-anomaly-detection.
{"title":"Semantic anomaly detection with large language models","authors":"Amine Elhafsi, Rohan Sinha, Christopher Agia, Edward Schmerling, Issa A. D. Nesnas, Marco Pavone","doi":"10.1007/s10514-023-10132-6","DOIUrl":"10.1007/s10514-023-10132-6","url":null,"abstract":"<div><p>As robots acquire increasingly sophisticated skills and see increasingly complex and varied environments, the threat of an edge case or anomalous failure is ever present. For example, Tesla cars have seen interesting failure modes ranging from autopilot disengagements due to inactive traffic lights carried by trucks to phantom braking caused by images of stop signs on roadside billboards. These system-level failures are not due to failures of any individual component of the autonomy stack but rather system-level deficiencies in semantic reasoning. Such edge cases, which we call <i>semantic anomalies</i>, are simple for a human to disentangle yet require insightful reasoning. To this end, we study the application of large language models (LLMs), endowed with broad contextual understanding and reasoning capabilities, to recognize such edge cases and introduce a monitoring framework for semantic anomaly detection in vision-based policies. Our experiments apply this framework to a finite state machine policy for autonomous driving and a learned policy for object manipulation. These experiments demonstrate that the LLM-based monitor can effectively identify semantic anomalies in a manner that shows agreement with human reasoning. Finally, we provide an extended discussion on the strengths and weaknesses of this approach and motivate a research outlook on how we can further use foundation models for semantic anomaly detection. Our project webpage can be found at https://sites.google.com/view/llm-anomaly-detection. \u0000</p></div>","PeriodicalId":55409,"journal":{"name":"Autonomous Robots","volume":"47 8","pages":"1035 - 1055"},"PeriodicalIF":3.5,"publicationDate":"2023-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135322901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-21DOI: 10.1007/s10514-023-10143-3
Kal Backman, Dana Kulić, Hoam Chung
Novice pilots find it difficult to operate and land unmanned aerial vehicles (UAVs), due to the complex UAV dynamics, challenges in depth perception, lack of expertise with the control interface and additional disturbances from the ground effect. Therefore we propose a shared autonomy approach to assist pilots in safely landing a UAV under conditions where depth perception is difficult and safe landing zones are limited. Our approach is comprised of two modules: a perception module that encodes information onto a compressed latent representation using two RGB-D cameras and a policy module that is trained with the reinforcement learning algorithm TD3 to discern the pilot’s intent and to provide control inputs that augment the user’s input to safely land the UAV. The policy module is trained in simulation using a population of simulated users. Simulated users are sampled from a parametric model with four parameters, which model a pilot’s tendency to conform to the assistant, proficiency, aggressiveness and speed. We conduct a user study ((n=28)) where human participants were tasked with landing a physical UAV on one of several platforms under challenging viewing conditions. The assistant, trained with only simulated user data, improved task success rate from 51.4 to 98.2% despite being unaware of the human participants’ goal or the structure of the environment a priori. With the proposed assistant, regardless of prior piloting experience, participants performed with a proficiency greater than the most experienced unassisted participants.
由于复杂的无人机动力学,深度感知的挑战,缺乏控制界面的专业知识以及来自地面效应的额外干扰,新手飞行员发现很难操作和降落无人机(UAV)。因此,我们提出了一种共享自主方法来帮助飞行员在深度感知困难和安全着陆区域有限的情况下安全着陆无人机。我们的方法由两个模块组成:一个感知模块,使用两个RGB-D相机将信息编码到压缩的潜在表示中;一个策略模块,使用强化学习算法TD3进行训练,以识别飞行员的意图,并提供控制输入,增加用户的输入以安全降落无人机。策略模块在模拟中使用一组模拟用户进行训练。模拟用户从一个参数模型中抽样,该模型有四个参数,分别模拟飞行员的服从倾向、熟练程度、侵略性和速度。我们进行了一项用户研究((n=28)),其中人类参与者的任务是在具有挑战性的观看条件下将实体无人机降落在几个平台之一上。仅使用模拟用户数据进行训练的助手将任务成功率从51.4提高到98.2% despite being unaware of the human participants’ goal or the structure of the environment a priori. With the proposed assistant, regardless of prior piloting experience, participants performed with a proficiency greater than the most experienced unassisted participants.
{"title":"Reinforcement learning for shared autonomy drone landings","authors":"Kal Backman, Dana Kulić, Hoam Chung","doi":"10.1007/s10514-023-10143-3","DOIUrl":"10.1007/s10514-023-10143-3","url":null,"abstract":"<div><p>Novice pilots find it difficult to operate and land unmanned aerial vehicles (UAVs), due to the complex UAV dynamics, challenges in depth perception, lack of expertise with the control interface and additional disturbances from the ground effect. Therefore we propose a shared autonomy approach to assist pilots in safely landing a UAV under conditions where depth perception is difficult and safe landing zones are limited. Our approach is comprised of two modules: a perception module that encodes information onto a compressed latent representation using two RGB-D cameras and a policy module that is trained with the reinforcement learning algorithm TD3 to discern the pilot’s intent and to provide control inputs that augment the user’s input to safely land the UAV. The policy module is trained in simulation using a population of simulated users. Simulated users are sampled from a parametric model with four parameters, which model a pilot’s tendency to conform to the assistant, proficiency, aggressiveness and speed. We conduct a user study (<span>(n=28)</span>) where human participants were tasked with landing a physical UAV on one of several platforms under challenging viewing conditions. The assistant, trained with only simulated user data, improved task success rate from 51.4 to 98.2% despite being unaware of the human participants’ goal or the structure of the environment a priori. With the proposed assistant, regardless of prior piloting experience, participants performed with a proficiency greater than the most experienced unassisted participants.\u0000</p></div>","PeriodicalId":55409,"journal":{"name":"Autonomous Robots","volume":"47 8","pages":"1419 - 1438"},"PeriodicalIF":3.5,"publicationDate":"2023-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10514-023-10143-3.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135510764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-20DOI: 10.1007/s10514-023-10149-x
Saran Khaliq, Muhammad Latif Anjum, Wajahat Hussain, Muhammad Uzair Khattak, Momen Rasool
We analyse, for the first time, the popular loop closing module of a well known and widely used open-source visual SLAM (ORB-SLAM) pipeline. Investigating failures in the loop closure module of visual SLAM is challenging since it consists of multiple building blocks. Our meticulous investigations have revealed a few interesting findings. Contrary to reported results, ORB-SLAM frequently misses large fraction of loop closures on public (KITTI, TUM RGB-D) datasets. One common assumption is, in such scenarios, the visual place recognition (vPR) block of the loop closure module is unable to find a suitable match due to extreme conditions (dynamic scene, viewpoint/scale changes). We report that native vPR of ORB-SLAM is not the sole reason for these failures. Although recent deep vPR alternatives achieve impressive matching performance, replacing native vPR with these deep alternatives will only partially improve loop closure performance of visual SLAM. Our findings suggest that the problem lies with the subsequent relative pose estimation module between the matching pair. ORB-SLAM3 has improved the recall of the original loop closing module. However, even in ORB-SLAM3, the loop closing module is the major reason behind loop closing failures. Surprisingly, using off-the-shelf ORB and SIFT based relative pose estimators (non real-time) manages to close most of the loops missed by ORB-SLAM. This significant performance gap between the two available methods suggests that ORB-SLAM’s pipeline can be further matured by focusing on the relative pose estimators, to improve loop closure performance, rather than investing more resources on improving vPR. We also evaluate deep alternatives for relative pose estimation in the context of loop closures. Interestingly, the performance of deep relocalization methods (e.g. MapNet) is worse than classic methods even in loop closures scenarios. This finding further supports the fundamental limitation of deep relocalization methods recently diagnosed. Finally, we expose bias in well-known public dataset (KITTI) due to which these commonly occurring failures have eluded the community. We augment the KITTI dataset with detailed loop closing labels. In order to compensate for the bias in the public datasets, we provide a challenging loop closure dataset which contains challenging yet commonly occurring indoor navigation scenarios with loop closures. We hope our findings and the accompanying dataset will help the community in further improving the popular ORB-SLAM’s pipeline.
我们首次分析了一个广为人知且广泛使用的开源可视化SLAM (ORB-SLAM)管道的流行闭环模块。由于可视化SLAM的闭环模块包含多个构建块,因此对其故障进行调查具有挑战性。我们细致的调查揭示了一些有趣的发现。与报道的结果相反,ORB-SLAM经常错过公共(KITTI, TUM RGB-D)数据集的大部分循环闭包。一个常见的假设是,在这种情况下,由于极端条件(动态场景,视点/尺度变化),闭环模块的视觉位置识别(vPR)块无法找到合适的匹配。我们报告ORB-SLAM的原生vPR不是导致这些失败的唯一原因。尽管最近的深度vPR替代品取得了令人印象深刻的匹配性能,但用这些深度替代品取代原生vPR只能部分提高视觉SLAM的闭环性能。我们的研究结果表明,问题在于匹配对之间的后续相对姿态估计模块。ORB-SLAM3改进了原回路关闭模块的召回。然而,即使在ORB-SLAM3中,循环关闭模块也是导致循环关闭失败的主要原因。令人惊讶的是,使用现成的ORB和基于SIFT的相对姿态估计器(非实时)可以关闭ORB- slam错过的大部分循环。两种可用方法之间的显著性能差距表明,ORB-SLAM的管道可以通过关注相对姿态估计器来进一步成熟,以提高环路闭合性能,而不是投入更多资源来提高vPR。我们还评估了在闭环环境中相对姿态估计的深度替代方法。有趣的是,即使在循环闭包场景下,深度重定位方法(例如MapNet)的性能也比经典方法差。这一发现进一步支持了最近诊断出的深度定位方法的基本局限性。最后,我们揭露了众所周知的公共数据集(KITTI)中的偏见,由于这些常见的故障已经避开了社区。我们用详细的循环结束标签来增强KITTI数据集。为了弥补公共数据集中的偏差,我们提供了一个具有挑战性的闭环数据集,其中包含具有挑战性但通常发生的室内导航场景。我们希望我们的发现和随附的数据集能够帮助社区进一步改进流行的ORB-SLAM管道。
{"title":"Why ORB-SLAM is missing commonly occurring loop closures?","authors":"Saran Khaliq, Muhammad Latif Anjum, Wajahat Hussain, Muhammad Uzair Khattak, Momen Rasool","doi":"10.1007/s10514-023-10149-x","DOIUrl":"10.1007/s10514-023-10149-x","url":null,"abstract":"<div><p>We analyse, for the first time, the popular loop closing module of a well known and widely used open-source visual SLAM (ORB-SLAM) pipeline. Investigating failures in the loop closure module of visual SLAM is challenging since it consists of multiple building blocks. Our meticulous investigations have revealed a few interesting findings. Contrary to reported results, ORB-SLAM frequently misses large fraction of loop closures on public (KITTI, TUM RGB-D) datasets. One common assumption is, in such scenarios, the visual place recognition (vPR) block of the loop closure module is unable to find a suitable match due to extreme conditions (dynamic scene, viewpoint/scale changes). We report that native vPR of ORB-SLAM is not the sole reason for these failures. Although recent deep vPR alternatives achieve impressive matching performance, replacing native vPR with these deep alternatives will only partially improve loop closure performance of visual SLAM. Our findings suggest that the problem lies with the subsequent relative pose estimation module between the matching pair. ORB-SLAM3 has improved the recall of the original loop closing module. However, even in ORB-SLAM3, the loop closing module is the major reason behind loop closing failures. Surprisingly, using <i>off-the-shelf</i> ORB and SIFT based relative pose estimators (non real-time) manages to close most of the loops missed by ORB-SLAM. This significant performance gap between the two available methods suggests that ORB-SLAM’s pipeline can be further matured by focusing on the relative pose estimators, to improve loop closure performance, rather than investing more resources on improving vPR. We also evaluate deep alternatives for relative pose estimation in the context of loop closures. Interestingly, the performance of deep relocalization methods (e.g. MapNet) is worse than classic methods even in loop closures scenarios. This finding further supports the fundamental limitation of deep relocalization methods recently diagnosed. Finally, we expose bias in well-known public dataset (KITTI) due to which these commonly occurring failures have eluded the community. We augment the KITTI dataset with detailed loop closing labels. In order to compensate for the bias in the public datasets, we provide a challenging loop closure dataset which contains challenging yet commonly occurring indoor navigation scenarios with loop closures. We hope our findings and the accompanying dataset will help the community in further improving the popular ORB-SLAM’s pipeline.</p></div>","PeriodicalId":55409,"journal":{"name":"Autonomous Robots","volume":"47 8","pages":"1519 - 1535"},"PeriodicalIF":3.5,"publicationDate":"2023-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135569276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-17DOI: 10.1007/s10514-023-10140-6
Hao Ma, Dieter Büchler, Bernhard Schölkopf, Michael Muehlebach
We rethink the traditional reinforcement learning approach, which is based on optimizing over feedback policies, and propose a new framework that optimizes over feedforward inputs instead. This not only mitigates the risk of destabilizing the system during training but also reduces the bulk of the learning to a supervised learning task. As a result, efficient and well-understood supervised learning techniques can be applied and are tuned using a validation data set. The labels are generated with a variant of iterative learning control, which also includes prior knowledge about the underlying dynamics. Our framework is applied for intercepting and returning ping-pong balls that are played to a four-degrees-of-freedom robotic arm in real-world experiments. The robot arm is driven by pneumatic artificial muscles, which makes the control and learning tasks challenging. We highlight the potential of our framework by comparing it to a reinforcement learning approach that optimizes over feedback policies. We find that our framework achieves a higher success rate for the returns ((100%) vs. (96%), on 107 consecutive trials, see https://youtu.be/kR9jowEH7PY) while requiring only about one tenth of the samples during training. We also find that our approach is able to deal with a variant of different incoming trajectories.
我们重新思考了传统的基于反馈策略优化的强化学习方法,并提出了一个新的框架来优化前馈输入。这不仅降低了在训练期间系统不稳定的风险,而且还减少了大量的学习到监督学习任务。因此,可以应用高效且易于理解的监督学习技术,并使用验证数据集进行调优。标签是用迭代学习控制的一种变体生成的,它还包括关于潜在动力学的先验知识。我们的框架在现实世界的实验中被应用于乒乓球的拦截和返回,乒乓球被打给一个四自由度的机械臂。机器人手臂由气动人造肌肉驱动,这使得控制和学习任务具有挑战性。我们通过将其与优化反馈策略的强化学习方法进行比较,突出了我们框架的潜力。我们发现我们的框架在107次连续试验中获得了更高的成功率((100%) vs. (96%),见https://youtu.be/kR9jowEH7PY),而在训练期间只需要大约十分之一的样本。我们还发现,我们的方法能够处理各种不同的入射轨迹。
{"title":"Reinforcement learning with model-based feedforward inputs for robotic table tennis","authors":"Hao Ma, Dieter Büchler, Bernhard Schölkopf, Michael Muehlebach","doi":"10.1007/s10514-023-10140-6","DOIUrl":"10.1007/s10514-023-10140-6","url":null,"abstract":"<div><p>We rethink the traditional reinforcement learning approach, which is based on optimizing over feedback policies, and propose a new framework that optimizes over feedforward inputs instead. This not only mitigates the risk of destabilizing the system during training but also reduces the bulk of the learning to a supervised learning task. As a result, efficient and well-understood supervised learning techniques can be applied and are tuned using a validation data set. The labels are generated with a variant of iterative learning control, which also includes prior knowledge about the underlying dynamics. Our framework is applied for intercepting and returning ping-pong balls that are played to a four-degrees-of-freedom robotic arm in real-world experiments. The robot arm is driven by pneumatic artificial muscles, which makes the control and learning tasks challenging. We highlight the potential of our framework by comparing it to a reinforcement learning approach that optimizes over feedback policies. We find that our framework achieves a higher success rate for the returns (<span>(100%)</span> vs. <span>(96%)</span>, on 107 consecutive trials, see https://youtu.be/kR9jowEH7PY) while requiring only about one tenth of the samples during training. We also find that our approach is able to deal with a variant of different incoming trajectories.\u0000</p></div>","PeriodicalId":55409,"journal":{"name":"Autonomous Robots","volume":"47 8","pages":"1387 - 1403"},"PeriodicalIF":3.5,"publicationDate":"2023-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10514-023-10140-6.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135995053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-15DOI: 10.1007/s10514-023-10146-0
Henrique Ferrolho, Vladimir Ivan, Wolfgang Merkt, Ioannis Havoutis, Sethu Vijayakumar
Deployment of robotic systems in the real world requires a certain level of robustness in order to deal with uncertainty factors, such as mismatches in the dynamics model, noise in sensor readings, and communication delays. Some approaches tackle these issues reactively at the control stage. However, regardless of the controller, online motion execution can only be as robust as the system capabilities allow at any given state. This is why it is important to have good motion plans to begin with, where robustness is considered proactively. To this end, we propose a metric (derived from first principles) for representing robustness against external disturbances. We then use this metric within our trajectory optimization framework for solving complex loco-manipulation tasks. Through our experiments, we show that trajectories generated using our approach can resist a greater range of forces originating from any possible direction. By using our method, we can compute trajectories that solve tasks as effectively as before, with the added benefit of being able to counteract stronger disturbances in worst-case scenarios.
{"title":"RoLoMa: robust loco-manipulation for quadruped robots with arms","authors":"Henrique Ferrolho, Vladimir Ivan, Wolfgang Merkt, Ioannis Havoutis, Sethu Vijayakumar","doi":"10.1007/s10514-023-10146-0","DOIUrl":"10.1007/s10514-023-10146-0","url":null,"abstract":"<div><p>Deployment of robotic systems in the real world requires a certain level of robustness in order to deal with uncertainty factors, such as mismatches in the dynamics model, noise in sensor readings, and communication delays. Some approaches tackle these issues <i>reactively</i> at the control stage. However, regardless of the controller, online motion execution can only be as robust as the system capabilities allow at any given state. This is why it is important to have good motion plans to begin with, where robustness is considered <i>proactively</i>. To this end, we propose a metric (derived from first principles) for representing robustness against external disturbances. We then use this metric within our trajectory optimization framework for solving complex loco-manipulation tasks. Through our experiments, we show that trajectories generated using our approach can resist a greater range of forces originating from any possible direction. By using our method, we can compute trajectories that solve tasks as effectively as before, with the added benefit of being able to counteract stronger disturbances in worst-case scenarios.\u0000</p></div>","PeriodicalId":55409,"journal":{"name":"Autonomous Robots","volume":"47 8","pages":"1463 - 1481"},"PeriodicalIF":3.5,"publicationDate":"2023-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10514-023-10146-0.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136185248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-09-23DOI: 10.1007/s10514-023-10137-1
Tara Boroushaki, Laura Dodds, Nazish Naeem, Fadel Adib
Mechanical search is a robotic problem where a robot needs to retrieve a target item that is partially or fully-occluded from its camera. State-of-the-art approaches for mechanical search either require an expensive search process to find the target item, or they require the item to be tagged with a radio frequency identification tag (e.g., RFID), making their approach beneficial only to tagged items in the environment. We present FuseBot, the first robotic system for RF-Visual mechanical search that enables efficient retrieval of both RF-tagged and untagged items in a pile. Rather than requiring all target items in a pile to be RF-tagged, FuseBot leverages the mere existence of an RF-tagged item in the pile to benefit both tagged and untagged items. Our design introduces two key innovations. The first is RF-Visual Mapping, a technique that identifies and locates RF-tagged items in a pile and uses this information to construct an RF-Visual occupancy distribution map. The second is RF-Visual Extraction, a policy formulated as an optimization problem that minimizes the number of actions required to extract the target object by accounting for the probabilistic occupancy distribution, the expected grasp quality, and the expected information gain from future actions. We built a real-time end-to-end prototype of our system on a UR5e robotic arm with in-hand vision and RF perception modules. We conducted over 200 real-world experimental trials to evaluate FuseBot and compare its performance to a state-of-the-art vision-based system named X-Ray (Danielczuk et al., in: 2020 IEEE/RSJ international conference on intelligent robots and systems (IROS), IEEE, 2020). Our experimental results demonstrate that FuseBot outperforms X-Ray’s efficiency by more than 40% in terms of the number of actions required for successful mechanical search. Furthermore, in comparison to X-Ray’s success rate of 84%, FuseBot achieves a success rate of 95% in retrieving untagged items, demonstrating for the first time that the benefits of RF perception extend beyond tagged objects in the mechanical search problem.
{"title":"FuseBot: mechanical search of rigid and deformable objects via multi-modal perception","authors":"Tara Boroushaki, Laura Dodds, Nazish Naeem, Fadel Adib","doi":"10.1007/s10514-023-10137-1","DOIUrl":"10.1007/s10514-023-10137-1","url":null,"abstract":"<div><p>Mechanical search is a robotic problem where a robot needs to retrieve a target item that is partially or fully-occluded from its camera. State-of-the-art approaches for mechanical search either require an expensive search process to find the target item, or they require the item to be tagged with a radio frequency identification tag (e.g., RFID), making their approach beneficial only to tagged items in the environment. We present FuseBot, the first robotic system for RF-Visual mechanical search that enables efficient retrieval of both RF-tagged and untagged items in a pile. Rather than requiring all target items in a pile to be RF-tagged, FuseBot leverages the mere existence of an RF-tagged item in the pile to benefit both tagged and untagged items. Our design introduces two key innovations. The first is <i>RF-Visual Mapping</i>, a technique that identifies and locates RF-tagged items in a pile and uses this information to construct an RF-Visual occupancy distribution map. The second is <i>RF-Visual Extraction</i>, a policy formulated as an optimization problem that minimizes the number of actions required to extract the target object by accounting for the probabilistic occupancy distribution, the expected grasp quality, and the expected information gain from future actions. We built a real-time end-to-end prototype of our system on a UR5e robotic arm with in-hand vision and RF perception modules. We conducted over 200 real-world experimental trials to evaluate FuseBot and compare its performance to a state-of-the-art vision-based system named X-Ray (Danielczuk et al., in: 2020 IEEE/RSJ international conference on intelligent robots and systems (IROS), IEEE, 2020). Our experimental results demonstrate that FuseBot outperforms X-Ray’s efficiency by more than 40% in terms of the number of actions required for successful mechanical search. Furthermore, in comparison to X-Ray’s success rate of 84%, FuseBot achieves a success rate of 95% in retrieving untagged items, demonstrating for the first time that the benefits of RF perception extend beyond tagged objects in the mechanical search problem.</p></div>","PeriodicalId":55409,"journal":{"name":"Autonomous Robots","volume":"47 8","pages":"1137 - 1154"},"PeriodicalIF":3.5,"publicationDate":"2023-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10514-023-10137-1.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135958951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-09-22DOI: 10.1007/s10514-023-10138-0
Marco Leonardi, Annette Stahl, Edmund Førland Brekke, Martin Ludvigsen
In this paper, a visual simultaneous localization and mapping (VSLAM/visual SLAM) system called underwater visual SLAM (UVS) system is presented, specifically tailored for camera-only navigation in natural underwater environments. The UVS system is particularly optimized towards precision and robustness, as well as lifelong operations. We build upon Oriented features from accelerated segment test and Rotated Binary robust independent elementary features simultaneous localization and mapping (ORB-SLAM) and improve the accuracy by performing an exact search in the descriptor space during triangulation and the robustness by utilizing a unified initialization method and a motion model. In addition, we present a scale-agnostic station-keeping detection, which aims to optimize the map and poses during station-keeping, and a pruning strategy, which takes into account the point’s age and distance to the active keyframe. An exhaustive evaluation is presented to the reader, using a total of 38 in-air and underwater sequences.
{"title":"UVS: underwater visual SLAM—a robust monocular visual SLAM system for lifelong underwater operations","authors":"Marco Leonardi, Annette Stahl, Edmund Førland Brekke, Martin Ludvigsen","doi":"10.1007/s10514-023-10138-0","DOIUrl":"10.1007/s10514-023-10138-0","url":null,"abstract":"<div><p>In this paper, a visual simultaneous localization and mapping (VSLAM/visual SLAM) system called underwater visual SLAM (UVS) system is presented, specifically tailored for camera-only navigation in natural underwater environments. The UVS system is particularly optimized towards precision and robustness, as well as lifelong operations. We build upon Oriented features from accelerated segment test and Rotated Binary robust independent elementary features simultaneous localization and mapping (ORB-SLAM) and improve the accuracy by performing an exact search in the descriptor space during triangulation and the robustness by utilizing a unified initialization method and a motion model. In addition, we present a scale-agnostic station-keeping detection, which aims to optimize the map and poses during station-keeping, and a pruning strategy, which takes into account the point’s age and distance to the active keyframe. An exhaustive evaluation is presented to the reader, using a total of 38 in-air and underwater sequences.\u0000</p></div>","PeriodicalId":55409,"journal":{"name":"Autonomous Robots","volume":"47 8","pages":"1367 - 1385"},"PeriodicalIF":3.5,"publicationDate":"2023-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10514-023-10138-0.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136015937","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-09-13DOI: 10.1007/s10514-023-10126-4
Christopher Heintz, Sean C. C. Bailey, Jesse B. Hoagg
We present a formation-control algorithm for autonomous fixed-wing air vehicles. The desired inter-vehicle positions are time-varying, and we assume that at least one vehicle has access to a measurement its position relative to the leader, which can be a physical or virtual member of the formation. Each vehicle is modeled with extended unicycle dynamics that include orientation kinematics on SO(3), speed dynamics, and strict constraints on speed (i.e., ground speed). The analytic result shows that the vehicles converge exponentially to the desired relative positions with each other and the leader. We also show that each vehicle’s speed satisfies the speed constraints. The formation algorithm is demonstrated in software-in-the-loop (SITL) simulations and experiments with fixed-wing air vehicles. To implement the formation-control algorithm, each vehicle has middle-loop controllers to determine roll, pitch, and throttle commands from the outer-loop formation control. We present SITL simulations with 4 fixed-wing air vehicles that demonstrate formation control with different communication structures. Finally, we present formation-control experiments with up to 3 fixed-wing air vehicles.
{"title":"Formation control for autonomous fixed-wing air vehicles with strict speed constraints","authors":"Christopher Heintz, Sean C. C. Bailey, Jesse B. Hoagg","doi":"10.1007/s10514-023-10126-4","DOIUrl":"10.1007/s10514-023-10126-4","url":null,"abstract":"<div><p>We present a formation-control algorithm for autonomous fixed-wing air vehicles. The desired inter-vehicle positions are time-varying, and we assume that at least one vehicle has access to a measurement its position relative to the leader, which can be a physical or virtual member of the formation. Each vehicle is modeled with extended unicycle dynamics that include orientation kinematics on SO(3), speed dynamics, and strict constraints on speed (i.e., ground speed). The analytic result shows that the vehicles converge exponentially to the desired relative positions with each other and the leader. We also show that each vehicle’s speed satisfies the speed constraints. The formation algorithm is demonstrated in software-in-the-loop (SITL) simulations and experiments with fixed-wing air vehicles. To implement the formation-control algorithm, each vehicle has middle-loop controllers to determine roll, pitch, and throttle commands from the outer-loop formation control. We present SITL simulations with 4 fixed-wing air vehicles that demonstrate formation control with different communication structures. Finally, we present formation-control experiments with up to 3 fixed-wing air vehicles.</p></div>","PeriodicalId":55409,"journal":{"name":"Autonomous Robots","volume":"47 8","pages":"1299 - 1323"},"PeriodicalIF":3.5,"publicationDate":"2023-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135742202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-09-08DOI: 10.1007/s10514-023-10130-8
Charles Schaff, Audrey Sedal, Shiyao Ni, Matthew R. Walter
This work provides a complete framework for the simulation, co-optimization, and sim-to-real transfer of the design and control of soft legged robots. Soft robots have “mechanical intelligence”: the ability to passively exhibit behaviors that would otherwise be difficult to program. Exploiting this capacity requires consideration of the coupling between design and control. Co-optimization provides a way to reason over this coupling. Yet, it is difficult to achieve simulations that are both sufficiently accurate to allow for sim-to-real transfer and fast enough for contemporary co-optimization algorithms. We describe a modularized model order reduction algorithm that improves simulation efficiency, while preserving the accuracy required to learn effective soft robot design and control. We propose a reinforcement learning-based co-optimization framework that identifies several soft crawling robots that outperform an expert baseline with zero-shot sim-to-real transfer. We study generalization of the framework to new terrains, and the efficacy of domain randomization as a means to improve sim-to-real transfer.
{"title":"Sim-to-real transfer of co-optimized soft robot crawlers","authors":"Charles Schaff, Audrey Sedal, Shiyao Ni, Matthew R. Walter","doi":"10.1007/s10514-023-10130-8","DOIUrl":"10.1007/s10514-023-10130-8","url":null,"abstract":"<div><p>This work provides a complete framework for the simulation, co-optimization, and sim-to-real transfer of the design and control of soft legged robots. Soft robots have “mechanical intelligence”: the ability to passively exhibit behaviors that would otherwise be difficult to program. Exploiting this capacity requires consideration of the coupling between design and control. Co-optimization provides a way to reason over this coupling. Yet, it is difficult to achieve simulations that are both sufficiently accurate to allow for sim-to-real transfer and fast enough for contemporary co-optimization algorithms. We describe a modularized model order reduction algorithm that improves simulation efficiency, while preserving the accuracy required to learn effective soft robot design and control. We propose a reinforcement learning-based co-optimization framework that identifies several soft crawling robots that outperform an expert baseline with zero-shot sim-to-real transfer. We study generalization of the framework to new terrains, and the efficacy of domain randomization as a means to improve sim-to-real transfer.\u0000</p></div>","PeriodicalId":55409,"journal":{"name":"Autonomous Robots","volume":"47 8","pages":"1195 - 1211"},"PeriodicalIF":3.5,"publicationDate":"2023-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46827720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-09-01DOI: 10.1007/s10514-023-10128-2
Grzegorz Malczyk, Maximilian Brunner, Eugenio Cuniato, Marco Tognon, Roland Siegwart
To improve accuracy and robustness of interactive aerial robots, the knowledge of the forces acting on the platform is of uttermost importance. The robot should distinguish interaction forces from external disturbances in order to be compliant with the firsts and reject the seconds. This represents a challenge since disturbances might be of different nature (physical contact, aerodynamic, modeling errors) and be applied to different points of the robot. This work presents a new (hbox {extended Kalman filter (EKF)}) based estimator for both external disturbance and interaction forces. The estimator fuses information coming from the system’s dynamic model and it’s state with wrench measurements coming from a Force-Torque sensor. This allows for robust interaction control at the tool’s tip even in presence of external disturbance wrenches acting on the platform. We employ the filter estimates in a novel hybrid force/motion controller to perform force tracking not only along the tool direction, but from any platform’s orientation, without losing the stability of the pose controller. The proposed framework is extensively tested on an omnidirectional aerial manipulator (AM) performing push and slide operations and transitioning between different interaction surfaces, while subject to external disturbances. The experiments are done equipping the AM with two different tools: a rigid interaction stick and an actuated delta manipulator, showing the generality of the approach. Moreover, the estimation results are compared to a state-of-the-art momentum-based estimator, clearly showing the superiority of the EKF approach.
{"title":"Multi-directional Interaction Force Control with an Aerial Manipulator Under External Disturbances","authors":"Grzegorz Malczyk, Maximilian Brunner, Eugenio Cuniato, Marco Tognon, Roland Siegwart","doi":"10.1007/s10514-023-10128-2","DOIUrl":"10.1007/s10514-023-10128-2","url":null,"abstract":"<div><p>To improve accuracy and robustness of interactive aerial robots, the knowledge of the forces acting on the platform is of uttermost importance. The robot should distinguish interaction forces from external disturbances in order to be compliant with the firsts and reject the seconds. This represents a challenge since disturbances might be of different nature (physical contact, aerodynamic, modeling errors) and be applied to different points of the robot. This work presents a new <span>(hbox {extended Kalman filter (EKF)})</span> based estimator for both external disturbance and interaction forces. The estimator fuses information coming from the system’s dynamic model and it’s state with wrench measurements coming from a Force-Torque sensor. This allows for robust interaction control at the tool’s tip even in presence of external disturbance wrenches acting on the platform. We employ the filter estimates in a novel hybrid force/motion controller to perform force tracking not only along the tool direction, but from any platform’s orientation, without losing the stability of the pose controller. The proposed framework is extensively tested on an omnidirectional aerial manipulator (AM) performing push and slide operations and transitioning between different interaction surfaces, while subject to external disturbances. The experiments are done equipping the AM with two different tools: a rigid interaction stick and an actuated delta manipulator, showing the generality of the approach. Moreover, the estimation results are compared to a state-of-the-art momentum-based estimator, clearly showing the superiority of the EKF approach.\u0000</p></div>","PeriodicalId":55409,"journal":{"name":"Autonomous Robots","volume":"47 8","pages":"1325 - 1343"},"PeriodicalIF":3.5,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10514-023-10128-2.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43984917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}