Andrew Lee, Ian Chuang, Ling-Yuan Chen, Iman Soltani
We present InterACT: Inter-dependency aware Action Chunking with Hierarchical Attention Transformers, a novel imitation learning framework for bimanual manipulation that integrates hierarchical attention to capture inter-dependencies between dual-arm joint states and visual inputs. InterACT consists of a Hierarchical Attention Encoder and a Multi-arm Decoder, both designed to enhance information aggregation and coordination. The encoder processes multi-modal inputs through segment-wise and cross-segment attention mechanisms, while the decoder leverages synchronization blocks to refine individual action predictions, providing the counterpart's prediction as context. Our experiments on a variety of simulated and real-world bimanual manipulation tasks demonstrate that InterACT significantly outperforms existing methods. Detailed ablation studies validate the contributions of key components of our work, including the impact of CLS tokens, cross-segment encoders, and synchronization blocks.
{"title":"InterACT: Inter-dependency Aware Action Chunking with Hierarchical Attention Transformers for Bimanual Manipulation","authors":"Andrew Lee, Ian Chuang, Ling-Yuan Chen, Iman Soltani","doi":"arxiv-2409.07914","DOIUrl":"https://doi.org/arxiv-2409.07914","url":null,"abstract":"We present InterACT: Inter-dependency aware Action Chunking with Hierarchical\u0000Attention Transformers, a novel imitation learning framework for bimanual\u0000manipulation that integrates hierarchical attention to capture\u0000inter-dependencies between dual-arm joint states and visual inputs. InterACT\u0000consists of a Hierarchical Attention Encoder and a Multi-arm Decoder, both\u0000designed to enhance information aggregation and coordination. The encoder\u0000processes multi-modal inputs through segment-wise and cross-segment attention\u0000mechanisms, while the decoder leverages synchronization blocks to refine\u0000individual action predictions, providing the counterpart's prediction as\u0000context. Our experiments on a variety of simulated and real-world bimanual\u0000manipulation tasks demonstrate that InterACT significantly outperforms existing\u0000methods. Detailed ablation studies validate the contributions of key components\u0000of our work, including the impact of CLS tokens, cross-segment encoders, and\u0000synchronization blocks.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mengke Zhang, Zhichao Han, Chao Xu, Fei Gao, Yanjun Cao
Differential-driven robots are widely used in various scenarios thanks to their straightforward principle, from household service robots to disaster response field robots. There are several different types of deriving mechanisms considering the real-world applications, including two-wheeled, four-wheeled skid-steering, tracked robots, etc. The differences in the driving mechanism usually require specific kinematic modeling when precise controlling is desired. Furthermore, the nonholonomic dynamics and possible lateral slip lead to different degrees of difficulty in getting feasible and high-quality trajectories. Therefore, a comprehensive trajectory optimization framework to compute trajectories efficiently for various kinds of differential-driven robots is highly desirable. In this paper, we propose a universal trajectory optimization framework that can be applied to differential-driven robot class, enabling the generation of high-quality trajectories within a restricted computational timeframe. We introduce a novel trajectory representation based on polynomial parameterization of motion states or their integrals, such as angular and linear velocities, that inherently matching robots' motion to the control principle for differential-driven robot class. The trajectory optimization problem is formulated to minimize complexity while prioritizing safety and operational efficiency. We then build a full-stack autonomous planning and control system to show the feasibility and robustness. We conduct extensive simulations and real-world testing in crowded environments with three kinds of differential-driven robots to validate the effectiveness of our approach. We will release our method as an open-source package.
{"title":"Universal Trajectory Optimization Framework for Differential-Driven Robot Class","authors":"Mengke Zhang, Zhichao Han, Chao Xu, Fei Gao, Yanjun Cao","doi":"arxiv-2409.07924","DOIUrl":"https://doi.org/arxiv-2409.07924","url":null,"abstract":"Differential-driven robots are widely used in various scenarios thanks to\u0000their straightforward principle, from household service robots to disaster\u0000response field robots. There are several different types of deriving mechanisms\u0000considering the real-world applications, including two-wheeled, four-wheeled\u0000skid-steering, tracked robots, etc. The differences in the driving mechanism\u0000usually require specific kinematic modeling when precise controlling is\u0000desired. Furthermore, the nonholonomic dynamics and possible lateral slip lead\u0000to different degrees of difficulty in getting feasible and high-quality\u0000trajectories. Therefore, a comprehensive trajectory optimization framework to\u0000compute trajectories efficiently for various kinds of differential-driven\u0000robots is highly desirable. In this paper, we propose a universal trajectory\u0000optimization framework that can be applied to differential-driven robot class,\u0000enabling the generation of high-quality trajectories within a restricted\u0000computational timeframe. We introduce a novel trajectory representation based\u0000on polynomial parameterization of motion states or their integrals, such as\u0000angular and linear velocities, that inherently matching robots' motion to the\u0000control principle for differential-driven robot class. The trajectory\u0000optimization problem is formulated to minimize complexity while prioritizing\u0000safety and operational efficiency. We then build a full-stack autonomous\u0000planning and control system to show the feasibility and robustness. We conduct\u0000extensive simulations and real-world testing in crowded environments with three\u0000kinds of differential-driven robots to validate the effectiveness of our\u0000approach. We will release our method as an open-source package.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Matthias Bentert, Daniel Coimbra Salomao, Alex Crane, Yosuke Mizutani, Felix Reidl, Blair D. Sullivan
We investigate whether algorithms based on arithmetic circuits are a viable alternative to existing solvers for Graph Inspection, a problem with direct application in robotic motion planning. Specifically, we seek to address the high memory usage of existing solvers. Aided by novel theoretical results enabling fast solution recovery, we implement a circuit-based solver for Graph Inspection which uses only polynomial space and test it on several realistic robotic motion planning datasets. In particular, we provide a comprehensive experimental evaluation of a suite of engineered algorithms for three key subroutines. While this evaluation demonstrates that circuit-based methods are not yet practically competitive for our robotics application, it also provides insights which may guide future efforts to bring circuit-based algorithms from theory to practice.
{"title":"Graph Inspection for Robotic Motion Planning: Do Arithmetic Circuits Help?","authors":"Matthias Bentert, Daniel Coimbra Salomao, Alex Crane, Yosuke Mizutani, Felix Reidl, Blair D. Sullivan","doi":"arxiv-2409.08219","DOIUrl":"https://doi.org/arxiv-2409.08219","url":null,"abstract":"We investigate whether algorithms based on arithmetic circuits are a viable\u0000alternative to existing solvers for Graph Inspection, a problem with direct\u0000application in robotic motion planning. Specifically, we seek to address the\u0000high memory usage of existing solvers. Aided by novel theoretical results\u0000enabling fast solution recovery, we implement a circuit-based solver for Graph\u0000Inspection which uses only polynomial space and test it on several realistic\u0000robotic motion planning datasets. In particular, we provide a comprehensive\u0000experimental evaluation of a suite of engineered algorithms for three key\u0000subroutines. While this evaluation demonstrates that circuit-based methods are\u0000not yet practically competitive for our robotics application, it also provides\u0000insights which may guide future efforts to bring circuit-based algorithms from\u0000theory to practice.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recent successes in applying reinforcement learning (RL) for robotics has shown it is a viable approach for constructing robotic controllers. However, RL controllers can produce many collisions in environments where new obstacles appear during execution. This poses a problem in safety-critical settings. We present a hybrid approach, called iKinQP-RL, that uses an Inverse Kinematics Quadratic Programming (iKinQP) controller to correct actions proposed by an RL policy at runtime. This ensures safe execution in the presence of new obstacles not present during training. Preliminary experiments illustrate our iKinQP-RL framework completely eliminates collisions with new obstacles while maintaining a high task success rate.
{"title":"Towards Online Safety Corrections for Robotic Manipulation Policies","authors":"Ariana Spalter, Mark Roberts, Laura M. Hiatt","doi":"arxiv-2409.08233","DOIUrl":"https://doi.org/arxiv-2409.08233","url":null,"abstract":"Recent successes in applying reinforcement learning (RL) for robotics has\u0000shown it is a viable approach for constructing robotic controllers. However, RL\u0000controllers can produce many collisions in environments where new obstacles\u0000appear during execution. This poses a problem in safety-critical settings. We\u0000present a hybrid approach, called iKinQP-RL, that uses an Inverse Kinematics\u0000Quadratic Programming (iKinQP) controller to correct actions proposed by an RL\u0000policy at runtime. This ensures safe execution in the presence of new obstacles\u0000not present during training. Preliminary experiments illustrate our iKinQP-RL\u0000framework completely eliminates collisions with new obstacles while maintaining\u0000a high task success rate.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Andi Peng, Belinda Z. Li, Ilia Sucholutsky, Nishanth Kumar, Julie A. Shah, Jacob Andreas, Andreea Bobu
Many approaches to robot learning begin by inferring a reward function from a set of human demonstrations. To learn a good reward, it is necessary to determine which features of the environment are relevant before determining how these features should be used to compute reward. End-to-end methods for joint feature and reward learning (e.g., using deep networks or program synthesis techniques) often yield brittle reward functions that are sensitive to spurious state features. By contrast, humans can often generalizably learn from a small number of demonstrations by incorporating strong priors about what features of a demonstration are likely meaningful for a task of interest. How do we build robots that leverage this kind of background knowledge when learning from new demonstrations? This paper describes a method named ALGAE (Adaptive Language-Guided Abstraction from [Contrastive] Explanations) which alternates between using language models to iteratively identify human-meaningful features needed to explain demonstrated behavior, then standard inverse reinforcement learning techniques to assign weights to these features. Experiments across a variety of both simulated and real-world robot environments show that ALGAE learns generalizable reward functions defined on interpretable features using only small numbers of demonstrations. Importantly, ALGAE can recognize when features are missing, then extract and define those features without any human input -- making it possible to quickly and efficiently acquire rich representations of user behavior.
{"title":"Adaptive Language-Guided Abstraction from Contrastive Explanations","authors":"Andi Peng, Belinda Z. Li, Ilia Sucholutsky, Nishanth Kumar, Julie A. Shah, Jacob Andreas, Andreea Bobu","doi":"arxiv-2409.08212","DOIUrl":"https://doi.org/arxiv-2409.08212","url":null,"abstract":"Many approaches to robot learning begin by inferring a reward function from a\u0000set of human demonstrations. To learn a good reward, it is necessary to\u0000determine which features of the environment are relevant before determining how\u0000these features should be used to compute reward. End-to-end methods for joint\u0000feature and reward learning (e.g., using deep networks or program synthesis\u0000techniques) often yield brittle reward functions that are sensitive to spurious\u0000state features. By contrast, humans can often generalizably learn from a small\u0000number of demonstrations by incorporating strong priors about what features of\u0000a demonstration are likely meaningful for a task of interest. How do we build\u0000robots that leverage this kind of background knowledge when learning from new\u0000demonstrations? This paper describes a method named ALGAE (Adaptive\u0000Language-Guided Abstraction from [Contrastive] Explanations) which alternates\u0000between using language models to iteratively identify human-meaningful features\u0000needed to explain demonstrated behavior, then standard inverse reinforcement\u0000learning techniques to assign weights to these features. Experiments across a\u0000variety of both simulated and real-world robot environments show that ALGAE\u0000learns generalizable reward functions defined on interpretable features using\u0000only small numbers of demonstrations. Importantly, ALGAE can recognize when\u0000features are missing, then extract and define those features without any human\u0000input -- making it possible to quickly and efficiently acquire rich\u0000representations of user behavior.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Machine learning based autonomous driving systems often face challenges with safety-critical scenarios that are rare in real-world data, hindering their large-scale deployment. While increasing real-world training data coverage could address this issue, it is costly and dangerous. This work explores generating safety-critical driving scenarios by modifying complex real-world regular scenarios through trajectory optimization. We propose ReGentS, which stabilizes generated trajectories and introduces heuristics to avoid obvious collisions and optimization problems. Our approach addresses unrealistic diverging trajectories and unavoidable collision scenarios that are not useful for training robust planner. We also extend the scenario generation framework to handle real-world data with up to 32 agents. Additionally, by using a differentiable simulator, our approach simplifies gradient descent-based optimization involving a simulator, paving the way for future advancements. The code is available at https://github.com/valeoai/ReGentS.
{"title":"ReGentS: Real-World Safety-Critical Driving Scenario Generation Made Stable","authors":"Yuan Yin, Pegah Khayatan, Éloi Zablocki, Alexandre Boulch, Matthieu Cord","doi":"arxiv-2409.07830","DOIUrl":"https://doi.org/arxiv-2409.07830","url":null,"abstract":"Machine learning based autonomous driving systems often face challenges with\u0000safety-critical scenarios that are rare in real-world data, hindering their\u0000large-scale deployment. While increasing real-world training data coverage\u0000could address this issue, it is costly and dangerous. This work explores\u0000generating safety-critical driving scenarios by modifying complex real-world\u0000regular scenarios through trajectory optimization. We propose ReGentS, which\u0000stabilizes generated trajectories and introduces heuristics to avoid obvious\u0000collisions and optimization problems. Our approach addresses unrealistic\u0000diverging trajectories and unavoidable collision scenarios that are not useful\u0000for training robust planner. We also extend the scenario generation framework\u0000to handle real-world data with up to 32 agents. Additionally, by using a\u0000differentiable simulator, our approach simplifies gradient descent-based\u0000optimization involving a simulator, paving the way for future advancements. The\u0000code is available at https://github.com/valeoai/ReGentS.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142221999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Himanshu Gaurav Singh, Antonio Loquercio, Carmelo Sferrazza, Jane Wu, Haozhi Qi, Pieter Abbeel, Jitendra Malik
We present an approach to learn general robot manipulation priors from 3D hand-object interaction trajectories. We build a framework to use in-the-wild videos to generate sensorimotor robot trajectories. We do so by lifting both the human hand and the manipulated object in a shared 3D space and retargeting human motions to robot actions. Generative modeling on this data gives us a task-agnostic base policy. This policy captures a general yet flexible manipulation prior. We empirically demonstrate that finetuning this policy, with both reinforcement learning (RL) and behavior cloning (BC), enables sample-efficient adaptation to downstream tasks and simultaneously improves robustness and generalizability compared to prior approaches. Qualitative experiments are available at: url{https://hgaurav2k.github.io/hop/}.
{"title":"Hand-Object Interaction Pretraining from Videos","authors":"Himanshu Gaurav Singh, Antonio Loquercio, Carmelo Sferrazza, Jane Wu, Haozhi Qi, Pieter Abbeel, Jitendra Malik","doi":"arxiv-2409.08273","DOIUrl":"https://doi.org/arxiv-2409.08273","url":null,"abstract":"We present an approach to learn general robot manipulation priors from 3D\u0000hand-object interaction trajectories. We build a framework to use in-the-wild\u0000videos to generate sensorimotor robot trajectories. We do so by lifting both\u0000the human hand and the manipulated object in a shared 3D space and retargeting\u0000human motions to robot actions. Generative modeling on this data gives us a\u0000task-agnostic base policy. This policy captures a general yet flexible\u0000manipulation prior. We empirically demonstrate that finetuning this policy,\u0000with both reinforcement learning (RL) and behavior cloning (BC), enables\u0000sample-efficient adaptation to downstream tasks and simultaneously improves\u0000robustness and generalizability compared to prior approaches. Qualitative\u0000experiments are available at: url{https://hgaurav2k.github.io/hop/}.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ming Li, Xiong Yang, Chaofan Wu, Jiaheng Li, Pinzhi Wang, Xuejiao Hu, Sidan Du, Yang Li
Omnidirectional Depth Estimation has broad application prospects in fields such as robotic navigation and autonomous driving. In this paper, we propose a robotic prototype system and corresponding algorithm designed to validate omnidirectional depth estimation for navigation and obstacle avoidance in real-world scenarios for both robots and vehicles. The proposed HexaMODE system captures 360$^circ$ depth maps using six surrounding arranged fisheye cameras. We introduce a combined spherical sweeping method and optimize the model architecture for proposed RtHexa-OmniMVS algorithm to achieve real-time omnidirectional depth estimation. To ensure high accuracy, robustness, and generalization in real-world environments, we employ a teacher-student self-training strategy, utilizing large-scale unlabeled real-world data for model training. The proposed algorithm demonstrates high accuracy in various complex real-world scenarios, both indoors and outdoors, achieving an inference speed of 15 fps on edge computing platforms.
{"title":"Real-time Multi-view Omnidirectional Depth Estimation System for Robots and Autonomous Driving on Real Scenes","authors":"Ming Li, Xiong Yang, Chaofan Wu, Jiaheng Li, Pinzhi Wang, Xuejiao Hu, Sidan Du, Yang Li","doi":"arxiv-2409.07843","DOIUrl":"https://doi.org/arxiv-2409.07843","url":null,"abstract":"Omnidirectional Depth Estimation has broad application prospects in fields\u0000such as robotic navigation and autonomous driving. In this paper, we propose a\u0000robotic prototype system and corresponding algorithm designed to validate\u0000omnidirectional depth estimation for navigation and obstacle avoidance in\u0000real-world scenarios for both robots and vehicles. The proposed HexaMODE system\u0000captures 360$^circ$ depth maps using six surrounding arranged fisheye cameras.\u0000We introduce a combined spherical sweeping method and optimize the model\u0000architecture for proposed RtHexa-OmniMVS algorithm to achieve real-time\u0000omnidirectional depth estimation. To ensure high accuracy, robustness, and\u0000generalization in real-world environments, we employ a teacher-student\u0000self-training strategy, utilizing large-scale unlabeled real-world data for\u0000model training. The proposed algorithm demonstrates high accuracy in various\u0000complex real-world scenarios, both indoors and outdoors, achieving an inference\u0000speed of 15 fps on edge computing platforms.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Devansh Dhrafani, Yifei Liu, Andrew Jong, Ukcheol Shin, Yao He, Tyler Harp, Yaoyu Hu, Jean Oh, Sebastian Scherer
Robust depth perception in visually-degraded environments is crucial for autonomous aerial systems. Thermal imaging cameras, which capture infrared radiation, are robust to visual degradation. However, due to lack of a large-scale dataset, the use of thermal cameras for unmanned aerial system (UAS) depth perception has remained largely unexplored. This paper presents a stereo thermal depth perception dataset for autonomous aerial perception applications. The dataset consists of stereo thermal images, LiDAR, IMU and ground truth depth maps captured in urban and forest settings under diverse conditions like day, night, rain, and smoke. We benchmark representative stereo depth estimation algorithms, offering insights into their performance in degraded conditions. Models trained on our dataset generalize well to unseen smoky conditions, highlighting the robustness of stereo thermal imaging for depth perception. We aim for this work to enhance robotic perception in disaster scenarios, allowing for exploration and operations in previously unreachable areas. The dataset and source code are available at https://firestereo.github.io.
{"title":"FIReStereo: Forest InfraRed Stereo Dataset for UAS Depth Perception in Visually Degraded Environments","authors":"Devansh Dhrafani, Yifei Liu, Andrew Jong, Ukcheol Shin, Yao He, Tyler Harp, Yaoyu Hu, Jean Oh, Sebastian Scherer","doi":"arxiv-2409.07715","DOIUrl":"https://doi.org/arxiv-2409.07715","url":null,"abstract":"Robust depth perception in visually-degraded environments is crucial for\u0000autonomous aerial systems. Thermal imaging cameras, which capture infrared\u0000radiation, are robust to visual degradation. However, due to lack of a\u0000large-scale dataset, the use of thermal cameras for unmanned aerial system\u0000(UAS) depth perception has remained largely unexplored. This paper presents a\u0000stereo thermal depth perception dataset for autonomous aerial perception\u0000applications. The dataset consists of stereo thermal images, LiDAR, IMU and\u0000ground truth depth maps captured in urban and forest settings under diverse\u0000conditions like day, night, rain, and smoke. We benchmark representative stereo\u0000depth estimation algorithms, offering insights into their performance in\u0000degraded conditions. Models trained on our dataset generalize well to unseen\u0000smoky conditions, highlighting the robustness of stereo thermal imaging for\u0000depth perception. We aim for this work to enhance robotic perception in\u0000disaster scenarios, allowing for exploration and operations in previously\u0000unreachable areas. The dataset and source code are available at\u0000https://firestereo.github.io.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaotong Zhang, Dingcheng Huang, Kamal Youcef-Toumi
Effective human-robot collaboration (HRC) requires the robots to possess human-like intelligence. Inspired by the human's cognitive ability to selectively process and filter elements in complex environments, this paper introduces a novel concept and scene-understanding approach termed `relevance.' It identifies relevant components in a scene. To accurately and efficiently quantify relevance, we developed an event-based framework that selectively triggers relevance determination, along with a probabilistic methodology built on a structured scene representation. Simulation results demonstrate that the relevance framework and methodology accurately predict the relevance of a general HRC setup, achieving a precision of 0.99 and a recall of 0.94. Relevance can be broadly applied to several areas in HRC to improve task planning time by 79.56% compared with pure planning for a cereal task, reduce perception latency by up to 26.53% for an object detector, improve HRC safety by up to 13.50% and reduce the number of inquiries for HRC by 75.36%. A real-world demonstration showcases the relevance framework's ability to intelligently assist humans in everyday tasks.
{"title":"Relevance for Human Robot Collaboration","authors":"Xiaotong Zhang, Dingcheng Huang, Kamal Youcef-Toumi","doi":"arxiv-2409.07753","DOIUrl":"https://doi.org/arxiv-2409.07753","url":null,"abstract":"Effective human-robot collaboration (HRC) requires the robots to possess\u0000human-like intelligence. Inspired by the human's cognitive ability to\u0000selectively process and filter elements in complex environments, this paper\u0000introduces a novel concept and scene-understanding approach termed `relevance.'\u0000It identifies relevant components in a scene. To accurately and efficiently\u0000quantify relevance, we developed an event-based framework that selectively\u0000triggers relevance determination, along with a probabilistic methodology built\u0000on a structured scene representation. Simulation results demonstrate that the\u0000relevance framework and methodology accurately predict the relevance of a\u0000general HRC setup, achieving a precision of 0.99 and a recall of 0.94.\u0000Relevance can be broadly applied to several areas in HRC to improve task\u0000planning time by 79.56% compared with pure planning for a cereal task, reduce\u0000perception latency by up to 26.53% for an object detector, improve HRC safety\u0000by up to 13.50% and reduce the number of inquiries for HRC by 75.36%. A\u0000real-world demonstration showcases the relevance framework's ability to\u0000intelligently assist humans in everyday tasks.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}