Avirup Das, Rishabh Dev Yadav, Sihao Sun, Mingfei Sun, Samuel Kaski, Wei Pan
An inherent fragility of quadrotor systems stems from model inaccuracies and external disturbances. These factors hinder performance and compromise the stability of the system, making precise control challenging. Existing model-based approaches either make deterministic assumptions, utilize Gaussian-based representations of uncertainty, or rely on nominal models, all of which often fall short in capturing the complex, multimodal nature of real-world dynamics. This work introduces DroneDiffusion, a novel framework that leverages conditional diffusion models to learn quadrotor dynamics, formulated as a sequence generation task. DroneDiffusion achieves superior generalization to unseen, complex scenarios by capturing the temporal nature of uncertainties and mitigating error propagation. We integrate the learned dynamics with an adaptive controller for trajectory tracking with stability guarantees. Extensive experiments in both simulation and real-world flights demonstrate the robustness of the framework across a range of scenarios, including unfamiliar flight paths and varying payloads, velocities, and wind disturbances.
{"title":"DroneDiffusion: Robust Quadrotor Dynamics Learning with Diffusion Models","authors":"Avirup Das, Rishabh Dev Yadav, Sihao Sun, Mingfei Sun, Samuel Kaski, Wei Pan","doi":"arxiv-2409.11292","DOIUrl":"https://doi.org/arxiv-2409.11292","url":null,"abstract":"An inherent fragility of quadrotor systems stems from model inaccuracies and\u0000external disturbances. These factors hinder performance and compromise the\u0000stability of the system, making precise control challenging. Existing\u0000model-based approaches either make deterministic assumptions, utilize\u0000Gaussian-based representations of uncertainty, or rely on nominal models, all\u0000of which often fall short in capturing the complex, multimodal nature of\u0000real-world dynamics. This work introduces DroneDiffusion, a novel framework\u0000that leverages conditional diffusion models to learn quadrotor dynamics,\u0000formulated as a sequence generation task. DroneDiffusion achieves superior\u0000generalization to unseen, complex scenarios by capturing the temporal nature of\u0000uncertainties and mitigating error propagation. We integrate the learned\u0000dynamics with an adaptive controller for trajectory tracking with stability\u0000guarantees. Extensive experiments in both simulation and real-world flights\u0000demonstrate the robustness of the framework across a range of scenarios,\u0000including unfamiliar flight paths and varying payloads, velocities, and wind\u0000disturbances.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We present VertiEncoder, a self-supervised representation learning approach for robot mobility on vertically challenging terrain. Using the same pre-training process, VertiEncoder can handle four different downstream tasks, including forward kinodynamics learning, inverse kinodynamics learning, behavior cloning, and patch reconstruction with a single representation. VertiEncoder uses a TransformerEncoder to learn the local context of its surroundings by random masking and next patch reconstruction. We show that VertiEncoder achieves better performance across all four different tasks compared to specialized End-to-End models with 77% fewer parameters. We also show VertiEncoder's comparable performance against state-of-the-art kinodynamic modeling and planning approaches in real-world robot deployment. These results underscore the efficacy of VertiEncoder in mitigating overfitting and fostering more robust generalization across diverse environmental contexts and downstream vehicle kinodynamic tasks.
{"title":"VertiEncoder: Self-Supervised Kinodynamic Representation Learning on Vertically Challenging Terrain","authors":"Mohammad Nazeri, Aniket Datar, Anuj Pokhrel, Chenhui Pan, Garrett Warnell, Xuesu Xiao","doi":"arxiv-2409.11570","DOIUrl":"https://doi.org/arxiv-2409.11570","url":null,"abstract":"We present VertiEncoder, a self-supervised representation learning approach\u0000for robot mobility on vertically challenging terrain. Using the same\u0000pre-training process, VertiEncoder can handle four different downstream tasks,\u0000including forward kinodynamics learning, inverse kinodynamics learning,\u0000behavior cloning, and patch reconstruction with a single representation.\u0000VertiEncoder uses a TransformerEncoder to learn the local context of its\u0000surroundings by random masking and next patch reconstruction. We show that\u0000VertiEncoder achieves better performance across all four different tasks\u0000compared to specialized End-to-End models with 77% fewer parameters. We also\u0000show VertiEncoder's comparable performance against state-of-the-art kinodynamic\u0000modeling and planning approaches in real-world robot deployment. These results\u0000underscore the efficacy of VertiEncoder in mitigating overfitting and fostering\u0000more robust generalization across diverse environmental contexts and downstream\u0000vehicle kinodynamic tasks.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A key challenge in autonomous driving is that Autonomous Vehicles (AVs) must contend with multiple, often conflicting, planning requirements. These requirements naturally form in a hierarchy -- e.g., avoiding a collision is more important than maintaining lane. While the exact structure of this hierarchy remains unknown, to progress towards ensuring that AVs satisfy pre-determined behavior specifications, it is crucial to develop approaches that systematically account for it. Motivated by lexicographic behavior specification in AVs, this work addresses a lexicographic multi-objective motion planning problem, where each objective is incomparably more important than the next -- consider that avoiding a collision is incomparably more important than a lane change violation. This work ties together two elements. Firstly, a multi-objective candidate function that asymptotically represents lexicographic orders is introduced. Unlike existing multi-objective cost function formulations, this approach assures that returned solutions asymptotically align with the lexicographic behavior specification. Secondly, inspired by continuation methods, we propose two algorithms that asymptotically approach minimum rank decisions -- i.e., decisions that satisfy the highest number of important rules possible. Through a couple practical examples, we showcase that the proposed candidate function asymptotically represents the lexicographic hierarchy, and that both proposed algorithms return minimum rank decisions, even when other approaches do not.
{"title":"Optimization of Rulebooks via Asymptotically Representing Lexicographic Hierarchies for Autonomous Vehicles","authors":"Matteo Penlington, Alessandro Zanardi, Emilio Frazzoli","doi":"arxiv-2409.11199","DOIUrl":"https://doi.org/arxiv-2409.11199","url":null,"abstract":"A key challenge in autonomous driving is that Autonomous Vehicles (AVs) must\u0000contend with multiple, often conflicting, planning requirements. These\u0000requirements naturally form in a hierarchy -- e.g., avoiding a collision is\u0000more important than maintaining lane. While the exact structure of this\u0000hierarchy remains unknown, to progress towards ensuring that AVs satisfy\u0000pre-determined behavior specifications, it is crucial to develop approaches\u0000that systematically account for it. Motivated by lexicographic behavior\u0000specification in AVs, this work addresses a lexicographic multi-objective\u0000motion planning problem, where each objective is incomparably more important\u0000than the next -- consider that avoiding a collision is incomparably more\u0000important than a lane change violation. This work ties together two elements.\u0000Firstly, a multi-objective candidate function that asymptotically represents\u0000lexicographic orders is introduced. Unlike existing multi-objective cost\u0000function formulations, this approach assures that returned solutions\u0000asymptotically align with the lexicographic behavior specification. Secondly,\u0000inspired by continuation methods, we propose two algorithms that asymptotically\u0000approach minimum rank decisions -- i.e., decisions that satisfy the highest\u0000number of important rules possible. Through a couple practical examples, we\u0000showcase that the proposed candidate function asymptotically represents the\u0000lexicographic hierarchy, and that both proposed algorithms return minimum rank\u0000decisions, even when other approaches do not.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266984","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Durgakant Pushp, Junhong Xu, Zheng Chen, Lantao Liu
Bounded rational agents often make decisions by evaluating a finite selection of choices, typically derived from a reference point termed the $`$default policy,' based on previous experience. However, the inherent rigidity of the static default policy presents significant challenges for agents when operating in unknown environment, that are not included in agent's prior knowledge. In this work, we introduce a context-generative default policy that leverages the region observed by the robot to predict unobserved part of the environment, thereby enabling the robot to adaptively adjust its default policy based on both the actual observed map and the $textit{imagined}$ unobserved map. Furthermore, the adaptive nature of the bounded rationality framework enables the robot to manage unreliable or incorrect imaginations by selectively sampling a few trajectories in the vicinity of the default policy. Our approach utilizes a diffusion model for map prediction and a sampling-based planning with B-spline trajectory optimization to generate the default policy. Extensive evaluations reveal that the context-generative policy outperforms the baseline methods in identifying and avoiding unseen obstacles. Additionally, real-world experiments conducted with the Crazyflie drones demonstrate the adaptability of our proposed method, even when acting in environments outside the domain of the training distribution.
有界理性代理通常通过评估有限的选择来做出决策,这些选择通常来自于一个被称为"$$默认政策 "的参考点,该参考点基于以往的经验。然而,当代理在未知环境中工作时,静态默认政策的固有刚性给代理带来了巨大挑战,而这些环境并不包括在代理的先验知识中。在这项工作中,我们引入了一种情境生成默认策略,它利用机器人观察到的区域来预测环境中未观察到的部分,从而使机器人能够根据实际观察到的地图和未观察到的地图自适应地调整其默认策略。此外,有界理性框架的自适应性质使机器人能够通过选择性地采样默认策略附近的一些轨迹来管理不可靠或不正确的想象。我们的方法利用扩散模型进行地图预测,并利用基于采样的规划和 B 样条轨迹优化来生成默认策略。广泛的评估表明,情境生成策略在识别和避开未知障碍物方面优于基准方法。此外,使用 Crazyflie 无人机进行的真实世界实验证明了我们提出的方法的适应性,即使在训练分布领域之外的环境中也是如此。
{"title":"Context-Generative Default Policy for Bounded Rational Agent","authors":"Durgakant Pushp, Junhong Xu, Zheng Chen, Lantao Liu","doi":"arxiv-2409.11604","DOIUrl":"https://doi.org/arxiv-2409.11604","url":null,"abstract":"Bounded rational agents often make decisions by evaluating a finite selection\u0000of choices, typically derived from a reference point termed the $`$default\u0000policy,' based on previous experience. However, the inherent rigidity of the\u0000static default policy presents significant challenges for agents when operating\u0000in unknown environment, that are not included in agent's prior knowledge. In\u0000this work, we introduce a context-generative default policy that leverages the\u0000region observed by the robot to predict unobserved part of the environment,\u0000thereby enabling the robot to adaptively adjust its default policy based on\u0000both the actual observed map and the $textit{imagined}$ unobserved map.\u0000Furthermore, the adaptive nature of the bounded rationality framework enables\u0000the robot to manage unreliable or incorrect imaginations by selectively\u0000sampling a few trajectories in the vicinity of the default policy. Our approach\u0000utilizes a diffusion model for map prediction and a sampling-based planning\u0000with B-spline trajectory optimization to generate the default policy. Extensive\u0000evaluations reveal that the context-generative policy outperforms the baseline\u0000methods in identifying and avoiding unseen obstacles. Additionally, real-world\u0000experiments conducted with the Crazyflie drones demonstrate the adaptability of\u0000our proposed method, even when acting in environments outside the domain of the\u0000training distribution.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Arvind Car, Sai Sravan Yarlagadda, Alison Bartsch, Abraham George, Amir Barati Farimani
As robotic systems become increasingly integrated into complex real-world environments, there is a growing need for approaches that enable robots to understand and act upon natural language instructions without relying on extensive pre-programmed knowledge of their surroundings. This paper presents PLATO, an innovative system that addresses this challenge by leveraging specialized large language model agents to process natural language inputs, understand the environment, predict tool affordances, and generate executable actions for robotic systems. Unlike traditional systems that depend on hard-coded environmental information, PLATO employs a modular architecture of specialized agents to operate without any initial knowledge of the environment. These agents identify objects and their locations within the scene, generate a comprehensive high-level plan, translate this plan into a series of low-level actions, and verify the completion of each step. The system is particularly tested on challenging tool-use tasks, which involve handling diverse objects and require long-horizon planning. PLATO's design allows it to adapt to dynamic and unstructured settings, significantly enhancing its flexibility and robustness. By evaluating the system across various complex scenarios, we demonstrate its capability to tackle a diverse range of tasks and offer a novel solution to integrate LLMs with robotic platforms, advancing the state-of-the-art in autonomous robotic task execution. For videos and prompt details, please see our project website: https://sites.google.com/andrew.cmu.edu/plato
{"title":"PLATO: Planning with LLMs and Affordances for Tool Manipulation","authors":"Arvind Car, Sai Sravan Yarlagadda, Alison Bartsch, Abraham George, Amir Barati Farimani","doi":"arxiv-2409.11580","DOIUrl":"https://doi.org/arxiv-2409.11580","url":null,"abstract":"As robotic systems become increasingly integrated into complex real-world\u0000environments, there is a growing need for approaches that enable robots to\u0000understand and act upon natural language instructions without relying on\u0000extensive pre-programmed knowledge of their surroundings. This paper presents\u0000PLATO, an innovative system that addresses this challenge by leveraging\u0000specialized large language model agents to process natural language inputs,\u0000understand the environment, predict tool affordances, and generate executable\u0000actions for robotic systems. Unlike traditional systems that depend on\u0000hard-coded environmental information, PLATO employs a modular architecture of\u0000specialized agents to operate without any initial knowledge of the environment.\u0000These agents identify objects and their locations within the scene, generate a\u0000comprehensive high-level plan, translate this plan into a series of low-level\u0000actions, and verify the completion of each step. The system is particularly\u0000tested on challenging tool-use tasks, which involve handling diverse objects\u0000and require long-horizon planning. PLATO's design allows it to adapt to dynamic\u0000and unstructured settings, significantly enhancing its flexibility and\u0000robustness. By evaluating the system across various complex scenarios, we\u0000demonstrate its capability to tackle a diverse range of tasks and offer a novel\u0000solution to integrate LLMs with robotic platforms, advancing the\u0000state-of-the-art in autonomous robotic task execution. For videos and prompt\u0000details, please see our project website:\u0000https://sites.google.com/andrew.cmu.edu/plato","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Embodied Everyday Task is a popular task in the embodied AI community, requiring agents to make a sequence of actions based on natural language instructions and visual observations. Traditional learning-based approaches face two challenges. Firstly, natural language instructions often lack explicit task planning. Secondly, extensive training is required to equip models with knowledge of the task environment. Previous works based on Large Language Model (LLM) either suffer from poor performance due to the lack of task-specific knowledge or rely on ground truth as few-shot samples. To address the above limitations, we propose a novel approach called Progressive Retrieval Augmented Generation (P-RAG), which not only effectively leverages the powerful language processing capabilities of LLMs but also progressively accumulates task-specific knowledge without ground-truth. Compared to the conventional RAG methods, which retrieve relevant information from the database in a one-shot manner to assist generation, P-RAG introduces an iterative approach to progressively update the database. In each iteration, P-RAG retrieves the latest database and obtains historical information from the previous interaction as experiential references for the current interaction. Moreover, we also introduce a more granular retrieval scheme that not only retrieves similar tasks but also incorporates retrieval of similar situations to provide more valuable reference experiences. Extensive experiments reveal that P-RAG achieves competitive results without utilizing ground truth and can even further improve performance through self-iterations.
{"title":"P-RAG: Progressive Retrieval Augmented Generation For Planning on Embodied Everyday Task","authors":"Weiye Xu, Min Wang, Wengang Zhou, Houqiang Li","doi":"arxiv-2409.11279","DOIUrl":"https://doi.org/arxiv-2409.11279","url":null,"abstract":"Embodied Everyday Task is a popular task in the embodied AI community,\u0000requiring agents to make a sequence of actions based on natural language\u0000instructions and visual observations. Traditional learning-based approaches\u0000face two challenges. Firstly, natural language instructions often lack explicit\u0000task planning. Secondly, extensive training is required to equip models with\u0000knowledge of the task environment. Previous works based on Large Language Model\u0000(LLM) either suffer from poor performance due to the lack of task-specific\u0000knowledge or rely on ground truth as few-shot samples. To address the above\u0000limitations, we propose a novel approach called Progressive Retrieval Augmented\u0000Generation (P-RAG), which not only effectively leverages the powerful language\u0000processing capabilities of LLMs but also progressively accumulates\u0000task-specific knowledge without ground-truth. Compared to the conventional RAG\u0000methods, which retrieve relevant information from the database in a one-shot\u0000manner to assist generation, P-RAG introduces an iterative approach to\u0000progressively update the database. In each iteration, P-RAG retrieves the\u0000latest database and obtains historical information from the previous\u0000interaction as experiential references for the current interaction. Moreover,\u0000we also introduce a more granular retrieval scheme that not only retrieves\u0000similar tasks but also incorporates retrieval of similar situations to provide\u0000more valuable reference experiences. Extensive experiments reveal that P-RAG\u0000achieves competitive results without utilizing ground truth and can even\u0000further improve performance through self-iterations.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ibrahim Ibrahim, Joris Gillis, Wilm Decré, Jan Swevers
This paper introduces an efficient $mathcal{O}(n)$ compute and memory complexity algorithm for globally optimal path planning on 2D Cartesian grids. Unlike existing marching methods that rely on approximate discretized solutions to the Eikonal equation, our approach achieves exact wavefront propagation by pivoting the analytic distance function based on visibility. The algorithm leverages a dynamic-programming subroutine to efficiently evaluate visibility queries. Through benchmarking against state-of-the-art any-angle path planners, we demonstrate that our method outperforms existing approaches in both speed and accuracy, particularly in cluttered environments. Notably, our method inherently provides globally optimal paths to all grid points, eliminating the need for additional gradient descent steps per path query. The same capability extends to multiple starting positions. We also provide a greedy version of our algorithm as well as open-source C++ implementation of our solver.
本文介绍了一种高效的$mathcal{O}(n)$计算和内存复杂度算法,用于二维笛卡尔网格上的全局最优路径规划。与依赖于艾克纳方程近似离散解的现有行进方法不同,我们的方法通过激活基于可见度的解析距离函数来实现精确的波前传播。该算法利用动态编程子程序来高效评估可见性查询。通过与最先进的任意角度路径规划器进行基准测试,我们证明我们的方法在速度和精度上都优于现有方法,尤其是在杂乱的环境中。值得注意的是,我们的方法本身就能提供通往所有网格点的全局最优路径,从而消除了每次路径查询都需要额外梯度下降步骤的需要。同样的能力也适用于多个起始位置。我们还提供了我们算法的贪婪版本,以及求解器的开源 C++ 实现。
{"title":"Exact Wavefront Propagation for Globally Optimal One-to-All Path Planning on 2D Cartesian Grids","authors":"Ibrahim Ibrahim, Joris Gillis, Wilm Decré, Jan Swevers","doi":"arxiv-2409.11545","DOIUrl":"https://doi.org/arxiv-2409.11545","url":null,"abstract":"This paper introduces an efficient $mathcal{O}(n)$ compute and memory\u0000complexity algorithm for globally optimal path planning on 2D Cartesian grids.\u0000Unlike existing marching methods that rely on approximate discretized solutions\u0000to the Eikonal equation, our approach achieves exact wavefront propagation by\u0000pivoting the analytic distance function based on visibility. The algorithm\u0000leverages a dynamic-programming subroutine to efficiently evaluate visibility\u0000queries. Through benchmarking against state-of-the-art any-angle path planners,\u0000we demonstrate that our method outperforms existing approaches in both speed\u0000and accuracy, particularly in cluttered environments. Notably, our method\u0000inherently provides globally optimal paths to all grid points, eliminating the\u0000need for additional gradient descent steps per path query. The same capability\u0000extends to multiple starting positions. We also provide a greedy version of our\u0000algorithm as well as open-source C++ implementation of our solver.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ninghan Zhong, Alessandro Potenza, Stephen L. Smith
Autonomous navigation in ice-covered waters poses significant challenges due to the frequent lack of viable collision-free trajectories. When complete obstacle avoidance is infeasible, it becomes imperative for the navigation strategy to minimize collisions. Additionally, the dynamic nature of ice, which moves in response to ship maneuvers, complicates the path planning process. To address these challenges, we propose a novel deep learning model to estimate the coarse dynamics of ice movements triggered by ship actions through occupancy estimation. To ensure real-time applicability, we propose a novel approach that caches intermediate prediction results and seamlessly integrates the predictive model into a graph search planner. We evaluate the proposed planner both in simulation and in a physical testbed against existing approaches and show that our planner significantly reduces collisions with ice when compared to the state-of-the-art. Codes and demos of this work are available at https://github.com/IvanIZ/predictive-asv-planner.
{"title":"Autonomous Navigation in Ice-Covered Waters with Learned Predictions on Ship-Ice Interactions","authors":"Ninghan Zhong, Alessandro Potenza, Stephen L. Smith","doi":"arxiv-2409.11326","DOIUrl":"https://doi.org/arxiv-2409.11326","url":null,"abstract":"Autonomous navigation in ice-covered waters poses significant challenges due\u0000to the frequent lack of viable collision-free trajectories. When complete\u0000obstacle avoidance is infeasible, it becomes imperative for the navigation\u0000strategy to minimize collisions. Additionally, the dynamic nature of ice, which\u0000moves in response to ship maneuvers, complicates the path planning process. To\u0000address these challenges, we propose a novel deep learning model to estimate\u0000the coarse dynamics of ice movements triggered by ship actions through\u0000occupancy estimation. To ensure real-time applicability, we propose a novel\u0000approach that caches intermediate prediction results and seamlessly integrates\u0000the predictive model into a graph search planner. We evaluate the proposed\u0000planner both in simulation and in a physical testbed against existing\u0000approaches and show that our planner significantly reduces collisions with ice\u0000when compared to the state-of-the-art. Codes and demos of this work are\u0000available at https://github.com/IvanIZ/predictive-asv-planner.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tong Ke, Parth Agrawal, Yun Zhang, Weikun Zhen, Chao X. Guo, Toby Sharp, Ryan C. Dutoit
In this paper, we introduce a novel estimator for vision-aided inertial navigation systems (VINS), the Preconditioned Cholesky-based Square Root Information Filter (PC-SRIF). When solving linear systems, employing Cholesky decomposition offers superior efficiency but can compromise numerical stability. Due to this, existing VINS utilizing (Square Root) Information Filters often opt for QR decomposition on platforms where single precision is preferred, avoiding the numerical challenges associated with Cholesky decomposition. While these issues are often attributed to the ill-conditioned information matrix in VINS, our analysis reveals that this is not an inherent property of VINS but rather a consequence of specific parameterizations. We identify several factors that contribute to an ill-conditioned information matrix and propose a preconditioning technique to mitigate these conditioning issues. Building on this analysis, we present PC-SRIF, which exhibits remarkable stability in performing Cholesky decomposition in single precision when solving linear systems in VINS. Consequently, PC-SRIF achieves superior theoretical efficiency compared to alternative estimators. To validate the efficiency advantages and numerical stability of PC-SRIF based VINS, we have conducted well controlled experiments, which provide empirical evidence in support of our theoretical findings. Remarkably, in our VINS implementation, PC-SRIF's runtime is 41% faster than QR-based SRIF.
{"title":"PC-SRIF: Preconditioned Cholesky-based Square Root Information Filter for Vision-aided Inertial Navigation","authors":"Tong Ke, Parth Agrawal, Yun Zhang, Weikun Zhen, Chao X. Guo, Toby Sharp, Ryan C. Dutoit","doi":"arxiv-2409.11372","DOIUrl":"https://doi.org/arxiv-2409.11372","url":null,"abstract":"In this paper, we introduce a novel estimator for vision-aided inertial\u0000navigation systems (VINS), the Preconditioned Cholesky-based Square Root\u0000Information Filter (PC-SRIF). When solving linear systems, employing Cholesky\u0000decomposition offers superior efficiency but can compromise numerical\u0000stability. Due to this, existing VINS utilizing (Square Root) Information\u0000Filters often opt for QR decomposition on platforms where single precision is\u0000preferred, avoiding the numerical challenges associated with Cholesky\u0000decomposition. While these issues are often attributed to the ill-conditioned\u0000information matrix in VINS, our analysis reveals that this is not an inherent\u0000property of VINS but rather a consequence of specific parameterizations. We\u0000identify several factors that contribute to an ill-conditioned information\u0000matrix and propose a preconditioning technique to mitigate these conditioning\u0000issues. Building on this analysis, we present PC-SRIF, which exhibits\u0000remarkable stability in performing Cholesky decomposition in single precision\u0000when solving linear systems in VINS. Consequently, PC-SRIF achieves superior\u0000theoretical efficiency compared to alternative estimators. To validate the\u0000efficiency advantages and numerical stability of PC-SRIF based VINS, we have\u0000conducted well controlled experiments, which provide empirical evidence in\u0000support of our theoretical findings. Remarkably, in our VINS implementation,\u0000PC-SRIF's runtime is 41% faster than QR-based SRIF.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A team of multiple robots seamlessly and safely working in human-filled public environments requires adaptive task allocation and socially-aware navigation that account for dynamic human behavior. Current approaches struggle with highly dynamic pedestrian movement and the need for flexible task allocation. We propose Hyper-SAMARL, a hypergraph-based system for multi-robot task allocation and socially-aware navigation, leveraging multi-agent reinforcement learning (MARL). Hyper-SAMARL models the environmental dynamics between robots, humans, and points of interest (POIs) using a hypergraph, enabling adaptive task assignment and socially-compliant navigation through a hypergraph diffusion mechanism. Our framework, trained with MARL, effectively captures interactions between robots and humans, adapting tasks based on real-time changes in human activity. Experimental results demonstrate that Hyper-SAMARL outperforms baseline models in terms of social navigation, task completion efficiency, and adaptability in various simulated scenarios.
{"title":"Hyper-SAMARL: Hypergraph-based Coordinated Task Allocation and Socially-aware Navigation for Multi-Robot Systems","authors":"Weizheng Wang, Aniket Bera, Byung-Cheol Min","doi":"arxiv-2409.11561","DOIUrl":"https://doi.org/arxiv-2409.11561","url":null,"abstract":"A team of multiple robots seamlessly and safely working in human-filled\u0000public environments requires adaptive task allocation and socially-aware\u0000navigation that account for dynamic human behavior. Current approaches struggle\u0000with highly dynamic pedestrian movement and the need for flexible task\u0000allocation. We propose Hyper-SAMARL, a hypergraph-based system for multi-robot\u0000task allocation and socially-aware navigation, leveraging multi-agent\u0000reinforcement learning (MARL). Hyper-SAMARL models the environmental dynamics\u0000between robots, humans, and points of interest (POIs) using a hypergraph,\u0000enabling adaptive task assignment and socially-compliant navigation through a\u0000hypergraph diffusion mechanism. Our framework, trained with MARL, effectively\u0000captures interactions between robots and humans, adapting tasks based on\u0000real-time changes in human activity. Experimental results demonstrate that\u0000Hyper-SAMARL outperforms baseline models in terms of social navigation, task\u0000completion efficiency, and adaptability in various simulated scenarios.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}