arXiv - CS - Robotics最新文献_第6页

Optimization of Rulebooks via Asymptotically Representing Lexicographic Hierarchies for Autonomous Vehicles 通过渐近表示自动驾驶汽车的词典层次优化规则手册

arXiv - CS - Robotics

Pub Date : 2024-09-17 DOI: arxiv-2409.11199

Matteo Penlington, Alessandro Zanardi, Emilio Frazzoli

A key challenge in autonomous driving is that Autonomous Vehicles (AVs) mustcontend with multiple, often conflicting, planning requirements. Theserequirements naturally form in a hierarchy -- e.g., avoiding a collision ismore important than maintaining lane. While the exact structure of thishierarchy remains unknown, to progress towards ensuring that AVs satisfypre-determined behavior specifications, it is crucial to develop approachesthat systematically account for it. Motivated by lexicographic behaviorspecification in AVs, this work addresses a lexicographic multi-objectivemotion planning problem, where each objective is incomparably more importantthan the next -- consider that avoiding a collision is incomparably moreimportant than a lane change violation. This work ties together two elements.Firstly, a multi-objective candidate function that asymptotically representslexicographic orders is introduced. Unlike existing multi-objective costfunction formulations, this approach assures that returned solutionsasymptotically align with the lexicographic behavior specification. Secondly,inspired by continuation methods, we propose two algorithms that asymptoticallyapproach minimum rank decisions -- i.e., decisions that satisfy the highestnumber of important rules possible. Through a couple practical examples, weshowcase that the proposed candidate function asymptotically represents thelexicographic hierarchy, and that both proposed algorithms return minimum rankdecisions, even when other approaches do not.

自动驾驶面临的一个主要挑战是，自动驾驶汽车（AV）必须满足多种规划要求，这些要求往往相互冲突。这些要求自然形成了一个层次结构--例如，避免碰撞比保持车道更重要。虽然这一层次结构的确切结构仍是未知数，但为了确保无人驾驶汽车满足预先确定的行为规范，开发出系统地考虑这一层次结构的方法至关重要。受自动驾驶汽车词典行为规范的启发，这项研究解决了一个词典多目标运动规划问题，在这个问题中，每个目标都比下一个目标重要得多--考虑到避免碰撞比违反变道规定重要得多。首先，我们引入了一种可渐近地表示反射阶次的多目标候选函数。与现有的多目标成本函数公式不同，这种方法能确保返回的解决方案渐近地符合词典行为规范。其次，受延续方法的启发，我们提出了两种渐近最小等级决策的算法，即满足尽可能多重要规则的决策。通过几个实际例子，我们证明了所提出的候选函数渐近地代表了lexicographic层次结构，而且所提出的两种算法都能返回最小等级决策，即使其他方法不能做到这一点。

{"title":"Optimization of Rulebooks via Asymptotically Representing Lexicographic Hierarchies for Autonomous Vehicles","authors":"Matteo Penlington, Alessandro Zanardi, Emilio Frazzoli","doi":"arxiv-2409.11199","DOIUrl":"https://doi.org/arxiv-2409.11199","url":null,"abstract":"A key challenge in autonomous driving is that Autonomous Vehicles (AVs) must\u0000contend with multiple, often conflicting, planning requirements. These\u0000requirements naturally form in a hierarchy -- e.g., avoiding a collision is\u0000more important than maintaining lane. While the exact structure of this\u0000hierarchy remains unknown, to progress towards ensuring that AVs satisfy\u0000pre-determined behavior specifications, it is crucial to develop approaches\u0000that systematically account for it. Motivated by lexicographic behavior\u0000specification in AVs, this work addresses a lexicographic multi-objective\u0000motion planning problem, where each objective is incomparably more important\u0000than the next -- consider that avoiding a collision is incomparably more\u0000important than a lane change violation. This work ties together two elements.\u0000Firstly, a multi-objective candidate function that asymptotically represents\u0000lexicographic orders is introduced. Unlike existing multi-objective cost\u0000function formulations, this approach assures that returned solutions\u0000asymptotically align with the lexicographic behavior specification. Secondly,\u0000inspired by continuation methods, we propose two algorithms that asymptotically\u0000approach minimum rank decisions -- i.e., decisions that satisfy the highest\u0000number of important rules possible. Through a couple practical examples, we\u0000showcase that the proposed candidate function asymptotically represents the\u0000lexicographic hierarchy, and that both proposed algorithms return minimum rank\u0000decisions, even when other approaches do not.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266984","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Use the Force, Bot! -- Force-Aware ProDMP with Event-Based Replanning 使用原力，机器人-- 基于事件重新规划的力感知 ProDMP

arXiv - CS - Robotics

Pub Date : 2024-09-17 DOI: arxiv-2409.11144

Paul Werner Lödige, Maximilian Xiling Li, Rudolf Lioutikov

Movement Primitives (MPs) are a well-established method for representing andgenerating modular robot trajectories. This work presents FA-ProDMP, a newapproach which introduces force awareness to Probabilistic Dynamic MovementPrimitives (ProDMP). FA-ProDMP adapts the trajectory during runtime to accountfor measured and desired forces. It offers smooth trajectories and capturesposition and force correlations over multiple trajectories, e.g. a set of humandemonstrations. FA-ProDMP supports multiple axes of force and is thus agnosticto cartesian or joint space control. This makes FA-ProDMP a valuable tool forlearning contact rich manipulation tasks such as polishing, cutting orindustrial assembly from demonstration. In order to reliably evaluateFA-ProDMP, this work additionally introduces a modular, 3D printed task suitecalled POEMPEL, inspired by the popular Lego Technic pins. POEMPEL mimicsindustrial peg-in-hole assembly tasks with force requirements. It offersmultiple parameters of adjustment, such as position, orientation and plugstiffness level, thus varying the direction and amount of required forces. Ourexperiments show that FA-ProDMP outperforms other MP formulations on thePOEMPEL setup and a electrical power plug insertion task, due to its replanningcapabilities based on the measured forces. These findings highlight howFA-ProDMP enhances the performance of robotic systems in contact-richmanipulation tasks.

运动原型（MP）是表示和生成模块化机器人轨迹的一种行之有效的方法。本研究提出的 FA-ProDMP 是一种新方法，它将力感知引入了概率动态运动原语 (ProDMP)。FA-ProDMP 可在运行时调整轨迹，以考虑测量到的力和期望的力。它提供平滑轨迹，并捕捉多个轨迹（例如一组人体演示）上的位置和力相关性。FA-ProDMP 支持多轴受力，因此与笛卡尔或关节空间控制无关。这使得 FA-ProDMP 成为学习丰富的接触操作任务（如抛光、切割或工业装配）的重要工具。为了对 FA-ProDMP 进行可靠的评估，这项工作还引入了一种名为 POEMPEL 的模块化 3D 打印任务套装，其灵感来源于广受欢迎的乐高 Technic 销钉。POEMPEL 模拟了具有力要求的工业钉入孔装配任务。它提供多种调节参数，如位置、方向和插头刚度水平，从而改变所需力的方向和大小。最近的实验表明，FA-ProDMP在POEMPEL设置和电源插头插入任务上的表现优于其他MP公式，这得益于它根据测量力重新规划的能力。这些发现凸显了FA-ProDMP是如何提高机器人系统在接触力操纵任务中的性能的。

{"title":"Use the Force, Bot! -- Force-Aware ProDMP with Event-Based Replanning","authors":"Paul Werner Lödige, Maximilian Xiling Li, Rudolf Lioutikov","doi":"arxiv-2409.11144","DOIUrl":"https://doi.org/arxiv-2409.11144","url":null,"abstract":"Movement Primitives (MPs) are a well-established method for representing and\u0000generating modular robot trajectories. This work presents FA-ProDMP, a new\u0000approach which introduces force awareness to Probabilistic Dynamic Movement\u0000Primitives (ProDMP). FA-ProDMP adapts the trajectory during runtime to account\u0000for measured and desired forces. It offers smooth trajectories and captures\u0000position and force correlations over multiple trajectories, e.g. a set of human\u0000demonstrations. FA-ProDMP supports multiple axes of force and is thus agnostic\u0000to cartesian or joint space control. This makes FA-ProDMP a valuable tool for\u0000learning contact rich manipulation tasks such as polishing, cutting or\u0000industrial assembly from demonstration. In order to reliably evaluate\u0000FA-ProDMP, this work additionally introduces a modular, 3D printed task suite\u0000called POEMPEL, inspired by the popular Lego Technic pins. POEMPEL mimics\u0000industrial peg-in-hole assembly tasks with force requirements. It offers\u0000multiple parameters of adjustment, such as position, orientation and plug\u0000stiffness level, thus varying the direction and amount of required forces. Our\u0000experiments show that FA-ProDMP outperforms other MP formulations on the\u0000POEMPEL setup and a electrical power plug insertion task, due to its replanning\u0000capabilities based on the measured forces. These findings highlight how\u0000FA-ProDMP enhances the performance of robotic systems in contact-rich\u0000manipulation tasks.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":"49 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142267085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DroneDiffusion: Robust Quadrotor Dynamics Learning with Diffusion Models DroneDiffusion：利用扩散模型进行稳健的四旋翼动态学习

arXiv - CS - Robotics

Pub Date : 2024-09-17 DOI: arxiv-2409.11292

Avirup Das, Rishabh Dev Yadav, Sihao Sun, Mingfei Sun, Samuel Kaski, Wei Pan

An inherent fragility of quadrotor systems stems from model inaccuracies andexternal disturbances. These factors hinder performance and compromise thestability of the system, making precise control challenging. Existingmodel-based approaches either make deterministic assumptions, utilizeGaussian-based representations of uncertainty, or rely on nominal models, allof which often fall short in capturing the complex, multimodal nature ofreal-world dynamics. This work introduces DroneDiffusion, a novel frameworkthat leverages conditional diffusion models to learn quadrotor dynamics,formulated as a sequence generation task. DroneDiffusion achieves superiorgeneralization to unseen, complex scenarios by capturing the temporal nature ofuncertainties and mitigating error propagation. We integrate the learneddynamics with an adaptive controller for trajectory tracking with stabilityguarantees. Extensive experiments in both simulation and real-world flightsdemonstrate the robustness of the framework across a range of scenarios,including unfamiliar flight paths and varying payloads, velocities, and winddisturbances.

四旋翼飞行器系统固有的脆弱性源于模型的不准确性和外部干扰。这些因素阻碍了系统的性能并损害了系统的稳定性，使精确控制变得十分困难。现有的基于模型的方法要么是确定性假设，要么是利用基于高斯的不确定性表示法，要么是依赖于名义模型，所有这些方法往往无法捕捉现实世界动态的复杂性和多模态性。这项工作介绍了 DroneDiffusion，这是一种利用条件扩散模型学习四旋翼飞行器动态的新型框架，它被表述为一个序列生成任务。DroneDiffusion 通过捕捉不确定性的时间性和减少错误传播，实现了对未知复杂场景的卓越泛化。我们将学习到的动力学与自适应控制器相结合，实现了具有稳定性保证的轨迹跟踪。在模拟和实际飞行中进行的大量实验证明了该框架在各种情况下的鲁棒性，包括不熟悉的飞行路径和不同的有效载荷、速度和风力干扰。

{"title":"DroneDiffusion: Robust Quadrotor Dynamics Learning with Diffusion Models","authors":"Avirup Das, Rishabh Dev Yadav, Sihao Sun, Mingfei Sun, Samuel Kaski, Wei Pan","doi":"arxiv-2409.11292","DOIUrl":"https://doi.org/arxiv-2409.11292","url":null,"abstract":"An inherent fragility of quadrotor systems stems from model inaccuracies and\u0000external disturbances. These factors hinder performance and compromise the\u0000stability of the system, making precise control challenging. Existing\u0000model-based approaches either make deterministic assumptions, utilize\u0000Gaussian-based representations of uncertainty, or rely on nominal models, all\u0000of which often fall short in capturing the complex, multimodal nature of\u0000real-world dynamics. This work introduces DroneDiffusion, a novel framework\u0000that leverages conditional diffusion models to learn quadrotor dynamics,\u0000formulated as a sequence generation task. DroneDiffusion achieves superior\u0000generalization to unseen, complex scenarios by capturing the temporal nature of\u0000uncertainties and mitigating error propagation. We integrate the learned\u0000dynamics with an adaptive controller for trajectory tracking with stability\u0000guarantees. Extensive experiments in both simulation and real-world flights\u0000demonstrate the robustness of the framework across a range of scenarios,\u0000including unfamiliar flight paths and varying payloads, velocities, and wind\u0000disturbances.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":"32 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Context-Generative Default Policy for Bounded Rational Agent 有界理性代理的情境生成默认策略

arXiv - CS - Robotics

Pub Date : 2024-09-17 DOI: arxiv-2409.11604

Durgakant Pushp, Junhong Xu, Zheng Chen, Lantao Liu

Bounded rational agents often make decisions by evaluating a finite selectionof choices, typically derived from a reference point termed the $`$defaultpolicy,' based on previous experience. However, the inherent rigidity of thestatic default policy presents significant challenges for agents when operatingin unknown environment, that are not included in agent's prior knowledge. Inthis work, we introduce a context-generative default policy that leverages theregion observed by the robot to predict unobserved part of the environment,thereby enabling the robot to adaptively adjust its default policy based onboth the actual observed map and the $textit{imagined}$ unobserved map.Furthermore, the adaptive nature of the bounded rationality framework enablesthe robot to manage unreliable or incorrect imaginations by selectivelysampling a few trajectories in the vicinity of the default policy. Our approachutilizes a diffusion model for map prediction and a sampling-based planningwith B-spline trajectory optimization to generate the default policy. Extensiveevaluations reveal that the context-generative policy outperforms the baselinemethods in identifying and avoiding unseen obstacles. Additionally, real-worldexperiments conducted with the Crazyflie drones demonstrate the adaptability ofour proposed method, even when acting in environments outside the domain of thetraining distribution.

有界理性代理通常通过评估有限的选择来做出决策，这些选择通常来自于一个被称为"$$默认政策 "的参考点，该参考点基于以往的经验。然而，当代理在未知环境中工作时，静态默认政策的固有刚性给代理带来了巨大挑战，而这些环境并不包括在代理的先验知识中。在这项工作中，我们引入了一种情境生成默认策略，它利用机器人观察到的区域来预测环境中未观察到的部分，从而使机器人能够根据实际观察到的地图和未观察到的地图自适应地调整其默认策略。此外，有界理性框架的自适应性质使机器人能够通过选择性地采样默认策略附近的一些轨迹来管理不可靠或不正确的想象。我们的方法利用扩散模型进行地图预测，并利用基于采样的规划和 B 样条轨迹优化来生成默认策略。广泛的评估表明，情境生成策略在识别和避开未知障碍物方面优于基准方法。此外，使用 Crazyflie 无人机进行的真实世界实验证明了我们提出的方法的适应性，即使在训练分布领域之外的环境中也是如此。

{"title":"Context-Generative Default Policy for Bounded Rational Agent","authors":"Durgakant Pushp, Junhong Xu, Zheng Chen, Lantao Liu","doi":"arxiv-2409.11604","DOIUrl":"https://doi.org/arxiv-2409.11604","url":null,"abstract":"Bounded rational agents often make decisions by evaluating a finite selection\u0000of choices, typically derived from a reference point termed the $`$default\u0000policy,' based on previous experience. However, the inherent rigidity of the\u0000static default policy presents significant challenges for agents when operating\u0000in unknown environment, that are not included in agent's prior knowledge. In\u0000this work, we introduce a context-generative default policy that leverages the\u0000region observed by the robot to predict unobserved part of the environment,\u0000thereby enabling the robot to adaptively adjust its default policy based on\u0000both the actual observed map and the $textit{imagined}$ unobserved map.\u0000Furthermore, the adaptive nature of the bounded rationality framework enables\u0000the robot to manage unreliable or incorrect imaginations by selectively\u0000sampling a few trajectories in the vicinity of the default policy. Our approach\u0000utilizes a diffusion model for map prediction and a sampling-based planning\u0000with B-spline trajectory optimization to generate the default policy. Extensive\u0000evaluations reveal that the context-generative policy outperforms the baseline\u0000methods in identifying and avoiding unseen obstacles. Additionally, real-world\u0000experiments conducted with the Crazyflie drones demonstrate the adaptability of\u0000our proposed method, even when acting in environments outside the domain of the\u0000training distribution.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":"77 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

PLATO: Planning with LLMs and Affordances for Tool Manipulation PLATO：利用 LLM 和 Affordances 进行工具操作规划

arXiv - CS - Robotics

Pub Date : 2024-09-17 DOI: arxiv-2409.11580

Arvind Car, Sai Sravan Yarlagadda, Alison Bartsch, Abraham George, Amir Barati Farimani

As robotic systems become increasingly integrated into complex real-worldenvironments, there is a growing need for approaches that enable robots tounderstand and act upon natural language instructions without relying onextensive pre-programmed knowledge of their surroundings. This paper presentsPLATO, an innovative system that addresses this challenge by leveragingspecialized large language model agents to process natural language inputs,understand the environment, predict tool affordances, and generate executableactions for robotic systems. Unlike traditional systems that depend onhard-coded environmental information, PLATO employs a modular architecture ofspecialized agents to operate without any initial knowledge of the environment.These agents identify objects and their locations within the scene, generate acomprehensive high-level plan, translate this plan into a series of low-levelactions, and verify the completion of each step. The system is particularlytested on challenging tool-use tasks, which involve handling diverse objectsand require long-horizon planning. PLATO's design allows it to adapt to dynamicand unstructured settings, significantly enhancing its flexibility androbustness. By evaluating the system across various complex scenarios, wedemonstrate its capability to tackle a diverse range of tasks and offer a novelsolution to integrate LLMs with robotic platforms, advancing thestate-of-the-art in autonomous robotic task execution. For videos and promptdetails, please see our project website:https://sites.google.com/andrew.cmu.edu/plato

随着机器人系统越来越多地融入复杂的现实世界环境，人们越来越需要能让机器人理解自然语言指令并根据指令行动的方法，而无需依赖对周围环境的大量预编程知识。本文介绍了一种创新系统--PLATO，它利用专门的大型语言模型代理来处理自然语言输入、理解环境、预测工具承受能力，并为机器人系统生成可执行的动作，从而应对这一挑战。与依赖硬编码环境信息的传统系统不同，PLATO 采用了由专业代理组成的模块化架构，无需任何初始环境知识即可运行。这些代理可识别场景中的物体及其位置，生成全面的高级计划，将该计划转化为一系列低级动作，并验证每个步骤的完成情况。该系统特别在具有挑战性的工具使用任务中进行了测试，这些任务涉及处理各种不同的物体，需要进行长远规划。PLATO的设计使其能够适应动态和非结构化的环境，大大提高了灵活性和稳健性。通过在各种复杂场景中对该系统进行评估，我们展示了该系统处理各种任务的能力，并提供了将 LLM 与机器人平台集成的新型解决方案，从而推动了自主机器人任务执行技术的发展。有关视频和提示详情，请访问我们的项目网站：https://sites.google.com/andrew.cmu.edu/plato。

{"title":"PLATO: Planning with LLMs and Affordances for Tool Manipulation","authors":"Arvind Car, Sai Sravan Yarlagadda, Alison Bartsch, Abraham George, Amir Barati Farimani","doi":"arxiv-2409.11580","DOIUrl":"https://doi.org/arxiv-2409.11580","url":null,"abstract":"As robotic systems become increasingly integrated into complex real-world\u0000environments, there is a growing need for approaches that enable robots to\u0000understand and act upon natural language instructions without relying on\u0000extensive pre-programmed knowledge of their surroundings. This paper presents\u0000PLATO, an innovative system that addresses this challenge by leveraging\u0000specialized large language model agents to process natural language inputs,\u0000understand the environment, predict tool affordances, and generate executable\u0000actions for robotic systems. Unlike traditional systems that depend on\u0000hard-coded environmental information, PLATO employs a modular architecture of\u0000specialized agents to operate without any initial knowledge of the environment.\u0000These agents identify objects and their locations within the scene, generate a\u0000comprehensive high-level plan, translate this plan into a series of low-level\u0000actions, and verify the completion of each step. The system is particularly\u0000tested on challenging tool-use tasks, which involve handling diverse objects\u0000and require long-horizon planning. PLATO's design allows it to adapt to dynamic\u0000and unstructured settings, significantly enhancing its flexibility and\u0000robustness. By evaluating the system across various complex scenarios, we\u0000demonstrate its capability to tackle a diverse range of tasks and offer a novel\u0000solution to integrate LLMs with robotic platforms, advancing the\u0000state-of-the-art in autonomous robotic task execution. For videos and prompt\u0000details, please see our project website:\u0000https://sites.google.com/andrew.cmu.edu/plato","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":"5 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

P-RAG: Progressive Retrieval Augmented Generation For Planning on Embodied Everyday Task P-RAG：用于规划日常任务的渐进式检索增强生成技术

arXiv - CS - Robotics

Pub Date : 2024-09-17 DOI: arxiv-2409.11279

Weiye Xu, Min Wang, Wengang Zhou, Houqiang Li

Embodied Everyday Task is a popular task in the embodied AI community,requiring agents to make a sequence of actions based on natural languageinstructions and visual observations. Traditional learning-based approachesface two challenges. Firstly, natural language instructions often lack explicittask planning. Secondly, extensive training is required to equip models withknowledge of the task environment. Previous works based on Large Language Model(LLM) either suffer from poor performance due to the lack of task-specificknowledge or rely on ground truth as few-shot samples. To address the abovelimitations, we propose a novel approach called Progressive Retrieval AugmentedGeneration (P-RAG), which not only effectively leverages the powerful languageprocessing capabilities of LLMs but also progressively accumulatestask-specific knowledge without ground-truth. Compared to the conventional RAGmethods, which retrieve relevant information from the database in a one-shotmanner to assist generation, P-RAG introduces an iterative approach toprogressively update the database. In each iteration, P-RAG retrieves thelatest database and obtains historical information from the previousinteraction as experiential references for the current interaction. Moreover,we also introduce a more granular retrieval scheme that not only retrievessimilar tasks but also incorporates retrieval of similar situations to providemore valuable reference experiences. Extensive experiments reveal that P-RAGachieves competitive results without utilizing ground truth and can evenfurther improve performance through self-iterations.

嵌入式日常任务（Embodied Everyday Task）是嵌入式人工智能界的一项热门任务，要求代理根据自然语言指令和视觉观察做出一系列动作。传统的基于学习的方法面临两个挑战。首先，自然语言指令通常缺乏明确的任务规划。其次，需要对模型进行大量训练，使其掌握任务环境的知识。以前基于大型语言模型（LLM）的工作要么由于缺乏任务特定知识而导致性能不佳，要么依赖于作为少量样本的地面实况。为了解决上述问题，我们提出了一种名为 "渐进式检索增强生成（Progressive Retrieval AugmentedGeneration，P-RAG）"的新方法，该方法不仅能有效利用 LLM 强大的语言处理能力，还能在不使用地面实况的情况下逐步积累特定任务的知识。传统的 RAG 方法是从数据库中一次性检索相关信息以帮助生成，与之相比，P-RAG 引入了一种迭代方法来逐步更新数据库。在每次迭代中，P-RAG 都会检索最新的数据库，并从之前的交互中获取历史信息，作为当前交互的经验参考。此外，我们还引入了一种粒度更细的检索方案，不仅能检索相似任务，还能检索相似情况，从而提供更有价值的参考经验。广泛的实验表明，P-RAG 在不使用地面实况的情况下也能获得有竞争力的结果，甚至还能通过自我迭代进一步提高性能。

{"title":"P-RAG: Progressive Retrieval Augmented Generation For Planning on Embodied Everyday Task","authors":"Weiye Xu, Min Wang, Wengang Zhou, Houqiang Li","doi":"arxiv-2409.11279","DOIUrl":"https://doi.org/arxiv-2409.11279","url":null,"abstract":"Embodied Everyday Task is a popular task in the embodied AI community,\u0000requiring agents to make a sequence of actions based on natural language\u0000instructions and visual observations. Traditional learning-based approaches\u0000face two challenges. Firstly, natural language instructions often lack explicit\u0000task planning. Secondly, extensive training is required to equip models with\u0000knowledge of the task environment. Previous works based on Large Language Model\u0000(LLM) either suffer from poor performance due to the lack of task-specific\u0000knowledge or rely on ground truth as few-shot samples. To address the above\u0000limitations, we propose a novel approach called Progressive Retrieval Augmented\u0000Generation (P-RAG), which not only effectively leverages the powerful language\u0000processing capabilities of LLMs but also progressively accumulates\u0000task-specific knowledge without ground-truth. Compared to the conventional RAG\u0000methods, which retrieve relevant information from the database in a one-shot\u0000manner to assist generation, P-RAG introduces an iterative approach to\u0000progressively update the database. In each iteration, P-RAG retrieves the\u0000latest database and obtains historical information from the previous\u0000interaction as experiential references for the current interaction. Moreover,\u0000we also introduce a more granular retrieval scheme that not only retrieves\u0000similar tasks but also incorporates retrieval of similar situations to provide\u0000more valuable reference experiences. Extensive experiments reveal that P-RAG\u0000achieves competitive results without utilizing ground truth and can even\u0000further improve performance through self-iterations.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":"26 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Exact Wavefront Propagation for Globally Optimal One-to-All Path Planning on 2D Cartesian Grids 在二维笛卡尔网格上进行全局最优 "一对全 "路径规划的精确波前传播

arXiv - CS - Robotics

Pub Date : 2024-09-17 DOI: arxiv-2409.11545

Ibrahim Ibrahim, Joris Gillis, Wilm Decré, Jan Swevers

This paper introduces an efficient $mathcal{O}(n)$ compute and memorycomplexity algorithm for globally optimal path planning on 2D Cartesian grids.Unlike existing marching methods that rely on approximate discretized solutionsto the Eikonal equation, our approach achieves exact wavefront propagation bypivoting the analytic distance function based on visibility. The algorithmleverages a dynamic-programming subroutine to efficiently evaluate visibilityqueries. Through benchmarking against state-of-the-art any-angle path planners,we demonstrate that our method outperforms existing approaches in both speedand accuracy, particularly in cluttered environments. Notably, our methodinherently provides globally optimal paths to all grid points, eliminating theneed for additional gradient descent steps per path query. The same capabilityextends to multiple starting positions. We also provide a greedy version of ouralgorithm as well as open-source C++ implementation of our solver.

本文介绍了一种高效的$mathcal{O}(n)$计算和内存复杂度算法，用于二维笛卡尔网格上的全局最优路径规划。与依赖于艾克纳方程近似离散解的现有行进方法不同，我们的方法通过激活基于可见度的解析距离函数来实现精确的波前传播。该算法利用动态编程子程序来高效评估可见性查询。通过与最先进的任意角度路径规划器进行基准测试，我们证明我们的方法在速度和精度上都优于现有方法，尤其是在杂乱的环境中。值得注意的是，我们的方法本身就能提供通往所有网格点的全局最优路径，从而消除了每次路径查询都需要额外梯度下降步骤的需要。同样的能力也适用于多个起始位置。我们还提供了我们算法的贪婪版本，以及求解器的开源 C++ 实现。

引用次数: 0

Autonomous Navigation in Ice-Covered Waters with Learned Predictions on Ship-Ice Interactions 利用船舶与冰相互作用的学习预测在冰封水域自主导航

arXiv - CS - Robotics

Pub Date : 2024-09-17 DOI: arxiv-2409.11326

Ninghan Zhong, Alessandro Potenza, Stephen L. Smith

Autonomous navigation in ice-covered waters poses significant challenges dueto the frequent lack of viable collision-free trajectories. When completeobstacle avoidance is infeasible, it becomes imperative for the navigationstrategy to minimize collisions. Additionally, the dynamic nature of ice, whichmoves in response to ship maneuvers, complicates the path planning process. Toaddress these challenges, we propose a novel deep learning model to estimatethe coarse dynamics of ice movements triggered by ship actions throughoccupancy estimation. To ensure real-time applicability, we propose a novelapproach that caches intermediate prediction results and seamlessly integratesthe predictive model into a graph search planner. We evaluate the proposedplanner both in simulation and in a physical testbed against existingapproaches and show that our planner significantly reduces collisions with icewhen compared to the state-of-the-art. Codes and demos of this work areavailable at https://github.com/IvanIZ/predictive-asv-planner.

由于经常缺乏可行的无碰撞轨迹，在冰封水域进行自主导航面临着巨大挑战。当完全避开障碍物不可行时，导航策略就必须尽量减少碰撞。此外，冰的动态性质会随着船只的机动而移动，这也使路径规划过程变得更加复杂。为了应对这些挑战，我们提出了一种新颖的深度学习模型，通过占位估计来估计船舶行动引发的冰运动的粗动态。为确保实时适用性，我们提出了一种新方法，即缓存中间预测结果，并将预测模型无缝集成到图搜索规划器中。我们在仿真和物理测试平台上评估了所提出的计划器与现有计划器的比较，结果表明，与最先进的计划器相比，我们的计划器大大减少了与冰的碰撞。这项工作的代码和演示可在 https://github.com/IvanIZ/predictive-asv-planner 上获得。

引用次数: 0

Hyper-SAMARL: Hypergraph-based Coordinated Task Allocation and Socially-aware Navigation for Multi-Robot Systems Hyper-SAMARL：基于超图的多机器人系统协调任务分配和社会感知导航

arXiv - CS - Robotics

Pub Date : 2024-09-17 DOI: arxiv-2409.11561

Weizheng Wang, Aniket Bera, Byung-Cheol Min

A team of multiple robots seamlessly and safely working in human-filledpublic environments requires adaptive task allocation and socially-awarenavigation that account for dynamic human behavior. Current approaches strugglewith highly dynamic pedestrian movement and the need for flexible taskallocation. We propose Hyper-SAMARL, a hypergraph-based system for multi-robottask allocation and socially-aware navigation, leveraging multi-agentreinforcement learning (MARL). Hyper-SAMARL models the environmental dynamicsbetween robots, humans, and points of interest (POIs) using a hypergraph,enabling adaptive task assignment and socially-compliant navigation through ahypergraph diffusion mechanism. Our framework, trained with MARL, effectivelycaptures interactions between robots and humans, adapting tasks based onreal-time changes in human activity. Experimental results demonstrate thatHyper-SAMARL outperforms baseline models in terms of social navigation, taskcompletion efficiency, and adaptability in various simulated scenarios.

一个由多个机器人组成的团队要在充满人类的公共环境中安全无缝地工作，需要考虑到人类的动态行为，进行自适应任务分配和社会导航。目前的方法难以应对高度动态的行人运动和灵活的任务分配需求。我们提出了 Hyper-SAMARL，这是一种基于超图的多任务分配和社会感知导航系统，利用了多代理强化学习（MARL）。Hyper-SAMARL利用超图对机器人、人类和兴趣点（POIs）之间的环境动态进行建模，通过超图扩散机制实现自适应任务分配和符合社会需求的导航。我们的框架采用 MARL 训练，能有效捕捉机器人与人类之间的互动，并根据人类活动的实时变化调整任务。实验结果表明，在各种模拟场景中，Hyper-SAMARL 在社交导航、任务完成效率和适应性方面都优于基线模型。

引用次数: 0

PC-SRIF: Preconditioned Cholesky-based Square Root Information Filter for Vision-aided Inertial Navigation PC-SRIF：用于视觉辅助惯性导航的基于 Cholesky 的预条件平方根信息滤波器

arXiv - CS - Robotics

Pub Date : 2024-09-17 DOI: arxiv-2409.11372

Tong Ke, Parth Agrawal, Yun Zhang, Weikun Zhen, Chao X. Guo, Toby Sharp, Ryan C. Dutoit

In this paper, we introduce a novel estimator for vision-aided inertialnavigation systems (VINS), the Preconditioned Cholesky-based Square RootInformation Filter (PC-SRIF). When solving linear systems, employing Choleskydecomposition offers superior efficiency but can compromise numericalstability. Due to this, existing VINS utilizing (Square Root) InformationFilters often opt for QR decomposition on platforms where single precision ispreferred, avoiding the numerical challenges associated with Choleskydecomposition. While these issues are often attributed to the ill-conditionedinformation matrix in VINS, our analysis reveals that this is not an inherentproperty of VINS but rather a consequence of specific parameterizations. Weidentify several factors that contribute to an ill-conditioned informationmatrix and propose a preconditioning technique to mitigate these conditioningissues. Building on this analysis, we present PC-SRIF, which exhibitsremarkable stability in performing Cholesky decomposition in single precisionwhen solving linear systems in VINS. Consequently, PC-SRIF achieves superiortheoretical efficiency compared to alternative estimators. To validate theefficiency advantages and numerical stability of PC-SRIF based VINS, we haveconducted well controlled experiments, which provide empirical evidence insupport of our theoretical findings. Remarkably, in our VINS implementation,PC-SRIF's runtime is 41% faster than QR-based SRIF.

本文介绍了一种用于视觉辅助惯性导航系统（VINS）的新型估计器--基于 Cholesky 的预条件平方根信息滤波器（PC-SRIF）。在求解线性系统时，采用 Cholesky分解能提供更高的效率，但会影响数值稳定性。有鉴于此，现有的利用（平方根）信息滤波器的 VINS 通常会在偏好单精度的平台上选择 QR 分解，以避免与 Cholesky 分解相关的数值挑战。虽然这些问题通常归因于 VINS 中的信息矩阵条件不良，但我们的分析表明，这并不是 VINS 的固有属性，而是特定参数化的结果。我们确定了导致信息矩阵条件不良的几个因素，并提出了一种预处理技术来缓解这些条件不良问题。在这一分析的基础上，我们提出了 PC-SRIF，它在求解 VINS 线性系统时以单精度执行 Cholesky 分解，表现出显著的稳定性。因此，与其他估计器相比，PC-SRIF 实现了超高的理论效率。为了验证基于 PC-SRIF 的 VINS 的效率优势和数值稳定性，我们进行了严格控制的实验，为我们的理论发现提供了实证支持。值得注意的是，在我们的 VINS 实现中，PC-SRIF 的运行时间比基于 QR 的 SRIF 快 41%。

{"title":"PC-SRIF: Preconditioned Cholesky-based Square Root Information Filter for Vision-aided Inertial Navigation","authors":"Tong Ke, Parth Agrawal, Yun Zhang, Weikun Zhen, Chao X. Guo, Toby Sharp, Ryan C. Dutoit","doi":"arxiv-2409.11372","DOIUrl":"https://doi.org/arxiv-2409.11372","url":null,"abstract":"In this paper, we introduce a novel estimator for vision-aided inertial\u0000navigation systems (VINS), the Preconditioned Cholesky-based Square Root\u0000Information Filter (PC-SRIF). When solving linear systems, employing Cholesky\u0000decomposition offers superior efficiency but can compromise numerical\u0000stability. Due to this, existing VINS utilizing (Square Root) Information\u0000Filters often opt for QR decomposition on platforms where single precision is\u0000preferred, avoiding the numerical challenges associated with Cholesky\u0000decomposition. While these issues are often attributed to the ill-conditioned\u0000information matrix in VINS, our analysis reveals that this is not an inherent\u0000property of VINS but rather a consequence of specific parameterizations. We\u0000identify several factors that contribute to an ill-conditioned information\u0000matrix and propose a preconditioning technique to mitigate these conditioning\u0000issues. Building on this analysis, we present PC-SRIF, which exhibits\u0000remarkable stability in performing Cholesky decomposition in single precision\u0000when solving linear systems in VINS. Consequently, PC-SRIF achieves superior\u0000theoretical efficiency compared to alternative estimators. To validate the\u0000efficiency advantages and numerical stability of PC-SRIF based VINS, we have\u0000conducted well controlled experiments, which provide empirical evidence in\u0000support of our theoretical findings. Remarkably, in our VINS implementation,\u0000PC-SRIF's runtime is 41% faster than QR-based SRIF.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":"16 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0