arXiv - CS - Robotics最新文献_第4页

Hypergraph-based Motion Generation with Multi-modal Interaction Relational Reasoning 基于超图的运动生成与多模态交互关系推理

arXiv - CS - Robotics

Pub Date : 2024-09-18 DOI: arxiv-2409.11676

Keshu Wu, Yang Zhou, Haotian Shi, Dominique Lord, Bin Ran, Xinyue Ye

The intricate nature of real-world driving environments, characterized bydynamic and diverse interactions among multiple vehicles and their possiblefuture states, presents considerable challenges in accurately predicting themotion states of vehicles and handling the uncertainty inherent in thepredictions. Addressing these challenges requires comprehensive modeling andreasoning to capture the implicit relations among vehicles and thecorresponding diverse behaviors. This research introduces an integratedframework for autonomous vehicles (AVs) motion prediction to address thesecomplexities, utilizing a novel Relational Hypergraph Interaction-informedNeural mOtion generator (RHINO). RHINO leverages hypergraph-based relationalreasoning by integrating a multi-scale hypergraph neural network to modelgroup-wise interactions among multiple vehicles and their multi-modal drivingbehaviors, thereby enhancing motion prediction accuracy and reliability.Experimental validation using real-world datasets demonstrates the superiorperformance of this framework in improving predictive accuracy and fosteringsocially aware automated driving in dynamic traffic scenarios.

真实世界的驾驶环境错综复杂，其特点是多种车辆之间的动态和多样化互动及其可能的未来状态，这给准确预测车辆的运动状态和处理预测中固有的不确定性带来了相当大的挑战。要应对这些挑战，需要进行综合建模和推理，以捕捉车辆之间的隐含关系和相应的各种行为。本研究利用新颖的关系超图交互式神经运算生成器（RHINO），为自动驾驶汽车（AV）运动预测引入了一个综合框架，以解决这些复杂问题。RHINO 通过集成多尺度超图神经网络，利用基于超图的关系推理，对多辆车之间的分组交互及其多模式驾驶行为进行建模，从而提高了运动预测的准确性和可靠性。使用真实世界数据集进行的实验验证证明，该框架在提高预测准确性和促进动态交通场景中的社会意识自动驾驶方面表现出色。

{"title":"Hypergraph-based Motion Generation with Multi-modal Interaction Relational Reasoning","authors":"Keshu Wu, Yang Zhou, Haotian Shi, Dominique Lord, Bin Ran, Xinyue Ye","doi":"arxiv-2409.11676","DOIUrl":"https://doi.org/arxiv-2409.11676","url":null,"abstract":"The intricate nature of real-world driving environments, characterized by\u0000dynamic and diverse interactions among multiple vehicles and their possible\u0000future states, presents considerable challenges in accurately predicting the\u0000motion states of vehicles and handling the uncertainty inherent in the\u0000predictions. Addressing these challenges requires comprehensive modeling and\u0000reasoning to capture the implicit relations among vehicles and the\u0000corresponding diverse behaviors. This research introduces an integrated\u0000framework for autonomous vehicles (AVs) motion prediction to address these\u0000complexities, utilizing a novel Relational Hypergraph Interaction-informed\u0000Neural mOtion generator (RHINO). RHINO leverages hypergraph-based relational\u0000reasoning by integrating a multi-scale hypergraph neural network to model\u0000group-wise interactions among multiple vehicles and their multi-modal driving\u0000behaviors, thereby enhancing motion prediction accuracy and reliability.\u0000Experimental validation using real-world datasets demonstrates the superior\u0000performance of this framework in improving predictive accuracy and fostering\u0000socially aware automated driving in dynamic traffic scenarios.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":"17 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266861","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

GauTOAO: Gaussian-based Task-Oriented Affordance of Objects GauTOAO：基于高斯的面向任务的物体亲和力

arXiv - CS - Robotics

Pub Date : 2024-09-18 DOI: arxiv-2409.11941

Jiawen Wang, Dingsheng Luo

When your robot grasps an object using dexterous hands or grippers, it shouldunderstand the Task-Oriented Affordances of the Object(TOAO), as differenttasks often require attention to specific parts of the object. To address thischallenge, we propose GauTOAO, a Gaussian-based framework for Task-OrientedAffordance of Objects, which leverages vision-language models in a zero-shotmanner to predict affordance-relevant regions of an object, given a naturallanguage query. Our approach introduces a new paradigm: "static camera, movingobject," allowing the robot to better observe and understand the object in handduring manipulation. GauTOAO addresses the limitations of existing methods,which often lack effective spatial grouping, by extracting a comprehensive 3Dobject mask using DINO features. This mask is then used to conditionally querygaussians, producing a refined semantic distribution over the object for thespecified task. This approach results in more accurate TOAO extraction,enhancing the robot's understanding of the object and improving taskperformance. We validate the effectiveness of GauTOAO through real-worldexperiments, demonstrating its capability to generalize across various tasks.

当机器人使用灵巧的手或抓手抓取物体时，它应该了解物体的任务导向适配性（TOAO），因为不同的任务往往需要关注物体的特定部分。为了应对这一挑战，我们提出了基于高斯的物体任务相关性框架 GauTOAO，该框架在给定自然语言查询的情况下，利用视觉语言模型，以零帧方式预测物体的相关性区域。我们的方法引入了一种新的范式："静态相机，移动物体"，使机器人能够在操作过程中更好地观察和理解手中的物体。GauTOAO 利用 DINO 特征提取全面的 3D 物体掩码，解决了现有方法往往缺乏有效空间分组的局限性。然后利用该掩码对高斯进行有条件查询，为指定任务生成对象的精细语义分布。这种方法能更准确地提取 TOAO，增强机器人对物体的理解，提高任务性能。我们通过真实世界的实验验证了 GauTOAO 的有效性，证明了它在各种任务中的通用能力。

{"title":"GauTOAO: Gaussian-based Task-Oriented Affordance of Objects","authors":"Jiawen Wang, Dingsheng Luo","doi":"arxiv-2409.11941","DOIUrl":"https://doi.org/arxiv-2409.11941","url":null,"abstract":"When your robot grasps an object using dexterous hands or grippers, it should\u0000understand the Task-Oriented Affordances of the Object(TOAO), as different\u0000tasks often require attention to specific parts of the object. To address this\u0000challenge, we propose GauTOAO, a Gaussian-based framework for Task-Oriented\u0000Affordance of Objects, which leverages vision-language models in a zero-shot\u0000manner to predict affordance-relevant regions of an object, given a natural\u0000language query. Our approach introduces a new paradigm: \"static camera, moving\u0000object,\" allowing the robot to better observe and understand the object in hand\u0000during manipulation. GauTOAO addresses the limitations of existing methods,\u0000which often lack effective spatial grouping, by extracting a comprehensive 3D\u0000object mask using DINO features. This mask is then used to conditionally query\u0000gaussians, producing a refined semantic distribution over the object for the\u0000specified task. This approach results in more accurate TOAO extraction,\u0000enhancing the robot's understanding of the object and improving task\u0000performance. We validate the effectiveness of GauTOAO through real-world\u0000experiments, demonstrating its capability to generalize across various tasks.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":"49 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Reinforcement Learning with Lie Group Orientations for Robotics 利用机器人的谎言群方向进行强化学习

arXiv - CS - Robotics

Pub Date : 2024-09-18 DOI: arxiv-2409.11935

Martin Schuck, Jan Brüdigam, Sandra Hirche, Angela Schoellig

Handling orientations of robots and objects is a crucial aspect of manyapplications. Yet, ever so often, there is a lack of mathematical correctnesswhen dealing with orientations, especially in learning pipelines involving, forexample, artificial neural networks. In this paper, we investigatereinforcement learning with orientations and propose a simple modification ofthe network's input and output that adheres to the Lie group structure oforientations. As a result, we obtain an easy and efficient implementation thatis directly usable with existing learning libraries and achieves significantlybetter performance than other common orientation representations. We brieflyintroduce Lie theory specifically for orientations in robotics to motivate andoutline our approach. Subsequently, a thorough empirical evaluation ofdifferent combinations of orientation representations for states and actionsdemonstrates the superior performance of our proposed approach in differentscenarios, including: direct orientation control, end effector orientationcontrol, and pick-and-place tasks.

处理机器人和物体的方向是许多应用的一个重要方面。然而，在处理方向问题时，尤其是在涉及人工神经网络的学习管道中，往往缺乏数学正确性。在本文中，我们研究了有方向性的强化学习，并提出了一种简单的网络输入和输出修改方法，以符合方向性的列群结构。因此，我们获得了一种简单高效的实现方法，它可以直接用于现有的学习库，而且性能明显优于其他常见的方向表示方法。我们简要介绍了专门针对机器人定向的李理论，以激励和概述我们的方法。随后，我们对状态和动作的不同方位表示组合进行了全面的实证评估，证明了我们提出的方法在不同场景中的卓越性能，包括：直接方位控制、末端效应器方位控制和拾放任务。

引用次数: 0

IMRL: Integrating Visual, Physical, Temporal, and Geometric Representations for Enhanced Food Acquisition IMRL：整合视觉、物理、时间和几何表征，增强食物获取能力

arXiv - CS - Robotics

Pub Date : 2024-09-18 DOI: arxiv-2409.12092

Rui Liu, Zahiruddin Mahammad, Amisha Bhaskar, Pratap Tokekar

Robotic assistive feeding holds significant promise for improving the qualityof life for individuals with eating disabilities. However, acquiring diversefood items under varying conditions and generalizing to unseen food presentsunique challenges. Existing methods that rely on surface-level geometricinformation (e.g., bounding box and pose) derived from visual cues (e.g.,color, shape, and texture) often lacks adaptability and robustness, especiallywhen foods share similar physical properties but differ in visual appearance.We employ imitation learning (IL) to learn a policy for food acquisition.Existing methods employ IL or Reinforcement Learning (RL) to learn a policybased on off-the-shelf image encoders such as ResNet-50. However, suchrepresentations are not robust and struggle to generalize across diverseacquisition scenarios. To address these limitations, we propose a novelapproach, IMRL (Integrated Multi-Dimensional Representation Learning), whichintegrates visual, physical, temporal, and geometric representations to enhancethe robustness and generalizability of IL for food acquisition. Our approachcaptures food types and physical properties (e.g., solid, semi-solid, granular,liquid, and mixture), models temporal dynamics of acquisition actions, andintroduces geometric information to determine optimal scooping points andassess bowl fullness. IMRL enables IL to adaptively adjust scooping strategiesbased on context, improving the robot's capability to handle diverse foodacquisition scenarios. Experiments on a real robot demonstrate our approach'srobustness and adaptability across various foods and bowl configurations,including zero-shot generalization to unseen settings. Our approach achievesimprovement up to $35%$ in success rate compared with the best-performingbaseline.

机器人辅助喂食为改善进食残疾人士的生活质量带来了巨大希望。然而，在不同条件下获取不同的食物并将其推广到未见过的食物上，这带来了独特的挑战。现有的方法依赖于从视觉线索（如颜色、形状和纹理）中提取的表面级几何信息（如边界框和姿势），这些方法往往缺乏适应性和鲁棒性，尤其是当食物具有相似的物理特性但视觉外观不同时。然而，这种方法并不稳健，很难在不同的获取场景中通用。为了解决这些局限性，我们提出了一种新方法--IMRL（综合多维表征学习），它整合了视觉、物理、时间和几何表征，以增强用于食物获取的 IL 的鲁棒性和泛化能力。我们的方法捕捉食物类型和物理特性（如固体、半固体、颗粒状、液体和混合物），建立获取动作的时间动态模型，并引入几何信息以确定最佳舀食点和评估碗的饱满度。IMRL使IL能够根据上下文自适应地调整舀取策略，从而提高机器人处理各种食物获取场景的能力。在真实机器人上进行的实验证明了我们的方法在各种食物和碗配置中的稳健性和适应性，包括对未知环境的零点泛化。与表现最好的基准相比，我们的方法在成功率上提高了 35%。

{"title":"IMRL: Integrating Visual, Physical, Temporal, and Geometric Representations for Enhanced Food Acquisition","authors":"Rui Liu, Zahiruddin Mahammad, Amisha Bhaskar, Pratap Tokekar","doi":"arxiv-2409.12092","DOIUrl":"https://doi.org/arxiv-2409.12092","url":null,"abstract":"Robotic assistive feeding holds significant promise for improving the quality\u0000of life for individuals with eating disabilities. However, acquiring diverse\u0000food items under varying conditions and generalizing to unseen food presents\u0000unique challenges. Existing methods that rely on surface-level geometric\u0000information (e.g., bounding box and pose) derived from visual cues (e.g.,\u0000color, shape, and texture) often lacks adaptability and robustness, especially\u0000when foods share similar physical properties but differ in visual appearance.\u0000We employ imitation learning (IL) to learn a policy for food acquisition.\u0000Existing methods employ IL or Reinforcement Learning (RL) to learn a policy\u0000based on off-the-shelf image encoders such as ResNet-50. However, such\u0000representations are not robust and struggle to generalize across diverse\u0000acquisition scenarios. To address these limitations, we propose a novel\u0000approach, IMRL (Integrated Multi-Dimensional Representation Learning), which\u0000integrates visual, physical, temporal, and geometric representations to enhance\u0000the robustness and generalizability of IL for food acquisition. Our approach\u0000captures food types and physical properties (e.g., solid, semi-solid, granular,\u0000liquid, and mixture), models temporal dynamics of acquisition actions, and\u0000introduces geometric information to determine optimal scooping points and\u0000assess bowl fullness. IMRL enables IL to adaptively adjust scooping strategies\u0000based on context, improving the robot's capability to handle diverse food\u0000acquisition scenarios. Experiments on a real robot demonstrate our approach's\u0000robustness and adaptability across various foods and bowl configurations,\u0000including zero-shot generalization to unseen settings. Our approach achieves\u0000improvement up to $35%$ in success rate compared with the best-performing\u0000baseline.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":"34 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

From Words to Wheels: Automated Style-Customized Policy Generation for Autonomous Driving 从文字到车轮：为自动驾驶自动生成风格定制的策略

arXiv - CS - Robotics

Pub Date : 2024-09-18 DOI: arxiv-2409.11694

Xu Han, Xianda Chen, Zhenghan Cai, Pinlong Cai, Meixin Zhu, Xiaowen Chu

Autonomous driving technology has witnessed rapid advancements, withfoundation models improving interactivity and user experiences. However,current autonomous vehicles (AVs) face significant limitations in deliveringcommand-based driving styles. Most existing methods either rely on predefineddriving styles that require expert input or use data-driven techniques likeInverse Reinforcement Learning to extract styles from driving data. Theseapproaches, though effective in some cases, face challenges: difficultyobtaining specific driving data for style matching (e.g., in Robotaxis),inability to align driving style metrics with user preferences, and limitationsto pre-existing styles, restricting customization and generalization to newcommands. This paper introduces Words2Wheels, a framework that automaticallygenerates customized driving policies based on natural language user commands.Words2Wheels employs a Style-Customized Reward Function to generate aStyle-Customized Driving Policy without relying on prior driving data. Byleveraging large language models and a Driving Style Database, the frameworkefficiently retrieves, adapts, and generalizes driving styles. A StatisticalEvaluation module ensures alignment with user preferences. Experimental resultsdemonstrate that Words2Wheels outperforms existing methods in accuracy,generalization, and adaptability, offering a novel solution for customized AVdriving behavior. Code and demo available athttps://yokhon.github.io/Words2Wheels/.

自动驾驶技术突飞猛进，基础模型改善了交互性和用户体验。然而，目前的自动驾驶汽车（AV）在提供基于指令的驾驶方式方面面临着很大的局限性。大多数现有方法要么依赖于需要专家输入的预定义驾驶风格，要么使用逆强化学习等数据驱动技术从驾驶数据中提取风格。这些方法虽然在某些情况下行之有效，但也面临着挑战：难以获得特定的驾驶数据进行风格匹配（例如在 Robotaxis 中），无法将驾驶风格指标与用户偏好相匹配，以及仅限于预先存在的风格，从而限制了对新命令的定制和泛化。本文介绍的 Words2Wheels 是一个基于自然语言用户指令自动生成定制驾驶策略的框架。Words2Wheels 采用风格定制奖励函数生成风格定制驾驶策略，而无需依赖先前的驾驶数据。该框架利用大型语言模型和驾驶风格数据库，有效地检索、调整和概括驾驶风格。统计评估模块可确保与用户偏好保持一致。实验结果表明，Words2Wheels 在准确性、概括性和适应性方面均优于现有方法，为定制化的自动驾驶汽车驾驶行为提供了新颖的解决方案。代码和演示可在https://yokhon.github.io/Words2Wheels/。

{"title":"From Words to Wheels: Automated Style-Customized Policy Generation for Autonomous Driving","authors":"Xu Han, Xianda Chen, Zhenghan Cai, Pinlong Cai, Meixin Zhu, Xiaowen Chu","doi":"arxiv-2409.11694","DOIUrl":"https://doi.org/arxiv-2409.11694","url":null,"abstract":"Autonomous driving technology has witnessed rapid advancements, with\u0000foundation models improving interactivity and user experiences. However,\u0000current autonomous vehicles (AVs) face significant limitations in delivering\u0000command-based driving styles. Most existing methods either rely on predefined\u0000driving styles that require expert input or use data-driven techniques like\u0000Inverse Reinforcement Learning to extract styles from driving data. These\u0000approaches, though effective in some cases, face challenges: difficulty\u0000obtaining specific driving data for style matching (e.g., in Robotaxis),\u0000inability to align driving style metrics with user preferences, and limitations\u0000to pre-existing styles, restricting customization and generalization to new\u0000commands. This paper introduces Words2Wheels, a framework that automatically\u0000generates customized driving policies based on natural language user commands.\u0000Words2Wheels employs a Style-Customized Reward Function to generate a\u0000Style-Customized Driving Policy without relying on prior driving data. By\u0000leveraging large language models and a Driving Style Database, the framework\u0000efficiently retrieves, adapts, and generalizes driving styles. A Statistical\u0000Evaluation module ensures alignment with user preferences. Experimental results\u0000demonstrate that Words2Wheels outperforms existing methods in accuracy,\u0000generalization, and adaptability, offering a novel solution for customized AV\u0000driving behavior. Code and demo available at\u0000https://yokhon.github.io/Words2Wheels/.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":"52 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

One Map to Find Them All: Real-time Open-Vocabulary Mapping for Zero-shot Multi-Object Navigation 一张地图找所有：零距离多目标导航的实时开放词汇映射

arXiv - CS - Robotics

Pub Date : 2024-09-18 DOI: arxiv-2409.11764

Finn Lukas Busch, Timon Homberger, Jesús Ortega-Peimbert, Quantao Yang, Olov Andersson

The capability to efficiently search for objects in complex environments isfundamental for many real-world robot applications. Recent advances inopen-vocabulary vision models have resulted in semantically-informed objectnavigation methods that allow a robot to search for an arbitrary object withoutprior training. However, these zero-shot methods have so far treated theenvironment as unknown for each consecutive query. In this paper we introduce anew benchmark for zero-shot multi-object navigation, allowing the robot toleverage information gathered from previous searches to more efficiently findnew objects. To address this problem we build a reusable open-vocabularyfeature map tailored for real-time object search. We further propose aprobabilistic-semantic map update that mitigates common sources of errors insemantic feature extraction and leverage this semantic uncertainty for informedmulti-object exploration. We evaluate our method on a set of object navigationtasks in both simulation as well as with a real robot, running in real-time ona Jetson Orin AGX. We demonstrate that it outperforms existing state-of-the-artapproaches both on single and multi-object navigation tasks. Additional videos,code and the multi-object navigation benchmark will be available onhttps://finnbsch.github.io/OneMap.

在复杂环境中高效搜索物体的能力是现实世界中许多机器人应用的基础。开放词汇视觉模型的最新进展带来了基于语义的物体导航方法，使机器人无需事先训练即可搜索任意物体。然而，迄今为止，这些 "零镜头 "方法在每次连续查询时都将环境视为未知。在本文中，我们引入了一种新的零点多目标导航基准，允许机器人利用从之前搜索中收集到的信息，更高效地找到新目标。为了解决这个问题，我们为实时物体搜索量身定制了一个可重复使用的开放词汇特征图。我们进一步提出了一种可减少语义特征提取中常见错误来源的robabilistic语义地图更新方法，并利用这种语义不确定性进行有依据的多对象探索。我们通过在 Jetson Orin AGX 上实时运行的一组对象导航任务，对我们的方法进行了模拟和真实机器人评估。结果表明，在单目标和多目标导航任务上，我们的方法都优于现有的先进方法。更多视频、代码和多目标导航基准将在https://finnbsch.github.io/OneMap。

{"title":"One Map to Find Them All: Real-time Open-Vocabulary Mapping for Zero-shot Multi-Object Navigation","authors":"Finn Lukas Busch, Timon Homberger, Jesús Ortega-Peimbert, Quantao Yang, Olov Andersson","doi":"arxiv-2409.11764","DOIUrl":"https://doi.org/arxiv-2409.11764","url":null,"abstract":"The capability to efficiently search for objects in complex environments is\u0000fundamental for many real-world robot applications. Recent advances in\u0000open-vocabulary vision models have resulted in semantically-informed object\u0000navigation methods that allow a robot to search for an arbitrary object without\u0000prior training. However, these zero-shot methods have so far treated the\u0000environment as unknown for each consecutive query. In this paper we introduce a\u0000new benchmark for zero-shot multi-object navigation, allowing the robot to\u0000leverage information gathered from previous searches to more efficiently find\u0000new objects. To address this problem we build a reusable open-vocabulary\u0000feature map tailored for real-time object search. We further propose a\u0000probabilistic-semantic map update that mitigates common sources of errors in\u0000semantic feature extraction and leverage this semantic uncertainty for informed\u0000multi-object exploration. We evaluate our method on a set of object navigation\u0000tasks in both simulation as well as with a real robot, running in real-time on\u0000a Jetson Orin AGX. We demonstrate that it outperforms existing state-of-the-art\u0000approaches both on single and multi-object navigation tasks. Additional videos,\u0000code and the multi-object navigation benchmark will be available on\u0000https://finnbsch.github.io/OneMap.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":"98 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Robots that Learn to Safely Influence via Prediction-Informed Reach-Avoid Dynamic Games 通过预测信息的 "到达-避开 "动态游戏学会安全影响的机器人

arXiv - CS - Robotics

Pub Date : 2024-09-18 DOI: arxiv-2409.12153

Ravi Pandya, Changliu Liu, Andrea Bajcsy

Robots can influence people to accomplish their tasks more efficiently:autonomous cars can inch forward at an intersection to pass through, andtabletop manipulators can go for an object on the table first. However, arobot's ability to influence can also compromise the safety of nearby people ifnaively executed. In this work, we pose and solve a novel robust reach-avoiddynamic game which enables robots to be maximally influential, but only when asafety backup control exists. On the human side, we model the human's behavioras goal-driven but conditioned on the robot's plan, enabling us to captureinfluence. On the robot side, we solve the dynamic game in the joint physicaland belief space, enabling the robot to reason about how its uncertainty inhuman behavior will evolve over time. We instantiate our method, called SLIDE(Safely Leveraging Influence in Dynamic Environments), in a high-dimensional(39-D) simulated human-robot collaborative manipulation task solved via offlinegame-theoretic reinforcement learning. We compare our approach to a robustbaseline that treats the human as a worst-case adversary, a safety controllerthat does not explicitly reason about influence, and an energy-function-basedsafety shield. We find that SLIDE consistently enables the robot to leveragethe influence it has on the human when it is safe to do so, ultimately allowingthe robot to be less conservative while still ensuring a high safety rateduring task execution.

机器人可以影响人们，使其更高效地完成任务：自动驾驶汽车可以在十字路口前行通过，桌面机械手可以先去拿桌上的物体。然而，机器人的影响能力也会危及附近人员的安全。在这项工作中，我们提出并解决了一个新颖的鲁棒性 "到达-避开 "动态博弈，它能让机器人发挥最大影响力，但前提是存在安全备份控制。在人类方面，我们将人类的行为建模为目标驱动，但以机器人的计划为条件，从而捕捉影响力。在机器人方面，我们在联合物理空间和信念空间中求解动态博弈，使机器人能够推理其不确定性和人类行为将如何随时间演变。我们在一个通过离线博弈论强化学习解决的高维（39-D）模拟人机协作操纵任务中，将我们的方法实例化，称为 SLIDE（在动态环境中安全利用影响力）。我们将我们的方法与将人类视为最坏情况对手的稳健基准、不明确推理影响的安全控制器和基于能量函数的安全防护进行了比较。我们发现，SLIDE 能够让机器人在安全的情况下持续利用它对人类的影响，最终让机器人在任务执行过程中减少保守，同时仍能确保较高的安全等级。

{"title":"Robots that Learn to Safely Influence via Prediction-Informed Reach-Avoid Dynamic Games","authors":"Ravi Pandya, Changliu Liu, Andrea Bajcsy","doi":"arxiv-2409.12153","DOIUrl":"https://doi.org/arxiv-2409.12153","url":null,"abstract":"Robots can influence people to accomplish their tasks more efficiently:\u0000autonomous cars can inch forward at an intersection to pass through, and\u0000tabletop manipulators can go for an object on the table first. However, a\u0000robot's ability to influence can also compromise the safety of nearby people if\u0000naively executed. In this work, we pose and solve a novel robust reach-avoid\u0000dynamic game which enables robots to be maximally influential, but only when a\u0000safety backup control exists. On the human side, we model the human's behavior\u0000as goal-driven but conditioned on the robot's plan, enabling us to capture\u0000influence. On the robot side, we solve the dynamic game in the joint physical\u0000and belief space, enabling the robot to reason about how its uncertainty in\u0000human behavior will evolve over time. We instantiate our method, called SLIDE\u0000(Safely Leveraging Influence in Dynamic Environments), in a high-dimensional\u0000(39-D) simulated human-robot collaborative manipulation task solved via offline\u0000game-theoretic reinforcement learning. We compare our approach to a robust\u0000baseline that treats the human as a worst-case adversary, a safety controller\u0000that does not explicitly reason about influence, and an energy-function-based\u0000safety shield. We find that SLIDE consistently enables the robot to leverage\u0000the influence it has on the human when it is safe to do so, ultimately allowing\u0000the robot to be less conservative while still ensuring a high safety rate\u0000during task execution.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":"52 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142267027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

SpotLight: Robotic Scene Understanding through Interaction and Affordance Detection SpotLight：通过交互和亲和力检测理解机器人场景

arXiv - CS - Robotics

Pub Date : 2024-09-18 DOI: arxiv-2409.11870

Tim Engelbracht, René Zurbrügg, Marc Pollefeys, Hermann Blum, Zuria Bauer

Despite increasing research efforts on household robotics, robots intendedfor deployment in domestic settings still struggle with more complex tasks suchas interacting with functional elements like drawers or light switches, largelydue to limited task-specific understanding and interaction capabilities. Thesetasks require not only detection and pose estimation but also an understandingof the affordances these elements provide. To address these challenges andenhance robotic scene understanding, we introduce SpotLight: A comprehensiveframework for robotic interaction with functional elements, specifically lightswitches. Furthermore, this framework enables robots to improve theirenvironmental understanding through interaction. Leveraging VLM-basedaffordance prediction to estimate motion primitives for light switchinteraction, we achieve up to 84% operation success in real world experiments.We further introduce a specialized dataset containing 715 images as well as acustom detection model for light switch detection. We demonstrate how theframework can facilitate robot learning through physical interaction by havingthe robot explore the environment and discover previously unknown relationshipsin a scene graph representation. Lastly, we propose an extension to theframework to accommodate other functional interactions such as swing doors,showcasing its flexibility. Videos and Code:timengelbracht.github.io/SpotLight/

尽管对家用机器人技术的研究日益增多，但用于家庭环境中的机器人仍然难以完成更复杂的任务，例如与抽屉或电灯开关等功能元件进行交互，这主要是由于对特定任务的理解和交互能力有限。这些任务不仅需要检测和姿势估计，还需要了解这些元素提供的能力。为了应对这些挑战并增强机器人对场景的理解，我们引入了 SpotLight：一个用于机器人与功能元素（特别是灯光开关）进行交互的综合框架。此外，该框架还能让机器人通过交互提高对环境的理解。我们进一步介绍了包含 715 幅图像的专用数据集，以及用于检测灯开关的自定义检测模型。我们展示了该框架如何通过物理交互促进机器人学习，让机器人探索环境并发现场景图表示中之前未知的关系。最后，我们提出了对该框架的扩展，以适应其他功能的交互，如旋转门，从而展示了该框架的灵活性。视频和代码：timengelbracht.github.io/SpotLight/

{"title":"SpotLight: Robotic Scene Understanding through Interaction and Affordance Detection","authors":"Tim Engelbracht, René Zurbrügg, Marc Pollefeys, Hermann Blum, Zuria Bauer","doi":"arxiv-2409.11870","DOIUrl":"https://doi.org/arxiv-2409.11870","url":null,"abstract":"Despite increasing research efforts on household robotics, robots intended\u0000for deployment in domestic settings still struggle with more complex tasks such\u0000as interacting with functional elements like drawers or light switches, largely\u0000due to limited task-specific understanding and interaction capabilities. These\u0000tasks require not only detection and pose estimation but also an understanding\u0000of the affordances these elements provide. To address these challenges and\u0000enhance robotic scene understanding, we introduce SpotLight: A comprehensive\u0000framework for robotic interaction with functional elements, specifically light\u0000switches. Furthermore, this framework enables robots to improve their\u0000environmental understanding through interaction. Leveraging VLM-based\u0000affordance prediction to estimate motion primitives for light switch\u0000interaction, we achieve up to 84% operation success in real world experiments.\u0000We further introduce a specialized dataset containing 715 images as well as a\u0000custom detection model for light switch detection. We demonstrate how the\u0000framework can facilitate robot learning through physical interaction by having\u0000the robot explore the environment and discover previously unknown relationships\u0000in a scene graph representation. Lastly, we propose an extension to the\u0000framework to accommodate other functional interactions such as swing doors,\u0000showcasing its flexibility. Videos and Code:\u0000timengelbracht.github.io/SpotLight/","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":"35 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An Efficient Projection-Based Next-best-view Planning Framework for Reconstruction of Unknown Objects 基于投影的高效下一最佳视角规划框架，用于重建未知物体

arXiv - CS - Robotics

Pub Date : 2024-09-18 DOI: arxiv-2409.12096

Zhizhou Jia, Shaohui Zhang, Qun Hao

Efficiently and completely capturing the three-dimensional data of an objectis a fundamental problem in industrial and robotic applications. The task ofnext-best-view (NBV) planning is to infer the pose of the next viewpoint basedon the current data, and gradually realize the complete three-dimensionalreconstruction. Many existing algorithms, however, suffer a large computationalburden due to the use of ray-casting. To address this, this paper proposes aprojection-based NBV planning framework. It can select the next best view at anextremely fast speed while ensuring the complete scanning of the object.Specifically, this framework refits different types of voxel clusters intoellipsoids based on the voxel structure.Then, the next best view is selectedfrom the candidate views using a projection-based viewpoint quality evaluationfunction in conjunction with a global partitioning strategy. This processreplaces the ray-casting in voxel structures, significantly improving thecomputational efficiency. Comparative experiments with other algorithms in asimulation environment show that the framework proposed in this paper canachieve 10 times efficiency improvement on the basis of capturing roughly thesame coverage. The real-world experimental results also prove the efficiencyand feasibility of the framework.

高效、完整地捕捉物体的三维数据是工业和机器人应用中的一个基本问题。下一个最佳视点（NBV）规划的任务是根据当前数据推断下一个视点的姿态，并逐步实现完整的三维重建。然而，现有的许多算法由于使用了光线投射技术，计算量很大。针对这一问题，本文提出了基于投影的 NBV 规划框架。具体来说，该框架根据体素结构将不同类型的体素簇重构为ellipsoids，然后使用基于投影的视点质量评估功能，结合全局分割策略，从候选视点中选出下一个最佳视点。这一过程取代了体素结构中的光线投射，大大提高了计算效率。在模拟环境中与其他算法的对比实验表明，本文提出的框架可以在捕捉大致相同的覆盖范围的基础上实现 10 倍的效率提升。真实世界的实验结果也证明了该框架的高效性和可行性。

{"title":"An Efficient Projection-Based Next-best-view Planning Framework for Reconstruction of Unknown Objects","authors":"Zhizhou Jia, Shaohui Zhang, Qun Hao","doi":"arxiv-2409.12096","DOIUrl":"https://doi.org/arxiv-2409.12096","url":null,"abstract":"Efficiently and completely capturing the three-dimensional data of an object\u0000is a fundamental problem in industrial and robotic applications. The task of\u0000next-best-view (NBV) planning is to infer the pose of the next viewpoint based\u0000on the current data, and gradually realize the complete three-dimensional\u0000reconstruction. Many existing algorithms, however, suffer a large computational\u0000burden due to the use of ray-casting. To address this, this paper proposes a\u0000projection-based NBV planning framework. It can select the next best view at an\u0000extremely fast speed while ensuring the complete scanning of the object.\u0000Specifically, this framework refits different types of voxel clusters into\u0000ellipsoids based on the voxel structure.Then, the next best view is selected\u0000from the candidate views using a projection-based viewpoint quality evaluation\u0000function in conjunction with a global partitioning strategy. This process\u0000replaces the ray-casting in voxel structures, significantly improving the\u0000computational efficiency. Comparative experiments with other algorithms in a\u0000simulation environment show that the framework proposed in this paper can\u0000achieve 10 times efficiency improvement on the basis of capturing roughly the\u0000same coverage. The real-world experimental results also prove the efficiency\u0000and feasibility of the framework.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":"49 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142267029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

XP-MARL: Auxiliary Prioritization in Multi-Agent Reinforcement Learning to Address Non-Stationarity XP-MARL：多代理强化学习中的辅助优先级以解决非稳定性问题

arXiv - CS - Robotics

Pub Date : 2024-09-18 DOI: arxiv-2409.11852

Jianye Xu, Omar Sobhy, Bassam Alrifaee

Non-stationarity poses a fundamental challenge in Multi-Agent ReinforcementLearning (MARL), arising from agents simultaneously learning and altering theirpolicies. This creates a non-stationary environment from the perspective ofeach individual agent, often leading to suboptimal or even unconverged learningoutcomes. We propose an open-source framework named XP-MARL, which augmentsMARL with auxiliary prioritization to address this challenge in cooperativesettings. XP-MARL is 1) founded upon our hypothesis that prioritizing agentsand letting higher-priority agents establish their actions first wouldstabilize the learning process and thus mitigate non-stationarity and 2)enabled by our proposed mechanism called action propagation, wherehigher-priority agents act first and communicate their actions, providing amore stationary environment for others. Moreover, instead of using a predefinedor heuristic priority assignment, XP-MARL learns priority-assignment policieswith an auxiliary MARL problem, leading to a joint learning scheme. Experimentsin a motion-planning scenario involving Connected and Automated Vehicles (CAVs)demonstrate that XP-MARL improves the safety of a baseline model by 84.4% andoutperforms a state-of-the-art approach, which improves the baseline by only12.8%. Code: github.com/cas-lab-munich/sigmarl

非稳态性是多代理强化学习（MARL）中的一个基本挑战，它产生于代理同时学习和改变其策略。从单个代理的角度来看，这创造了一个非稳态环境，往往会导致次优甚至不融合的学习结果。我们提出了一个名为 XP-MARL 的开源框架，该框架通过辅助优先级排序来增强 MARL，以应对合作环境中的这一挑战。XP-MARL 1）建立在我们的假设之上，即确定代理的优先级并让优先级较高的代理首先确定其行动将稳定学习过程，从而缓解非稳态问题；2）通过我们提出的行动传播机制得以实现，即优先级较高的代理首先行动并传播其行动，为其他代理提供更稳定的环境。此外，XP-MARL 不使用预定义或启发式优先级分配，而是通过一个辅助 MARL 问题来学习优先级分配策略，从而形成一种联合学习方案。在涉及车联网和自动驾驶汽车（CAV）的运动规划场景中进行的实验表明，XP-MARL 将基线模型的安全性提高了 84.4%，优于最先进的方法，后者仅将基线提高了 12.8%。代码：github.com/cas-lab-munich/sigmarl

{"title":"XP-MARL: Auxiliary Prioritization in Multi-Agent Reinforcement Learning to Address Non-Stationarity","authors":"Jianye Xu, Omar Sobhy, Bassam Alrifaee","doi":"arxiv-2409.11852","DOIUrl":"https://doi.org/arxiv-2409.11852","url":null,"abstract":"Non-stationarity poses a fundamental challenge in Multi-Agent Reinforcement\u0000Learning (MARL), arising from agents simultaneously learning and altering their\u0000policies. This creates a non-stationary environment from the perspective of\u0000each individual agent, often leading to suboptimal or even unconverged learning\u0000outcomes. We propose an open-source framework named XP-MARL, which augments\u0000MARL with auxiliary prioritization to address this challenge in cooperative\u0000settings. XP-MARL is 1) founded upon our hypothesis that prioritizing agents\u0000and letting higher-priority agents establish their actions first would\u0000stabilize the learning process and thus mitigate non-stationarity and 2)\u0000enabled by our proposed mechanism called action propagation, where\u0000higher-priority agents act first and communicate their actions, providing a\u0000more stationary environment for others. Moreover, instead of using a predefined\u0000or heuristic priority assignment, XP-MARL learns priority-assignment policies\u0000with an auxiliary MARL problem, leading to a joint learning scheme. Experiments\u0000in a motion-planning scenario involving Connected and Automated Vehicles (CAVs)\u0000demonstrate that XP-MARL improves the safety of a baseline model by 84.4% and\u0000outperforms a state-of-the-art approach, which improves the baseline by only\u000012.8%. Code: github.com/cas-lab-munich/sigmarl","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":"20 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0