IEEE Robotics and Automation Letters最新文献

英文中文

P-AgNav: Range View-Based Autonomous Navigation System for Cornfields

IF 4.6 2区计算机科学 Q2 ROBOTICS

IEEE Robotics and Automation Letters

Pub Date : 2025-02-13 DOI: 10.1109/LRA.2025.3541335

Kitae Kim;Aarya Deb;David J. Cappelleri

In this paper, we present an in-row and under-canopy autonomous navigation system for cornfields, called the Purdue Agricultural Navigation System or P-AgNav. Our navigation framework is primarily based on range view images from a 3D light detection and ranging (LiDAR) sensor. P-AgNav is designed for an autonomous robot to navigate in the corn rows with collision avoidance and to switch between rows without GNSS assistance or pre-defined waypoints. The system enables robots, which are intended to monitor crops or conduct physical sampling, to autonomously navigate multiple crop rows with minimal human intervention, thereby increasing crop management efficiency. The capabilities of P-AgNav have been validated through experiments in both simulation and real cornfield environments.

引用次数: 0

CLID-SLAM: A Coupled LiDAR-Inertial Neural Implicit Dense SLAM With Region-Specific SDF Estimation

IF 4.6 2区计算机科学 Q2 ROBOTICS

IEEE Robotics and Automation Letters

Pub Date : 2025-02-13 DOI: 10.1109/LRA.2025.3541912

Junlong Jiang;Xuetao Zhang;Gang Sun;Yisha Liu;Xuebo Zhang;Yan Zhuang

This letter proposes a novel scan-to-neural model matching, tightly-coupled LiDAR-inertial Simultaneous Localization and Mapping (SLAM) system, which can achieve more accurate state estimation and incrementally reconstruct the dense map. Different from the existing methods, the key insight of the proposed approach is that region-specific Signed Distance Function (SDF) estimations supervise the neural implicit representation to capture scene geometry, while SDF predictions and Inertial Measurement Unit (IMU) data are fused to strengthen the alignment of the LiDAR scan and the neural SDF map. As a result, the proposed approach achieves more robust and accurate state estimation with high-fidelity surface reconstruction. Specifically, an SDF supervision estimation method is proposed to generate more accurate SDF labels. Point-to-plane distances are utilized for planar regions and local nearest-neighbor distances are leveraged for non-planar areas, which reduces reconstruction artifacts and further significantly improves localization accuracy. Furthermore, we propose the first tightly-coupled LiDAR-inertial neural dense SLAM system that fuses SDF predictions and IMU data to align the received scan with the neural SDF map, thereby achieving more robust and accurate localization. Comparative experiments on multiple datasets are conducted to demonstrate the superior performance of the proposed method including the localization accuracy, robustness, and mapping quality.

{"title":"CLID-SLAM: A Coupled LiDAR-Inertial Neural Implicit Dense SLAM With Region-Specific SDF Estimation","authors":"Junlong Jiang;Xuetao Zhang;Gang Sun;Yisha Liu;Xuebo Zhang;Yan Zhuang","doi":"10.1109/LRA.2025.3541912","DOIUrl":"https://doi.org/10.1109/LRA.2025.3541912","url":null,"abstract":"This letter proposes a novel scan-to-neural model matching, tightly-coupled LiDAR-inertial Simultaneous Localization and Mapping (SLAM) system, which can achieve more accurate state estimation and incrementally reconstruct the dense map. Different from the existing methods, the key insight of the proposed approach is that region-specific Signed Distance Function (SDF) estimations supervise the neural implicit representation to capture scene geometry, while SDF predictions and Inertial Measurement Unit (IMU) data are fused to strengthen the alignment of the LiDAR scan and the neural SDF map. As a result, the proposed approach achieves more robust and accurate state estimation with high-fidelity surface reconstruction. Specifically, an SDF supervision estimation method is proposed to generate more accurate SDF labels. Point-to-plane distances are utilized for planar regions and local nearest-neighbor distances are leveraged for non-planar areas, which reduces reconstruction artifacts and further significantly improves localization accuracy. Furthermore, we propose the first tightly-coupled LiDAR-inertial neural dense SLAM system that fuses SDF predictions and IMU data to align the received scan with the neural SDF map, thereby achieving more robust and accurate localization. Comparative experiments on multiple datasets are conducted to demonstrate the superior performance of the proposed method including the localization accuracy, robustness, and mapping quality.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 4","pages":"3310-3317"},"PeriodicalIF":4.6,"publicationDate":"2025-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143489204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A State-Time Space Approach for Local Trajectory Replanning of an MAV in Dynamic Indoor Environments

IF 4.6 2区计算机科学 Q2 ROBOTICS

IEEE Robotics and Automation Letters

Pub Date : 2025-02-13 DOI: 10.1109/LRA.2025.3541376

Fengyu Quan;Yuanzhe Shen;Peiyan Liu;Ximin Lyu;Haoyao Chen

Multirotor aerial vehicles (MAVs) in confined, dynamic indoor environments need reliable planning capabilities to avoid moving pedestrians. Current MAV trajectory planning algorithms often result in low success rates or unnecessary constraints on navigable space. We propose a multi-stage local trajectory planner that predicts pedestrian movements using State-Time Space (ST-space) based on the Euclidean Signed Distance Field (ESDF) to tackle these challenges. Our method quickly generates collision-free trajectories by incorporating spatiotemporal optimization and fast ESDF queries. Based on statistical analysis, our method improves performance over state-of-the-art MAV trajectory planning methods as pedestrian speed increases. Finally, we validate the real-time applicability of our proposed method in indoor dynamic scenarios.

引用次数: 0

Interactive Incremental Learning of Generalizable Skills With Local Trajectory Modulation 利用局部轨迹调制进行可通用技能的交互式渐进学习

IF 4.6 2区计算机科学 Q2 ROBOTICS

IEEE Robotics and Automation Letters

Pub Date : 2025-02-13 DOI: 10.1109/LRA.2025.3542209

Markus Knauer;Alin Albu-Schäffer;Freek Stulp;João Silvério

The problem of generalization in learning from demonstration (LfD) has received considerable attention over the years, particularly within the context of movement primitives, where a number of approaches have emerged. Recently, two important approaches have gained recognition. While one leverages via-points to adapt skills locally by modulating demonstrated trajectories, another relies on so-called task-parameterized (TP) models that encode movements with respect to different coordinate systems, using a product of probabilities for generalization. While the former are well-suited to precise, local modulations, the latter aim at generalizing over large regions of the workspace and often involve multiple objects. Addressing the quality of generalization by leveraging both approaches simultaneously has received little attention. In this work, we propose an interactive imitation learning framework that simultaneously leverages local and global modulations of trajectory distributions. Building on the kernelized movement primitives (KMP) framework, we introduce novel mechanisms for skill modulation from direct human corrective feedback. Our approach particularly exploits the concept of via-points to incrementally and interactively 1) improve the model accuracy locally, 2) add new objects to the task during execution and 3) extend the skill into regions where demonstrations were not provided. We evaluate our method on a bearing ring-loading task using a torque-controlled, 7-DoF, DLR SARA robot (Iskandar et al., 2020).

多年来，从演示中学习（LfD）的泛化问题受到了广泛关注，尤其是在运动原语方面，出现了许多方法。最近，有两种重要的方法得到了认可。一种方法是利用通路点（via-points），通过调节已演示的轨迹来局部调整技能；另一种方法则依赖于所谓的任务参数化（TP）模型，该模型根据不同的坐标系对动作进行编码，并使用概率乘积进行泛化。前者非常适合精确的局部调制，而后者的目标是在工作空间的大范围内进行泛化，通常涉及多个对象。同时利用这两种方法来解决泛化的质量问题，目前还很少有人关注。在这项工作中，我们提出了一种交互式模仿学习框架，可同时利用轨迹分布的局部和全局调制。在核化运动基元（KMP）框架的基础上，我们引入了新的机制，从直接的人类纠正反馈中进行技能调节。我们的方法特别利用了 "通路点"（via-points）的概念，以渐进和交互的方式：1）提高局部模型的准确性；2）在任务执行过程中添加新对象；3）将技能扩展到未提供示范的区域。我们使用扭矩控制的 7-DoF DLR SARA 机器人（Iskandar et al.）

{"title":"Interactive Incremental Learning of Generalizable Skills With Local Trajectory Modulation","authors":"Markus Knauer;Alin Albu-Schäffer;Freek Stulp;João Silvério","doi":"10.1109/LRA.2025.3542209","DOIUrl":"https://doi.org/10.1109/LRA.2025.3542209","url":null,"abstract":"The problem of generalization in learning from demonstration (LfD) has received considerable attention over the years, particularly within the context of movement primitives, where a number of approaches have emerged. Recently, two important approaches have gained recognition. While one leverages via-points to adapt skills locally by modulating demonstrated trajectories, another relies on so-called <italic>task-parameterized</i> (TP) models that encode movements with respect to different coordinate systems, using a product of probabilities for generalization. While the former are well-suited to precise, local modulations, the latter aim at generalizing over large regions of the workspace and often involve multiple objects. Addressing the quality of generalization by leveraging both approaches simultaneously has received little attention. In this work, we propose an interactive imitation learning framework that simultaneously leverages local and global modulations of trajectory distributions. Building on the kernelized movement primitives (KMP) framework, we introduce novel mechanisms for skill modulation from direct human corrective feedback. Our approach particularly exploits the concept of via-points to incrementally and interactively 1) improve the model accuracy locally, 2) add new objects to the task during execution and 3) extend the skill into regions where demonstrations were not provided. We evaluate our method on a bearing ring-loading task using a torque-controlled, 7-DoF, DLR SARA robot (Iskandar et al., 2020).","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 4","pages":"3398-3405"},"PeriodicalIF":4.6,"publicationDate":"2025-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143521392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Rotary Actuation Strategy for Altering Magnetically Controlled Capsule Endoscope Motion State

IF 4.6 2区计算机科学 Q2 ROBOTICS

IEEE Robotics and Automation Letters

Pub Date : 2025-02-13 DOI: 10.1109/LRA.2025.3541451

Zhifan Teng;Jixiong Ren;Hongbo Sun;Quanyue Liu;Jianhua Liu;Qiuliang Wang

The magnetically controlled capsule endoscope (MCCE) is an effective tool for the examination of the digestive tract, as it is controlled by an external magnetic field to achieve active movement. Existing rotational actuation strategies are primarily focused on the study of fixed actuation angles, with a lack of analysis of the impact of the range of actuation angles in motion control. This letter proposes an actuation strategy based on a rotating magnetic field to analyze the critical range of the actuation angle for the MCCE to achieve effective actuation and uniform advancement in the intestine. Simulation and experimental results show that the MCCE can be accelerated from a stationary state to a maximum speed at the maximum actuation angle. In experiments in silicone intestines and isolated porcine intestines, their maximum movement speeds reached 36.5 mm/s and 21.4 mm/s, respectively. In addition, the effectiveness of the actuation strategy was verified based on an adaptive actuation system. In the future, our actuation strategy and adaptive system can be combined with image or ultrasound automated diagnostics, which is expected to provide physicians with better tools for digestive examinations.

{"title":"A Rotary Actuation Strategy for Altering Magnetically Controlled Capsule Endoscope Motion State","authors":"Zhifan Teng;Jixiong Ren;Hongbo Sun;Quanyue Liu;Jianhua Liu;Qiuliang Wang","doi":"10.1109/LRA.2025.3541451","DOIUrl":"https://doi.org/10.1109/LRA.2025.3541451","url":null,"abstract":"The magnetically controlled capsule endoscope (MCCE) is an effective tool for the examination of the digestive tract, as it is controlled by an external magnetic field to achieve active movement. Existing rotational actuation strategies are primarily focused on the study of fixed actuation angles, with a lack of analysis of the impact of the range of actuation angles in motion control. This letter proposes an actuation strategy based on a rotating magnetic field to analyze the critical range of the actuation angle for the MCCE to achieve effective actuation and uniform advancement in the intestine. Simulation and experimental results show that the MCCE can be accelerated from a stationary state to a maximum speed at the maximum actuation angle. In experiments in silicone intestines and isolated porcine intestines, their maximum movement speeds reached 36.5 mm/s and 21.4 mm/s, respectively. In addition, the effectiveness of the actuation strategy was verified based on an adaptive actuation system. In the future, our actuation strategy and adaptive system can be combined with image or ultrasound automated diagnostics, which is expected to provide physicians with better tools for digestive examinations.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 4","pages":"3342-3349"},"PeriodicalIF":4.6,"publicationDate":"2025-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143512807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Online Friction Coefficient Identification for Legged Robots on Slippery Terrain Using Smoothed Contact Gradients

IF 4.6 2区计算机科学 Q2 ROBOTICS

IEEE Robotics and Automation Letters

Pub Date : 2025-02-13 DOI: 10.1109/LRA.2025.3541428

Hajun Kim;Dongyun Kang;Min-Gyu Kim;Gijeong Kim;Hae-Won Park

This letter proposes an online friction coefficient identification framework for legged robots on slippery terrain. The approach formulates the optimization problem to minimize the sum of residuals between actual and predicted states parameterized by the friction coefficient in rigid body contact dynamics. Notably, the proposed framework leverages the analytic smoothed gradient of contact impulses, obtained by smoothing the complementarity condition of Coulomb friction, to solve the issue of non-informative gradients induced from the nonsmooth contact dynamics. Moreover, we introduce the rejection method to filter out data with high normal contact velocity following contact initiations during friction coefficient identification for legged robots. To validate the proposed framework, we conduct the experiments using a quadrupedal robot platform, KAIST HOUND, on slippery and nonslippery terrain. We observe that our framework achieves fast and consistent friction coefficient identification within various initial conditions.

引用次数: 0

Load Sharing in Distributed Collaborative Manipulation

IF 4.6 2区计算机科学 Q2 ROBOTICS

IEEE Robotics and Automation Letters

Pub Date : 2025-02-13 DOI: 10.1109/LRA.2025.3541924

Jinhui Du;Yujun Liang;Hongyuan Tao;Yaohang Xu;Lijun Zhu;Han Ding

The letter considers the multi-robot collaborative manipulation with load sharing characteristic, when dynamic parameters of the system composed of multiple robots and the rigid body are unknown. Load sharing refers to each robot calculating the required wrench for the manipulation task itself and actively sharing the manipulation duty when the global grasp matrix and system parameters are unknown. The global information however must be collected for the load distribution algorithm in the literature, where a central node is required for the distribution calculation. On the contrary, we propose a distributed control framework based on the idea of the weighted consensus and the parameter estimation to achieve the load sharing among robots according to the prescribed contribution factor, when only the local grasp matrix is needed. It is found that the manipulation wrench provided by the proposed controller causes no squeezing effect on the object at the steady state. The numerical simulation and real-robot experiments verify the effectiveness of the proposed framework.

引用次数: 0

CaseVPR: Correlation-Aware Sequential Embedding for Sequence-to-Frame Visual Place Recognition

IF 4.6 2区计算机科学 Q2 ROBOTICS

IEEE Robotics and Automation Letters

Pub Date : 2025-02-13 DOI: 10.1109/LRA.2025.3541452

Heshan Li;Guohao Peng;Jun Zhang;Mingxing Wen;Yingchong Ma;Danwei Wang

Visual Place Recognition (VPR) is crucial for autonomous vehicles, as it enables their identification of previously visited locations. Compared with conventional single-frame retrieval, leveraging sequences of frames to depict places has been proven effective in alleviating perceptual aliasing. However, mainstream sequence retrieval methods encode multiple frames into a single descriptor, relinquishing the capacity of fine-grained frame-to-frame matching. This limitation hampers the precise positioning of individual frames within the query sequence. On the other hand, sequence matching methods such as SeqSLAM are capable of frame-to-frame matching, but they rely on global brute-force search and the constant speed assumption, which may result in retrieval failures. To address the above issues, we propose a sequence-to-frame hierarchical matching pipeline for VPR, named CaseVPR. It consists of coarse-level sequence retrieval based on sequential descriptor matching to mine potential starting points, followed by fine-grained sequence matching to find frame-to-frame correspondence. Particularly, a CaseNet is proposed to encode the correlation-aware features of consecutive frames into hierarchical descriptors for sequence retrieval and matching. On this basis, an AdaptSeq-V2 searching strategy is proposed to identify frame-level correspondences of the query sequence in candidate regions determined by potential starting points. To validate our hierarchical pipeline, we evaluate CaseVPR on multiple datasets. Experiments demonstrate that our CaseVPR outperforms all benchmark methods in terms of average precision, and achieves new State-of-the-art (SOTA) results for sequence-based VPR.

视觉地点识别（VPR）对自动驾驶汽车来说至关重要，因为它能让自动驾驶汽车识别以前到过的地点。与传统的单帧检索相比，利用帧序列来描述地点已被证明能有效减轻知觉混叠。然而，主流的序列检索方法将多个帧编码为一个描述符，从而放弃了细粒度帧对帧匹配的能力。这种限制妨碍了查询序列中单个帧的精确定位。另一方面，序列匹配方法（如 SeqSLAM）能够进行帧对帧匹配，但它们依赖于全局暴力搜索和恒定速度假设，这可能会导致检索失败。为解决上述问题，我们提出了一种用于 VPR 的序列到帧分层匹配管道，命名为 CaseVPR。它包括基于序列描述符匹配的粗级序列检索，以挖掘潜在的起点，然后通过细粒度序列匹配找到帧与帧之间的对应关系。特别是，建议使用 CaseNet 将连续帧的相关感知特征编码成分层描述符，以便进行序列检索和匹配。在此基础上，我们提出了 AdaptSeq-V2 搜索策略，在由潜在起点确定的候选区域中识别查询序列的帧级对应关系。为了验证我们的分层管道，我们在多个数据集上对 CaseVPR 进行了评估。实验证明，我们的 CaseVPR 在平均精确度方面优于所有基准方法，并为基于序列的 VPR 取得了新的最先进（SOTA）结果。

{"title":"CaseVPR: Correlation-Aware Sequential Embedding for Sequence-to-Frame Visual Place Recognition","authors":"Heshan Li;Guohao Peng;Jun Zhang;Mingxing Wen;Yingchong Ma;Danwei Wang","doi":"10.1109/LRA.2025.3541452","DOIUrl":"https://doi.org/10.1109/LRA.2025.3541452","url":null,"abstract":"Visual Place Recognition (VPR) is crucial for autonomous vehicles, as it enables their identification of previously visited locations. Compared with conventional single-frame retrieval, leveraging sequences of frames to depict places has been proven effective in alleviating perceptual aliasing. However, mainstream sequence retrieval methods encode multiple frames into a single descriptor, relinquishing the capacity of fine-grained frame-to-frame matching. This limitation hampers the precise positioning of individual frames within the query sequence. On the other hand, sequence matching methods such as SeqSLAM are capable of frame-to-frame matching, but they rely on global brute-force search and the constant speed assumption, which may result in retrieval failures. To address the above issues, we propose a sequence-to-frame hierarchical matching pipeline for VPR, named CaseVPR. It consists of coarse-level sequence retrieval based on sequential descriptor matching to mine potential starting points, followed by fine-grained sequence matching to find frame-to-frame correspondence. Particularly, a CaseNet is proposed to encode the correlation-aware features of consecutive frames into hierarchical descriptors for sequence retrieval and matching. On this basis, an AdaptSeq-V2 searching strategy is proposed to identify frame-level correspondences of the query sequence in candidate regions determined by potential starting points. To validate our hierarchical pipeline, we evaluate CaseVPR on multiple datasets. Experiments demonstrate that our CaseVPR outperforms all benchmark methods in terms of average precision, and achieves new State-of-the-art (SOTA) results for sequence-based VPR.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 4","pages":"3430-3437"},"PeriodicalIF":4.6,"publicationDate":"2025-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143521393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Observe Then Act: Asynchronous Active Vision-Action Model for Robotic Manipulation

IF 4.6 2区计算机科学 Q2 ROBOTICS

IEEE Robotics and Automation Letters

Pub Date : 2025-02-12 DOI: 10.1109/LRA.2025.3541334

Guokang Wang;Hang Li;Shuyuan Zhang;Di Guo;Yanhong Liu;Huaping Liu

In real-world scenarios, many robotic manipulation tasks are hindered by occlusions and limited fields of view, posing significant challenges for passive observation-based models that rely on fixed or wrist-mounted cameras. In this letter, we investigate the problem of robotic manipulation under limited visual observation and propose a task-driven asynchronous active vision-action model. Our model serially connects a camera Next-Best-View (NBV) policy with a gripper Next-Best-Pose (NBP) policy, and trains them in a sensor-motor coordination framework using few-shot reinforcement learning. This approach enables the agent to reposition a third-person camera to actively observe the environment based on the task goal, and subsequently determine the appropriate manipulation actions. We trained and evaluated our model on 8 viewpoint-constrained tasks in RLBench. The results demonstrate that our model consistently outperforms baseline algorithms, showcasing its effectiveness in handling visual constraints in manipulation tasks.

引用次数: 0

OVL-MAP: An Online Visual Language Map Approach for Vision-and-Language Navigation in Continuous Environments

IF 4.6 2区计算机科学 Q2 ROBOTICS

IEEE Robotics and Automation Letters

Pub Date : 2025-02-11 DOI: 10.1109/LRA.2025.3540577

Shuhuan Wen;Ziyuan Zhang;Yuxiang Sun;Zhiwen Wang

Vision-and-Language Navigation in Continuous Environments (VLN-CE) requires agents to navigate 3D environments based on visual observations and natural language instructions. Existing approaches, focused on topological and semantic maps, often face limitations in accurately understanding and adapting to complex or previously unseen environments, particularly due to static and offline map constructions. To address these challenges, this letter proposes OVL-MAP, an innovative algorithm comprising three key modules: an online vision-and-language map construction module, a waypoint prediction module, and an action decision module. The online map construction module leverages robust open-vocabulary semantic segmentation to dynamically enhance the agent's scene understanding. The waypoint prediction module processes natural language instructions to identify task-relevant regions, predict sub-goal locations, and guide trajectory planning. The action decision module utilizes the DD-PPO strategy for effective navigation. Evaluations on the Robo-VLN and R2R-CE datasets demonstrate that OVL-MAP significantly improves navigation performance and exhibits stronger generalization in unknown environments.

连续环境中的视觉与语言导航（VLN-CE）要求代理根据视觉观察和自然语言指令导航三维环境。现有的方法侧重于拓扑和语义地图，在准确理解和适应复杂或以前未见过的环境方面往往面临局限，特别是由于静态和离线地图的构建。为了应对这些挑战，本文提出了一种创新算法 OVL-MAP，它由三个关键模块组成：在线视觉和语言地图构建模块、航点预测模块和行动决策模块。在线地图构建模块利用强大的开放词汇语义分割技术，动态地增强代理对场景的理解。航点预测模块处理自然语言指令，以识别任务相关区域、预测子目标位置并指导轨迹规划。行动决策模块利用 DD-PPO 策略实现有效导航。在 Robo-VLN 和 R2R-CE 数据集上进行的评估表明，OVL-MAP 显著提高了导航性能，并在未知环境中表现出更强的泛化能力。

{"title":"OVL-MAP: An Online Visual Language Map Approach for Vision-and-Language Navigation in Continuous Environments","authors":"Shuhuan Wen;Ziyuan Zhang;Yuxiang Sun;Zhiwen Wang","doi":"10.1109/LRA.2025.3540577","DOIUrl":"https://doi.org/10.1109/LRA.2025.3540577","url":null,"abstract":"Vision-and-Language Navigation in Continuous Environments (VLN-CE) requires agents to navigate 3D environments based on visual observations and natural language instructions. Existing approaches, focused on topological and semantic maps, often face limitations in accurately understanding and adapting to complex or previously unseen environments, particularly due to static and offline map constructions. To address these challenges, this letter proposes OVL-MAP, an innovative algorithm comprising three key modules: an online vision-and-language map construction module, a waypoint prediction module, and an action decision module. The online map construction module leverages robust open-vocabulary semantic segmentation to dynamically enhance the agent's scene understanding. The waypoint prediction module processes natural language instructions to identify task-relevant regions, predict sub-goal locations, and guide trajectory planning. The action decision module utilizes the DD-PPO strategy for effective navigation. Evaluations on the Robo-VLN and R2R-CE datasets demonstrate that OVL-MAP significantly improves navigation performance and exhibits stronger generalization in unknown environments.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 4","pages":"3294-3301"},"PeriodicalIF":4.6,"publicationDate":"2025-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143480774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

IEEE Robotics and Automation Letters

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀