Pub Date : 2025-02-13DOI: 10.1109/LRA.2025.3541335
Kitae Kim;Aarya Deb;David J. Cappelleri
In this paper, we present an in-row and under-canopy autonomous navigation system for cornfields, called the Purdue Agricultural Navigation System or P-AgNav. Our navigation framework is primarily based on range view images from a 3D light detection and ranging (LiDAR) sensor. P-AgNav is designed for an autonomous robot to navigate in the corn rows with collision avoidance and to switch between rows without GNSS assistance or pre-defined waypoints. The system enables robots, which are intended to monitor crops or conduct physical sampling, to autonomously navigate multiple crop rows with minimal human intervention, thereby increasing crop management efficiency. The capabilities of P-AgNav have been validated through experiments in both simulation and real cornfield environments.
{"title":"P-AgNav: Range View-Based Autonomous Navigation System for Cornfields","authors":"Kitae Kim;Aarya Deb;David J. Cappelleri","doi":"10.1109/LRA.2025.3541335","DOIUrl":"https://doi.org/10.1109/LRA.2025.3541335","url":null,"abstract":"In this paper, we present an in-row and under-canopy autonomous navigation system for cornfields, called the Purdue Agricultural Navigation System or P-AgNav. Our navigation framework is primarily based on range view images from a 3D light detection and ranging (LiDAR) sensor. P-AgNav is designed for an autonomous robot to navigate in the corn rows with collision avoidance and to switch between rows without GNSS assistance or pre-defined waypoints. The system enables robots, which are intended to monitor crops or conduct physical sampling, to autonomously navigate multiple crop rows with minimal human intervention, thereby increasing crop management efficiency. The capabilities of P-AgNav have been validated through experiments in both simulation and real cornfield environments.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 4","pages":"3366-3373"},"PeriodicalIF":4.6,"publicationDate":"2025-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143512790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This letter proposes a novel scan-to-neural model matching, tightly-coupled LiDAR-inertial Simultaneous Localization and Mapping (SLAM) system, which can achieve more accurate state estimation and incrementally reconstruct the dense map. Different from the existing methods, the key insight of the proposed approach is that region-specific Signed Distance Function (SDF) estimations supervise the neural implicit representation to capture scene geometry, while SDF predictions and Inertial Measurement Unit (IMU) data are fused to strengthen the alignment of the LiDAR scan and the neural SDF map. As a result, the proposed approach achieves more robust and accurate state estimation with high-fidelity surface reconstruction. Specifically, an SDF supervision estimation method is proposed to generate more accurate SDF labels. Point-to-plane distances are utilized for planar regions and local nearest-neighbor distances are leveraged for non-planar areas, which reduces reconstruction artifacts and further significantly improves localization accuracy. Furthermore, we propose the first tightly-coupled LiDAR-inertial neural dense SLAM system that fuses SDF predictions and IMU data to align the received scan with the neural SDF map, thereby achieving more robust and accurate localization. Comparative experiments on multiple datasets are conducted to demonstrate the superior performance of the proposed method including the localization accuracy, robustness, and mapping quality.
{"title":"CLID-SLAM: A Coupled LiDAR-Inertial Neural Implicit Dense SLAM With Region-Specific SDF Estimation","authors":"Junlong Jiang;Xuetao Zhang;Gang Sun;Yisha Liu;Xuebo Zhang;Yan Zhuang","doi":"10.1109/LRA.2025.3541912","DOIUrl":"https://doi.org/10.1109/LRA.2025.3541912","url":null,"abstract":"This letter proposes a novel scan-to-neural model matching, tightly-coupled LiDAR-inertial Simultaneous Localization and Mapping (SLAM) system, which can achieve more accurate state estimation and incrementally reconstruct the dense map. Different from the existing methods, the key insight of the proposed approach is that region-specific Signed Distance Function (SDF) estimations supervise the neural implicit representation to capture scene geometry, while SDF predictions and Inertial Measurement Unit (IMU) data are fused to strengthen the alignment of the LiDAR scan and the neural SDF map. As a result, the proposed approach achieves more robust and accurate state estimation with high-fidelity surface reconstruction. Specifically, an SDF supervision estimation method is proposed to generate more accurate SDF labels. Point-to-plane distances are utilized for planar regions and local nearest-neighbor distances are leveraged for non-planar areas, which reduces reconstruction artifacts and further significantly improves localization accuracy. Furthermore, we propose the first tightly-coupled LiDAR-inertial neural dense SLAM system that fuses SDF predictions and IMU data to align the received scan with the neural SDF map, thereby achieving more robust and accurate localization. Comparative experiments on multiple datasets are conducted to demonstrate the superior performance of the proposed method including the localization accuracy, robustness, and mapping quality.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 4","pages":"3310-3317"},"PeriodicalIF":4.6,"publicationDate":"2025-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143489204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Multirotor aerial vehicles (MAVs) in confined, dynamic indoor environments need reliable planning capabilities to avoid moving pedestrians. Current MAV trajectory planning algorithms often result in low success rates or unnecessary constraints on navigable space. We propose a multi-stage local trajectory planner that predicts pedestrian movements using State-Time Space (ST-space) based on the Euclidean Signed Distance Field (ESDF) to tackle these challenges. Our method quickly generates collision-free trajectories by incorporating spatiotemporal optimization and fast ESDF queries. Based on statistical analysis, our method improves performance over state-of-the-art MAV trajectory planning methods as pedestrian speed increases. Finally, we validate the real-time applicability of our proposed method in indoor dynamic scenarios.
{"title":"A State-Time Space Approach for Local Trajectory Replanning of an MAV in Dynamic Indoor Environments","authors":"Fengyu Quan;Yuanzhe Shen;Peiyan Liu;Ximin Lyu;Haoyao Chen","doi":"10.1109/LRA.2025.3541376","DOIUrl":"https://doi.org/10.1109/LRA.2025.3541376","url":null,"abstract":"Multirotor aerial vehicles (MAVs) in confined, dynamic indoor environments need reliable planning capabilities to avoid moving pedestrians. Current MAV trajectory planning algorithms often result in low success rates or unnecessary constraints on navigable space. We propose a multi-stage local trajectory planner that predicts pedestrian movements using State-Time Space (ST-space) based on the Euclidean Signed Distance Field (ESDF) to tackle these challenges. Our method quickly generates collision-free trajectories by incorporating spatiotemporal optimization and fast ESDF queries. Based on statistical analysis, our method improves performance over state-of-the-art MAV trajectory planning methods as pedestrian speed increases. Finally, we validate the real-time applicability of our proposed method in indoor dynamic scenarios.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 4","pages":"3438-3445"},"PeriodicalIF":4.6,"publicationDate":"2025-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143521435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-13DOI: 10.1109/LRA.2025.3542209
Markus Knauer;Alin Albu-Schäffer;Freek Stulp;João Silvério
The problem of generalization in learning from demonstration (LfD) has received considerable attention over the years, particularly within the context of movement primitives, where a number of approaches have emerged. Recently, two important approaches have gained recognition. While one leverages via-points to adapt skills locally by modulating demonstrated trajectories, another relies on so-called task-parameterized (TP) models that encode movements with respect to different coordinate systems, using a product of probabilities for generalization. While the former are well-suited to precise, local modulations, the latter aim at generalizing over large regions of the workspace and often involve multiple objects. Addressing the quality of generalization by leveraging both approaches simultaneously has received little attention. In this work, we propose an interactive imitation learning framework that simultaneously leverages local and global modulations of trajectory distributions. Building on the kernelized movement primitives (KMP) framework, we introduce novel mechanisms for skill modulation from direct human corrective feedback. Our approach particularly exploits the concept of via-points to incrementally and interactively 1) improve the model accuracy locally, 2) add new objects to the task during execution and 3) extend the skill into regions where demonstrations were not provided. We evaluate our method on a bearing ring-loading task using a torque-controlled, 7-DoF, DLR SARA robot (Iskandar et al., 2020).
多年来,从演示中学习(LfD)的泛化问题受到了广泛关注,尤其是在运动原语方面,出现了许多方法。最近,有两种重要的方法得到了认可。一种方法是利用通路点(via-points),通过调节已演示的轨迹来局部调整技能;另一种方法则依赖于所谓的任务参数化(TP)模型,该模型根据不同的坐标系对动作进行编码,并使用概率乘积进行泛化。前者非常适合精确的局部调制,而后者的目标是在工作空间的大范围内进行泛化,通常涉及多个对象。同时利用这两种方法来解决泛化的质量问题,目前还很少有人关注。在这项工作中,我们提出了一种交互式模仿学习框架,可同时利用轨迹分布的局部和全局调制。在核化运动基元(KMP)框架的基础上,我们引入了新的机制,从直接的人类纠正反馈中进行技能调节。我们的方法特别利用了 "通路点"(via-points)的概念,以渐进和交互的方式:1)提高局部模型的准确性;2)在任务执行过程中添加新对象;3)将技能扩展到未提供示范的区域。我们使用扭矩控制的 7-DoF DLR SARA 机器人(Iskandar et al.)
{"title":"Interactive Incremental Learning of Generalizable Skills With Local Trajectory Modulation","authors":"Markus Knauer;Alin Albu-Schäffer;Freek Stulp;João Silvério","doi":"10.1109/LRA.2025.3542209","DOIUrl":"https://doi.org/10.1109/LRA.2025.3542209","url":null,"abstract":"The problem of generalization in learning from demonstration (LfD) has received considerable attention over the years, particularly within the context of movement primitives, where a number of approaches have emerged. Recently, two important approaches have gained recognition. While one leverages via-points to adapt skills locally by modulating demonstrated trajectories, another relies on so-called <italic>task-parameterized</i> (TP) models that encode movements with respect to different coordinate systems, using a product of probabilities for generalization. While the former are well-suited to precise, local modulations, the latter aim at generalizing over large regions of the workspace and often involve multiple objects. Addressing the quality of generalization by leveraging both approaches simultaneously has received little attention. In this work, we propose an interactive imitation learning framework that simultaneously leverages local and global modulations of trajectory distributions. Building on the kernelized movement primitives (KMP) framework, we introduce novel mechanisms for skill modulation from direct human corrective feedback. Our approach particularly exploits the concept of via-points to incrementally and interactively 1) improve the model accuracy locally, 2) add new objects to the task during execution and 3) extend the skill into regions where demonstrations were not provided. We evaluate our method on a bearing ring-loading task using a torque-controlled, 7-DoF, DLR SARA robot (Iskandar et al., 2020).","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 4","pages":"3398-3405"},"PeriodicalIF":4.6,"publicationDate":"2025-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143521392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-13DOI: 10.1109/LRA.2025.3541451
Zhifan Teng;Jixiong Ren;Hongbo Sun;Quanyue Liu;Jianhua Liu;Qiuliang Wang
The magnetically controlled capsule endoscope (MCCE) is an effective tool for the examination of the digestive tract, as it is controlled by an external magnetic field to achieve active movement. Existing rotational actuation strategies are primarily focused on the study of fixed actuation angles, with a lack of analysis of the impact of the range of actuation angles in motion control. This letter proposes an actuation strategy based on a rotating magnetic field to analyze the critical range of the actuation angle for the MCCE to achieve effective actuation and uniform advancement in the intestine. Simulation and experimental results show that the MCCE can be accelerated from a stationary state to a maximum speed at the maximum actuation angle. In experiments in silicone intestines and isolated porcine intestines, their maximum movement speeds reached 36.5 mm/s and 21.4 mm/s, respectively. In addition, the effectiveness of the actuation strategy was verified based on an adaptive actuation system. In the future, our actuation strategy and adaptive system can be combined with image or ultrasound automated diagnostics, which is expected to provide physicians with better tools for digestive examinations.
{"title":"A Rotary Actuation Strategy for Altering Magnetically Controlled Capsule Endoscope Motion State","authors":"Zhifan Teng;Jixiong Ren;Hongbo Sun;Quanyue Liu;Jianhua Liu;Qiuliang Wang","doi":"10.1109/LRA.2025.3541451","DOIUrl":"https://doi.org/10.1109/LRA.2025.3541451","url":null,"abstract":"The magnetically controlled capsule endoscope (MCCE) is an effective tool for the examination of the digestive tract, as it is controlled by an external magnetic field to achieve active movement. Existing rotational actuation strategies are primarily focused on the study of fixed actuation angles, with a lack of analysis of the impact of the range of actuation angles in motion control. This letter proposes an actuation strategy based on a rotating magnetic field to analyze the critical range of the actuation angle for the MCCE to achieve effective actuation and uniform advancement in the intestine. Simulation and experimental results show that the MCCE can be accelerated from a stationary state to a maximum speed at the maximum actuation angle. In experiments in silicone intestines and isolated porcine intestines, their maximum movement speeds reached 36.5 mm/s and 21.4 mm/s, respectively. In addition, the effectiveness of the actuation strategy was verified based on an adaptive actuation system. In the future, our actuation strategy and adaptive system can be combined with image or ultrasound automated diagnostics, which is expected to provide physicians with better tools for digestive examinations.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 4","pages":"3342-3349"},"PeriodicalIF":4.6,"publicationDate":"2025-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143512807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-13DOI: 10.1109/LRA.2025.3541428
Hajun Kim;Dongyun Kang;Min-Gyu Kim;Gijeong Kim;Hae-Won Park
This letter proposes an online friction coefficient identification framework for legged robots on slippery terrain. The approach formulates the optimization problem to minimize the sum of residuals between actual and predicted states parameterized by the friction coefficient in rigid body contact dynamics. Notably, the proposed framework leverages the analytic smoothed gradient of contact impulses, obtained by smoothing the complementarity condition of Coulomb friction, to solve the issue of non-informative gradients induced from the nonsmooth contact dynamics. Moreover, we introduce the rejection method to filter out data with high normal contact velocity following contact initiations during friction coefficient identification for legged robots. To validate the proposed framework, we conduct the experiments using a quadrupedal robot platform, KAIST HOUND, on slippery and nonslippery terrain. We observe that our framework achieves fast and consistent friction coefficient identification within various initial conditions.
{"title":"Online Friction Coefficient Identification for Legged Robots on Slippery Terrain Using Smoothed Contact Gradients","authors":"Hajun Kim;Dongyun Kang;Min-Gyu Kim;Gijeong Kim;Hae-Won Park","doi":"10.1109/LRA.2025.3541428","DOIUrl":"https://doi.org/10.1109/LRA.2025.3541428","url":null,"abstract":"This letter proposes an online friction coefficient identification framework for legged robots on slippery terrain. The approach formulates the optimization problem to minimize the sum of residuals between actual and predicted states parameterized by the friction coefficient in rigid body contact dynamics. Notably, the proposed framework leverages the analytic smoothed gradient of contact impulses, obtained by smoothing the complementarity condition of Coulomb friction, to solve the issue of non-informative gradients induced from the nonsmooth contact dynamics. Moreover, we introduce the rejection method to filter out data with high normal contact velocity following contact initiations during friction coefficient identification for legged robots. To validate the proposed framework, we conduct the experiments using a quadrupedal robot platform, KAIST HOUND, on slippery and nonslippery terrain. We observe that our framework achieves fast and consistent friction coefficient identification within various initial conditions.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 4","pages":"3150-3157"},"PeriodicalIF":4.6,"publicationDate":"2025-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143455246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The letter considers the multi-robot collaborative manipulation with load sharing characteristic, when dynamic parameters of the system composed of multiple robots and the rigid body are unknown. Load sharing refers to each robot calculating the required wrench for the manipulation task itself and actively sharing the manipulation duty when the global grasp matrix and system parameters are unknown. The global information however must be collected for the load distribution algorithm in the literature, where a central node is required for the distribution calculation. On the contrary, we propose a distributed control framework based on the idea of the weighted consensus and the parameter estimation to achieve the load sharing among robots according to the prescribed contribution factor, when only the local grasp matrix is needed. It is found that the manipulation wrench provided by the proposed controller causes no squeezing effect on the object at the steady state. The numerical simulation and real-robot experiments verify the effectiveness of the proposed framework.
{"title":"Load Sharing in Distributed Collaborative Manipulation","authors":"Jinhui Du;Yujun Liang;Hongyuan Tao;Yaohang Xu;Lijun Zhu;Han Ding","doi":"10.1109/LRA.2025.3541924","DOIUrl":"https://doi.org/10.1109/LRA.2025.3541924","url":null,"abstract":"The letter considers the multi-robot collaborative manipulation with load sharing characteristic, when dynamic parameters of the system composed of multiple robots and the rigid body are unknown. Load sharing refers to each robot calculating the required wrench for the manipulation task itself and actively sharing the manipulation duty when the global grasp matrix and system parameters are unknown. The global information however must be collected for the load distribution algorithm in the literature, where a central node is required for the distribution calculation. On the contrary, we propose a distributed control framework based on the idea of the weighted consensus and the parameter estimation to achieve the load sharing among robots according to the prescribed contribution factor, when only the local grasp matrix is needed. It is found that the manipulation wrench provided by the proposed controller causes no squeezing effect on the object at the steady state. The numerical simulation and real-robot experiments verify the effectiveness of the proposed framework.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 4","pages":"3390-3397"},"PeriodicalIF":4.6,"publicationDate":"2025-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143521421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-13DOI: 10.1109/LRA.2025.3541452
Heshan Li;Guohao Peng;Jun Zhang;Mingxing Wen;Yingchong Ma;Danwei Wang
Visual Place Recognition (VPR) is crucial for autonomous vehicles, as it enables their identification of previously visited locations. Compared with conventional single-frame retrieval, leveraging sequences of frames to depict places has been proven effective in alleviating perceptual aliasing. However, mainstream sequence retrieval methods encode multiple frames into a single descriptor, relinquishing the capacity of fine-grained frame-to-frame matching. This limitation hampers the precise positioning of individual frames within the query sequence. On the other hand, sequence matching methods such as SeqSLAM are capable of frame-to-frame matching, but they rely on global brute-force search and the constant speed assumption, which may result in retrieval failures. To address the above issues, we propose a sequence-to-frame hierarchical matching pipeline for VPR, named CaseVPR. It consists of coarse-level sequence retrieval based on sequential descriptor matching to mine potential starting points, followed by fine-grained sequence matching to find frame-to-frame correspondence. Particularly, a CaseNet is proposed to encode the correlation-aware features of consecutive frames into hierarchical descriptors for sequence retrieval and matching. On this basis, an AdaptSeq-V2 searching strategy is proposed to identify frame-level correspondences of the query sequence in candidate regions determined by potential starting points. To validate our hierarchical pipeline, we evaluate CaseVPR on multiple datasets. Experiments demonstrate that our CaseVPR outperforms all benchmark methods in terms of average precision, and achieves new State-of-the-art (SOTA) results for sequence-based VPR.
{"title":"CaseVPR: Correlation-Aware Sequential Embedding for Sequence-to-Frame Visual Place Recognition","authors":"Heshan Li;Guohao Peng;Jun Zhang;Mingxing Wen;Yingchong Ma;Danwei Wang","doi":"10.1109/LRA.2025.3541452","DOIUrl":"https://doi.org/10.1109/LRA.2025.3541452","url":null,"abstract":"Visual Place Recognition (VPR) is crucial for autonomous vehicles, as it enables their identification of previously visited locations. Compared with conventional single-frame retrieval, leveraging sequences of frames to depict places has been proven effective in alleviating perceptual aliasing. However, mainstream sequence retrieval methods encode multiple frames into a single descriptor, relinquishing the capacity of fine-grained frame-to-frame matching. This limitation hampers the precise positioning of individual frames within the query sequence. On the other hand, sequence matching methods such as SeqSLAM are capable of frame-to-frame matching, but they rely on global brute-force search and the constant speed assumption, which may result in retrieval failures. To address the above issues, we propose a sequence-to-frame hierarchical matching pipeline for VPR, named CaseVPR. It consists of coarse-level sequence retrieval based on sequential descriptor matching to mine potential starting points, followed by fine-grained sequence matching to find frame-to-frame correspondence. Particularly, a CaseNet is proposed to encode the correlation-aware features of consecutive frames into hierarchical descriptors for sequence retrieval and matching. On this basis, an AdaptSeq-V2 searching strategy is proposed to identify frame-level correspondences of the query sequence in candidate regions determined by potential starting points. To validate our hierarchical pipeline, we evaluate CaseVPR on multiple datasets. Experiments demonstrate that our CaseVPR outperforms all benchmark methods in terms of average precision, and achieves new State-of-the-art (SOTA) results for sequence-based VPR.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 4","pages":"3430-3437"},"PeriodicalIF":4.6,"publicationDate":"2025-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143521393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-12DOI: 10.1109/LRA.2025.3541334
Guokang Wang;Hang Li;Shuyuan Zhang;Di Guo;Yanhong Liu;Huaping Liu
In real-world scenarios, many robotic manipulation tasks are hindered by occlusions and limited fields of view, posing significant challenges for passive observation-based models that rely on fixed or wrist-mounted cameras. In this letter, we investigate the problem of robotic manipulation under limited visual observation and propose a task-driven asynchronous active vision-action model. Our model serially connects a camera Next-Best-View (NBV) policy with a gripper Next-Best-Pose (NBP) policy, and trains them in a sensor-motor coordination framework using few-shot reinforcement learning. This approach enables the agent to reposition a third-person camera to actively observe the environment based on the task goal, and subsequently determine the appropriate manipulation actions. We trained and evaluated our model on 8 viewpoint-constrained tasks in RLBench. The results demonstrate that our model consistently outperforms baseline algorithms, showcasing its effectiveness in handling visual constraints in manipulation tasks.
{"title":"Observe Then Act: Asynchronous Active Vision-Action Model for Robotic Manipulation","authors":"Guokang Wang;Hang Li;Shuyuan Zhang;Di Guo;Yanhong Liu;Huaping Liu","doi":"10.1109/LRA.2025.3541334","DOIUrl":"https://doi.org/10.1109/LRA.2025.3541334","url":null,"abstract":"In real-world scenarios, many robotic manipulation tasks are hindered by occlusions and limited fields of view, posing significant challenges for passive observation-based models that rely on fixed or wrist-mounted cameras. In this letter, we investigate the problem of robotic manipulation under limited visual observation and propose a task-driven asynchronous active vision-action model. Our model serially connects a camera Next-Best-View (NBV) policy with a gripper Next-Best-Pose (NBP) policy, and trains them in a sensor-motor coordination framework using few-shot reinforcement learning. This approach enables the agent to reposition a third-person camera to actively observe the environment based on the task goal, and subsequently determine the appropriate manipulation actions. We trained and evaluated our model on 8 viewpoint-constrained tasks in RLBench. The results demonstrate that our model consistently outperforms baseline algorithms, showcasing its effectiveness in handling visual constraints in manipulation tasks.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 4","pages":"3422-3429"},"PeriodicalIF":4.6,"publicationDate":"2025-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143521434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-11DOI: 10.1109/LRA.2025.3540577
Shuhuan Wen;Ziyuan Zhang;Yuxiang Sun;Zhiwen Wang
Vision-and-Language Navigation in Continuous Environments (VLN-CE) requires agents to navigate 3D environments based on visual observations and natural language instructions. Existing approaches, focused on topological and semantic maps, often face limitations in accurately understanding and adapting to complex or previously unseen environments, particularly due to static and offline map constructions. To address these challenges, this letter proposes OVL-MAP, an innovative algorithm comprising three key modules: an online vision-and-language map construction module, a waypoint prediction module, and an action decision module. The online map construction module leverages robust open-vocabulary semantic segmentation to dynamically enhance the agent's scene understanding. The waypoint prediction module processes natural language instructions to identify task-relevant regions, predict sub-goal locations, and guide trajectory planning. The action decision module utilizes the DD-PPO strategy for effective navigation. Evaluations on the Robo-VLN and R2R-CE datasets demonstrate that OVL-MAP significantly improves navigation performance and exhibits stronger generalization in unknown environments.
{"title":"OVL-MAP: An Online Visual Language Map Approach for Vision-and-Language Navigation in Continuous Environments","authors":"Shuhuan Wen;Ziyuan Zhang;Yuxiang Sun;Zhiwen Wang","doi":"10.1109/LRA.2025.3540577","DOIUrl":"https://doi.org/10.1109/LRA.2025.3540577","url":null,"abstract":"Vision-and-Language Navigation in Continuous Environments (VLN-CE) requires agents to navigate 3D environments based on visual observations and natural language instructions. Existing approaches, focused on topological and semantic maps, often face limitations in accurately understanding and adapting to complex or previously unseen environments, particularly due to static and offline map constructions. To address these challenges, this letter proposes OVL-MAP, an innovative algorithm comprising three key modules: an online vision-and-language map construction module, a waypoint prediction module, and an action decision module. The online map construction module leverages robust open-vocabulary semantic segmentation to dynamically enhance the agent's scene understanding. The waypoint prediction module processes natural language instructions to identify task-relevant regions, predict sub-goal locations, and guide trajectory planning. The action decision module utilizes the DD-PPO strategy for effective navigation. Evaluations on the Robo-VLN and R2R-CE datasets demonstrate that OVL-MAP significantly improves navigation performance and exhibits stronger generalization in unknown environments.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 4","pages":"3294-3301"},"PeriodicalIF":4.6,"publicationDate":"2025-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143480774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}