Pub Date : 2026-01-12DOI: 10.1109/LRA.2026.3653394
Jonghyeok Kim;Wan Kyun Chung
Among the many choices in the matrix-vector factorization of the Coriolis and centripetal terms satisfying the skew-symmetry condition in system dynamics, the unique factorization, called Christoffel-consistent (CC) factorization, has been proposed. We derived the unique CC factorization in the Lie group context and examined the impact of Christoffel inconsistency in Coriolis matrix factorization on the dynamic behavior of robot systems during both free motion and interaction with humans, particularly in the context of passivity-based controllers and augmented PD controllers. Specifically, the question is: What are the advantages of using the CC factorization, and what is the effect of non-CC factorization on the robot’s dynamic behavior, which has been rarely explored? We showed that Christoffel inconsistency generates unwanted torsion, causing the system to deviate from the desired trajectory, and this results in undesirable dynamic behavior when controlling the system, especially when the dynamics of the robot is described by twist and wrench. Through simulation and a real-world robot experiment, this phenomenon is verified for the first time.
{"title":"Christoffel-Consistent Coriolis Factorization and Its Effect on the Control of a Robot","authors":"Jonghyeok Kim;Wan Kyun Chung","doi":"10.1109/LRA.2026.3653394","DOIUrl":"https://doi.org/10.1109/LRA.2026.3653394","url":null,"abstract":"Among the many choices in the matrix-vector factorization of the Coriolis and centripetal terms satisfying the skew-symmetry condition in system dynamics, the unique factorization, called Christoffel-consistent (CC) factorization, has been proposed. We derived the unique CC factorization in the Lie group context and examined the impact of Christoffel inconsistency in Coriolis matrix factorization on the dynamic behavior of robot systems during both free motion and interaction with humans, particularly in the context of passivity-based controllers and augmented PD controllers. Specifically, the question is: What are the advantages of using the CC factorization, and what is the effect of non-CC factorization on the robot’s dynamic behavior, which has been rarely explored? We showed that Christoffel inconsistency generates unwanted torsion, causing the system to deviate from the desired trajectory, and this results in undesirable dynamic behavior when controlling the system, especially when the dynamics of the robot is described by twist and wrench. Through simulation and a real-world robot experiment, this phenomenon is verified for the first time.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"11 3","pages":"2682-2689"},"PeriodicalIF":5.3,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146026592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-12DOI: 10.1109/LRA.2026.3652069
Jianxi Zhang;Jingtian Zhang;Hong Zeng;Dapeng Chen;Huijun Li;Aiguo Song
The foreign objects on utility poles may damage power lines and cause significant disruptions in electricity supply. A widely used approach to address this issue is for qualified personnel to climb on the pole and remove the foreign objects in a timely manner using an insulating tube. However, prolonged overhead manipulation of the insulating tube in the constrained environment not only leads to considerable upper-limb fatigue but also makes accurate tube positioning increasingly challenging. To address these challenges, wearable robotic limbs with an active control strategy have the potential to effectively reduce upper-limb fatigue and assist in tube positioning. This work presents supernumerary robotic limbs (SRLs) designed to assist electrical workers in a simulated overhead foreign objects removal task. We further propose a shared control method based on finite-horizon non-zero-sum game theory. This method models the cooperation between the SRL and the worker to adaptively modulate the input of the SRL, thereby providing rapid and accurate assistance in tube positioning. Experimental results show that the proposed SRL can reduce primary upper-limb muscle activity (deltoid, biceps brachii, brachioradialis and flexor carpi radialis) by up to 59.73% compared with performing the task without the SRL. Moreover, compared with a method that ignores human input, the proposed control strategy achieves more accurate positioning during human-SRLs cooperation. These results demonstrate the potential of both the SRL and the control strategy for the live-line overhead foreign objects removal task.
{"title":"Development and Control of Supernumerary Robotic Limbs for Overhead Tube Manipulation Task","authors":"Jianxi Zhang;Jingtian Zhang;Hong Zeng;Dapeng Chen;Huijun Li;Aiguo Song","doi":"10.1109/LRA.2026.3652069","DOIUrl":"https://doi.org/10.1109/LRA.2026.3652069","url":null,"abstract":"The foreign objects on utility poles may damage power lines and cause significant disruptions in electricity supply. A widely used approach to address this issue is for qualified personnel to climb on the pole and remove the foreign objects in a timely manner using an insulating tube. However, prolonged overhead manipulation of the insulating tube in the constrained environment not only leads to considerable upper-limb fatigue but also makes accurate tube positioning increasingly challenging. To address these challenges, wearable robotic limbs with an active control strategy have the potential to effectively reduce upper-limb fatigue and assist in tube positioning. This work presents supernumerary robotic limbs (SRLs) designed to assist electrical workers in a simulated overhead foreign objects removal task. We further propose a shared control method based on finite-horizon non-zero-sum game theory. This method models the cooperation between the SRL and the worker to adaptively modulate the input of the SRL, thereby providing rapid and accurate assistance in tube positioning. Experimental results show that the proposed SRL can reduce primary upper-limb muscle activity (deltoid, biceps brachii, brachioradialis and flexor carpi radialis) by up to 59.73% compared with performing the task without the SRL. Moreover, compared with a method that ignores human input, the proposed control strategy achieves more accurate positioning during human-SRLs cooperation. These results demonstrate the potential of both the SRL and the control strategy for the live-line overhead foreign objects removal task.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"11 3","pages":"2634-2641"},"PeriodicalIF":5.3,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146026609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-12DOI: 10.1109/LRA.2026.3653290
Yuchen Weng;Nuo Li;Peng Yu;Qi Wang;Yongqiang Qi;Shaoze You;Jun Wang
Neural Radiance Fields (NeRF) have significantly advanced photorealistic novel view synthesis. Recently, 3D Gaussian Splatting has emerged as a promising technique with faster training and rendering speeds. However, both methods rely heavily on clear images and precise camera poses, limiting performance under motion blur. To address this, we introduce Event-Informed 3D Deblur Reconstruction with Gaussian Splatting(EiGS), a novel approach leveraging event camera data to enhance 3D Gaussian Splatting, improving sharpness and clarity in scenes affected by motion blur. Our method employs an Adaptive Deviation Estimator to learn Gaussian center shifts as the inverse of complex camera jitter, enabling simulation of motion blur during training. A motion consistency loss ensures global coherence in Gaussian displacements, while Blurriness and Event Integration Losses guide the model toward precise 3D representations. Extensive experiments demonstrate superior sharpness and real-time rendering capabilities compared to existing methods, with ablation studies validating the effectiveness of our components in robust, high-quality reconstruction for complex static scenes.
{"title":"EiGS: Event-Informed 3D Deblur Reconstruction With Gaussian Splatting","authors":"Yuchen Weng;Nuo Li;Peng Yu;Qi Wang;Yongqiang Qi;Shaoze You;Jun Wang","doi":"10.1109/LRA.2026.3653290","DOIUrl":"https://doi.org/10.1109/LRA.2026.3653290","url":null,"abstract":"Neural Radiance Fields (NeRF) have significantly advanced photorealistic novel view synthesis. Recently, 3D Gaussian Splatting has emerged as a promising technique with faster training and rendering speeds. However, both methods rely heavily on clear images and precise camera poses, limiting performance under motion blur. To address this, we introduce Event-Informed 3D Deblur Reconstruction with Gaussian Splatting(EiGS), a novel approach leveraging event camera data to enhance 3D Gaussian Splatting, improving sharpness and clarity in scenes affected by motion blur. Our method employs an Adaptive Deviation Estimator to learn Gaussian center shifts as the inverse of complex camera jitter, enabling simulation of motion blur during training. A motion consistency loss ensures global coherence in Gaussian displacements, while Blurriness and Event Integration Losses guide the model toward precise 3D representations. Extensive experiments demonstrate superior sharpness and real-time rendering capabilities compared to existing methods, with ablation studies validating the effectiveness of our components in robust, high-quality reconstruction for complex static scenes.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"11 3","pages":"2474-2481"},"PeriodicalIF":5.3,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146001871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-12DOI: 10.1109/LRA.2026.3653282
Sangmin Lee;Donghyun Choi;Jee-Hwan Ryu
Accurate global localization remains a fundamental challenge in autonomous vehicle navigation. Traditional methods typically rely on high-definition (HD) maps generated through prior traverses or utilize auxiliary sensors, such as a global positioning system (GPS). However, the above approaches are often limited by high costs, scalability issues, and decreased reliability where GPS is unavailable. Moreover, prior methods require route-specific sensor calibration and impose modality-specific constraints, which restrict generalization across different sensor types. The proposed framework addresses this limitation by leveraging a shared embedding space, learned via a weight-sharing Vision Transformer (ViT) encoder, that aligns heterogeneous sensor modalities, Light Detection and Ranging (LiDAR) images, and geo-tagged StreetView panoramas. The proposed alignment enables reliable cross-modal retrieval and coarse-level localization without HD-map priors or route-specific calibration. Further, to address the heading inconsistency between query LiDAR and StreetView, an equirectangular perspective-n-point (PnP) solver is proposed to refine the relative pose through patch-level feature correspondences. As a result, the framework achieves coarse 3-degree-of-freedom (DoF) localization from a single LiDAR scan and publicly available StreetView imagery, bridging the gap between place recognition and metric localization. Experiments demonstrate that the proposed method achieves high recall and heading accuracy, offering scalability in urban settings covered by public Street View without reliance on HD maps.
{"title":"LSV-Loc: LiDAR to StreetView Image Cross-Modal Localization","authors":"Sangmin Lee;Donghyun Choi;Jee-Hwan Ryu","doi":"10.1109/LRA.2026.3653282","DOIUrl":"https://doi.org/10.1109/LRA.2026.3653282","url":null,"abstract":"Accurate global localization remains a fundamental challenge in autonomous vehicle navigation. Traditional methods typically rely on high-definition (HD) maps generated through prior traverses or utilize auxiliary sensors, such as a global positioning system (GPS). However, the above approaches are often limited by high costs, scalability issues, and decreased reliability where GPS is unavailable. Moreover, prior methods require route-specific sensor calibration and impose modality-specific constraints, which restrict generalization across different sensor types. The proposed framework addresses this limitation by leveraging a shared embedding space, learned via a weight-sharing Vision Transformer (ViT) encoder, that aligns heterogeneous sensor modalities, Light Detection and Ranging (LiDAR) images, and geo-tagged StreetView panoramas. The proposed alignment enables reliable cross-modal retrieval and coarse-level localization without HD-map priors or route-specific calibration. Further, to address the heading inconsistency between query LiDAR and StreetView, an equirectangular perspective-n-point (PnP) solver is proposed to refine the relative pose through patch-level feature correspondences. As a result, the framework achieves coarse 3-degree-of-freedom (DoF) localization from a single LiDAR scan and publicly available StreetView imagery, bridging the gap between place recognition and metric localization. Experiments demonstrate that the proposed method achieves high recall and heading accuracy, offering scalability in urban settings covered by public Street View without reliance on HD maps.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"11 3","pages":"2514-2521"},"PeriodicalIF":5.3,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146001877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We present VISTA (Viewpoint-based Image selection with Semantic Task Awareness), an active exploration method for robots to plan informative trajectories that improve 3D map quality in areas most relevant for task completion. Given an open-vocabulary search instruction (e.g., “find a person”), VISTA enables a robot to explore its environment to search for the object of interest, while simultaneously building a real-time semantic 3D Gaussian Splatting reconstruction of the scene. The robot navigates its environment by planning receding-horizon trajectories that prioritize semantic similarity to the query and exploration of unseen regions of the environment. To evaluate trajectories, VISTA introduces a novel, efficient viewpoint-semantic coverage metric that quantifies both the geometric view diversity and task relevance in the 3D scene. On static datasets, our coverage metric outperforms state-of-the-art baselines, FisherRF and Bayes' Rays, in computation speed and reconstruction quality. In quadrotor hardware experiments, VISTA achieves 6x higher success rates in challenging maps, compared to baseline methods, while matching baseline performance in less challenging maps. Lastly, we show that VISTA is platform-agnostic by deploying it on a quadrotor drone and a Spot quadruped robot.
{"title":"VISTA: Open-Vocabulary, Task-Relevant Robot Exploration With Online Semantic Gaussian Splatting","authors":"Keiko Nagami;Timothy Chen;Javier Yu;Ola Shorinwa;Maximilian Adang;Carlyn Dougherty;Eric Cristofalo;Mac Schwager","doi":"10.1109/LRA.2026.3653276","DOIUrl":"https://doi.org/10.1109/LRA.2026.3653276","url":null,"abstract":"We present VISTA (Viewpoint-based Image selection with Semantic Task Awareness), an active exploration method for robots to plan informative trajectories that improve 3D map quality in areas most relevant for task completion. Given an open-vocabulary search instruction (e.g., “find a person”), VISTA enables a robot to explore its environment to search for the object of interest, while simultaneously building a real-time semantic 3D Gaussian Splatting reconstruction of the scene. The robot navigates its environment by planning receding-horizon trajectories that prioritize semantic similarity to the query and exploration of unseen regions of the environment. To evaluate trajectories, VISTA introduces a novel, efficient viewpoint-semantic coverage metric that quantifies both the geometric view diversity and task relevance in the 3D scene. On static datasets, our coverage metric outperforms state-of-the-art baselines, FisherRF and Bayes' Rays, in computation speed and reconstruction quality. In quadrotor hardware experiments, VISTA achieves 6x higher success rates in challenging maps, compared to baseline methods, while matching baseline performance in less challenging maps. Lastly, we show that VISTA is platform-agnostic by deploying it on a quadrotor drone and a Spot quadruped robot.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"11 3","pages":"3150-3157"},"PeriodicalIF":5.3,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146082218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-12DOI: 10.1109/LRA.2026.3653387
Yifei Liu;Kefei Wen
We present a Lie group implicit formulation for kinematically redundant parallel manipulators that yields left-trivialized extended Jacobians for the extended task variable $x=(g,rho)in text{SE}(3)times mathcal {R}$. On top of this model we design a gradient-based redundancy flow on the redundancy manifold that empirically maintains a positive manipulability margin along prescribed $text{SE}(3)$ trajectories. The framework uses right-multiplicative state updates, remains compatible with automatic differentiation, and avoids mechanism-specific analytic Jacobians; it works with either direct inverse kinematics or a numeric solver. A specialization to $text{SO}(2)^{3}$ provides computation-friendly first- and second-order steps. We validate the approach on two representative mechanisms: a (6+3)-degree-of-freedom (DoF) Stewart platform and a Spherical–Revolute platform. Across dense-coverage orientation trajectories and interactive gamepad commands, the extended Jacobian remained well conditioned while the redundancy planner ran at approximately 2 kHz in software-in-the-loop on a laptop-class CPU. The method integrates cleanly with existing kinematic stacks and is suitable for real-time deployment.
{"title":"Lie Group Implicit Kinematics for Redundant Parallel Manipulators: Left-Trivialized Extended Jacobians and Gradient-Based Online Redundancy Flows for Singularity Avoidance","authors":"Yifei Liu;Kefei Wen","doi":"10.1109/LRA.2026.3653387","DOIUrl":"https://doi.org/10.1109/LRA.2026.3653387","url":null,"abstract":"We present a Lie group implicit formulation for kinematically redundant parallel manipulators that yields left-trivialized extended Jacobians for the extended task variable <inline-formula><tex-math>$x=(g,rho)in text{SE}(3)times mathcal {R}$</tex-math></inline-formula>. On top of this model we design a gradient-based redundancy flow on the redundancy manifold that empirically maintains a positive manipulability margin along prescribed <inline-formula><tex-math>$text{SE}(3)$</tex-math></inline-formula> trajectories. The framework uses right-multiplicative state updates, remains compatible with automatic differentiation, and avoids mechanism-specific analytic Jacobians; it works with either direct inverse kinematics or a numeric solver. A specialization to <inline-formula><tex-math>$text{SO}(2)^{3}$</tex-math></inline-formula> provides computation-friendly first- and second-order steps. We validate the approach on two representative mechanisms: a (6+3)-degree-of-freedom (DoF) Stewart platform and a Spherical–Revolute platform. Across dense-coverage orientation trajectories and interactive gamepad commands, the extended Jacobian remained well conditioned while the redundancy planner ran at approximately 2 kHz in software-in-the-loop on a laptop-class CPU. The method integrates cleanly with existing kinematic stacks and is suitable for real-time deployment.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"11 2","pages":"2322-2329"},"PeriodicalIF":5.3,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146026476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-12DOI: 10.1109/LRA.2026.3653333
Evgenii Kruzhkov;Raphael Memmesheimer;Sven Behnke
Robust robot localization is an important prerequisite for navigation, but it becomes challenging when the map and robot measurements are obtained from different sensors. Prior methods are often tailored to specific environments, relying on closed-set semantics or fine-tuned features. In this work, we extend Monte Carlo Localization with vision-language features, allowing OMCL to robustly compute the likelihood of visual observations given a camera pose and a 3D map created from posed RGB-D images or aligned point clouds These open-vocabulary features enable us to associate observations and map elements from different modalities, and to natively initialize global localization through natural language descriptions of nearby objects. We evaluate our approach using Matterport3D and Replica for indoor scenes and demonstrate generalization on SemanticKITTI for outdoor scenes.
{"title":"OMCL: Open-Vocabulary Monte Carlo Localization","authors":"Evgenii Kruzhkov;Raphael Memmesheimer;Sven Behnke","doi":"10.1109/LRA.2026.3653333","DOIUrl":"https://doi.org/10.1109/LRA.2026.3653333","url":null,"abstract":"Robust robot localization is an important prerequisite for navigation, but it becomes challenging when the map and robot measurements are obtained from different sensors. Prior methods are often tailored to specific environments, relying on closed-set semantics or fine-tuned features. In this work, we extend Monte Carlo Localization with vision-language features, allowing OMCL to robustly compute the likelihood of visual observations given a camera pose and a 3D map created from posed RGB-D images or aligned point clouds These open-vocabulary features enable us to associate observations and map elements from different modalities, and to natively initialize global localization through natural language descriptions of nearby objects. We evaluate our approach using Matterport3D and Replica for indoor scenes and demonstrate generalization on SemanticKITTI for outdoor scenes.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"11 3","pages":"2698-2705"},"PeriodicalIF":5.3,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146026566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-12DOI: 10.1109/LRA.2026.3653274
Shifei Duan;Francesco De Pace;Minas Liarokapis
Over the last decade, robot telemanipulation has been increasingly utilized in various applications so as to replace human operators in hazardous or remote environments. However, the telemanipulation of robots remains a challenging task, especially when high precision and dexterity are required. Peg-in-hole tasks are considered some of the most challenging tasks as they require high precision. To facilitate the execution of such complex tasks, this paper introduces and compares different semi-autonomous strategies for virtual reality (VR) based remote robotic telemanipulation of a robot arm. Four modalities of robotic telemanipulation with varying degrees of autonomy are presented and thoroughly compared. Finally, the comparative user study highlights the differences between the proposed modalities and showcases the advantages and disadvantages of each approach in detail.
{"title":"Comparing Semi-Autonomous Strategies for Virtual Reality Based Remote Robotic Telemanipulation: On Peg-In-Hole Tasks","authors":"Shifei Duan;Francesco De Pace;Minas Liarokapis","doi":"10.1109/LRA.2026.3653274","DOIUrl":"https://doi.org/10.1109/LRA.2026.3653274","url":null,"abstract":"Over the last decade, robot telemanipulation has been increasingly utilized in various applications so as to replace human operators in hazardous or remote environments. However, the telemanipulation of robots remains a challenging task, especially when high precision and dexterity are required. Peg-in-hole tasks are considered some of the most challenging tasks as they require high precision. To facilitate the execution of such complex tasks, this paper introduces and compares different semi-autonomous strategies for virtual reality (VR) based remote robotic telemanipulation of a robot arm. Four modalities of robotic telemanipulation with varying degrees of autonomy are presented and thoroughly compared. Finally, the comparative user study highlights the differences between the proposed modalities and showcases the advantages and disadvantages of each approach in detail.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"11 3","pages":"2562-2569"},"PeriodicalIF":5.3,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146026593","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-12DOI: 10.1109/LRA.2026.3653283
Hongbo Duan;Shangyi Luo;Zhiyuan Deng;Yanbo Chen;Yuanhao Chiang;Yi Liu;Fangming Liu;Xueqian Wang
Autonomous language-guided navigation in large-scale outdoor environments remains a key challenge in mobile robotics, due to difficulties in semantic reasoning, dynamic conditions, and long-term stability. We propose CausalNav, the first scene graph-based semantic navigation framework tailored for dynamic outdoor environments. We construct a multi-level semantic scene graph using LLMs, referred to as the Embodied Graph, that hierarchically integrates coarse-grained map data with fine-grained object entities. The constructed graph serves as a retrievable knowledge base for Retrieval-Augmented Generation (RAG), enabling semantic navigation and long-range planning under open-vocabulary queries. By fusing real-time perception with offline map data, the Embodied Graph supports robust navigation across varying spatial granularities in dynamic outdoor environments. Dynamic objects are explicitly handled in both the scene graph construction and hierarchical planning modules. The Embodied Graph is continuously updated within a temporal window to reflect environmental changes and support real-time semantic navigation. Extensive experiments in both simulation and real-world settings demonstrate superior robustness and efficiency.
{"title":"CAUSALNAV: A Long-Term Embodied Navigation System for Autonomous Mobile Robots in Dynamic Outdoor Scenarios","authors":"Hongbo Duan;Shangyi Luo;Zhiyuan Deng;Yanbo Chen;Yuanhao Chiang;Yi Liu;Fangming Liu;Xueqian Wang","doi":"10.1109/LRA.2026.3653283","DOIUrl":"https://doi.org/10.1109/LRA.2026.3653283","url":null,"abstract":"Autonomous language-guided navigation in large-scale outdoor environments remains a key challenge in mobile robotics, due to difficulties in semantic reasoning, dynamic conditions, and long-term stability. We propose CausalNav, the first scene graph-based semantic navigation framework tailored for dynamic outdoor environments. We construct a multi-level semantic scene graph using LLMs, referred to as the <italic>Embodied Graph</i>, that hierarchically integrates coarse-grained map data with fine-grained object entities. The constructed graph serves as a retrievable knowledge base for Retrieval-Augmented Generation (RAG), enabling semantic navigation and long-range planning under open-vocabulary queries. By fusing real-time perception with offline map data, the <italic>Embodied Graph</i> supports robust navigation across varying spatial granularities in dynamic outdoor environments. Dynamic objects are explicitly handled in both the scene graph construction and hierarchical planning modules. The <italic>Embodied Graph</i> is continuously updated within a temporal window to reflect environmental changes and support real-time semantic navigation. Extensive experiments in both simulation and real-world settings demonstrate superior robustness and efficiency.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"11 3","pages":"3198-3205"},"PeriodicalIF":5.3,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146082070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-12DOI: 10.1109/LRA.2026.3653369
Jun Ye;Qiu Fang;Shi Wang;Changqing Gao;Weixing Peng;Yaonan Wang
Three-dimensional (3D) measurement is essential for quality control in manufacturing, especially for components with complex geometries. Conventional viewpoint planning methods based on fixed spherical coordinates often fail to capture intricate surfaces, leading to suboptimal reconstructions. To address this, we propose a multi-degree-of-freedom reinforcement learning (RL) framework for continuous viewpoint planning in robotic 3D measurement. The framework introduces three key innovations: (1) a voxel-based state representation with dynamic ray-traced coverage updates; (2) a dual-objective reward that enforces precise overlap control while minimizing the number of viewpoints; and (3) integration of robotic kinematics to guarantee physically feasible scanning. Experiments on industrial parts demonstrate that our method outperforms existing techniques in overlap regulation and planning efficiency, enabling more accurate and autonomous 3D reconstruction for complex geometries.
{"title":"Efficient Robotic 3D Measurement Through Multi-DoF Reinforcement Learning for Continuous Viewpoint Planning","authors":"Jun Ye;Qiu Fang;Shi Wang;Changqing Gao;Weixing Peng;Yaonan Wang","doi":"10.1109/LRA.2026.3653369","DOIUrl":"https://doi.org/10.1109/LRA.2026.3653369","url":null,"abstract":"Three-dimensional (3D) measurement is essential for quality control in manufacturing, especially for components with complex geometries. Conventional viewpoint planning methods based on fixed spherical coordinates often fail to capture intricate surfaces, leading to suboptimal reconstructions. To address this, we propose a multi-degree-of-freedom reinforcement learning (RL) framework for continuous viewpoint planning in robotic 3D measurement. The framework introduces three key innovations: (1) a voxel-based state representation with dynamic ray-traced coverage updates; (2) a dual-objective reward that enforces precise overlap control while minimizing the number of viewpoints; and (3) integration of robotic kinematics to guarantee physically feasible scanning. Experiments on industrial parts demonstrate that our method outperforms existing techniques in overlap regulation and planning efficiency, enabling more accurate and autonomous 3D reconstruction for complex geometries.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"11 3","pages":"2618-2625"},"PeriodicalIF":5.3,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146026570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}