Pub Date : 2026-01-12DOI: 10.1109/LRA.2026.3653290
Yuchen Weng;Nuo Li;Peng Yu;Qi Wang;Yongqiang Qi;Shaoze You;Jun Wang
Neural Radiance Fields (NeRF) have significantly advanced photorealistic novel view synthesis. Recently, 3D Gaussian Splatting has emerged as a promising technique with faster training and rendering speeds. However, both methods rely heavily on clear images and precise camera poses, limiting performance under motion blur. To address this, we introduce Event-Informed 3D Deblur Reconstruction with Gaussian Splatting(EiGS), a novel approach leveraging event camera data to enhance 3D Gaussian Splatting, improving sharpness and clarity in scenes affected by motion blur. Our method employs an Adaptive Deviation Estimator to learn Gaussian center shifts as the inverse of complex camera jitter, enabling simulation of motion blur during training. A motion consistency loss ensures global coherence in Gaussian displacements, while Blurriness and Event Integration Losses guide the model toward precise 3D representations. Extensive experiments demonstrate superior sharpness and real-time rendering capabilities compared to existing methods, with ablation studies validating the effectiveness of our components in robust, high-quality reconstruction for complex static scenes.
{"title":"EiGS: Event-Informed 3D Deblur Reconstruction With Gaussian Splatting","authors":"Yuchen Weng;Nuo Li;Peng Yu;Qi Wang;Yongqiang Qi;Shaoze You;Jun Wang","doi":"10.1109/LRA.2026.3653290","DOIUrl":"https://doi.org/10.1109/LRA.2026.3653290","url":null,"abstract":"Neural Radiance Fields (NeRF) have significantly advanced photorealistic novel view synthesis. Recently, 3D Gaussian Splatting has emerged as a promising technique with faster training and rendering speeds. However, both methods rely heavily on clear images and precise camera poses, limiting performance under motion blur. To address this, we introduce Event-Informed 3D Deblur Reconstruction with Gaussian Splatting(EiGS), a novel approach leveraging event camera data to enhance 3D Gaussian Splatting, improving sharpness and clarity in scenes affected by motion blur. Our method employs an Adaptive Deviation Estimator to learn Gaussian center shifts as the inverse of complex camera jitter, enabling simulation of motion blur during training. A motion consistency loss ensures global coherence in Gaussian displacements, while Blurriness and Event Integration Losses guide the model toward precise 3D representations. Extensive experiments demonstrate superior sharpness and real-time rendering capabilities compared to existing methods, with ablation studies validating the effectiveness of our components in robust, high-quality reconstruction for complex static scenes.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"11 3","pages":"2474-2481"},"PeriodicalIF":5.3,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146001871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-12DOI: 10.1109/LRA.2026.3653282
Sangmin Lee;Donghyun Choi;Jee-Hwan Ryu
Accurate global localization remains a fundamental challenge in autonomous vehicle navigation. Traditional methods typically rely on high-definition (HD) maps generated through prior traverses or utilize auxiliary sensors, such as a global positioning system (GPS). However, the above approaches are often limited by high costs, scalability issues, and decreased reliability where GPS is unavailable. Moreover, prior methods require route-specific sensor calibration and impose modality-specific constraints, which restrict generalization across different sensor types. The proposed framework addresses this limitation by leveraging a shared embedding space, learned via a weight-sharing Vision Transformer (ViT) encoder, that aligns heterogeneous sensor modalities, Light Detection and Ranging (LiDAR) images, and geo-tagged StreetView panoramas. The proposed alignment enables reliable cross-modal retrieval and coarse-level localization without HD-map priors or route-specific calibration. Further, to address the heading inconsistency between query LiDAR and StreetView, an equirectangular perspective-n-point (PnP) solver is proposed to refine the relative pose through patch-level feature correspondences. As a result, the framework achieves coarse 3-degree-of-freedom (DoF) localization from a single LiDAR scan and publicly available StreetView imagery, bridging the gap between place recognition and metric localization. Experiments demonstrate that the proposed method achieves high recall and heading accuracy, offering scalability in urban settings covered by public Street View without reliance on HD maps.
{"title":"LSV-Loc: LiDAR to StreetView Image Cross-Modal Localization","authors":"Sangmin Lee;Donghyun Choi;Jee-Hwan Ryu","doi":"10.1109/LRA.2026.3653282","DOIUrl":"https://doi.org/10.1109/LRA.2026.3653282","url":null,"abstract":"Accurate global localization remains a fundamental challenge in autonomous vehicle navigation. Traditional methods typically rely on high-definition (HD) maps generated through prior traverses or utilize auxiliary sensors, such as a global positioning system (GPS). However, the above approaches are often limited by high costs, scalability issues, and decreased reliability where GPS is unavailable. Moreover, prior methods require route-specific sensor calibration and impose modality-specific constraints, which restrict generalization across different sensor types. The proposed framework addresses this limitation by leveraging a shared embedding space, learned via a weight-sharing Vision Transformer (ViT) encoder, that aligns heterogeneous sensor modalities, Light Detection and Ranging (LiDAR) images, and geo-tagged StreetView panoramas. The proposed alignment enables reliable cross-modal retrieval and coarse-level localization without HD-map priors or route-specific calibration. Further, to address the heading inconsistency between query LiDAR and StreetView, an equirectangular perspective-n-point (PnP) solver is proposed to refine the relative pose through patch-level feature correspondences. As a result, the framework achieves coarse 3-degree-of-freedom (DoF) localization from a single LiDAR scan and publicly available StreetView imagery, bridging the gap between place recognition and metric localization. Experiments demonstrate that the proposed method achieves high recall and heading accuracy, offering scalability in urban settings covered by public Street View without reliance on HD maps.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"11 3","pages":"2514-2521"},"PeriodicalIF":5.3,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146001877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-12DOI: 10.1109/LRA.2026.3653387
Yifei Liu;Kefei Wen
We present a Lie group implicit formulation for kinematically redundant parallel manipulators that yields left-trivialized extended Jacobians for the extended task variable $x=(g,rho)in text{SE}(3)times mathcal {R}$. On top of this model we design a gradient-based redundancy flow on the redundancy manifold that empirically maintains a positive manipulability margin along prescribed $text{SE}(3)$ trajectories. The framework uses right-multiplicative state updates, remains compatible with automatic differentiation, and avoids mechanism-specific analytic Jacobians; it works with either direct inverse kinematics or a numeric solver. A specialization to $text{SO}(2)^{3}$ provides computation-friendly first- and second-order steps. We validate the approach on two representative mechanisms: a (6+3)-degree-of-freedom (DoF) Stewart platform and a Spherical–Revolute platform. Across dense-coverage orientation trajectories and interactive gamepad commands, the extended Jacobian remained well conditioned while the redundancy planner ran at approximately 2 kHz in software-in-the-loop on a laptop-class CPU. The method integrates cleanly with existing kinematic stacks and is suitable for real-time deployment.
{"title":"Lie Group Implicit Kinematics for Redundant Parallel Manipulators: Left-Trivialized Extended Jacobians and Gradient-Based Online Redundancy Flows for Singularity Avoidance","authors":"Yifei Liu;Kefei Wen","doi":"10.1109/LRA.2026.3653387","DOIUrl":"https://doi.org/10.1109/LRA.2026.3653387","url":null,"abstract":"We present a Lie group implicit formulation for kinematically redundant parallel manipulators that yields left-trivialized extended Jacobians for the extended task variable <inline-formula><tex-math>$x=(g,rho)in text{SE}(3)times mathcal {R}$</tex-math></inline-formula>. On top of this model we design a gradient-based redundancy flow on the redundancy manifold that empirically maintains a positive manipulability margin along prescribed <inline-formula><tex-math>$text{SE}(3)$</tex-math></inline-formula> trajectories. The framework uses right-multiplicative state updates, remains compatible with automatic differentiation, and avoids mechanism-specific analytic Jacobians; it works with either direct inverse kinematics or a numeric solver. A specialization to <inline-formula><tex-math>$text{SO}(2)^{3}$</tex-math></inline-formula> provides computation-friendly first- and second-order steps. We validate the approach on two representative mechanisms: a (6+3)-degree-of-freedom (DoF) Stewart platform and a Spherical–Revolute platform. Across dense-coverage orientation trajectories and interactive gamepad commands, the extended Jacobian remained well conditioned while the redundancy planner ran at approximately 2 kHz in software-in-the-loop on a laptop-class CPU. The method integrates cleanly with existing kinematic stacks and is suitable for real-time deployment.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"11 2","pages":"2322-2329"},"PeriodicalIF":5.3,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146026476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-12DOI: 10.1109/LRA.2026.3653333
Evgenii Kruzhkov;Raphael Memmesheimer;Sven Behnke
Robust robot localization is an important prerequisite for navigation, but it becomes challenging when the map and robot measurements are obtained from different sensors. Prior methods are often tailored to specific environments, relying on closed-set semantics or fine-tuned features. In this work, we extend Monte Carlo Localization with vision-language features, allowing OMCL to robustly compute the likelihood of visual observations given a camera pose and a 3D map created from posed RGB-D images or aligned point clouds These open-vocabulary features enable us to associate observations and map elements from different modalities, and to natively initialize global localization through natural language descriptions of nearby objects. We evaluate our approach using Matterport3D and Replica for indoor scenes and demonstrate generalization on SemanticKITTI for outdoor scenes.
{"title":"OMCL: Open-Vocabulary Monte Carlo Localization","authors":"Evgenii Kruzhkov;Raphael Memmesheimer;Sven Behnke","doi":"10.1109/LRA.2026.3653333","DOIUrl":"https://doi.org/10.1109/LRA.2026.3653333","url":null,"abstract":"Robust robot localization is an important prerequisite for navigation, but it becomes challenging when the map and robot measurements are obtained from different sensors. Prior methods are often tailored to specific environments, relying on closed-set semantics or fine-tuned features. In this work, we extend Monte Carlo Localization with vision-language features, allowing OMCL to robustly compute the likelihood of visual observations given a camera pose and a 3D map created from posed RGB-D images or aligned point clouds These open-vocabulary features enable us to associate observations and map elements from different modalities, and to natively initialize global localization through natural language descriptions of nearby objects. We evaluate our approach using Matterport3D and Replica for indoor scenes and demonstrate generalization on SemanticKITTI for outdoor scenes.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"11 3","pages":"2698-2705"},"PeriodicalIF":5.3,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146026566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-12DOI: 10.1109/LRA.2026.3653274
Shifei Duan;Francesco De Pace;Minas Liarokapis
Over the last decade, robot telemanipulation has been increasingly utilized in various applications so as to replace human operators in hazardous or remote environments. However, the telemanipulation of robots remains a challenging task, especially when high precision and dexterity are required. Peg-in-hole tasks are considered some of the most challenging tasks as they require high precision. To facilitate the execution of such complex tasks, this paper introduces and compares different semi-autonomous strategies for virtual reality (VR) based remote robotic telemanipulation of a robot arm. Four modalities of robotic telemanipulation with varying degrees of autonomy are presented and thoroughly compared. Finally, the comparative user study highlights the differences between the proposed modalities and showcases the advantages and disadvantages of each approach in detail.
{"title":"Comparing Semi-Autonomous Strategies for Virtual Reality Based Remote Robotic Telemanipulation: On Peg-In-Hole Tasks","authors":"Shifei Duan;Francesco De Pace;Minas Liarokapis","doi":"10.1109/LRA.2026.3653274","DOIUrl":"https://doi.org/10.1109/LRA.2026.3653274","url":null,"abstract":"Over the last decade, robot telemanipulation has been increasingly utilized in various applications so as to replace human operators in hazardous or remote environments. However, the telemanipulation of robots remains a challenging task, especially when high precision and dexterity are required. Peg-in-hole tasks are considered some of the most challenging tasks as they require high precision. To facilitate the execution of such complex tasks, this paper introduces and compares different semi-autonomous strategies for virtual reality (VR) based remote robotic telemanipulation of a robot arm. Four modalities of robotic telemanipulation with varying degrees of autonomy are presented and thoroughly compared. Finally, the comparative user study highlights the differences between the proposed modalities and showcases the advantages and disadvantages of each approach in detail.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"11 3","pages":"2562-2569"},"PeriodicalIF":5.3,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146026593","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We present VISTA (Viewpoint-based Image selection with Semantic Task Awareness), an active exploration method for robots to plan informative trajectories that improve 3D map quality in areas most relevant for task completion. Given an open-vocabulary search instruction (e.g., “find a person”), VISTA enables a robot to explore its environment to search for the object of interest, while simultaneously building a real-time semantic 3D Gaussian Splatting reconstruction of the scene. The robot navigates its environment by planning receding-horizon trajectories that prioritize semantic similarity to the query and exploration of unseen regions of the environment. To evaluate trajectories, VISTA introduces a novel, efficient viewpoint-semantic coverage metric that quantifies both the geometric view diversity and task relevance in the 3D scene. On static datasets, our coverage metric outperforms state-of-the-art baselines, FisherRF and Bayes' Rays, in computation speed and reconstruction quality. In quadrotor hardware experiments, VISTA achieves 6x higher success rates in challenging maps, compared to baseline methods, while matching baseline performance in less challenging maps. Lastly, we show that VISTA is platform-agnostic by deploying it on a quadrotor drone and a Spot quadruped robot.
{"title":"VISTA: Open-Vocabulary, Task-Relevant Robot Exploration With Online Semantic Gaussian Splatting","authors":"Keiko Nagami;Timothy Chen;Javier Yu;Ola Shorinwa;Maximilian Adang;Carlyn Dougherty;Eric Cristofalo;Mac Schwager","doi":"10.1109/LRA.2026.3653276","DOIUrl":"https://doi.org/10.1109/LRA.2026.3653276","url":null,"abstract":"We present VISTA (Viewpoint-based Image selection with Semantic Task Awareness), an active exploration method for robots to plan informative trajectories that improve 3D map quality in areas most relevant for task completion. Given an open-vocabulary search instruction (e.g., “find a person”), VISTA enables a robot to explore its environment to search for the object of interest, while simultaneously building a real-time semantic 3D Gaussian Splatting reconstruction of the scene. The robot navigates its environment by planning receding-horizon trajectories that prioritize semantic similarity to the query and exploration of unseen regions of the environment. To evaluate trajectories, VISTA introduces a novel, efficient viewpoint-semantic coverage metric that quantifies both the geometric view diversity and task relevance in the 3D scene. On static datasets, our coverage metric outperforms state-of-the-art baselines, FisherRF and Bayes' Rays, in computation speed and reconstruction quality. In quadrotor hardware experiments, VISTA achieves 6x higher success rates in challenging maps, compared to baseline methods, while matching baseline performance in less challenging maps. Lastly, we show that VISTA is platform-agnostic by deploying it on a quadrotor drone and a Spot quadruped robot.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"11 3","pages":"3150-3157"},"PeriodicalIF":5.3,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146082218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-12DOI: 10.1109/LRA.2026.3653283
Hongbo Duan;Shangyi Luo;Zhiyuan Deng;Yanbo Chen;Yuanhao Chiang;Yi Liu;Fangming Liu;Xueqian Wang
Autonomous language-guided navigation in large-scale outdoor environments remains a key challenge in mobile robotics, due to difficulties in semantic reasoning, dynamic conditions, and long-term stability. We propose CausalNav, the first scene graph-based semantic navigation framework tailored for dynamic outdoor environments. We construct a multi-level semantic scene graph using LLMs, referred to as the Embodied Graph, that hierarchically integrates coarse-grained map data with fine-grained object entities. The constructed graph serves as a retrievable knowledge base for Retrieval-Augmented Generation (RAG), enabling semantic navigation and long-range planning under open-vocabulary queries. By fusing real-time perception with offline map data, the Embodied Graph supports robust navigation across varying spatial granularities in dynamic outdoor environments. Dynamic objects are explicitly handled in both the scene graph construction and hierarchical planning modules. The Embodied Graph is continuously updated within a temporal window to reflect environmental changes and support real-time semantic navigation. Extensive experiments in both simulation and real-world settings demonstrate superior robustness and efficiency.
{"title":"CAUSALNAV: A Long-Term Embodied Navigation System for Autonomous Mobile Robots in Dynamic Outdoor Scenarios","authors":"Hongbo Duan;Shangyi Luo;Zhiyuan Deng;Yanbo Chen;Yuanhao Chiang;Yi Liu;Fangming Liu;Xueqian Wang","doi":"10.1109/LRA.2026.3653283","DOIUrl":"https://doi.org/10.1109/LRA.2026.3653283","url":null,"abstract":"Autonomous language-guided navigation in large-scale outdoor environments remains a key challenge in mobile robotics, due to difficulties in semantic reasoning, dynamic conditions, and long-term stability. We propose CausalNav, the first scene graph-based semantic navigation framework tailored for dynamic outdoor environments. We construct a multi-level semantic scene graph using LLMs, referred to as the <italic>Embodied Graph</i>, that hierarchically integrates coarse-grained map data with fine-grained object entities. The constructed graph serves as a retrievable knowledge base for Retrieval-Augmented Generation (RAG), enabling semantic navigation and long-range planning under open-vocabulary queries. By fusing real-time perception with offline map data, the <italic>Embodied Graph</i> supports robust navigation across varying spatial granularities in dynamic outdoor environments. Dynamic objects are explicitly handled in both the scene graph construction and hierarchical planning modules. The <italic>Embodied Graph</i> is continuously updated within a temporal window to reflect environmental changes and support real-time semantic navigation. Extensive experiments in both simulation and real-world settings demonstrate superior robustness and efficiency.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"11 3","pages":"3198-3205"},"PeriodicalIF":5.3,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146082070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-12DOI: 10.1109/LRA.2026.3653369
Jun Ye;Qiu Fang;Shi Wang;Changqing Gao;Weixing Peng;Yaonan Wang
Three-dimensional (3D) measurement is essential for quality control in manufacturing, especially for components with complex geometries. Conventional viewpoint planning methods based on fixed spherical coordinates often fail to capture intricate surfaces, leading to suboptimal reconstructions. To address this, we propose a multi-degree-of-freedom reinforcement learning (RL) framework for continuous viewpoint planning in robotic 3D measurement. The framework introduces three key innovations: (1) a voxel-based state representation with dynamic ray-traced coverage updates; (2) a dual-objective reward that enforces precise overlap control while minimizing the number of viewpoints; and (3) integration of robotic kinematics to guarantee physically feasible scanning. Experiments on industrial parts demonstrate that our method outperforms existing techniques in overlap regulation and planning efficiency, enabling more accurate and autonomous 3D reconstruction for complex geometries.
{"title":"Efficient Robotic 3D Measurement Through Multi-DoF Reinforcement Learning for Continuous Viewpoint Planning","authors":"Jun Ye;Qiu Fang;Shi Wang;Changqing Gao;Weixing Peng;Yaonan Wang","doi":"10.1109/LRA.2026.3653369","DOIUrl":"https://doi.org/10.1109/LRA.2026.3653369","url":null,"abstract":"Three-dimensional (3D) measurement is essential for quality control in manufacturing, especially for components with complex geometries. Conventional viewpoint planning methods based on fixed spherical coordinates often fail to capture intricate surfaces, leading to suboptimal reconstructions. To address this, we propose a multi-degree-of-freedom reinforcement learning (RL) framework for continuous viewpoint planning in robotic 3D measurement. The framework introduces three key innovations: (1) a voxel-based state representation with dynamic ray-traced coverage updates; (2) a dual-objective reward that enforces precise overlap control while minimizing the number of viewpoints; and (3) integration of robotic kinematics to guarantee physically feasible scanning. Experiments on industrial parts demonstrate that our method outperforms existing techniques in overlap regulation and planning efficiency, enabling more accurate and autonomous 3D reconstruction for complex geometries.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"11 3","pages":"2618-2625"},"PeriodicalIF":5.3,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146026570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-12DOI: 10.1109/LRA.2026.3653329
Kejun Li;Zachary Olkin;Yisong Yue;Aaron D. Ames
Reinforcement learning (RL) has shown promise in generating robust locomotion policies for bipedal robots, but often suffers from tedious reward design and sensitivity to poorly shaped objectives. In this work, we propose a structured reward shaping framework that leverages model-based trajectory generation and control Lyapunov functions (CLFs) to guide policy learning. We explore two model-based planners for generating reference trajectories: a reduced-order linear inverted pendulum (LIP) model for velocity-conditioned motion planning, and a precomputed gait library based on hybrid zero dynamics (HZD) using full-order dynamics. These planners define desired end-effector and joint trajectories, which are used to construct CLF-based rewards that penalize tracking error and encourage rapid convergence. This formulation provides meaningful intermediate rewards, and is straightforward to implement once a reference is available. Both the reference trajectories and CLF shaping are used only during training, resulting in a lightweight policy at deployment. We validate our method both in simulation and through extensive real-world experiments on a Unitree G1 robot. CLF-RL demonstrates significantly improved robustness relative to the baseline RL policy and better performance than a classic tracking reward RL formulation.
{"title":"CLF-RL: Control Lyapunov Function Guided Reinforcement Learning","authors":"Kejun Li;Zachary Olkin;Yisong Yue;Aaron D. Ames","doi":"10.1109/LRA.2026.3653329","DOIUrl":"https://doi.org/10.1109/LRA.2026.3653329","url":null,"abstract":"Reinforcement learning (RL) has shown promise in generating robust locomotion policies for bipedal robots, but often suffers from tedious reward design and sensitivity to poorly shaped objectives. In this work, we propose a structured reward shaping framework that leverages model-based trajectory generation and control Lyapunov functions (CLFs) to guide policy learning. We explore two model-based planners for generating reference trajectories: a reduced-order linear inverted pendulum (LIP) model for velocity-conditioned motion planning, and a precomputed gait library based on hybrid zero dynamics (HZD) using full-order dynamics. These planners define desired end-effector and joint trajectories, which are used to construct CLF-based rewards that penalize tracking error and encourage rapid convergence. This formulation provides meaningful intermediate rewards, and is straightforward to implement once a reference is available. Both the reference trajectories and CLF shaping are used only during training, resulting in a lightweight policy at deployment. We validate our method both in simulation and through extensive real-world experiments on a Unitree G1 robot. CLF-RL demonstrates significantly improved robustness relative to the baseline RL policy and better performance than a classic tracking reward RL formulation.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"11 3","pages":"3230-3237"},"PeriodicalIF":5.3,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146082065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this study, we address the problem of open-vocabulary mobile manipulation, where a robot is required to carry a wide range of objects to receptacles based on free-form natural language instructions. This task is challenging, as it involves understanding visual semantics and the affordance of manipulation actions. To tackle these challenges, we propose Affordance RAG, a zero-shot hierarchical multimodal retrieval framework that constructs Affordance-Aware Embodied Memory from pre-explored images. The model retrieves candidate targets based on regional and visual semantics and reranks them with affordance scores, allowing the robot to identify manipulation options that are likely to be executable in real-world environments. Our method outperformed existing approaches in retrieval performance for mobile manipulation instruction in large-scale indoor environments. Furthermore, in real-world experiments where the robot performed mobile manipulation in indoor environments based on free-form instructions, the proposed method achieved a task success rate of 85%, outperforming existing methods in both retrieval performance and overall task success.
{"title":"Affordance RAG: Hierarchical Multimodal Retrieval With Affordance-Aware Embodied Memory for Mobile Manipulation","authors":"Ryosuke Korekata;Quanting Xie;Yonatan Bisk;Komei Sugiura","doi":"10.1109/LRA.2026.3653281","DOIUrl":"https://doi.org/10.1109/LRA.2026.3653281","url":null,"abstract":"In this study, we address the problem of open-vocabulary mobile manipulation, where a robot is required to carry a wide range of objects to receptacles based on free-form natural language instructions. This task is challenging, as it involves understanding visual semantics and the affordance of manipulation actions. To tackle these challenges, we propose Affordance RAG, a zero-shot hierarchical multimodal retrieval framework that constructs Affordance-Aware Embodied Memory from pre-explored images. The model retrieves candidate targets based on regional and visual semantics and reranks them with affordance scores, allowing the robot to identify manipulation options that are likely to be executable in real-world environments. Our method outperformed existing approaches in retrieval performance for mobile manipulation instruction in large-scale indoor environments. Furthermore, in real-world experiments where the robot performed mobile manipulation in indoor environments based on free-form instructions, the proposed method achieved a task success rate of 85%, outperforming existing methods in both retrieval performance and overall task success.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"11 3","pages":"2706-2713"},"PeriodicalIF":5.3,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146026567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}