首页 > 最新文献

IEEE Robotics and Automation Letters最新文献

英文 中文
EiGS: Event-Informed 3D Deblur Reconstruction With Gaussian Splatting eig:事件通知的3D去模糊重建与高斯飞溅
IF 5.3 2区 计算机科学 Q2 ROBOTICS Pub Date : 2026-01-12 DOI: 10.1109/LRA.2026.3653290
Yuchen Weng;Nuo Li;Peng Yu;Qi Wang;Yongqiang Qi;Shaoze You;Jun Wang
Neural Radiance Fields (NeRF) have significantly advanced photorealistic novel view synthesis. Recently, 3D Gaussian Splatting has emerged as a promising technique with faster training and rendering speeds. However, both methods rely heavily on clear images and precise camera poses, limiting performance under motion blur. To address this, we introduce Event-Informed 3D Deblur Reconstruction with Gaussian Splatting(EiGS), a novel approach leveraging event camera data to enhance 3D Gaussian Splatting, improving sharpness and clarity in scenes affected by motion blur. Our method employs an Adaptive Deviation Estimator to learn Gaussian center shifts as the inverse of complex camera jitter, enabling simulation of motion blur during training. A motion consistency loss ensures global coherence in Gaussian displacements, while Blurriness and Event Integration Losses guide the model toward precise 3D representations. Extensive experiments demonstrate superior sharpness and real-time rendering capabilities compared to existing methods, with ablation studies validating the effectiveness of our components in robust, high-quality reconstruction for complex static scenes.
神经辐射场(Neural Radiance Fields, NeRF)在真实感新视角合成方面具有显著的进步。最近,3D高斯飞溅已经成为一种有前途的技术,具有更快的训练和渲染速度。然而,这两种方法都严重依赖于清晰的图像和精确的相机姿势,限制了运动模糊下的性能。为了解决这个问题,我们引入了带有高斯飞溅的事件通知3D去模糊重建(EiGS),这是一种利用事件相机数据增强3D高斯飞溅的新方法,提高了受运动模糊影响场景的清晰度和清晰度。我们的方法采用自适应偏差估计器来学习高斯中心偏移作为复杂相机抖动的逆,从而在训练过程中模拟运动模糊。运动一致性损失确保了高斯位移的全局一致性,而模糊和事件集成损失指导模型走向精确的3D表示。与现有方法相比,大量的实验证明了优越的清晰度和实时渲染能力,烧蚀研究验证了我们的组件在复杂静态场景的鲁棒性,高质量重建中的有效性。
{"title":"EiGS: Event-Informed 3D Deblur Reconstruction With Gaussian Splatting","authors":"Yuchen Weng;Nuo Li;Peng Yu;Qi Wang;Yongqiang Qi;Shaoze You;Jun Wang","doi":"10.1109/LRA.2026.3653290","DOIUrl":"https://doi.org/10.1109/LRA.2026.3653290","url":null,"abstract":"Neural Radiance Fields (NeRF) have significantly advanced photorealistic novel view synthesis. Recently, 3D Gaussian Splatting has emerged as a promising technique with faster training and rendering speeds. However, both methods rely heavily on clear images and precise camera poses, limiting performance under motion blur. To address this, we introduce Event-Informed 3D Deblur Reconstruction with Gaussian Splatting(EiGS), a novel approach leveraging event camera data to enhance 3D Gaussian Splatting, improving sharpness and clarity in scenes affected by motion blur. Our method employs an Adaptive Deviation Estimator to learn Gaussian center shifts as the inverse of complex camera jitter, enabling simulation of motion blur during training. A motion consistency loss ensures global coherence in Gaussian displacements, while Blurriness and Event Integration Losses guide the model toward precise 3D representations. Extensive experiments demonstrate superior sharpness and real-time rendering capabilities compared to existing methods, with ablation studies validating the effectiveness of our components in robust, high-quality reconstruction for complex static scenes.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"11 3","pages":"2474-2481"},"PeriodicalIF":5.3,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146001871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LSV-Loc: LiDAR to StreetView Image Cross-Modal Localization lv - loc:激光雷达到街景图像的跨模态定位
IF 5.3 2区 计算机科学 Q2 ROBOTICS Pub Date : 2026-01-12 DOI: 10.1109/LRA.2026.3653282
Sangmin Lee;Donghyun Choi;Jee-Hwan Ryu
Accurate global localization remains a fundamental challenge in autonomous vehicle navigation. Traditional methods typically rely on high-definition (HD) maps generated through prior traverses or utilize auxiliary sensors, such as a global positioning system (GPS). However, the above approaches are often limited by high costs, scalability issues, and decreased reliability where GPS is unavailable. Moreover, prior methods require route-specific sensor calibration and impose modality-specific constraints, which restrict generalization across different sensor types. The proposed framework addresses this limitation by leveraging a shared embedding space, learned via a weight-sharing Vision Transformer (ViT) encoder, that aligns heterogeneous sensor modalities, Light Detection and Ranging (LiDAR) images, and geo-tagged StreetView panoramas. The proposed alignment enables reliable cross-modal retrieval and coarse-level localization without HD-map priors or route-specific calibration. Further, to address the heading inconsistency between query LiDAR and StreetView, an equirectangular perspective-n-point (PnP) solver is proposed to refine the relative pose through patch-level feature correspondences. As a result, the framework achieves coarse 3-degree-of-freedom (DoF) localization from a single LiDAR scan and publicly available StreetView imagery, bridging the gap between place recognition and metric localization. Experiments demonstrate that the proposed method achieves high recall and heading accuracy, offering scalability in urban settings covered by public Street View without reliance on HD maps.
准确的全球定位仍然是自动驾驶汽车导航的一个基本挑战。传统的方法通常依赖于通过预先穿越生成的高清(HD)地图或利用辅助传感器,如全球定位系统(GPS)。然而,上述方法通常受到高成本、可伸缩性问题以及GPS不可用时可靠性降低的限制。此外,先前的方法需要特定路线的传感器校准并施加特定模态的约束,这限制了不同传感器类型的泛化。提出的框架通过利用共享嵌入空间来解决这一限制,通过权重共享视觉变压器(ViT)编码器学习,该编码器可以对齐异构传感器模式、光探测和测距(LiDAR)图像以及地理标记街景全景图。所提出的校准能够实现可靠的跨模态检索和粗略的定位,而无需hd地图先验或特定路线的校准。此外,为了解决查询LiDAR和StreetView之间的航向不一致问题,提出了一种等矩形视角-n-点(PnP)求解器,通过斑块级特征对应来细化相对姿态。因此,该框架通过单一激光雷达扫描和公开的街景图像实现了粗略的3自由度(DoF)定位,弥合了位置识别和度量定位之间的差距。实验表明,该方法具有较高的查全率和航向精度,在城市街景环境中具有可扩展性,无需依赖高清地图。
{"title":"LSV-Loc: LiDAR to StreetView Image Cross-Modal Localization","authors":"Sangmin Lee;Donghyun Choi;Jee-Hwan Ryu","doi":"10.1109/LRA.2026.3653282","DOIUrl":"https://doi.org/10.1109/LRA.2026.3653282","url":null,"abstract":"Accurate global localization remains a fundamental challenge in autonomous vehicle navigation. Traditional methods typically rely on high-definition (HD) maps generated through prior traverses or utilize auxiliary sensors, such as a global positioning system (GPS). However, the above approaches are often limited by high costs, scalability issues, and decreased reliability where GPS is unavailable. Moreover, prior methods require route-specific sensor calibration and impose modality-specific constraints, which restrict generalization across different sensor types. The proposed framework addresses this limitation by leveraging a shared embedding space, learned via a weight-sharing Vision Transformer (ViT) encoder, that aligns heterogeneous sensor modalities, Light Detection and Ranging (LiDAR) images, and geo-tagged StreetView panoramas. The proposed alignment enables reliable cross-modal retrieval and coarse-level localization without HD-map priors or route-specific calibration. Further, to address the heading inconsistency between query LiDAR and StreetView, an equirectangular perspective-n-point (PnP) solver is proposed to refine the relative pose through patch-level feature correspondences. As a result, the framework achieves coarse 3-degree-of-freedom (DoF) localization from a single LiDAR scan and publicly available StreetView imagery, bridging the gap between place recognition and metric localization. Experiments demonstrate that the proposed method achieves high recall and heading accuracy, offering scalability in urban settings covered by public Street View without reliance on HD maps.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"11 3","pages":"2514-2521"},"PeriodicalIF":5.3,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146001877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Lie Group Implicit Kinematics for Redundant Parallel Manipulators: Left-Trivialized Extended Jacobians and Gradient-Based Online Redundancy Flows for Singularity Avoidance 冗余并联机器人的李群隐式运动学:左平凡化扩展雅可比矩阵和基于梯度的在线冗余流奇异避免
IF 5.3 2区 计算机科学 Q2 ROBOTICS Pub Date : 2026-01-12 DOI: 10.1109/LRA.2026.3653387
Yifei Liu;Kefei Wen
We present a Lie group implicit formulation for kinematically redundant parallel manipulators that yields left-trivialized extended Jacobians for the extended task variable $x=(g,rho)in text{SE}(3)times mathcal {R}$. On top of this model we design a gradient-based redundancy flow on the redundancy manifold that empirically maintains a positive manipulability margin along prescribed $text{SE}(3)$ trajectories. The framework uses right-multiplicative state updates, remains compatible with automatic differentiation, and avoids mechanism-specific analytic Jacobians; it works with either direct inverse kinematics or a numeric solver. A specialization to $text{SO}(2)^{3}$ provides computation-friendly first- and second-order steps. We validate the approach on two representative mechanisms: a (6+3)-degree-of-freedom (DoF) Stewart platform and a Spherical–Revolute platform. Across dense-coverage orientation trajectories and interactive gamepad commands, the extended Jacobian remained well conditioned while the redundancy planner ran at approximately 2 kHz in software-in-the-loop on a laptop-class CPU. The method integrates cleanly with existing kinematic stacks and is suitable for real-time deployment.
我们提出了一个运动冗余并联机器人的李群隐式公式,该公式对扩展任务变量$x=(g,rho)in text{SE}(3) mathcal {R}$产生左平凡化扩展雅可比矩阵。在此模型之上,我们在冗余流形上设计了一个基于梯度的冗余流,该冗余流形沿着规定的$text{SE}(3)$轨迹经验地保持正可操作性裕度。该框架使用右乘状态更新,与自动微分保持兼容,并避免了特定于机制的解析雅可比矩阵;它适用于直接逆运动学或数值解。对$text{SO}(2)^{3}$的专门化提供了计算友好的一阶和二阶步骤。我们在两个代表性机构上验证了该方法:一个(6+3)自由度(DoF) Stewart平台和一个球面-转动平台。在密集覆盖的方向轨迹和交互式手柄命令中,扩展的雅可比矩阵保持良好的条件,而冗余规划器在笔记本电脑级CPU的环内软件中以大约2 kHz的频率运行。该方法与现有的运动学堆栈清晰地集成在一起,适合实时部署。
{"title":"Lie Group Implicit Kinematics for Redundant Parallel Manipulators: Left-Trivialized Extended Jacobians and Gradient-Based Online Redundancy Flows for Singularity Avoidance","authors":"Yifei Liu;Kefei Wen","doi":"10.1109/LRA.2026.3653387","DOIUrl":"https://doi.org/10.1109/LRA.2026.3653387","url":null,"abstract":"We present a Lie group implicit formulation for kinematically redundant parallel manipulators that yields left-trivialized extended Jacobians for the extended task variable <inline-formula><tex-math>$x=(g,rho)in text{SE}(3)times mathcal {R}$</tex-math></inline-formula>. On top of this model we design a gradient-based redundancy flow on the redundancy manifold that empirically maintains a positive manipulability margin along prescribed <inline-formula><tex-math>$text{SE}(3)$</tex-math></inline-formula> trajectories. The framework uses right-multiplicative state updates, remains compatible with automatic differentiation, and avoids mechanism-specific analytic Jacobians; it works with either direct inverse kinematics or a numeric solver. A specialization to <inline-formula><tex-math>$text{SO}(2)^{3}$</tex-math></inline-formula> provides computation-friendly first- and second-order steps. We validate the approach on two representative mechanisms: a (6+3)-degree-of-freedom (DoF) Stewart platform and a Spherical–Revolute platform. Across dense-coverage orientation trajectories and interactive gamepad commands, the extended Jacobian remained well conditioned while the redundancy planner ran at approximately 2 kHz in software-in-the-loop on a laptop-class CPU. The method integrates cleanly with existing kinematic stacks and is suitable for real-time deployment.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"11 2","pages":"2322-2329"},"PeriodicalIF":5.3,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146026476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
OMCL: Open-Vocabulary Monte Carlo Localization OMCL:开放词汇蒙特卡罗本地化
IF 5.3 2区 计算机科学 Q2 ROBOTICS Pub Date : 2026-01-12 DOI: 10.1109/LRA.2026.3653333
Evgenii Kruzhkov;Raphael Memmesheimer;Sven Behnke
Robust robot localization is an important prerequisite for navigation, but it becomes challenging when the map and robot measurements are obtained from different sensors. Prior methods are often tailored to specific environments, relying on closed-set semantics or fine-tuned features. In this work, we extend Monte Carlo Localization with vision-language features, allowing OMCL to robustly compute the likelihood of visual observations given a camera pose and a 3D map created from posed RGB-D images or aligned point clouds These open-vocabulary features enable us to associate observations and map elements from different modalities, and to natively initialize global localization through natural language descriptions of nearby objects. We evaluate our approach using Matterport3D and Replica for indoor scenes and demonstrate generalization on SemanticKITTI for outdoor scenes.
鲁棒机器人定位是导航的重要前提,但当地图和机器人测量数据来自不同的传感器时,这就变得具有挑战性。先前的方法通常是针对特定环境进行定制的,依赖于闭集语义或微调特性。在这项工作中,我们用视觉语言特征扩展了蒙特卡罗定位,允许OMCL鲁棒地计算给定相机姿势和由RGB-D图像或对齐点云创建的3D地图的视觉观测的可能性。这些开放词汇表特征使我们能够将来自不同模态的观测和地图元素关联起来,并通过对附近物体的自然语言描述来本地初始化全局定位。我们在室内场景中使用Matterport3D和Replica来评估我们的方法,并在室外场景中展示SemanticKITTI的泛化。
{"title":"OMCL: Open-Vocabulary Monte Carlo Localization","authors":"Evgenii Kruzhkov;Raphael Memmesheimer;Sven Behnke","doi":"10.1109/LRA.2026.3653333","DOIUrl":"https://doi.org/10.1109/LRA.2026.3653333","url":null,"abstract":"Robust robot localization is an important prerequisite for navigation, but it becomes challenging when the map and robot measurements are obtained from different sensors. Prior methods are often tailored to specific environments, relying on closed-set semantics or fine-tuned features. In this work, we extend Monte Carlo Localization with vision-language features, allowing OMCL to robustly compute the likelihood of visual observations given a camera pose and a 3D map created from posed RGB-D images or aligned point clouds These open-vocabulary features enable us to associate observations and map elements from different modalities, and to natively initialize global localization through natural language descriptions of nearby objects. We evaluate our approach using Matterport3D and Replica for indoor scenes and demonstrate generalization on SemanticKITTI for outdoor scenes.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"11 3","pages":"2698-2705"},"PeriodicalIF":5.3,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146026566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Comparing Semi-Autonomous Strategies for Virtual Reality Based Remote Robotic Telemanipulation: On Peg-In-Hole Tasks 基于虚拟现实的远程机器人半自主操作策略比较——钉入孔任务
IF 5.3 2区 计算机科学 Q2 ROBOTICS Pub Date : 2026-01-12 DOI: 10.1109/LRA.2026.3653274
Shifei Duan;Francesco De Pace;Minas Liarokapis
Over the last decade, robot telemanipulation has been increasingly utilized in various applications so as to replace human operators in hazardous or remote environments. However, the telemanipulation of robots remains a challenging task, especially when high precision and dexterity are required. Peg-in-hole tasks are considered some of the most challenging tasks as they require high precision. To facilitate the execution of such complex tasks, this paper introduces and compares different semi-autonomous strategies for virtual reality (VR) based remote robotic telemanipulation of a robot arm. Four modalities of robotic telemanipulation with varying degrees of autonomy are presented and thoroughly compared. Finally, the comparative user study highlights the differences between the proposed modalities and showcases the advantages and disadvantages of each approach in detail.
在过去的十年中,机器人遥控越来越多地应用于各种应用中,以便在危险或远程环境中取代人类操作员。然而,机器人的远程操作仍然是一项具有挑战性的任务,特别是当需要高精度和灵巧性时。钉孔作业被认为是最具挑战性的任务之一,因为它们需要高精度。为了方便这些复杂任务的执行,本文介绍并比较了基于虚拟现实(VR)的机器人手臂远程操作的不同半自主策略。提出了四种不同自主程度的机器人遥控方式,并对其进行了比较。最后,比较用户研究强调了提出的模式之间的差异,并详细展示了每种方法的优点和缺点。
{"title":"Comparing Semi-Autonomous Strategies for Virtual Reality Based Remote Robotic Telemanipulation: On Peg-In-Hole Tasks","authors":"Shifei Duan;Francesco De Pace;Minas Liarokapis","doi":"10.1109/LRA.2026.3653274","DOIUrl":"https://doi.org/10.1109/LRA.2026.3653274","url":null,"abstract":"Over the last decade, robot telemanipulation has been increasingly utilized in various applications so as to replace human operators in hazardous or remote environments. However, the telemanipulation of robots remains a challenging task, especially when high precision and dexterity are required. Peg-in-hole tasks are considered some of the most challenging tasks as they require high precision. To facilitate the execution of such complex tasks, this paper introduces and compares different semi-autonomous strategies for virtual reality (VR) based remote robotic telemanipulation of a robot arm. Four modalities of robotic telemanipulation with varying degrees of autonomy are presented and thoroughly compared. Finally, the comparative user study highlights the differences between the proposed modalities and showcases the advantages and disadvantages of each approach in detail.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"11 3","pages":"2562-2569"},"PeriodicalIF":5.3,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146026593","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
VISTA: Open-Vocabulary, Task-Relevant Robot Exploration With Online Semantic Gaussian Splatting 开放词汇,任务相关的机器人探索与在线语义高斯飞溅
IF 5.3 2区 计算机科学 Q2 ROBOTICS Pub Date : 2026-01-12 DOI: 10.1109/LRA.2026.3653276
Keiko Nagami;Timothy Chen;Javier Yu;Ola Shorinwa;Maximilian Adang;Carlyn Dougherty;Eric Cristofalo;Mac Schwager
We present VISTA (Viewpoint-based Image selection with Semantic Task Awareness), an active exploration method for robots to plan informative trajectories that improve 3D map quality in areas most relevant for task completion. Given an open-vocabulary search instruction (e.g., “find a person”), VISTA enables a robot to explore its environment to search for the object of interest, while simultaneously building a real-time semantic 3D Gaussian Splatting reconstruction of the scene. The robot navigates its environment by planning receding-horizon trajectories that prioritize semantic similarity to the query and exploration of unseen regions of the environment. To evaluate trajectories, VISTA introduces a novel, efficient viewpoint-semantic coverage metric that quantifies both the geometric view diversity and task relevance in the 3D scene. On static datasets, our coverage metric outperforms state-of-the-art baselines, FisherRF and Bayes' Rays, in computation speed and reconstruction quality. In quadrotor hardware experiments, VISTA achieves 6x higher success rates in challenging maps, compared to baseline methods, while matching baseline performance in less challenging maps. Lastly, we show that VISTA is platform-agnostic by deploying it on a quadrotor drone and a Spot quadruped robot.
我们提出了VISTA(带有语义任务感知的基于视点的图像选择),这是一种主动探索方法,用于机器人规划信息轨迹,从而提高与任务完成最相关区域的3D地图质量。给定一个开放词汇搜索指令(例如,“找到一个人”),VISTA使机器人能够探索其环境以搜索感兴趣的对象,同时构建实时语义三维高斯飞溅重建场景。机器人通过规划视界后退轨迹来导航其环境,这些轨迹优先考虑语义相似性,而不是对环境中未知区域的查询和探索。为了评估轨迹,VISTA引入了一种新颖、高效的视点语义覆盖度量,量化了3D场景中的几何视点多样性和任务相关性。在静态数据集上,我们的覆盖度量在计算速度和重建质量方面优于最先进的基线,如FisherRF和Bayes’Rays。在四旋翼硬件实验中,VISTA在具有挑战性的地图上的成功率比基线方法高6倍,同时在较低挑战性的地图上的性能与基线相当。最后,我们通过将VISTA部署在四旋翼无人机和Spot四足机器人上来证明它是平台无关的。
{"title":"VISTA: Open-Vocabulary, Task-Relevant Robot Exploration With Online Semantic Gaussian Splatting","authors":"Keiko Nagami;Timothy Chen;Javier Yu;Ola Shorinwa;Maximilian Adang;Carlyn Dougherty;Eric Cristofalo;Mac Schwager","doi":"10.1109/LRA.2026.3653276","DOIUrl":"https://doi.org/10.1109/LRA.2026.3653276","url":null,"abstract":"We present VISTA (Viewpoint-based Image selection with Semantic Task Awareness), an active exploration method for robots to plan informative trajectories that improve 3D map quality in areas most relevant for task completion. Given an open-vocabulary search instruction (e.g., “find a person”), VISTA enables a robot to explore its environment to search for the object of interest, while simultaneously building a real-time semantic 3D Gaussian Splatting reconstruction of the scene. The robot navigates its environment by planning receding-horizon trajectories that prioritize semantic similarity to the query and exploration of unseen regions of the environment. To evaluate trajectories, VISTA introduces a novel, efficient viewpoint-semantic coverage metric that quantifies both the geometric view diversity and task relevance in the 3D scene. On static datasets, our coverage metric outperforms state-of-the-art baselines, FisherRF and Bayes' Rays, in computation speed and reconstruction quality. In quadrotor hardware experiments, VISTA achieves 6x higher success rates in challenging maps, compared to baseline methods, while matching baseline performance in less challenging maps. Lastly, we show that VISTA is platform-agnostic by deploying it on a quadrotor drone and a Spot quadruped robot.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"11 3","pages":"3150-3157"},"PeriodicalIF":5.3,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146082218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CAUSALNAV: A Long-Term Embodied Navigation System for Autonomous Mobile Robots in Dynamic Outdoor Scenarios caualnav:一种用于动态户外场景下自主移动机器人的长期嵌入导航系统
IF 5.3 2区 计算机科学 Q2 ROBOTICS Pub Date : 2026-01-12 DOI: 10.1109/LRA.2026.3653283
Hongbo Duan;Shangyi Luo;Zhiyuan Deng;Yanbo Chen;Yuanhao Chiang;Yi Liu;Fangming Liu;Xueqian Wang
Autonomous language-guided navigation in large-scale outdoor environments remains a key challenge in mobile robotics, due to difficulties in semantic reasoning, dynamic conditions, and long-term stability. We propose CausalNav, the first scene graph-based semantic navigation framework tailored for dynamic outdoor environments. We construct a multi-level semantic scene graph using LLMs, referred to as the Embodied Graph, that hierarchically integrates coarse-grained map data with fine-grained object entities. The constructed graph serves as a retrievable knowledge base for Retrieval-Augmented Generation (RAG), enabling semantic navigation and long-range planning under open-vocabulary queries. By fusing real-time perception with offline map data, the Embodied Graph supports robust navigation across varying spatial granularities in dynamic outdoor environments. Dynamic objects are explicitly handled in both the scene graph construction and hierarchical planning modules. The Embodied Graph is continuously updated within a temporal window to reflect environmental changes and support real-time semantic navigation. Extensive experiments in both simulation and real-world settings demonstrate superior robustness and efficiency.
由于语义推理、动态条件和长期稳定性方面的困难,大规模户外环境下的自主语言引导导航仍然是移动机器人的关键挑战。我们提出了CausalNav,这是第一个为动态户外环境量身定制的基于场景图的语义导航框架。我们使用llm构建了一个多层语义场景图,称为嵌入图,它分层地将粗粒度地图数据与细粒度对象实体集成在一起。构建的图用作检索增强生成(retrieve - augmented Generation, RAG)的可检索知识库,支持开放词汇表查询下的语义导航和长期规划。通过融合实时感知和离线地图数据,Embodied Graph支持在动态户外环境中跨越不同空间粒度的强大导航。动态对象在场景图形构建和分层规划模块中被显式处理。嵌入图在一个时间窗口内不断更新,以反映环境变化并支持实时语义导航。在模拟和现实世界中进行的大量实验证明了优越的鲁棒性和效率。
{"title":"CAUSALNAV: A Long-Term Embodied Navigation System for Autonomous Mobile Robots in Dynamic Outdoor Scenarios","authors":"Hongbo Duan;Shangyi Luo;Zhiyuan Deng;Yanbo Chen;Yuanhao Chiang;Yi Liu;Fangming Liu;Xueqian Wang","doi":"10.1109/LRA.2026.3653283","DOIUrl":"https://doi.org/10.1109/LRA.2026.3653283","url":null,"abstract":"Autonomous language-guided navigation in large-scale outdoor environments remains a key challenge in mobile robotics, due to difficulties in semantic reasoning, dynamic conditions, and long-term stability. We propose CausalNav, the first scene graph-based semantic navigation framework tailored for dynamic outdoor environments. We construct a multi-level semantic scene graph using LLMs, referred to as the <italic>Embodied Graph</i>, that hierarchically integrates coarse-grained map data with fine-grained object entities. The constructed graph serves as a retrievable knowledge base for Retrieval-Augmented Generation (RAG), enabling semantic navigation and long-range planning under open-vocabulary queries. By fusing real-time perception with offline map data, the <italic>Embodied Graph</i> supports robust navigation across varying spatial granularities in dynamic outdoor environments. Dynamic objects are explicitly handled in both the scene graph construction and hierarchical planning modules. The <italic>Embodied Graph</i> is continuously updated within a temporal window to reflect environmental changes and support real-time semantic navigation. Extensive experiments in both simulation and real-world settings demonstrate superior robustness and efficiency.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"11 3","pages":"3198-3205"},"PeriodicalIF":5.3,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146082070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient Robotic 3D Measurement Through Multi-DoF Reinforcement Learning for Continuous Viewpoint Planning 基于连续视点规划的多自由度强化学习的高效机器人三维测量
IF 5.3 2区 计算机科学 Q2 ROBOTICS Pub Date : 2026-01-12 DOI: 10.1109/LRA.2026.3653369
Jun Ye;Qiu Fang;Shi Wang;Changqing Gao;Weixing Peng;Yaonan Wang
Three-dimensional (3D) measurement is essential for quality control in manufacturing, especially for components with complex geometries. Conventional viewpoint planning methods based on fixed spherical coordinates often fail to capture intricate surfaces, leading to suboptimal reconstructions. To address this, we propose a multi-degree-of-freedom reinforcement learning (RL) framework for continuous viewpoint planning in robotic 3D measurement. The framework introduces three key innovations: (1) a voxel-based state representation with dynamic ray-traced coverage updates; (2) a dual-objective reward that enforces precise overlap control while minimizing the number of viewpoints; and (3) integration of robotic kinematics to guarantee physically feasible scanning. Experiments on industrial parts demonstrate that our method outperforms existing techniques in overlap regulation and planning efficiency, enabling more accurate and autonomous 3D reconstruction for complex geometries.
三维(3D)测量对于制造过程中的质量控制至关重要,特别是对于具有复杂几何形状的部件。传统的基于固定球坐标的视点规划方法往往不能捕获复杂的表面,导致重建不理想。为了解决这个问题,我们提出了一个用于机器人三维测量中连续视点规划的多自由度强化学习(RL)框架。该框架引入了三个关键创新:(1)基于体素的状态表示,具有动态光线跟踪覆盖更新;(2)双重目标奖励,在最小化视点数量的同时,实现精确的重叠控制;(3)机器人运动学集成,保证物理上可行的扫描。在工业零件上的实验表明,我们的方法在重叠调节和规划效率方面优于现有技术,能够更准确和自主地对复杂几何形状进行3D重建。
{"title":"Efficient Robotic 3D Measurement Through Multi-DoF Reinforcement Learning for Continuous Viewpoint Planning","authors":"Jun Ye;Qiu Fang;Shi Wang;Changqing Gao;Weixing Peng;Yaonan Wang","doi":"10.1109/LRA.2026.3653369","DOIUrl":"https://doi.org/10.1109/LRA.2026.3653369","url":null,"abstract":"Three-dimensional (3D) measurement is essential for quality control in manufacturing, especially for components with complex geometries. Conventional viewpoint planning methods based on fixed spherical coordinates often fail to capture intricate surfaces, leading to suboptimal reconstructions. To address this, we propose a multi-degree-of-freedom reinforcement learning (RL) framework for continuous viewpoint planning in robotic 3D measurement. The framework introduces three key innovations: (1) a voxel-based state representation with dynamic ray-traced coverage updates; (2) a dual-objective reward that enforces precise overlap control while minimizing the number of viewpoints; and (3) integration of robotic kinematics to guarantee physically feasible scanning. Experiments on industrial parts demonstrate that our method outperforms existing techniques in overlap regulation and planning efficiency, enabling more accurate and autonomous 3D reconstruction for complex geometries.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"11 3","pages":"2618-2625"},"PeriodicalIF":5.3,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146026570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CLF-RL: Control Lyapunov Function Guided Reinforcement Learning CLF-RL:控制李雅普诺夫函数引导强化学习
IF 5.3 2区 计算机科学 Q2 ROBOTICS Pub Date : 2026-01-12 DOI: 10.1109/LRA.2026.3653329
Kejun Li;Zachary Olkin;Yisong Yue;Aaron D. Ames
Reinforcement learning (RL) has shown promise in generating robust locomotion policies for bipedal robots, but often suffers from tedious reward design and sensitivity to poorly shaped objectives. In this work, we propose a structured reward shaping framework that leverages model-based trajectory generation and control Lyapunov functions (CLFs) to guide policy learning. We explore two model-based planners for generating reference trajectories: a reduced-order linear inverted pendulum (LIP) model for velocity-conditioned motion planning, and a precomputed gait library based on hybrid zero dynamics (HZD) using full-order dynamics. These planners define desired end-effector and joint trajectories, which are used to construct CLF-based rewards that penalize tracking error and encourage rapid convergence. This formulation provides meaningful intermediate rewards, and is straightforward to implement once a reference is available. Both the reference trajectories and CLF shaping are used only during training, resulting in a lightweight policy at deployment. We validate our method both in simulation and through extensive real-world experiments on a Unitree G1 robot. CLF-RL demonstrates significantly improved robustness relative to the baseline RL policy and better performance than a classic tracking reward RL formulation.
强化学习(RL)在为两足机器人生成健壮的运动策略方面显示出了希望,但通常受到繁琐的奖励设计和对不良目标的敏感性的影响。在这项工作中,我们提出了一个结构化的奖励塑造框架,该框架利用基于模型的轨迹生成和控制李雅普诺夫函数(clf)来指导策略学习。我们探索了两种基于模型的规划器来生成参考轨迹:用于速度条件运动规划的降阶线性倒立摆(LIP)模型,以及基于混合零动力学(HZD)的预计算步态库。这些规划器定义了期望的末端执行器和关节轨迹,用于构建基于clf的奖励,以惩罚跟踪错误并鼓励快速收敛。这个公式提供了有意义的中间奖励,一旦有了参考就可以直接执行。参考轨迹和CLF成形仅在训练期间使用,从而在部署时采用轻量级策略。我们在模拟中验证了我们的方法,并通过在Unitree G1机器人上进行了广泛的现实世界实验。与基准强化学习策略相比,CLF-RL的鲁棒性得到了显著提高,表现优于经典跟踪奖励强化学习公式。
{"title":"CLF-RL: Control Lyapunov Function Guided Reinforcement Learning","authors":"Kejun Li;Zachary Olkin;Yisong Yue;Aaron D. Ames","doi":"10.1109/LRA.2026.3653329","DOIUrl":"https://doi.org/10.1109/LRA.2026.3653329","url":null,"abstract":"Reinforcement learning (RL) has shown promise in generating robust locomotion policies for bipedal robots, but often suffers from tedious reward design and sensitivity to poorly shaped objectives. In this work, we propose a structured reward shaping framework that leverages model-based trajectory generation and control Lyapunov functions (CLFs) to guide policy learning. We explore two model-based planners for generating reference trajectories: a reduced-order linear inverted pendulum (LIP) model for velocity-conditioned motion planning, and a precomputed gait library based on hybrid zero dynamics (HZD) using full-order dynamics. These planners define desired end-effector and joint trajectories, which are used to construct CLF-based rewards that penalize tracking error and encourage rapid convergence. This formulation provides meaningful intermediate rewards, and is straightforward to implement once a reference is available. Both the reference trajectories and CLF shaping are used only during training, resulting in a lightweight policy at deployment. We validate our method both in simulation and through extensive real-world experiments on a Unitree G1 robot. CLF-RL demonstrates significantly improved robustness relative to the baseline RL policy and better performance than a classic tracking reward RL formulation.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"11 3","pages":"3230-3237"},"PeriodicalIF":5.3,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146082065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Affordance RAG: Hierarchical Multimodal Retrieval With Affordance-Aware Embodied Memory for Mobile Manipulation 面向移动操作的可视性感知具身记忆分层多模态检索
IF 5.3 2区 计算机科学 Q2 ROBOTICS Pub Date : 2026-01-12 DOI: 10.1109/LRA.2026.3653281
Ryosuke Korekata;Quanting Xie;Yonatan Bisk;Komei Sugiura
In this study, we address the problem of open-vocabulary mobile manipulation, where a robot is required to carry a wide range of objects to receptacles based on free-form natural language instructions. This task is challenging, as it involves understanding visual semantics and the affordance of manipulation actions. To tackle these challenges, we propose Affordance RAG, a zero-shot hierarchical multimodal retrieval framework that constructs Affordance-Aware Embodied Memory from pre-explored images. The model retrieves candidate targets based on regional and visual semantics and reranks them with affordance scores, allowing the robot to identify manipulation options that are likely to be executable in real-world environments. Our method outperformed existing approaches in retrieval performance for mobile manipulation instruction in large-scale indoor environments. Furthermore, in real-world experiments where the robot performed mobile manipulation in indoor environments based on free-form instructions, the proposed method achieved a task success rate of 85%, outperforming existing methods in both retrieval performance and overall task success.
在这项研究中,我们解决了开放词汇移动操作的问题,其中要求机器人根据自由形式的自然语言指令将各种物体搬运到容器中。这项任务是具有挑战性的,因为它涉及到理解视觉语义和操作动作的可视性。为了解决这些挑战,我们提出了一个零镜头分层多模态检索框架,从预先探索的图像中构建情景感知的具具记忆。该模型根据区域和视觉语义检索候选目标,并根据可用性分数对它们进行重新排名,从而允许机器人识别可能在现实环境中可执行的操作选项。我们的方法在大规模室内环境下的移动操作指令检索性能上优于现有方法。此外,在机器人基于自由形式指令在室内环境中进行移动操作的实际实验中,该方法的任务成功率达到85%,在检索性能和总体任务成功率方面都优于现有方法。
{"title":"Affordance RAG: Hierarchical Multimodal Retrieval With Affordance-Aware Embodied Memory for Mobile Manipulation","authors":"Ryosuke Korekata;Quanting Xie;Yonatan Bisk;Komei Sugiura","doi":"10.1109/LRA.2026.3653281","DOIUrl":"https://doi.org/10.1109/LRA.2026.3653281","url":null,"abstract":"In this study, we address the problem of open-vocabulary mobile manipulation, where a robot is required to carry a wide range of objects to receptacles based on free-form natural language instructions. This task is challenging, as it involves understanding visual semantics and the affordance of manipulation actions. To tackle these challenges, we propose Affordance RAG, a zero-shot hierarchical multimodal retrieval framework that constructs Affordance-Aware Embodied Memory from pre-explored images. The model retrieves candidate targets based on regional and visual semantics and reranks them with affordance scores, allowing the robot to identify manipulation options that are likely to be executable in real-world environments. Our method outperformed existing approaches in retrieval performance for mobile manipulation instruction in large-scale indoor environments. Furthermore, in real-world experiments where the robot performed mobile manipulation in indoor environments based on free-form instructions, the proposed method achieved a task success rate of 85%, outperforming existing methods in both retrieval performance and overall task success.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"11 3","pages":"2706-2713"},"PeriodicalIF":5.3,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146026567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Robotics and Automation Letters
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1