Supernumerary Robotic Limbs (SRLs) offer considerable promise for assisting wearers in complex tasks, yet their adaptability is often constrained by their inherent fixed morphology. While modular reconfigurable designs present a viable solution, applying it to wearable systems introduces critical size-payload trade-offs. To address these limitations, this letter, drawing inspiration from the human upper limb, introduces a modular reconfigurable supernumerary robotic limb (MRSRL) based on a tightly integrated hardware and control co-design. At the hardware level, we developed a novel joint module featuring a bio-inspired redundant actuation mechanism. The module's partitioned design also ensures functional extensibility. At the control level, we designed a unified control framework leveraging active disturbance rejection control to effectively suppress backlash and maintain robust motion tracking across heterogeneous actuator configurations. Experimental validation demonstrates the system's efficacy, achieving a 4.21-fold increase in torque-to-weight ratio compared to the single-motor module, and reductions of 7.20% and 17.34% in maximum absolute error and integral of absolute error, respectively, against baseline methods. Our work establishes a robust framework for the development of next-generation SRLs capable of adapting to a wide range of tasks.
{"title":"A Bio-Inspired Scalable Parallel Actuation Approach for Modular Reconfigurable Supernumerary Limbs","authors":"Dawei Liang;Sikai Zhao;Tenglei Wang;Bohuan Lu;Jian Qi;Ning Zhao;Haotian Ju;Jie Zhao;Yanhe Zhu","doi":"10.1109/LRA.2026.3665407","DOIUrl":"https://doi.org/10.1109/LRA.2026.3665407","url":null,"abstract":"Supernumerary Robotic Limbs (SRLs) offer considerable promise for assisting wearers in complex tasks, yet their adaptability is often constrained by their inherent fixed morphology. While modular reconfigurable designs present a viable solution, applying it to wearable systems introduces critical size-payload trade-offs. To address these limitations, this letter, drawing inspiration from the human upper limb, introduces a modular reconfigurable supernumerary robotic limb (MRSRL) based on a tightly integrated hardware and control co-design. At the hardware level, we developed a novel joint module featuring a bio-inspired redundant actuation mechanism. The module's partitioned design also ensures functional extensibility. At the control level, we designed a unified control framework leveraging active disturbance rejection control to effectively suppress backlash and maintain robust motion tracking across heterogeneous actuator configurations. Experimental validation demonstrates the system's efficacy, achieving a 4.21-fold increase in torque-to-weight ratio compared to the single-motor module, and reductions of 7.20% and 17.34% in maximum absolute error and integral of absolute error, respectively, against baseline methods. Our work establishes a robust framework for the development of next-generation SRLs capable of adapting to a wide range of tasks.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"11 4","pages":"4849-4856"},"PeriodicalIF":5.3,"publicationDate":"2026-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147362229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-16DOI: 10.1109/LRA.2026.3665066
Haebeom Jung;Namtae Kim;Jungwoo Kim;Jaesik Park
Accurate LiDAR-camera calibration is crucial for multi-sensor systems. However, traditional methods often rely on physical targets, which are impractical for real-world deployment. Moreover, even carefully calibrated extrinsics can degrade over time due to sensor drift or external disturbances, necessitating periodic recalibration. To address these challenges, we present a Targetless LiDAR–Camera Calibration (TLC-Calib) that jointly optimizes sensor poses with a neural Gaussian–based scene representation. Reliable LiDAR points are frozen as anchor Gaussians to preserve global structure, while auxiliary Gaussians prevent local overfitting under noisy initialization. Our fully differentiable pipeline with photometric and geometric regularization achieves robust and generalizable calibration, consistently outperforming existing targetless methods on the KITTI-360, Waymo, and Fast-LIVO2 datasets. In addition, it yields more consistent Novel View Synthesis results, reflecting improved extrinsic alignment.
{"title":"Targetless LiDAR-Camera Calibration With Neural Gaussian Splatting","authors":"Haebeom Jung;Namtae Kim;Jungwoo Kim;Jaesik Park","doi":"10.1109/LRA.2026.3665066","DOIUrl":"https://doi.org/10.1109/LRA.2026.3665066","url":null,"abstract":"Accurate LiDAR-camera calibration is crucial for multi-sensor systems. However, traditional methods often rely on physical targets, which are impractical for real-world deployment. Moreover, even carefully calibrated extrinsics can degrade over time due to sensor drift or external disturbances, necessitating periodic recalibration. To address these challenges, we present a Targetless LiDAR–Camera Calibration (TLC-Calib) that jointly optimizes sensor poses with a neural Gaussian–based scene representation. Reliable LiDAR points are frozen as anchor Gaussians to preserve global structure, while auxiliary Gaussians prevent local overfitting under noisy initialization. Our fully differentiable pipeline with photometric and geometric regularization achieves robust and generalizable calibration, consistently outperforming existing targetless methods on the <sc>KITTI-360</small>, <sc>Waymo</small>, and <sc>Fast-LIVO2</small> datasets. In addition, it yields more consistent Novel View Synthesis results, reflecting improved extrinsic alignment.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"11 4","pages":"4777-4784"},"PeriodicalIF":5.3,"publicationDate":"2026-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147362452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-16DOI: 10.1109/LRA.2026.3665080
Doyeon Kim;Heoncheol Lee
Reliable place recognition is essential to SLAM, as it enables loop closure detection, re-localization, and map merging in long-term operation and multi-robot deployments. While semantic information enables a more human-like understanding of environments, only a few studies have integrated semantic graphs with background appearance cues. To address this gap, we propose FLASH (Fibonacci Lattice Spherical Harmonics), a novel LiDAR-based place recognition (LPR) approach that employs spherical harmonics (SH) to unify semantic, topological, and appearance information into a compact and discriminative descriptor. Specifically, FLASH introduces newly defined complementary spherical functions for the foreground and background, uniformly samples the spherical domain with a Fibonacci lattice, and expands these functions in the SH basis to obtain a rotation-invariant representation. Experimental results on KITTI, Ford Campus, Apollo, and CU-Multi demonstrate that FLASH consistently achieves higher place recognition performance across various scenarios.
{"title":"FLASH: Fibonacci Lattice Spherical Harmonics for Semantic Place Recognition Using LiDAR","authors":"Doyeon Kim;Heoncheol Lee","doi":"10.1109/LRA.2026.3665080","DOIUrl":"https://doi.org/10.1109/LRA.2026.3665080","url":null,"abstract":"Reliable place recognition is essential to SLAM, as it enables loop closure detection, re-localization, and map merging in long-term operation and multi-robot deployments. While semantic information enables a more human-like understanding of environments, only a few studies have integrated semantic graphs with background appearance cues. To address this gap, we propose FLASH (Fibonacci Lattice Spherical Harmonics), a novel LiDAR-based place recognition (LPR) approach that employs spherical harmonics (SH) to unify semantic, topological, and appearance information into a compact and discriminative descriptor. Specifically, FLASH introduces newly defined complementary spherical functions for the foreground and background, uniformly samples the spherical domain with a Fibonacci lattice, and expands these functions in the SH basis to obtain a rotation-invariant representation. Experimental results on KITTI, Ford Campus, Apollo, and CU-Multi demonstrate that FLASH consistently achieves higher place recognition performance across various scenarios.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"11 4","pages":"4521-4528"},"PeriodicalIF":5.3,"publicationDate":"2026-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147299657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-16DOI: 10.1109/LRA.2026.3665313
Quoc Hung Hoang;Gon-Woo Kim
This letter presents a new tightly coupled LiDAR-inertial odometry (LIO) scheme for a quadruped robot operating under fluctuating conditions. By proposing external disturbance modeling, the profile of unknown disturbances and noise on the IMU is effectively characterized using the time delay estimation (TDE) technique. Simultaneously, the IMU orientation and the TDE-based uncertainty model are jointly updated through an Error-State Kalman Filter (ESKF) using a measurement model derived from LiDAR odometry (LO). Thereafter, the output of the ESKF is employed to mitigate vibration effects in the IMU preintegration factor, thereby enhancing the precision and stability of inertial motion estimation. Furthermore, the refined IMU preintegration pose is leveraged to correct LiDAR distortion and improve the accuracy of LO. As a result, the proposed approach achieves optimal performance, smooth trajectories, and enhanced robustness against uncertainties. Finally, the effectiveness of the proposed LIO is evaluated through real-time experiments on a quadruped robot across different scenarios.
{"title":"External Disturbances Compensation for LiDAR-Inertial Odometry Under Vibration Conditions on Quadruped Robot","authors":"Quoc Hung Hoang;Gon-Woo Kim","doi":"10.1109/LRA.2026.3665313","DOIUrl":"https://doi.org/10.1109/LRA.2026.3665313","url":null,"abstract":"This letter presents a new tightly coupled LiDAR-inertial odometry (LIO) scheme for a quadruped robot operating under fluctuating conditions. By proposing external disturbance modeling, the profile of unknown disturbances and noise on the IMU is effectively characterized using the time delay estimation (TDE) technique. Simultaneously, the IMU orientation and the TDE-based uncertainty model are jointly updated through an Error-State Kalman Filter (ESKF) using a measurement model derived from LiDAR odometry (LO). Thereafter, the output of the ESKF is employed to mitigate vibration effects in the IMU preintegration factor, thereby enhancing the precision and stability of inertial motion estimation. Furthermore, the refined IMU preintegration pose is leveraged to correct LiDAR distortion and improve the accuracy of LO. As a result, the proposed approach achieves optimal performance, smooth trajectories, and enhanced robustness against uncertainties. Finally, the effectiveness of the proposed LIO is evaluated through real-time experiments on a quadruped robot across different scenarios.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"11 4","pages":"4713-4720"},"PeriodicalIF":5.3,"publicationDate":"2026-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147362420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-16DOI: 10.1109/LRA.2026.3665065
Hongyi Li;Jun Xu;Jinfeng Liu
The control of large buildings encounters challenges in computational efficiency due to their size and nonlinear components.To address these issues, this paper proposes a Piecewise Affine (PWA)-based distributed scheme for Model Predictive Control (MPC) that optimizes energy and comfort through PWA-based quadratic programming. We utilize the Alternating Direction Method of Multipliers (ADMM) for effective decomposition and apply the PWA technique to handle the nonlinear components. To solve the resulting large-scale nonconvex problems, the paper introduces a convex ADMM algorithm that transforms the nonconvex problem into a series of smaller convex problems, significantly enhancing computational efficiency. Furthermore, we demonstrate that the convex ADMM algorithm converges to a local optimum of the original problem. A case study involving 36 zones validates the effectiveness of the proposed method. Our proposed method reduces execution time by 86% compared to the centralized version.
{"title":"Distributed Model Predictive Control for Energy and Comfort Optimization in Large Buildings Using Piecewise Affine Approximation","authors":"Hongyi Li;Jun Xu;Jinfeng Liu","doi":"10.1109/LRA.2026.3665065","DOIUrl":"https://doi.org/10.1109/LRA.2026.3665065","url":null,"abstract":"The control of large buildings encounters challenges in computational efficiency due to their size and nonlinear components.To address these issues, this paper proposes a Piecewise Affine (PWA)-based distributed scheme for Model Predictive Control (MPC) that optimizes energy and comfort through PWA-based quadratic programming. We utilize the Alternating Direction Method of Multipliers (ADMM) for effective decomposition and apply the PWA technique to handle the nonlinear components. To solve the resulting large-scale nonconvex problems, the paper introduces a convex ADMM algorithm that transforms the nonconvex problem into a series of smaller convex problems, significantly enhancing computational efficiency. Furthermore, we demonstrate that the convex ADMM algorithm converges to a local optimum of the original problem. A case study involving 36 zones validates the effectiveness of the proposed method. Our proposed method reduces execution time by 86% compared to the centralized version.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"11 4","pages":"4147-4154"},"PeriodicalIF":5.3,"publicationDate":"2026-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146223837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Twisted string actuators (TSAs) have emerged as promising actuators in robotics owing to their compliant, lightweight nature, and high transmission ratios. However, practical utilization of TSAs remains limited due to their restricted stroke length and intrinsically nonlinear transmission ratio (TR). While variable-radius pulleys (VRPs) can mitigate these issues, they exhibit poor scalability as their volume grows disproportionately with the required stroke or compensation TR range. This paper proposes the dual variable-radius drum TSA (DVRD-TSA) and mathematically proves that its architecture offers superior scalability and compactness compared to existing mechanisms as performance demands increase. For a baseline comparison to validate our model, we fabricated a prototype and compared it to a conventional TSA under constant load conditions with identical initial length. The experiment confirmed that our DVRD-TSA delivers a substantially larger linear stroke (210.3 mm, 72.5%) compared to conventional TSA (85.3 mm, 29.4%), while maintaining a comparable peak torque (25.82 Nmm vs. 24.21 Nmm), and successfully tracks its target near-constant transmission ratio (1.475 rad/mm) with low error. This work presents a compact, passive, and scalable solution that overcomes two major drawbacks of TSAs, nonlinear TR and limited stroke, thereby making them a more compelling option for robotic applications.
扭弦致动器(TSAs)由于其柔顺性、轻量化和高传动比而成为机器人技术中有前途的致动器。然而,由于其固有非线性传动比(TR)和行程长度的限制,tsa的实际应用仍然受到限制。虽然变半径滑轮(vrp)可以缓解这些问题,但它们的可扩展性很差,因为它们的体积与所需行程或补偿TR范围不成比例地增长。本文提出了双变半径鼓TSA (DVRD-TSA),并从数学上证明了随着性能需求的增加,其结构与现有机制相比具有更好的可扩展性和紧凑性。为了验证我们的模型的基线比较,我们制作了一个原型,并将其与具有相同初始长度的恒定负载条件下的传统TSA进行了比较。实验证实,与传统的TSA (85.3 mm, 29.4%)相比,我们的DVRD-TSA提供了更大的线性行程(210.3 mm, 72.5%),同时保持了相当的峰值扭矩(25.82 Nmm vs. 24.21 Nmm),并成功跟踪其目标近恒定传动比(1.475 rad/mm),误差很小。这项工作提出了一种紧凑,被动和可扩展的解决方案,克服了tsa的两个主要缺点,非线性TR和有限行程,从而使它们成为机器人应用中更有吸引力的选择。
{"title":"Dual Variable-Radius Drum for Transmission Ratio Linearization and Stroke Enhancement in Twisted String Actuators","authors":"Juyeong Seo;JaeHyung Jang;Seungjoon Baek;Jee-Hwan Ryu","doi":"10.1109/LRA.2026.3665071","DOIUrl":"https://doi.org/10.1109/LRA.2026.3665071","url":null,"abstract":"Twisted string actuators (TSAs) have emerged as promising actuators in robotics owing to their compliant, lightweight nature, and high transmission ratios. However, practical utilization of TSAs remains limited due to their restricted stroke length and intrinsically nonlinear transmission ratio (TR). While variable-radius pulleys (VRPs) can mitigate these issues, they exhibit poor scalability as their volume grows disproportionately with the required stroke or compensation TR range. This paper proposes the dual variable-radius drum TSA (DVRD-TSA) and mathematically proves that its architecture offers superior scalability and compactness compared to existing mechanisms as performance demands increase. For a baseline comparison to validate our model, we fabricated a prototype and compared it to a conventional TSA under constant load conditions with identical initial length. The experiment confirmed that our DVRD-TSA delivers a substantially larger linear stroke (210.3 mm, 72.5%) compared to conventional TSA (85.3 mm, 29.4%), while maintaining a comparable peak torque (25.82 Nmm vs. 24.21 Nmm), and successfully tracks its target near-constant transmission ratio (1.475 rad/mm) with low error. This work presents a compact, passive, and scalable solution that overcomes two major drawbacks of TSAs, nonlinear TR and limited stroke, thereby making them a more compelling option for robotic applications.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"11 4","pages":"4529-4536"},"PeriodicalIF":5.3,"publicationDate":"2026-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147299715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-16DOI: 10.1109/LRA.2026.3665443
Chang Han Low;Ziyue Wang;Tianyi Zhang;Zhu Zhuo;Zhitao Zeng;Evangelos B. Mazomenos;Yueming Jin
Robotic-assisted surgery (RAS) is central to modern surgery, driving the need for intelligent systems with accurate scene understanding. Most existing surgical AI methods rely on isolated, task-specific models, leading to fragmented pipelines with limited interpretability and no unified understanding of RAS scene. Vision-Language Models (VLMs) offer strong zero-shot reasoning, but struggle with hallucinations, domain gaps and weak task-interdependency modeling. To address the lack of unified data for RAS scene understanding, we introduce SurgCoTBench, the first reasoning-focused benchmark in RAS, covering 14256 QA pairs with frame-level annotations across five major surgical tasks. Building on SurgCoTBench, we propose SurgRAW, a clinically aligned Chain-of-Thought (CoT) driven agentic workflow for zero-shot multi-task reasoning in surgery. SurgRAW employs a hierarchical reasoning workflow where an orchestrator divides surgical scene understanding into two reasoning streams and directs specialized agents to generate task-level reasoning, while higher-level agents capture workflow interdependencies or ground output clinically. Specifically, we propose a panel discussion mechanism to ensure task-specific agents collaborate synergistically and leverage on task interdependencies. Similarly, we incorporate a retrieval-augmented generation module to enrich agents with surgical knowledge and alleviate domain gaps in general VLMs. We design task-specific CoT prompts grounded in surgical domain to ensure clinically aligned reasoning, reduce hallucinations and enhance interpretability. Extensive experiments show that SurgRAW surpasses mainstream VLMs and agentic systems and outperforms a supervised model by 14.61% accuracy.
{"title":"SurgRAW: Multi-Agent Workflow With Chain of Thought Reasoning for Robotic Surgical Video Analysis","authors":"Chang Han Low;Ziyue Wang;Tianyi Zhang;Zhu Zhuo;Zhitao Zeng;Evangelos B. Mazomenos;Yueming Jin","doi":"10.1109/LRA.2026.3665443","DOIUrl":"https://doi.org/10.1109/LRA.2026.3665443","url":null,"abstract":"Robotic-assisted surgery (RAS) is central to modern surgery, driving the need for intelligent systems with accurate scene understanding. Most existing surgical AI methods rely on isolated, task-specific models, leading to fragmented pipelines with limited interpretability and no unified understanding of RAS scene. Vision-Language Models (VLMs) offer strong zero-shot reasoning, but struggle with hallucinations, domain gaps and weak task-interdependency modeling. To address the lack of unified data for RAS scene understanding, we introduce <bold>SurgCoTBench</b>, the first reasoning-focused benchmark in RAS, covering 14256 QA pairs with frame-level annotations across five major surgical tasks. Building on SurgCoTBench, we propose <bold>SurgRAW</b>, a clinically aligned Chain-of-Thought (CoT) driven agentic workflow for zero-shot multi-task reasoning in surgery. SurgRAW employs a hierarchical reasoning workflow where an orchestrator divides surgical scene understanding into two reasoning streams and directs specialized agents to generate task-level reasoning, while higher-level agents capture workflow interdependencies or ground output clinically. Specifically, we propose a panel discussion mechanism to ensure task-specific agents collaborate synergistically and leverage on task interdependencies. Similarly, we incorporate a retrieval-augmented generation module to enrich agents with surgical knowledge and alleviate domain gaps in general VLMs. We design task-specific CoT prompts grounded in surgical domain to ensure clinically aligned reasoning, reduce hallucinations and enhance interpretability. Extensive experiments show that SurgRAW surpasses mainstream VLMs and agentic systems and outperforms a supervised model by 14.61% accuracy.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"11 4","pages":"4857-4864"},"PeriodicalIF":5.3,"publicationDate":"2026-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147362334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Visuomotor policies trained on human expert demonstrations have recently shown strong performance across a wide range of robotic manipulation tasks. However, these policies remain highly sensitive to domain shifts stemming from background or robot embodiment changes, which limits their generalization capabilities. In this paper, we present ARRO, a novel visual representation that leverages zero-shot open-vocabulary segmentation and object detection models to efficiently mask out task-irrelevant regions of the scene in real time without requiring additional training, modeling of the setup, or camera calibration. By filtering visual distractors and overlaying virtual guides during both training and inference, ARRO improves robustness to scene variations and reduces the need for additional data collection. We extensively evaluate ARRO with Diffusion Policy on a range of tabletop manipulation tasks in both simulation and real-world environments, and further demonstrate its compatibility and effectiveness with generalist robot policies, such as Octo, OpenVLA and $pi _{0}$. Across all settings in our evaluation, ARRO yields consistent performance gains, allows for selective masking to choose between different objects, and shows robustness even to challenging segmentation conditions. Videos showcasing our results are available at: augmented-reality-for-robots.github.io
{"title":"Augmented Reality for RObots (ARRO): Pointing Visuomotor Policies Towards Visual Robustness","authors":"Reihaneh Mirjalili;Tobias Jülg;Florian Walter;Wolfram Burgard","doi":"10.1109/LRA.2026.3665444","DOIUrl":"https://doi.org/10.1109/LRA.2026.3665444","url":null,"abstract":"Visuomotor policies trained on human expert demonstrations have recently shown strong performance across a wide range of robotic manipulation tasks. However, these policies remain highly sensitive to domain shifts stemming from background or robot embodiment changes, which limits their generalization capabilities. In this paper, we present ARRO, a novel visual representation that leverages zero-shot open-vocabulary segmentation and object detection models to efficiently mask out task-irrelevant regions of the scene in real time without requiring additional training, modeling of the setup, or camera calibration. By filtering visual distractors and overlaying virtual guides during both training and inference, ARRO improves robustness to scene variations and reduces the need for additional data collection. We extensively evaluate ARRO with Diffusion Policy on a range of tabletop manipulation tasks in both simulation and real-world environments, and further demonstrate its compatibility and effectiveness with generalist robot policies, such as Octo, OpenVLA and <inline-formula><tex-math>$pi _{0}$</tex-math></inline-formula>. Across all settings in our evaluation, ARRO yields consistent performance gains, allows for selective masking to choose between different objects, and shows robustness even to challenging segmentation conditions. Videos showcasing our results are available at: augmented-reality-for-robots.github.io","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"11 4","pages":"4785-4792"},"PeriodicalIF":5.3,"publicationDate":"2026-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147362453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-16DOI: 10.1109/LRA.2026.3665078
Zhongkai Gu;Chufan Zhang;Xin Jiang
The automation level in the assembly lines of leading automobile manufacturers can reach 80% -90%; however, the installation of automotive wire harnesses predominantly requires manual assembly. This is due to the challenges robots face when manipulating deformable parts. To mitigate this problem, we design a parallel gripper with embedded in-hand manipulation functions. The design is optimized for tackling the situations involving hanging cables which are common in assembly lines. It enables stable grasping of the cable connector, making precise plugging with traditional force control possible. The proposed method is validated by both simulation and experiments.
{"title":"In-Hand Manipulation of the Connector for Cable Installation","authors":"Zhongkai Gu;Chufan Zhang;Xin Jiang","doi":"10.1109/LRA.2026.3665078","DOIUrl":"https://doi.org/10.1109/LRA.2026.3665078","url":null,"abstract":"The automation level in the assembly lines of leading automobile manufacturers can reach 80% -90%; however, the installation of automotive wire harnesses predominantly requires manual assembly. This is due to the challenges robots face when manipulating deformable parts. To mitigate this problem, we design a parallel gripper with embedded in-hand manipulation functions. The design is optimized for tackling the situations involving hanging cables which are common in assembly lines. It enables stable grasping of the cable connector, making precise plugging with traditional force control possible. The proposed method is validated by both simulation and experiments.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"11 4","pages":"4593-4600"},"PeriodicalIF":5.3,"publicationDate":"2026-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147299725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-13DOI: 10.1109/LRA.2026.3664620
Ruiyu Wang;Zheyu Zhuang;Danica Kragic;Florian T. Pokorny
Generalizing beyond the training domain in image-based behavior cloning remains challenging. Existing methods address individual axes of generalization, workspace shifts, viewpoint changes, and cross-embodiment transfer, yet they are typically developed in isolation and often rely on complex pipelines. We introduce PALM (Perception Alignment for Local Manipulation), which leverages the invariance of local action distributions between out-of-distribution (OOD) and demonstrated domains to address these OOD shifts concurrently, without additional input modalities, model changes, or data collection. PALM modularizes the manipulation policy into coarse global components and a local policy for fine-grained actions. We reduce the discrepancy between in-domain and OOD inputs at the local policy level by enforcing local visual focus and consistent proprioceptive representation, allowing the policy to retrieve invariant local actions under OOD conditions. Experiments show that PALM limits OOD performance drops to 8% in simulation and 24% in the real world, compared to 45% and 77% for baselines.
{"title":"PALM: Enhanced Generalizability for Local Visuomotor Policies via Perception Alignment","authors":"Ruiyu Wang;Zheyu Zhuang;Danica Kragic;Florian T. Pokorny","doi":"10.1109/LRA.2026.3664620","DOIUrl":"https://doi.org/10.1109/LRA.2026.3664620","url":null,"abstract":"Generalizing beyond the training domain in image-based behavior cloning remains challenging. Existing methods address individual axes of generalization, workspace shifts, viewpoint changes, and cross-embodiment transfer, yet they are typically developed in isolation and often rely on complex pipelines. We introduce PALM (Perception Alignment for Local Manipulation), which leverages the invariance of local action distributions between out-of-distribution (OOD) and demonstrated domains to address these OOD shifts concurrently, without additional input modalities, model changes, or data collection. PALM modularizes the manipulation policy into coarse global components and a local policy for fine-grained actions. We reduce the discrepancy between in-domain and OOD inputs at the local policy level by enforcing local visual focus and consistent proprioceptive representation, allowing the policy to retrieve invariant local actions under OOD conditions. Experiments show that PALM limits OOD performance drops to 8% in simulation and 24% in the real world, compared to 45% and 77% for baselines.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"11 4","pages":"4865-4872"},"PeriodicalIF":5.3,"publicationDate":"2026-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11395611","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147362551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}