首页 > 最新文献

IEEE Robotics and Automation Letters最新文献

英文 中文
A Bio-Inspired Scalable Parallel Actuation Approach for Modular Reconfigurable Supernumerary Limbs 模块化可重构多余肢体的仿生可扩展并行驱动方法
IF 5.3 2区 计算机科学 Q2 ROBOTICS Pub Date : 2026-02-16 DOI: 10.1109/LRA.2026.3665407
Dawei Liang;Sikai Zhao;Tenglei Wang;Bohuan Lu;Jian Qi;Ning Zhao;Haotian Ju;Jie Zhao;Yanhe Zhu
Supernumerary Robotic Limbs (SRLs) offer considerable promise for assisting wearers in complex tasks, yet their adaptability is often constrained by their inherent fixed morphology. While modular reconfigurable designs present a viable solution, applying it to wearable systems introduces critical size-payload trade-offs. To address these limitations, this letter, drawing inspiration from the human upper limb, introduces a modular reconfigurable supernumerary robotic limb (MRSRL) based on a tightly integrated hardware and control co-design. At the hardware level, we developed a novel joint module featuring a bio-inspired redundant actuation mechanism. The module's partitioned design also ensures functional extensibility. At the control level, we designed a unified control framework leveraging active disturbance rejection control to effectively suppress backlash and maintain robust motion tracking across heterogeneous actuator configurations. Experimental validation demonstrates the system's efficacy, achieving a 4.21-fold increase in torque-to-weight ratio compared to the single-motor module, and reductions of 7.20% and 17.34% in maximum absolute error and integral of absolute error, respectively, against baseline methods. Our work establishes a robust framework for the development of next-generation SRLs capable of adapting to a wide range of tasks.
多余机械肢体(srl)为帮助佩戴者完成复杂任务提供了可观的前景,但它们的适应性往往受到其固有固定形态的限制。虽然模块化可重构设计提供了可行的解决方案,但将其应用于可穿戴系统会引入关键的尺寸-有效载荷权衡。为了解决这些限制,本文从人类上肢中汲取灵感,介绍了一种基于紧密集成的硬件和控制协同设计的模块化可重构多余机器人肢体(MRSRL)。在硬件层面,我们开发了一种具有仿生冗余驱动机制的新型关节模块。模块的分区设计也保证了功能的可扩展性。在控制层面,我们设计了一个统一的控制框架,利用自抗扰控制来有效地抑制间隙,并在异构执行器配置中保持鲁棒运动跟踪。实验验证了该系统的有效性,与单电机模块相比,该系统的扭矩重量比提高了4.21倍,最大绝对误差和绝对误差积分分别比基线方法降低了7.20%和17.34%。我们的工作为能够适应广泛任务的下一代srl的开发建立了一个强大的框架。
{"title":"A Bio-Inspired Scalable Parallel Actuation Approach for Modular Reconfigurable Supernumerary Limbs","authors":"Dawei Liang;Sikai Zhao;Tenglei Wang;Bohuan Lu;Jian Qi;Ning Zhao;Haotian Ju;Jie Zhao;Yanhe Zhu","doi":"10.1109/LRA.2026.3665407","DOIUrl":"https://doi.org/10.1109/LRA.2026.3665407","url":null,"abstract":"Supernumerary Robotic Limbs (SRLs) offer considerable promise for assisting wearers in complex tasks, yet their adaptability is often constrained by their inherent fixed morphology. While modular reconfigurable designs present a viable solution, applying it to wearable systems introduces critical size-payload trade-offs. To address these limitations, this letter, drawing inspiration from the human upper limb, introduces a modular reconfigurable supernumerary robotic limb (MRSRL) based on a tightly integrated hardware and control co-design. At the hardware level, we developed a novel joint module featuring a bio-inspired redundant actuation mechanism. The module's partitioned design also ensures functional extensibility. At the control level, we designed a unified control framework leveraging active disturbance rejection control to effectively suppress backlash and maintain robust motion tracking across heterogeneous actuator configurations. Experimental validation demonstrates the system's efficacy, achieving a 4.21-fold increase in torque-to-weight ratio compared to the single-motor module, and reductions of 7.20% and 17.34% in maximum absolute error and integral of absolute error, respectively, against baseline methods. Our work establishes a robust framework for the development of next-generation SRLs capable of adapting to a wide range of tasks.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"11 4","pages":"4849-4856"},"PeriodicalIF":5.3,"publicationDate":"2026-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147362229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Targetless LiDAR-Camera Calibration With Neural Gaussian Splatting 基于神经高斯溅射的无目标激光雷达相机标定
IF 5.3 2区 计算机科学 Q2 ROBOTICS Pub Date : 2026-02-16 DOI: 10.1109/LRA.2026.3665066
Haebeom Jung;Namtae Kim;Jungwoo Kim;Jaesik Park
Accurate LiDAR-camera calibration is crucial for multi-sensor systems. However, traditional methods often rely on physical targets, which are impractical for real-world deployment. Moreover, even carefully calibrated extrinsics can degrade over time due to sensor drift or external disturbances, necessitating periodic recalibration. To address these challenges, we present a Targetless LiDAR–Camera Calibration (TLC-Calib) that jointly optimizes sensor poses with a neural Gaussian–based scene representation. Reliable LiDAR points are frozen as anchor Gaussians to preserve global structure, while auxiliary Gaussians prevent local overfitting under noisy initialization. Our fully differentiable pipeline with photometric and geometric regularization achieves robust and generalizable calibration, consistently outperforming existing targetless methods on the KITTI-360, Waymo, and Fast-LIVO2 datasets. In addition, it yields more consistent Novel View Synthesis results, reflecting improved extrinsic alignment.
精确的激光雷达相机校准是多传感器系统的关键。然而,传统方法通常依赖于物理目标,这对于实际部署是不切实际的。此外,即使是精心校准的外部元件也会由于传感器漂移或外部干扰而随着时间的推移而退化,因此需要定期重新校准。为了解决这些挑战,我们提出了一种无目标激光雷达相机校准(TLC-Calib),它通过基于神经高斯的场景表示来共同优化传感器姿势。将可靠的LiDAR点冻结为锚定高斯点以保持全局结构,而辅助高斯点则防止在噪声初始化下的局部过拟合。我们的完全可微分管道具有光度和几何正则化,实现了鲁棒和可通用的校准,在KITTI-360, Waymo和Fast-LIVO2数据集上始终优于现有的无目标方法。此外,它产生了更一致的新视图合成结果,反映了改进的外部对齐。
{"title":"Targetless LiDAR-Camera Calibration With Neural Gaussian Splatting","authors":"Haebeom Jung;Namtae Kim;Jungwoo Kim;Jaesik Park","doi":"10.1109/LRA.2026.3665066","DOIUrl":"https://doi.org/10.1109/LRA.2026.3665066","url":null,"abstract":"Accurate LiDAR-camera calibration is crucial for multi-sensor systems. However, traditional methods often rely on physical targets, which are impractical for real-world deployment. Moreover, even carefully calibrated extrinsics can degrade over time due to sensor drift or external disturbances, necessitating periodic recalibration. To address these challenges, we present a Targetless LiDAR–Camera Calibration (TLC-Calib) that jointly optimizes sensor poses with a neural Gaussian–based scene representation. Reliable LiDAR points are frozen as anchor Gaussians to preserve global structure, while auxiliary Gaussians prevent local overfitting under noisy initialization. Our fully differentiable pipeline with photometric and geometric regularization achieves robust and generalizable calibration, consistently outperforming existing targetless methods on the <sc>KITTI-360</small>, <sc>Waymo</small>, and <sc>Fast-LIVO2</small> datasets. In addition, it yields more consistent Novel View Synthesis results, reflecting improved extrinsic alignment.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"11 4","pages":"4777-4784"},"PeriodicalIF":5.3,"publicationDate":"2026-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147362452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FLASH: Fibonacci Lattice Spherical Harmonics for Semantic Place Recognition Using LiDAR FLASH:使用激光雷达进行语义位置识别的斐波那契点阵球面谐波
IF 5.3 2区 计算机科学 Q2 ROBOTICS Pub Date : 2026-02-16 DOI: 10.1109/LRA.2026.3665080
Doyeon Kim;Heoncheol Lee
Reliable place recognition is essential to SLAM, as it enables loop closure detection, re-localization, and map merging in long-term operation and multi-robot deployments. While semantic information enables a more human-like understanding of environments, only a few studies have integrated semantic graphs with background appearance cues. To address this gap, we propose FLASH (Fibonacci Lattice Spherical Harmonics), a novel LiDAR-based place recognition (LPR) approach that employs spherical harmonics (SH) to unify semantic, topological, and appearance information into a compact and discriminative descriptor. Specifically, FLASH introduces newly defined complementary spherical functions for the foreground and background, uniformly samples the spherical domain with a Fibonacci lattice, and expands these functions in the SH basis to obtain a rotation-invariant representation. Experimental results on KITTI, Ford Campus, Apollo, and CU-Multi demonstrate that FLASH consistently achieves higher place recognition performance across various scenarios.
可靠的位置识别对于SLAM至关重要,因为它可以在长期操作和多机器人部署中实现环路闭合检测、重新定位和地图合并。虽然语义信息使我们能够更像人类一样理解环境,但只有少数研究将语义图与背景外观线索结合起来。为了解决这一差距,我们提出了FLASH (Fibonacci Lattice Spherical Harmonics),这是一种新的基于激光雷达的位置识别(LPR)方法,它利用球面谐波(SH)将语义、拓扑和外观信息统一到一个紧凑的判别描述符中。具体来说,FLASH为前景和背景引入了新定义的互补球面函数,用斐波那契格对球面域进行均匀采样,并在SH基中展开这些函数以获得旋转不变表示。在KITTI、Ford Campus、Apollo和CU-Multi上的实验结果表明,FLASH在各种场景下都能保持较高的位置识别性能。
{"title":"FLASH: Fibonacci Lattice Spherical Harmonics for Semantic Place Recognition Using LiDAR","authors":"Doyeon Kim;Heoncheol Lee","doi":"10.1109/LRA.2026.3665080","DOIUrl":"https://doi.org/10.1109/LRA.2026.3665080","url":null,"abstract":"Reliable place recognition is essential to SLAM, as it enables loop closure detection, re-localization, and map merging in long-term operation and multi-robot deployments. While semantic information enables a more human-like understanding of environments, only a few studies have integrated semantic graphs with background appearance cues. To address this gap, we propose FLASH (Fibonacci Lattice Spherical Harmonics), a novel LiDAR-based place recognition (LPR) approach that employs spherical harmonics (SH) to unify semantic, topological, and appearance information into a compact and discriminative descriptor. Specifically, FLASH introduces newly defined complementary spherical functions for the foreground and background, uniformly samples the spherical domain with a Fibonacci lattice, and expands these functions in the SH basis to obtain a rotation-invariant representation. Experimental results on KITTI, Ford Campus, Apollo, and CU-Multi demonstrate that FLASH consistently achieves higher place recognition performance across various scenarios.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"11 4","pages":"4521-4528"},"PeriodicalIF":5.3,"publicationDate":"2026-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147299657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
External Disturbances Compensation for LiDAR-Inertial Odometry Under Vibration Conditions on Quadruped Robot 四足机器人振动条件下激光雷达-惯性里程测量的外部干扰补偿
IF 5.3 2区 计算机科学 Q2 ROBOTICS Pub Date : 2026-02-16 DOI: 10.1109/LRA.2026.3665313
Quoc Hung Hoang;Gon-Woo Kim
This letter presents a new tightly coupled LiDAR-inertial odometry (LIO) scheme for a quadruped robot operating under fluctuating conditions. By proposing external disturbance modeling, the profile of unknown disturbances and noise on the IMU is effectively characterized using the time delay estimation (TDE) technique. Simultaneously, the IMU orientation and the TDE-based uncertainty model are jointly updated through an Error-State Kalman Filter (ESKF) using a measurement model derived from LiDAR odometry (LO). Thereafter, the output of the ESKF is employed to mitigate vibration effects in the IMU preintegration factor, thereby enhancing the precision and stability of inertial motion estimation. Furthermore, the refined IMU preintegration pose is leveraged to correct LiDAR distortion and improve the accuracy of LO. As a result, the proposed approach achieves optimal performance, smooth trajectories, and enhanced robustness against uncertainties. Finally, the effectiveness of the proposed LIO is evaluated through real-time experiments on a quadruped robot across different scenarios.
本文提出了一种新的激光雷达-惯性里程计(LIO)紧密耦合方案,用于在波动条件下工作的四足机器人。通过外部干扰建模,利用时延估计(TDE)技术有效地表征了IMU上未知干扰和噪声的分布。同时,利用激光雷达测程(LO)的测量模型,通过误差状态卡尔曼滤波(ESKF)对IMU方向和基于tde的不确定性模型进行联合更新。然后,利用ESKF的输出来减轻IMU预积分因子中的振动影响,从而提高惯性运动估计的精度和稳定性。此外,利用改进的IMU预积分姿态来校正激光雷达畸变,提高LO的精度。结果表明,该方法实现了最优性能、平滑轨迹和增强的抗不确定性鲁棒性。最后,通过四足机器人在不同场景下的实时实验,评估了所提出的LIO的有效性。
{"title":"External Disturbances Compensation for LiDAR-Inertial Odometry Under Vibration Conditions on Quadruped Robot","authors":"Quoc Hung Hoang;Gon-Woo Kim","doi":"10.1109/LRA.2026.3665313","DOIUrl":"https://doi.org/10.1109/LRA.2026.3665313","url":null,"abstract":"This letter presents a new tightly coupled LiDAR-inertial odometry (LIO) scheme for a quadruped robot operating under fluctuating conditions. By proposing external disturbance modeling, the profile of unknown disturbances and noise on the IMU is effectively characterized using the time delay estimation (TDE) technique. Simultaneously, the IMU orientation and the TDE-based uncertainty model are jointly updated through an Error-State Kalman Filter (ESKF) using a measurement model derived from LiDAR odometry (LO). Thereafter, the output of the ESKF is employed to mitigate vibration effects in the IMU preintegration factor, thereby enhancing the precision and stability of inertial motion estimation. Furthermore, the refined IMU preintegration pose is leveraged to correct LiDAR distortion and improve the accuracy of LO. As a result, the proposed approach achieves optimal performance, smooth trajectories, and enhanced robustness against uncertainties. Finally, the effectiveness of the proposed LIO is evaluated through real-time experiments on a quadruped robot across different scenarios.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"11 4","pages":"4713-4720"},"PeriodicalIF":5.3,"publicationDate":"2026-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147362420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Distributed Model Predictive Control for Energy and Comfort Optimization in Large Buildings Using Piecewise Affine Approximation 基于分段仿射逼近的大型建筑节能与舒适优化的分布式模型预测控制
IF 5.3 2区 计算机科学 Q2 ROBOTICS Pub Date : 2026-02-16 DOI: 10.1109/LRA.2026.3665065
Hongyi Li;Jun Xu;Jinfeng Liu
The control of large buildings encounters challenges in computational efficiency due to their size and nonlinear components.To address these issues, this paper proposes a Piecewise Affine (PWA)-based distributed scheme for Model Predictive Control (MPC) that optimizes energy and comfort through PWA-based quadratic programming. We utilize the Alternating Direction Method of Multipliers (ADMM) for effective decomposition and apply the PWA technique to handle the nonlinear components. To solve the resulting large-scale nonconvex problems, the paper introduces a convex ADMM algorithm that transforms the nonconvex problem into a series of smaller convex problems, significantly enhancing computational efficiency. Furthermore, we demonstrate that the convex ADMM algorithm converges to a local optimum of the original problem. A case study involving 36 zones validates the effectiveness of the proposed method. Our proposed method reduces execution time by 86% compared to the centralized version.
大型建筑的控制由于其规模和非线性组成,在计算效率方面面临挑战。为了解决这些问题,本文提出了一种基于分段仿射(PWA)的模型预测控制(MPC)分布式方案,该方案通过基于分段仿射(PWA)的二次规划优化能量和舒适性。我们利用乘法器的交替方向法(ADMM)进行有效分解,并应用PWA技术处理非线性分量。为了解决由此产生的大规模非凸问题,本文引入了一种凸ADMM算法,将非凸问题转化为一系列较小的凸问题,显著提高了计算效率。进一步证明了凸ADMM算法收敛于原问题的局部最优。一个涉及36个层的案例研究验证了所提出方法的有效性。与集中式版本相比,我们提出的方法减少了86%的执行时间。
{"title":"Distributed Model Predictive Control for Energy and Comfort Optimization in Large Buildings Using Piecewise Affine Approximation","authors":"Hongyi Li;Jun Xu;Jinfeng Liu","doi":"10.1109/LRA.2026.3665065","DOIUrl":"https://doi.org/10.1109/LRA.2026.3665065","url":null,"abstract":"The control of large buildings encounters challenges in computational efficiency due to their size and nonlinear components.To address these issues, this paper proposes a Piecewise Affine (PWA)-based distributed scheme for Model Predictive Control (MPC) that optimizes energy and comfort through PWA-based quadratic programming. We utilize the Alternating Direction Method of Multipliers (ADMM) for effective decomposition and apply the PWA technique to handle the nonlinear components. To solve the resulting large-scale nonconvex problems, the paper introduces a convex ADMM algorithm that transforms the nonconvex problem into a series of smaller convex problems, significantly enhancing computational efficiency. Furthermore, we demonstrate that the convex ADMM algorithm converges to a local optimum of the original problem. A case study involving 36 zones validates the effectiveness of the proposed method. Our proposed method reduces execution time by 86% compared to the centralized version.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"11 4","pages":"4147-4154"},"PeriodicalIF":5.3,"publicationDate":"2026-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146223837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dual Variable-Radius Drum for Transmission Ratio Linearization and Stroke Enhancement in Twisted String Actuators 双变半径滚筒用于扭柱执行器传动比线性化和行程增强
IF 5.3 2区 计算机科学 Q2 ROBOTICS Pub Date : 2026-02-16 DOI: 10.1109/LRA.2026.3665071
Juyeong Seo;JaeHyung Jang;Seungjoon Baek;Jee-Hwan Ryu
Twisted string actuators (TSAs) have emerged as promising actuators in robotics owing to their compliant, lightweight nature, and high transmission ratios. However, practical utilization of TSAs remains limited due to their restricted stroke length and intrinsically nonlinear transmission ratio (TR). While variable-radius pulleys (VRPs) can mitigate these issues, they exhibit poor scalability as their volume grows disproportionately with the required stroke or compensation TR range. This paper proposes the dual variable-radius drum TSA (DVRD-TSA) and mathematically proves that its architecture offers superior scalability and compactness compared to existing mechanisms as performance demands increase. For a baseline comparison to validate our model, we fabricated a prototype and compared it to a conventional TSA under constant load conditions with identical initial length. The experiment confirmed that our DVRD-TSA delivers a substantially larger linear stroke (210.3 mm, 72.5%) compared to conventional TSA (85.3 mm, 29.4%), while maintaining a comparable peak torque (25.82 Nmm vs. 24.21 Nmm), and successfully tracks its target near-constant transmission ratio (1.475 rad/mm) with low error. This work presents a compact, passive, and scalable solution that overcomes two major drawbacks of TSAs, nonlinear TR and limited stroke, thereby making them a more compelling option for robotic applications.
扭弦致动器(TSAs)由于其柔顺性、轻量化和高传动比而成为机器人技术中有前途的致动器。然而,由于其固有非线性传动比(TR)和行程长度的限制,tsa的实际应用仍然受到限制。虽然变半径滑轮(vrp)可以缓解这些问题,但它们的可扩展性很差,因为它们的体积与所需行程或补偿TR范围不成比例地增长。本文提出了双变半径鼓TSA (DVRD-TSA),并从数学上证明了随着性能需求的增加,其结构与现有机制相比具有更好的可扩展性和紧凑性。为了验证我们的模型的基线比较,我们制作了一个原型,并将其与具有相同初始长度的恒定负载条件下的传统TSA进行了比较。实验证实,与传统的TSA (85.3 mm, 29.4%)相比,我们的DVRD-TSA提供了更大的线性行程(210.3 mm, 72.5%),同时保持了相当的峰值扭矩(25.82 Nmm vs. 24.21 Nmm),并成功跟踪其目标近恒定传动比(1.475 rad/mm),误差很小。这项工作提出了一种紧凑,被动和可扩展的解决方案,克服了tsa的两个主要缺点,非线性TR和有限行程,从而使它们成为机器人应用中更有吸引力的选择。
{"title":"Dual Variable-Radius Drum for Transmission Ratio Linearization and Stroke Enhancement in Twisted String Actuators","authors":"Juyeong Seo;JaeHyung Jang;Seungjoon Baek;Jee-Hwan Ryu","doi":"10.1109/LRA.2026.3665071","DOIUrl":"https://doi.org/10.1109/LRA.2026.3665071","url":null,"abstract":"Twisted string actuators (TSAs) have emerged as promising actuators in robotics owing to their compliant, lightweight nature, and high transmission ratios. However, practical utilization of TSAs remains limited due to their restricted stroke length and intrinsically nonlinear transmission ratio (TR). While variable-radius pulleys (VRPs) can mitigate these issues, they exhibit poor scalability as their volume grows disproportionately with the required stroke or compensation TR range. This paper proposes the dual variable-radius drum TSA (DVRD-TSA) and mathematically proves that its architecture offers superior scalability and compactness compared to existing mechanisms as performance demands increase. For a baseline comparison to validate our model, we fabricated a prototype and compared it to a conventional TSA under constant load conditions with identical initial length. The experiment confirmed that our DVRD-TSA delivers a substantially larger linear stroke (210.3 mm, 72.5%) compared to conventional TSA (85.3 mm, 29.4%), while maintaining a comparable peak torque (25.82 Nmm vs. 24.21 Nmm), and successfully tracks its target near-constant transmission ratio (1.475 rad/mm) with low error. This work presents a compact, passive, and scalable solution that overcomes two major drawbacks of TSAs, nonlinear TR and limited stroke, thereby making them a more compelling option for robotic applications.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"11 4","pages":"4529-4536"},"PeriodicalIF":5.3,"publicationDate":"2026-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147299715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SurgRAW: Multi-Agent Workflow With Chain of Thought Reasoning for Robotic Surgical Video Analysis SurgRAW:用于机器人手术视频分析的多智能体工作流与思维链推理
IF 5.3 2区 计算机科学 Q2 ROBOTICS Pub Date : 2026-02-16 DOI: 10.1109/LRA.2026.3665443
Chang Han Low;Ziyue Wang;Tianyi Zhang;Zhu Zhuo;Zhitao Zeng;Evangelos B. Mazomenos;Yueming Jin
Robotic-assisted surgery (RAS) is central to modern surgery, driving the need for intelligent systems with accurate scene understanding. Most existing surgical AI methods rely on isolated, task-specific models, leading to fragmented pipelines with limited interpretability and no unified understanding of RAS scene. Vision-Language Models (VLMs) offer strong zero-shot reasoning, but struggle with hallucinations, domain gaps and weak task-interdependency modeling. To address the lack of unified data for RAS scene understanding, we introduce SurgCoTBench, the first reasoning-focused benchmark in RAS, covering 14256 QA pairs with frame-level annotations across five major surgical tasks. Building on SurgCoTBench, we propose SurgRAW, a clinically aligned Chain-of-Thought (CoT) driven agentic workflow for zero-shot multi-task reasoning in surgery. SurgRAW employs a hierarchical reasoning workflow where an orchestrator divides surgical scene understanding into two reasoning streams and directs specialized agents to generate task-level reasoning, while higher-level agents capture workflow interdependencies or ground output clinically. Specifically, we propose a panel discussion mechanism to ensure task-specific agents collaborate synergistically and leverage on task interdependencies. Similarly, we incorporate a retrieval-augmented generation module to enrich agents with surgical knowledge and alleviate domain gaps in general VLMs. We design task-specific CoT prompts grounded in surgical domain to ensure clinically aligned reasoning, reduce hallucinations and enhance interpretability. Extensive experiments show that SurgRAW surpasses mainstream VLMs and agentic systems and outperforms a supervised model by 14.61% accuracy.
机器人辅助手术(RAS)是现代外科手术的核心,推动了对具有准确场景理解的智能系统的需求。大多数现有的外科人工智能方法依赖于孤立的、特定于任务的模型,导致管道碎片化,可解释性有限,对RAS场景没有统一的理解。视觉语言模型(VLMs)提供了强大的零射击推理,但与幻觉、领域差距和弱任务相互依赖模型作斗争。为了解决RAS场景理解缺乏统一数据的问题,我们引入了SurgCoTBench,这是RAS中第一个以推理为重点的基准,涵盖了14256对QA对,并在五个主要手术任务中使用帧级注释。在SurgCoTBench的基础上,我们提出了SurgRAW,这是一种临床一致的思维链(CoT)驱动的代理工作流,用于手术中的零shot多任务推理。SurgRAW采用分层推理工作流,其中编排者将手术场景理解分为两个推理流,并指导专门的代理生成任务级推理,而更高级的代理捕获工作流的相互依赖性或临床地面输出。具体来说,我们提出了一个小组讨论机制,以确保特定任务的代理协同协作并利用任务的相互依赖性。同样,我们加入了一个检索增强生成模块,以丰富具有外科知识的智能体,并缓解一般vlm中的领域空白。我们设计了基于外科领域的特定任务CoT提示,以确保临床一致的推理,减少幻觉并提高可解释性。大量的实验表明,SurgRAW超越了主流的VLMs和代理系统,并且比监督模型的准确率高出14.61%。
{"title":"SurgRAW: Multi-Agent Workflow With Chain of Thought Reasoning for Robotic Surgical Video Analysis","authors":"Chang Han Low;Ziyue Wang;Tianyi Zhang;Zhu Zhuo;Zhitao Zeng;Evangelos B. Mazomenos;Yueming Jin","doi":"10.1109/LRA.2026.3665443","DOIUrl":"https://doi.org/10.1109/LRA.2026.3665443","url":null,"abstract":"Robotic-assisted surgery (RAS) is central to modern surgery, driving the need for intelligent systems with accurate scene understanding. Most existing surgical AI methods rely on isolated, task-specific models, leading to fragmented pipelines with limited interpretability and no unified understanding of RAS scene. Vision-Language Models (VLMs) offer strong zero-shot reasoning, but struggle with hallucinations, domain gaps and weak task-interdependency modeling. To address the lack of unified data for RAS scene understanding, we introduce <bold>SurgCoTBench</b>, the first reasoning-focused benchmark in RAS, covering 14256 QA pairs with frame-level annotations across five major surgical tasks. Building on SurgCoTBench, we propose <bold>SurgRAW</b>, a clinically aligned Chain-of-Thought (CoT) driven agentic workflow for zero-shot multi-task reasoning in surgery. SurgRAW employs a hierarchical reasoning workflow where an orchestrator divides surgical scene understanding into two reasoning streams and directs specialized agents to generate task-level reasoning, while higher-level agents capture workflow interdependencies or ground output clinically. Specifically, we propose a panel discussion mechanism to ensure task-specific agents collaborate synergistically and leverage on task interdependencies. Similarly, we incorporate a retrieval-augmented generation module to enrich agents with surgical knowledge and alleviate domain gaps in general VLMs. We design task-specific CoT prompts grounded in surgical domain to ensure clinically aligned reasoning, reduce hallucinations and enhance interpretability. Extensive experiments show that SurgRAW surpasses mainstream VLMs and agentic systems and outperforms a supervised model by 14.61% accuracy.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"11 4","pages":"4857-4864"},"PeriodicalIF":5.3,"publicationDate":"2026-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147362334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Augmented Reality for RObots (ARRO): Pointing Visuomotor Policies Towards Visual Robustness 机器人增强现实(ARRO):指向视觉运动策略的视觉鲁棒性
IF 5.3 2区 计算机科学 Q2 ROBOTICS Pub Date : 2026-02-16 DOI: 10.1109/LRA.2026.3665444
Reihaneh Mirjalili;Tobias Jülg;Florian Walter;Wolfram Burgard
Visuomotor policies trained on human expert demonstrations have recently shown strong performance across a wide range of robotic manipulation tasks. However, these policies remain highly sensitive to domain shifts stemming from background or robot embodiment changes, which limits their generalization capabilities. In this paper, we present ARRO, a novel visual representation that leverages zero-shot open-vocabulary segmentation and object detection models to efficiently mask out task-irrelevant regions of the scene in real time without requiring additional training, modeling of the setup, or camera calibration. By filtering visual distractors and overlaying virtual guides during both training and inference, ARRO improves robustness to scene variations and reduces the need for additional data collection. We extensively evaluate ARRO with Diffusion Policy on a range of tabletop manipulation tasks in both simulation and real-world environments, and further demonstrate its compatibility and effectiveness with generalist robot policies, such as Octo, OpenVLA and $pi _{0}$. Across all settings in our evaluation, ARRO yields consistent performance gains, allows for selective masking to choose between different objects, and shows robustness even to challenging segmentation conditions. Videos showcasing our results are available at: augmented-reality-for-robots.github.io
经过人类专家演示训练的视觉运动策略最近在广泛的机器人操作任务中显示出强大的性能。然而,这些策略对背景或机器人实施体变化引起的领域变化仍然高度敏感,这限制了它们的泛化能力。在本文中,我们提出了ARRO,这是一种新颖的视觉表示,它利用零镜头开放词汇分割和目标检测模型来实时有效地掩盖场景中与任务无关的区域,而无需额外的训练、设置建模或相机校准。通过在训练和推理过程中过滤视觉干扰物和覆盖虚拟指南,ARRO提高了对场景变化的鲁棒性,减少了对额外数据收集的需求。我们在模拟和现实环境中广泛评估了带有扩散策略的ARRO在一系列桌面操作任务上的性能,并进一步证明了它与通用机器人策略(如Octo, OpenVLA和$pi _{0}$)的兼容性和有效性。在我们评估的所有设置中,ARRO产生一致的性能增益,允许在不同对象之间选择选择性屏蔽,并且即使在具有挑战性的分割条件下也显示出鲁棒性。展示我们研究结果的视频可在以下网站获得:augmented-reality-for-robots.github.io
{"title":"Augmented Reality for RObots (ARRO): Pointing Visuomotor Policies Towards Visual Robustness","authors":"Reihaneh Mirjalili;Tobias Jülg;Florian Walter;Wolfram Burgard","doi":"10.1109/LRA.2026.3665444","DOIUrl":"https://doi.org/10.1109/LRA.2026.3665444","url":null,"abstract":"Visuomotor policies trained on human expert demonstrations have recently shown strong performance across a wide range of robotic manipulation tasks. However, these policies remain highly sensitive to domain shifts stemming from background or robot embodiment changes, which limits their generalization capabilities. In this paper, we present ARRO, a novel visual representation that leverages zero-shot open-vocabulary segmentation and object detection models to efficiently mask out task-irrelevant regions of the scene in real time without requiring additional training, modeling of the setup, or camera calibration. By filtering visual distractors and overlaying virtual guides during both training and inference, ARRO improves robustness to scene variations and reduces the need for additional data collection. We extensively evaluate ARRO with Diffusion Policy on a range of tabletop manipulation tasks in both simulation and real-world environments, and further demonstrate its compatibility and effectiveness with generalist robot policies, such as Octo, OpenVLA and <inline-formula><tex-math>$pi _{0}$</tex-math></inline-formula>. Across all settings in our evaluation, ARRO yields consistent performance gains, allows for selective masking to choose between different objects, and shows robustness even to challenging segmentation conditions. Videos showcasing our results are available at: augmented-reality-for-robots.github.io","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"11 4","pages":"4785-4792"},"PeriodicalIF":5.3,"publicationDate":"2026-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147362453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
In-Hand Manipulation of the Connector for Cable Installation 电缆安装中连接器的手动操作
IF 5.3 2区 计算机科学 Q2 ROBOTICS Pub Date : 2026-02-16 DOI: 10.1109/LRA.2026.3665078
Zhongkai Gu;Chufan Zhang;Xin Jiang
The automation level in the assembly lines of leading automobile manufacturers can reach 80% -90%; however, the installation of automotive wire harnesses predominantly requires manual assembly. This is due to the challenges robots face when manipulating deformable parts. To mitigate this problem, we design a parallel gripper with embedded in-hand manipulation functions. The design is optimized for tackling the situations involving hanging cables which are common in assembly lines. It enables stable grasping of the cable connector, making precise plugging with traditional force control possible. The proposed method is validated by both simulation and experiments.
领先汽车制造企业装配线自动化水平可达80% -90%;然而,汽车线束的安装主要需要手工组装。这是由于机器人在操纵可变形部件时面临的挑战。为了缓解这一问题,我们设计了一个具有嵌入式手持操作功能的并行抓取器。该设计针对装配线中常见的悬挂电缆的情况进行了优化。它可以稳定地抓住电缆连接器,使传统力控制的精确堵塞成为可能。仿真和实验验证了该方法的有效性。
{"title":"In-Hand Manipulation of the Connector for Cable Installation","authors":"Zhongkai Gu;Chufan Zhang;Xin Jiang","doi":"10.1109/LRA.2026.3665078","DOIUrl":"https://doi.org/10.1109/LRA.2026.3665078","url":null,"abstract":"The automation level in the assembly lines of leading automobile manufacturers can reach 80% -90%; however, the installation of automotive wire harnesses predominantly requires manual assembly. This is due to the challenges robots face when manipulating deformable parts. To mitigate this problem, we design a parallel gripper with embedded in-hand manipulation functions. The design is optimized for tackling the situations involving hanging cables which are common in assembly lines. It enables stable grasping of the cable connector, making precise plugging with traditional force control possible. The proposed method is validated by both simulation and experiments.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"11 4","pages":"4593-4600"},"PeriodicalIF":5.3,"publicationDate":"2026-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147299725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PALM: Enhanced Generalizability for Local Visuomotor Policies via Perception Alignment PALM:通过感知对齐增强局部视觉运动策略的可泛化性
IF 5.3 2区 计算机科学 Q2 ROBOTICS Pub Date : 2026-02-13 DOI: 10.1109/LRA.2026.3664620
Ruiyu Wang;Zheyu Zhuang;Danica Kragic;Florian T. Pokorny
Generalizing beyond the training domain in image-based behavior cloning remains challenging. Existing methods address individual axes of generalization, workspace shifts, viewpoint changes, and cross-embodiment transfer, yet they are typically developed in isolation and often rely on complex pipelines. We introduce PALM (Perception Alignment for Local Manipulation), which leverages the invariance of local action distributions between out-of-distribution (OOD) and demonstrated domains to address these OOD shifts concurrently, without additional input modalities, model changes, or data collection. PALM modularizes the manipulation policy into coarse global components and a local policy for fine-grained actions. We reduce the discrepancy between in-domain and OOD inputs at the local policy level by enforcing local visual focus and consistent proprioceptive representation, allowing the policy to retrieve invariant local actions under OOD conditions. Experiments show that PALM limits OOD performance drops to 8% in simulation and 24% in the real world, compared to 45% and 77% for baselines.
在基于图像的行为克隆中,泛化超越训练领域仍然是一个挑战。现有的方法处理泛化、工作空间转换、视点变化和跨体现转移的各个轴,但它们通常是孤立地开发的,并且往往依赖于复杂的管道。我们引入了PALM(局部操作的感知对齐),它利用分布外(OOD)和演示域之间局部动作分布的不变性来同时解决这些OOD变化,而无需额外的输入方式、模型更改或数据收集。PALM将操作策略模块化为粗全局组件和用于细粒度操作的本地策略。我们通过强化局部视觉焦点和一致的本体感受表征,在局部策略层面上减少域内和本体感知输入之间的差异,允许策略在本体感知条件下检索不变的局部动作。实验表明,PALM将OOD性能在模拟中降低到8%,在现实世界中降低到24%,而在基线中分别为45%和77%。
{"title":"PALM: Enhanced Generalizability for Local Visuomotor Policies via Perception Alignment","authors":"Ruiyu Wang;Zheyu Zhuang;Danica Kragic;Florian T. Pokorny","doi":"10.1109/LRA.2026.3664620","DOIUrl":"https://doi.org/10.1109/LRA.2026.3664620","url":null,"abstract":"Generalizing beyond the training domain in image-based behavior cloning remains challenging. Existing methods address individual axes of generalization, workspace shifts, viewpoint changes, and cross-embodiment transfer, yet they are typically developed in isolation and often rely on complex pipelines. We introduce PALM (Perception Alignment for Local Manipulation), which leverages the invariance of local action distributions between out-of-distribution (OOD) and demonstrated domains to address these OOD shifts concurrently, without additional input modalities, model changes, or data collection. PALM modularizes the manipulation policy into coarse global components and a local policy for fine-grained actions. We reduce the discrepancy between in-domain and OOD inputs at the local policy level by enforcing local visual focus and consistent proprioceptive representation, allowing the policy to retrieve invariant local actions under OOD conditions. Experiments show that PALM limits OOD performance drops to 8% in simulation and 24% in the real world, compared to 45% and 77% for baselines.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"11 4","pages":"4865-4872"},"PeriodicalIF":5.3,"publicationDate":"2026-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11395611","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147362551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Robotics and Automation Letters
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1