IEEE Robotics and Automation Letters最新文献

英文中文

Whole-Body Teleoperation for Mobile Manipulation at Zero Added Cost

IF 4.6 2区计算机科学 Q2 ROBOTICS

IEEE Robotics and Automation Letters

Pub Date : 2025-02-10 DOI: 10.1109/LRA.2025.3540582

Daniel Honerkamp;Harsh Mahesheka;Jan Ole von Hartz;Tim Welschehold;Abhinav Valada

Demonstration data plays a key role in learning complex behaviors and training robotic foundation models. While effective control interfaces exist for static manipulators, data collection remains cumbersome and time intensive for mobile manipulators due to their large number of degrees of freedom. While specialized hardware, avatars, or motion tracking can enable whole-body control, these approaches are either expensive, robot-specific, or suffer from the embodiment mismatch between robot and human demonstrator. In this work, we present MoMa-Teleop, a novel teleoperation method that infers end-effector motions from existing interfaces and delegates the base motions to a previously developed reinforcement learning agent, leaving the operator to focus fully on the task-relevant end-effector motions. This enables whole-body teleoperation of mobile manipulators with no additional hardware or setup costs via standard interfaces such as joysticks or hand guidance. Moreover, the operator is not bound to a tracked workspace and can move freely with the robot over spatially extended tasks. We demonstrate that our approach results in a significant reduction in task completion time across a variety of robots and tasks. As the generated data covers diverse whole-body motions without embodiment mismatch, it enables efficient imitation learning. By focusing on task-specific end-effector motions, our approach learns skills that transfer to unseen settings, such as new obstacles or changed object positions, from as little as five demonstrations.

{"title":"Whole-Body Teleoperation for Mobile Manipulation at Zero Added Cost","authors":"Daniel Honerkamp;Harsh Mahesheka;Jan Ole von Hartz;Tim Welschehold;Abhinav Valada","doi":"10.1109/LRA.2025.3540582","DOIUrl":"https://doi.org/10.1109/LRA.2025.3540582","url":null,"abstract":"Demonstration data plays a key role in learning complex behaviors and training robotic foundation models. While effective control interfaces exist for static manipulators, data collection remains cumbersome and time intensive for mobile manipulators due to their large number of degrees of freedom. While specialized hardware, avatars, or motion tracking can enable whole-body control, these approaches are either expensive, robot-specific, or suffer from the embodiment mismatch between robot and human demonstrator. In this work, we present MoMa-Teleop, a novel teleoperation method that infers end-effector motions from existing interfaces and delegates the base motions to a previously developed reinforcement learning agent, leaving the operator to focus fully on the task-relevant end-effector motions. This enables whole-body teleoperation of mobile manipulators with no additional hardware or setup costs via standard interfaces such as joysticks or hand guidance. Moreover, the operator is not bound to a tracked workspace and can move freely with the robot over spatially extended tasks. We demonstrate that our approach results in a significant reduction in task completion time across a variety of robots and tasks. As the generated data covers diverse whole-body motions without embodiment mismatch, it enables efficient imitation learning. By focusing on task-specific end-effector motions, our approach learns skills that transfer to unseen settings, such as new obstacles or changed object positions, from as little as five demonstrations.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 4","pages":"3198-3205"},"PeriodicalIF":4.6,"publicationDate":"2025-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143465688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Hybrid Iterative Linear Quadratic Estimation: Optimal Estimation for Hybrid Systems

IF 4.6 2区计算机科学 Q2 ROBOTICS

IEEE Robotics and Automation Letters

Pub Date : 2025-02-10 DOI: 10.1109/LRA.2025.3540387

J. Joe Payne;James Zhu;Nathan J. Kong;Aaron M. Johnson

In this letter we present Hybrid iterative Linear Quadratic Estimation (HiLQE), an optimization based offline state estimation algorithm for hybrid dynamical systems. We utilize the saltation matrix, a first order approximation of the variational update through an event driven hybrid transition, to calculate gradient information through hybrid events in the backward pass of an iterative linear quadratic optimization over state estimates. This enables accurate computation of the value function approximation at each timestep. Additionally, the forward pass in the iterative algorithm is augmented with hybrid dynamics in the rollout. A reference extension method is used to account for varying impact times when comparing states for the feedback gain in noise calculation. The proposed method is demonstrated on an ASLIP hopper system with position measurements. In comparison to the Salted Kalman Filter (SKF), the algorithm presented here achieves a maximum of 63.55% reduction in estimation error magnitude over all state dimensions near impact events.

引用次数: 0

Real-Time Excavation Trajectory Modulation for Slip and Rollover Prevention

IF 4.6 2区计算机科学 Q2 ROBOTICS

IEEE Robotics and Automation Letters

Pub Date : 2025-02-10 DOI: 10.1109/LRA.2025.3540389

ChangU Kim;Bukun Son;Minhyeong Lee;Hyelim Choi;Seokhyun Hong;Minsung Kang;Jihyun Moon;Dongmok Kim;Dongjun Lee

We propose a novel real-time excavation trajectory modulation framework on a slope for an autonomous excavator with a low-level digital kinematic control as common for hydraulic industrial excavators. Excavation on a slope is challenging because of a higher risk of slips and rollovers. To deal with this, we propose a real-time excavation trajectory modulation framework based on slope tangential/normal force ratio

$mu$

and zero moment point

$xi$

. The slip and rollover prevention conditions are incorporated in a single linear inequality using the same fractional structure in

$mu$

and

$xi$

with the common denominator. However, due to the adoption of the low-level digital kinematic control, this prevention requires the prediction of the excavation force at the next timestamp, and, for this, we develop a data-driven excavation force difference prediction model utilizing a deep learning architecture, Transformer. The remaining error of this prediction is then addressed by using the technique of robust optimization with box uncertainty of the developed excavation force difference model. Our proposed framework is validated experimentally with our customized scaled-down excavator.

{"title":"Real-Time Excavation Trajectory Modulation for Slip and Rollover Prevention","authors":"ChangU Kim;Bukun Son;Minhyeong Lee;Hyelim Choi;Seokhyun Hong;Minsung Kang;Jihyun Moon;Dongmok Kim;Dongjun Lee","doi":"10.1109/LRA.2025.3540389","DOIUrl":"https://doi.org/10.1109/LRA.2025.3540389","url":null,"abstract":"We propose a novel real-time excavation trajectory modulation framework on a slope for an autonomous excavator with a low-level digital kinematic control as common for hydraulic industrial excavators. Excavation on a slope is challenging because of a higher risk of slips and rollovers. To deal with this, we propose a real-time excavation trajectory modulation framework based on slope tangential/normal force ratio <inline-formula><tex-math>$mu$</tex-math></inline-formula> and zero moment point <inline-formula><tex-math>$xi$</tex-math></inline-formula>. The slip and rollover prevention conditions are incorporated in a single linear inequality using the same fractional structure in <inline-formula><tex-math>$mu$</tex-math></inline-formula> and <inline-formula><tex-math>$xi$</tex-math></inline-formula> with the common denominator. However, due to the adoption of the low-level digital kinematic control, this prevention requires the prediction of the excavation force at the next timestamp, and, for this, we develop a data-driven excavation force difference prediction model utilizing a deep learning architecture, Transformer. The remaining error of this prediction is then addressed by using the technique of robust optimization with box uncertainty of the developed excavation force difference model. Our proposed framework is validated experimentally with our customized scaled-down excavator.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 4","pages":"3094-3101"},"PeriodicalIF":4.6,"publicationDate":"2025-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10878475","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143446169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Predictive Energy Stability Margin: Prediction of Heavy Machine Overturning Considering Rotation and Translation

IF 4.6 2区计算机科学 Q2 ROBOTICS

IEEE Robotics and Automation Letters

Pub Date : 2025-02-10 DOI: 10.1109/LRA.2025.3540382

Mitsuhiro Kamezaki;Yuya Kokudo;Yusuke Uehara;Shunya Itano;Tatsuhito Iida;Shigeki Sugano

Fatal accidents caused by the overturning of heavy machines still happen, so the prediction and prevention of overturns are urgently needed. Indicators to evaluate overturn, such as the energy stability margin (ESM), have been proposed but are limited to a non-slip ground surface. Even if ESM is above zero, the machine may overturn due to additional manipulator operation or hitting an obstacle while sliding down a slope. This study thus proposes a predictive energy stability margin,

${bm{p}}$

-ESM, that focuses on kinetic energy in the translational and rotational directions for overturn prediction. Rotational kinetic energy

${{{bm{E}}}_{bm{R}}}$

accelerates overturning, and the translational kinetic energy

${{{bm{E}}}_{bm{T}}}$

in the slope direction is converted to

${{{bm{E}}}_{bm{R}}}$

. Both are calculated from the mass and the position and acceleration of the center of gravity (COG) for each part of the machine. ESM

${bm{U}}$

is defined as the difference between the height of COG just before overturn and the current height of the COG. Thus,

${bm{p}}$

-ESM is defined as

${bm{U}}$

minus the sum of

${{{bm{E}}}_{bm{R}}}$

and

${{{bm{E}}}_{bm{T}}}$

. We also developed an operation support system to limit the manipulator operation by using

${bm{p}}$

-ESM. The results of experiments using a hydraulically driven scale model (1/14) with different combinations of operations, loading weights, and ground surfaces confirmed that

${bm{p}}$

-ESM can predict overturns early and accurately, which conventional ESM cannot do. We also found that the support system using

${bm{p}}$

-ESM can prevent inappropriate operations and avoid overturns.

{"title":"Predictive Energy Stability Margin: Prediction of Heavy Machine Overturning Considering Rotation and Translation","authors":"Mitsuhiro Kamezaki;Yuya Kokudo;Yusuke Uehara;Shunya Itano;Tatsuhito Iida;Shigeki Sugano","doi":"10.1109/LRA.2025.3540382","DOIUrl":"https://doi.org/10.1109/LRA.2025.3540382","url":null,"abstract":"Fatal accidents caused by the overturning of heavy machines still happen, so the prediction and prevention of overturns are urgently needed. Indicators to evaluate overturn, such as the energy stability margin (ESM), have been proposed but are limited to a non-slip ground surface. Even if ESM is above zero, the machine may overturn due to additional manipulator operation or hitting an obstacle while sliding down a slope. This study thus proposes a predictive energy stability margin, <inline-formula><tex-math>${bm{p}}$</tex-math></inline-formula>-ESM, that focuses on kinetic energy in the translational and rotational directions for overturn prediction. Rotational kinetic energy <inline-formula><tex-math>${{{bm{E}}}_{bm{R}}}$</tex-math></inline-formula> accelerates overturning, and the translational kinetic energy <inline-formula><tex-math>${{{bm{E}}}_{bm{T}}}$</tex-math></inline-formula> in the slope direction is converted to <inline-formula><tex-math>${{{bm{E}}}_{bm{R}}}$</tex-math></inline-formula>. Both are calculated from the mass and the position and acceleration of the center of gravity (COG) for each part of the machine. ESM <inline-formula><tex-math>${bm{U}}$</tex-math></inline-formula> is defined as the difference between the height of COG just before overturn and the current height of the COG. Thus, <inline-formula><tex-math>${bm{p}}$</tex-math></inline-formula>-ESM is defined as <inline-formula><tex-math>${bm{U}}$</tex-math></inline-formula> minus the sum of <inline-formula><tex-math>${{{bm{E}}}_{bm{R}}}$</tex-math></inline-formula> and <inline-formula><tex-math>${{{bm{E}}}_{bm{T}}}$</tex-math></inline-formula>. We also developed an operation support system to limit the manipulator operation by using <inline-formula><tex-math>${bm{p}}$</tex-math></inline-formula>-ESM. The results of experiments using a hydraulically driven scale model (1/14) with different combinations of operations, loading weights, and ground surfaces confirmed that <inline-formula><tex-math>${bm{p}}$</tex-math></inline-formula>-ESM can predict overturns early and accurately, which conventional ESM cannot do. We also found that the support system using <inline-formula><tex-math>${bm{p}}$</tex-math></inline-formula>-ESM can prevent inappropriate operations and avoid overturns.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 4","pages":"3286-3293"},"PeriodicalIF":4.6,"publicationDate":"2025-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143480778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Dynamics as Prompts: In-Context Learning for Sim-to-Real System Identifications

IF 4.6 2区计算机科学 Q2 ROBOTICS

IEEE Robotics and Automation Letters

Pub Date : 2025-02-10 DOI: 10.1109/LRA.2025.3540391

Xilun Zhang;Shiqi Liu;Peide Huang;William Jongwon Han;Yiqi Lyu;Mengdi Xu;Ding Zhao

Sim-to-real transfer remains a significant challenge in robotics due to the discrepancies between simulated and real-world dynamics. Traditional methods like Domain Randomization often fail to capture fine-grained dynamics, limiting their effectiveness for precise control tasks. In this work, we propose a novel approach that dynamically adjusts simulation environment parameters online using in-context learning. By leveraging past interaction histories as context, our method adapts the simulation environment dynamics to match real-world dynamics without requiring gradient updates, resulting in faster and more accurate alignment between simulated and real-world performance. We validate our approach across two tasks: object scooping and table air hockey. In the sim-to-sim evaluations, our method significantly outperforms the baselines on environment parameter estimation by 80% and 42% in the object scooping and table air hockey setups, respectively. Furthermore, our method achieves at least 70% success rate in sim-to-real transfer on object scooping across three different objects. By incorporating historical interaction data, our approach delivers efficient and smooth system identification, advancing the deployment of robots in dynamic real-world scenarios.

由于模拟动态与真实动态之间存在差异，从模拟到真实的转换仍然是机器人技术中的一项重大挑战。领域随机化等传统方法往往无法捕捉细粒度动态，从而限制了其在精确控制任务中的有效性。在这项工作中，我们提出了一种新方法，利用情境学习在线动态调整模拟环境参数。通过利用过去的交互历史作为上下文，我们的方法无需梯度更新即可调整仿真环境动态以匹配真实世界动态，从而更快、更准确地调整仿真性能和真实世界性能。我们在两个任务中验证了我们的方法：舀物体和桌上空气曲棍球。在模拟到模拟的评估中，我们的方法在环境参数估计方面明显优于基线方法，在舀物体和桌上曲棍球设置中分别优于基线方法 80% 和 42%。此外，我们的方法在三种不同物体的舀物过程中实现了至少 70% 的成功率。通过结合历史交互数据，我们的方法提供了高效平滑的系统识别，推动了机器人在动态真实世界场景中的部署。

{"title":"Dynamics as Prompts: In-Context Learning for Sim-to-Real System Identifications","authors":"Xilun Zhang;Shiqi Liu;Peide Huang;William Jongwon Han;Yiqi Lyu;Mengdi Xu;Ding Zhao","doi":"10.1109/LRA.2025.3540391","DOIUrl":"https://doi.org/10.1109/LRA.2025.3540391","url":null,"abstract":"Sim-to-real transfer remains a significant challenge in robotics due to the discrepancies between simulated and real-world dynamics. Traditional methods like Domain Randomization often fail to capture fine-grained dynamics, limiting their effectiveness for precise control tasks. In this work, we propose a novel approach that dynamically adjusts simulation environment parameters online using in-context learning. By leveraging past interaction histories as context, our method adapts the simulation environment dynamics to match real-world dynamics without requiring gradient updates, resulting in faster and more accurate alignment between simulated and real-world performance. We validate our approach across two tasks: object scooping and table air hockey. In the sim-to-sim evaluations, our method significantly outperforms the baselines on environment parameter estimation by 80% and 42% in the object scooping and table air hockey setups, respectively. Furthermore, our method achieves at least 70% success rate in sim-to-real transfer on object scooping across three different objects. By incorporating historical interaction data, our approach delivers efficient and smooth system identification, advancing the deployment of robots in dynamic real-world scenarios.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 4","pages":"3190-3197"},"PeriodicalIF":4.6,"publicationDate":"2025-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143465859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Portable Autonomous Underwater Vehicle With Multi-Thruster Propulsion: Design, Development, and Vision-Based Tracking Control

IF 4.6 2区计算机科学 Q2 ROBOTICS

IEEE Robotics and Automation Letters

Pub Date : 2025-02-10 DOI: 10.1109/LRA.2025.3540380

Zeyu Sha;Xiaorui Wang;Mingyang Yang;Hong Lei;Feitian Zhang

Autonomous underwater vehicles (AUVs) play a pivotal role in the exploration of marine resources. With the increasing complexity of underwater tasks, conventional torpedo-shaped AUVs exhibit significant limitations, particularly in complex and dynamic environments, due to restricted lateral translation and in-place rotation capabilities. To address these challenges, this letter introduces OpenAUV, a novel AUV design featuring a redundant multi-thruster configuration that enables full omnidirectional motion. The enhanced maneuverability significantly improves performance in underwater tracking tasks. Furthermore, OpenAUV's portability, cost-effectiveness, and open-source framework make it highly suitable for a wide range of scientific research applications. Comprehensive kinematics and dynamics models of OpenAUV are developed with hydrodynamic coefficients identified through computational fluid dynamics simulations. A vision-based tracking Autonomous underwater vehicles (AUVs) play a pivotal role in the exploration of marine resources. With the increasing complexity of underwater tasks, conventional torpedo-shaped AUVs exhibit significant limitations, particularly in complex and dynamic environments, due to restricted lateral translation and in-place rotation capabilities. To address these challenges, this letter introduces OpenAUV, a novel AUV design featuring a redundant multi-thruster configuration that enables full omnidirectional motion. The enhanced maneuverability significantly improves performance in underwater tracking tasks. Furthermore, OpenAUV's portability, cost-effectiveness, and open-source framework make it highly suitable for a wide range of scientific research applications. Comprehensive kinematics and dynamics models of OpenAUV are developed with hydrodynamic coefficients identified through computational fluid dynamics simulations. A vision-based tracking control system is designed to facilitate accurate maneuvers. Extensive experimental tests are conducted in a laboratory pool, the results of which confirms the effectiveness of the proposed design.

{"title":"A Portable Autonomous Underwater Vehicle With Multi-Thruster Propulsion: Design, Development, and Vision-Based Tracking Control","authors":"Zeyu Sha;Xiaorui Wang;Mingyang Yang;Hong Lei;Feitian Zhang","doi":"10.1109/LRA.2025.3540380","DOIUrl":"https://doi.org/10.1109/LRA.2025.3540380","url":null,"abstract":"Autonomous underwater vehicles (AUVs) play a pivotal role in the exploration of marine resources. With the increasing complexity of underwater tasks, conventional torpedo-shaped AUVs exhibit significant limitations, particularly in complex and dynamic environments, due to restricted lateral translation and in-place rotation capabilities. To address these challenges, this letter introduces OpenAUV, a novel AUV design featuring a redundant multi-thruster configuration that enables full omnidirectional motion. The enhanced maneuverability significantly improves performance in underwater tracking tasks. Furthermore, OpenAUV's portability, cost-effectiveness, and open-source framework make it highly suitable for a wide range of scientific research applications. Comprehensive kinematics and dynamics models of OpenAUV are developed with hydrodynamic coefficients identified through computational fluid dynamics simulations. A vision-based tracking Autonomous underwater vehicles (AUVs) play a pivotal role in the exploration of marine resources. With the increasing complexity of underwater tasks, conventional torpedo-shaped AUVs exhibit significant limitations, particularly in complex and dynamic environments, due to restricted lateral translation and in-place rotation capabilities. To address these challenges, this letter introduces OpenAUV, a novel AUV design featuring a redundant multi-thruster configuration that enables full omnidirectional motion. The enhanced maneuverability significantly improves performance in underwater tracking tasks. Furthermore, OpenAUV's portability, cost-effectiveness, and open-source framework make it highly suitable for a wide range of scientific research applications. Comprehensive kinematics and dynamics models of OpenAUV are developed with hydrodynamic coefficients identified through computational fluid dynamics simulations. A vision-based tracking control system is designed to facilitate accurate maneuvers. Extensive experimental tests are conducted in a laboratory pool, the results of which confirms the effectiveness of the proposed design.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 4","pages":"3046-3053"},"PeriodicalIF":4.6,"publicationDate":"2025-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143446317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

BEVCon: Advancing Bird's Eye View Perception With Contrastive Learning

IF 4.6 2区计算机科学 Q2 ROBOTICS

IEEE Robotics and Automation Letters

Pub Date : 2025-02-10 DOI: 10.1109/LRA.2025.3540386

Ziyang Leng;Jiawei Yang;Zhicheng Ren;Bolei Zhou

We present BEVCon, a simple yet effective contrastive learning framework designed to improve Bird's Eye View (BEV) perception in autonomous driving. BEV perception offers a top-down-view representation of the surrounding environment, making it crucial for 3D object detection, segmentation, and trajectory prediction tasks. While prior work has primarily focused on enhancing BEV encoders and task-specific heads, we address the underexplored potential of representation learning in BEV models. BEVCon introduces two contrastive learning modules: an instance feature contrast module for refining BEV features and a perspective view contrast module that enhances the image backbone. The dense contrastive learning designed on top of detection losses leads to improved feature representations across both the BEV encoder and the backbone. Extensive experiments on the nuScenes dataset demonstrate that BEVCon achieves consistent performance gains, achieving up to +2.4% mAP improvement over state-of-the-art baselines. Our results highlight the critical role of representation learning in BEV perception and offer a complementary avenue to conventional task-specific optimizations.

{"title":"BEVCon: Advancing Bird's Eye View Perception With Contrastive Learning","authors":"Ziyang Leng;Jiawei Yang;Zhicheng Ren;Bolei Zhou","doi":"10.1109/LRA.2025.3540386","DOIUrl":"https://doi.org/10.1109/LRA.2025.3540386","url":null,"abstract":"We present BEVCon, a simple yet effective contrastive learning framework designed to improve Bird's Eye View (BEV) perception in autonomous driving. BEV perception offers a top-down-view representation of the surrounding environment, making it crucial for 3D object detection, segmentation, and trajectory prediction tasks. While prior work has primarily focused on enhancing BEV encoders and task-specific heads, we address the underexplored potential of representation learning in BEV models. BEVCon introduces two contrastive learning modules: an instance feature contrast module for refining BEV features and a perspective view contrast module that enhances the image backbone. The dense contrastive learning designed on top of detection losses leads to improved feature representations across both the BEV encoder and the backbone. Extensive experiments on the nuScenes dataset demonstrate that BEVCon achieves consistent performance gains, achieving up to +2.4% mAP improvement over state-of-the-art baselines. Our results highlight the critical role of representation learning in BEV perception and offer a complementary avenue to conventional task-specific optimizations.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 4","pages":"3158-3165"},"PeriodicalIF":4.6,"publicationDate":"2025-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143455244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Signage-Aware Exploration in Open World Using Venue Maps

IF 4.6 2区计算机科学 Q2 ROBOTICS

IEEE Robotics and Automation Letters

Pub Date : 2025-02-10 DOI: 10.1109/LRA.2025.3540390

Chang Chen;Liang Lu;Lei Yang;Yinqiang Zhang;Yizhou Chen;Ruixing Jia;Jia Pan

Current exploration methods struggle to search for shops or restaurants in unknown open-world environments due to the lack of prior knowledge. Humans can leverage venue maps that offer valuable scene priors to aid exploration planning by correlating the signage in the scene with landmark names on the map. However, arbitrary shapes and styles of the texts on signage, along with multi-view inconsistencies, pose significant challenges for robots to recognize them accurately. Additionally, discrepancies between real-world environments and venue maps hinder the integration of text-level information into the planners. This paper introduces a novel signage-aware exploration system to address these challenges, enabling the robots to utilize venue maps effectively. We propose a signage understanding method that accurately detects and recognizes the texts on signage using a diffusion-based text instance retrieval method combined with a 2D-to-3D semantic fusion strategy. Furthermore, we design a venue map-guided exploration-exploitation planner that balances exploration in unknown regions using directional heuristics derived from venue maps and exploitation to get close and adjust orientation for better recognition. Experiments in large-scale shopping malls demonstrate our method's superior signage recognition performance and search efficiency, surpassing state-of-the-art text spotting methods and traditional exploration approaches.

{"title":"Signage-Aware Exploration in Open World Using Venue Maps","authors":"Chang Chen;Liang Lu;Lei Yang;Yinqiang Zhang;Yizhou Chen;Ruixing Jia;Jia Pan","doi":"10.1109/LRA.2025.3540390","DOIUrl":"https://doi.org/10.1109/LRA.2025.3540390","url":null,"abstract":"Current exploration methods struggle to search for shops or restaurants in unknown open-world environments due to the lack of prior knowledge. Humans can leverage venue maps that offer valuable scene priors to aid exploration planning by correlating the signage in the scene with landmark names on the map. However, arbitrary shapes and styles of the texts on signage, along with multi-view inconsistencies, pose significant challenges for robots to recognize them accurately. Additionally, discrepancies between real-world environments and venue maps hinder the integration of text-level information into the planners. This paper introduces a novel signage-aware exploration system to address these challenges, enabling the robots to utilize venue maps effectively. We propose a signage understanding method that accurately detects and recognizes the texts on signage using a diffusion-based text instance retrieval method combined with a 2D-to-3D semantic fusion strategy. Furthermore, we design a venue map-guided exploration-exploitation planner that balances exploration in unknown regions using directional heuristics derived from venue maps and exploitation to get close and adjust orientation for better recognition. Experiments in large-scale shopping malls demonstrate our method's superior signage recognition performance and search efficiency, surpassing state-of-the-art text spotting methods and traditional exploration approaches.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 4","pages":"3414-3421"},"PeriodicalIF":4.6,"publicationDate":"2025-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143521394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Gaze-Guided Robotic Vascular Ultrasound Leveraging Human Intention Estimation

IF 4.6 2区计算机科学 Q2 ROBOTICS

IEEE Robotics and Automation Letters

Pub Date : 2025-02-07 DOI: 10.1109/LRA.2025.3539546

Yuan Bi;Yang Su;Nassir Navab;Zhongliang Jiang

Medical ultrasound has been widely used to examine vascular structure in modern clinical practice. However, traditional ultrasound examination often faces challenges related to inter- and intra-operator variation. The robotic ultrasound system (RUSS) appears as a potential solution for such challenges because of its superiority in stability and reproducibility. Given the complex anatomy of human vasculature, multiple vessels often appear in ultrasound images, or a single vessel bifurcates into branches, complicating the examination process. To tackle this challenge, this work presents a gaze-guided RUSS for vascular applications. A gaze tracker captures the eye movements of the operator. The extracted gaze signal guides the RUSS to follow the correct vessel when it bifurcates. Additionally, a gaze-guided segmentation network is proposed to enhance segmentation robustness by exploiting gaze information. However, gaze signals are often noisy, requiring interpretation to accurately discern the operator's true intentions. To this end, this study proposes a stabilization module to process raw gaze data. The inferred attention heatmap is utilized as a region proposal to aid segmentation and serve as a trigger signal when the operator needs to adjust the scanning target, such as when a bifurcation appears. To ensure appropriate contact between the probe and surface during scanning, an automatic ultrasound confidence-based orientation correction method is developed. In experiments, we demonstrated the efficiency of the proposed gaze-guided segmentation pipeline by comparing it with other methods. Besides, the performance of the proposed gaze-guided RUSS was also validated as a whole on a realistic arm phantom with an uneven surface.

{"title":"Gaze-Guided Robotic Vascular Ultrasound Leveraging Human Intention Estimation","authors":"Yuan Bi;Yang Su;Nassir Navab;Zhongliang Jiang","doi":"10.1109/LRA.2025.3539546","DOIUrl":"https://doi.org/10.1109/LRA.2025.3539546","url":null,"abstract":"Medical ultrasound has been widely used to examine vascular structure in modern clinical practice. However, traditional ultrasound examination often faces challenges related to inter- and intra-operator variation. The robotic ultrasound system (RUSS) appears as a potential solution for such challenges because of its superiority in stability and reproducibility. Given the complex anatomy of human vasculature, multiple vessels often appear in ultrasound images, or a single vessel bifurcates into branches, complicating the examination process. To tackle this challenge, this work presents a gaze-guided RUSS for vascular applications. A gaze tracker captures the eye movements of the operator. The extracted gaze signal guides the RUSS to follow the correct vessel when it bifurcates. Additionally, a gaze-guided segmentation network is proposed to enhance segmentation robustness by exploiting gaze information. However, gaze signals are often noisy, requiring interpretation to accurately discern the operator's true intentions. To this end, this study proposes a stabilization module to process raw gaze data. The inferred attention heatmap is utilized as a region proposal to aid segmentation and serve as a trigger signal when the operator needs to adjust the scanning target, such as when a bifurcation appears. To ensure appropriate contact between the probe and surface during scanning, an automatic ultrasound confidence-based orientation correction method is developed. In experiments, we demonstrated the efficiency of the proposed gaze-guided segmentation pipeline by comparing it with other methods. Besides, the performance of the proposed gaze-guided RUSS was also validated as a whole on a realistic arm phantom with an uneven surface.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 4","pages":"3078-3085"},"PeriodicalIF":4.6,"publicationDate":"2025-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143446308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DeRi-IGP: Learning to Manipulate Rigid Objects Using Deformable Linear Objects via Iterative Grasp-Pull

IF 4.6 2区计算机科学 Q2 ROBOTICS

IEEE Robotics and Automation Letters

Pub Date : 2025-02-07 DOI: 10.1109/LRA.2025.3539910

Zixing Wang;Ahmed H. Qureshi

Robotic manipulation of rigid objects via deformable linear objects (DLO) such as ropes is an emerging field of research with applications in various rigid object transportation tasks. A few methods that exist in this field suffer from limited robot action and operational space, poor generalization ability, and expensive model-based development. To address these challenges, we propose a universally applicable moving primitive called Iterative Grasp-Pull (IGP). We also introduce a novel vision-based neural policy that learns to parameterize the IGP primitive to manipulate DLO and transport their attached rigid objects to the desired goal locations. Additionally, our decentralized algorithm design allows collaboration among multiple agents to manipulate rigid objects using DLO. We evaluated the effectiveness of our approach in both simulated and real-world environments for a variety of soft-rigid body manipulation tasks. In the real world, we also demonstrate the effectiveness of our decentralized approach through human-robot collaborative transportation of rigid objects to given goal locations. We also showcase the large operational space of IGP primitive by solving distant object acquisition tasks. Lastly, we compared our approach with several model-based and learning-based baseline methods. The results indicate that our method surpasses other approaches by a significant margin.

{"title":"DeRi-IGP: Learning to Manipulate Rigid Objects Using Deformable Linear Objects via Iterative Grasp-Pull","authors":"Zixing Wang;Ahmed H. Qureshi","doi":"10.1109/LRA.2025.3539910","DOIUrl":"https://doi.org/10.1109/LRA.2025.3539910","url":null,"abstract":"Robotic manipulation of rigid objects via deformable linear objects (DLO) such as ropes is an emerging field of research with applications in various rigid object transportation tasks. A few methods that exist in this field suffer from limited robot action and operational space, poor generalization ability, and expensive model-based development. To address these challenges, we propose a universally applicable moving primitive called Iterative Grasp-Pull (IGP). We also introduce a novel vision-based neural policy that learns to parameterize the IGP primitive to manipulate DLO and transport their attached rigid objects to the desired goal locations. Additionally, our decentralized algorithm design allows collaboration among multiple agents to manipulate rigid objects using DLO. We evaluated the effectiveness of our approach in both simulated and real-world environments for a variety of soft-rigid body manipulation tasks. In the real world, we also demonstrate the effectiveness of our decentralized approach through human-robot collaborative transportation of rigid objects to given goal locations. We also showcase the large operational space of IGP primitive by solving distant object acquisition tasks. Lastly, we compared our approach with several model-based and learning-based baseline methods. The results indicate that our method surpasses other approaches by a significant margin.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 4","pages":"3166-3173"},"PeriodicalIF":4.6,"publicationDate":"2025-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143455242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

IEEE Robotics and Automation Letters

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀