首页 > 最新文献

IEEE Robotics and Automation Letters最新文献

英文 中文
MonoTher-Depth: Enhancing Thermal Depth Estimation via Confidence-Aware Distillation
IF 4.6 2区 计算机科学 Q2 ROBOTICS Pub Date : 2025-01-30 DOI: 10.1109/LRA.2025.3536855
Xingxing Zuo;Nikhil Ranganathan;Connor Lee;Georgia Gkioxari;Soon-Jo Chung
Monocular depth estimation (MDE) from thermal images is a crucial technology for robotic systems operating in challenging conditions such as fog, smoke, and low light. The limited availability of labeled thermal data constrains the generalization capabilities of thermal MDE models compared to foundational RGB MDE models, which benefit from datasets of millions of images across diverse scenarios. To address this challenge, we introduce a novel pipeline that enhances thermal MDE through knowledge distillation from a versatile RGB MDE model. Our approach features a confidence-aware distillation method that utilizes the predicted confidence of the RGB MDE to selectively strengthen the thermal MDE model, capitalizing on the strengths of the RGB model while mitigating its weaknesses. Our method significantly improves the accuracy of the thermal MDE, independent of the availability of labeled depth supervision, and greatly expands its applicability to new scenarios. In our experiments on new scenarios without labeled depth, the proposed confidence-aware distillation method reduces the absolute relative error of thermal MDE by 22.88% compared to the baseline without distillation.
{"title":"MonoTher-Depth: Enhancing Thermal Depth Estimation via Confidence-Aware Distillation","authors":"Xingxing Zuo;Nikhil Ranganathan;Connor Lee;Georgia Gkioxari;Soon-Jo Chung","doi":"10.1109/LRA.2025.3536855","DOIUrl":"https://doi.org/10.1109/LRA.2025.3536855","url":null,"abstract":"Monocular depth estimation (MDE) from thermal images is a crucial technology for robotic systems operating in challenging conditions such as fog, smoke, and low light. The limited availability of labeled thermal data constrains the generalization capabilities of thermal MDE models compared to foundational RGB MDE models, which benefit from datasets of millions of images across diverse scenarios. To address this challenge, we introduce a novel pipeline that enhances thermal MDE through knowledge distillation from a versatile RGB MDE model. Our approach features a confidence-aware distillation method that utilizes the predicted confidence of the RGB MDE to selectively strengthen the thermal MDE model, capitalizing on the strengths of the RGB model while mitigating its weaknesses. Our method significantly improves the accuracy of the thermal MDE, independent of the availability of labeled depth supervision, and greatly expands its applicability to new scenarios. In our experiments on new scenarios without labeled depth, the proposed confidence-aware distillation method reduces the absolute relative error of thermal MDE by 22.88% compared to the baseline without distillation.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 3","pages":"2830-2837"},"PeriodicalIF":4.6,"publicationDate":"2025-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143396363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SIMPNet: Spatial-Informed Motion Planning Network
IF 4.6 2区 计算机科学 Q2 ROBOTICS Pub Date : 2025-01-30 DOI: 10.1109/LRA.2025.3537317
Davood Soleymanzadeh;Xiao Liang;Minghui Zheng
Current robotic manipulators require fast and efficient motion-planning algorithms to operate in cluttered environments. State-of-the-art sampling-based motion planners struggle to scale to high-dimensional configuration spaces and are inefficient in complex environments. This inefficiency arises because these planners utilize either uniform or hand-crafted sampling heuristics within the configuration space. To address these challenges, we present the Spatial-informed Motion Planning Network (SIMPNet). SIMPNet consists of a stochastic graph neural network (GNN)-based sampling heuristic for informed sampling within the configuration space. The sampling heuristic of SIMPNet encodes the workspace embedding into the configuration space through a cross-attention mechanism. It encodes the manipulator's kinematic structure into a graph, which is used to generate informed samples within the framework of sampling-based motion planning algorithms. We have evaluated the performance of SIMPNet using a UR5e robotic manipulator operating within simple and complex workspaces, comparing it against baseline state-of-the-art motion planners. The evaluation results show the effectiveness and advantages of the proposed planner compared to the baseline planners.
{"title":"SIMPNet: Spatial-Informed Motion Planning Network","authors":"Davood Soleymanzadeh;Xiao Liang;Minghui Zheng","doi":"10.1109/LRA.2025.3537317","DOIUrl":"https://doi.org/10.1109/LRA.2025.3537317","url":null,"abstract":"Current robotic manipulators require fast and efficient motion-planning algorithms to operate in cluttered environments. State-of-the-art sampling-based motion planners struggle to scale to high-dimensional configuration spaces and are inefficient in complex environments. This inefficiency arises because these planners utilize either uniform or hand-crafted sampling heuristics within the configuration space. To address these challenges, we present the Spatial-informed Motion Planning Network (SIMPNet). SIMPNet consists of a stochastic graph neural network (GNN)-based sampling heuristic for informed sampling within the configuration space. The sampling heuristic of SIMPNet encodes the workspace embedding into the configuration space through a cross-attention mechanism. It encodes the manipulator's kinematic structure into a graph, which is used to generate informed samples within the framework of sampling-based motion planning algorithms. We have evaluated the performance of SIMPNet using a UR5e robotic manipulator operating within simple and complex workspaces, comparing it against baseline state-of-the-art motion planners. The evaluation results show the effectiveness and advantages of the proposed planner compared to the baseline planners.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 3","pages":"2870-2877"},"PeriodicalIF":4.6,"publicationDate":"2025-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143404030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptive Non-Linear Centroidal MPC With Stability Guarantees for Robust Locomotion of Legged Robots
IF 4.6 2区 计算机科学 Q2 ROBOTICS Pub Date : 2025-01-30 DOI: 10.1109/LRA.2025.3536296
Mohamed Elobaid;Giulio Turrisi;Lorenzo Rapetti;Giulio Romualdi;Stefano Dafarra;Tomohiro Kawakami;Tomohiro Chaki;Takahide Yoshiike;Claudio Semini;Daniele Pucci
Nonlinear model predictive locomotion controllers based on the reduced centroidal dynamics are nowadays ubiquitous in legged robots. These schemes, even if they assume an inherent simplification of the robot's dynamics, were shown to endow robots with a step-adjustment capability in reaction to small pushes, and in the case of uncertain parameters - as unknown payloads - they were shown to provide some “practical”, albeit limited, robustness. In this work, we provide rigorous certificates of their closed-loop stability via reformulating the online centroidal MPC controller. This is achieved thanks to a systematic procedure inspired by the machinery of adaptive control, together with ideas coming from Control Lyapunov Functions. Our reformulation, in addition, provides robustness for a class of unmeasured constant disturbances. To demonstrate the generality of our approach, we validated our formulation on a new generation of humanoid robots - the $text{56.7 kg}$ ergoCub, as well as on the commercially available $text{21 kg}$ quadruped robot Aliengo.
{"title":"Adaptive Non-Linear Centroidal MPC With Stability Guarantees for Robust Locomotion of Legged Robots","authors":"Mohamed Elobaid;Giulio Turrisi;Lorenzo Rapetti;Giulio Romualdi;Stefano Dafarra;Tomohiro Kawakami;Tomohiro Chaki;Takahide Yoshiike;Claudio Semini;Daniele Pucci","doi":"10.1109/LRA.2025.3536296","DOIUrl":"https://doi.org/10.1109/LRA.2025.3536296","url":null,"abstract":"Nonlinear model predictive locomotion controllers based on the reduced centroidal dynamics are nowadays ubiquitous in legged robots. These schemes, even if they assume an inherent simplification of the robot's dynamics, were shown to endow robots with a step-adjustment capability in reaction to small pushes, and in the case of uncertain parameters - as unknown payloads - they were shown to provide some “practical”, albeit limited, robustness. In this work, we provide rigorous certificates of their closed-loop stability via reformulating the online centroidal MPC controller. This is achieved thanks to a systematic procedure inspired by the machinery of adaptive control, together with ideas coming from Control Lyapunov Functions. Our reformulation, in addition, provides robustness for a class of unmeasured constant disturbances. To demonstrate the generality of our approach, we validated our formulation on a new generation of humanoid robots - the <inline-formula><tex-math>$text{56.7 kg}$</tex-math></inline-formula> ergoCub, as well as on the commercially available <inline-formula><tex-math>$text{21 kg}$</tex-math></inline-formula> quadruped robot Aliengo.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 3","pages":"2806-2813"},"PeriodicalIF":4.6,"publicationDate":"2025-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143379509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficiently Kinematic-Constraint-Coupled State Estimation for Integrated Aerial Platforms in GPS-Denied Environments
IF 4.6 2区 计算机科学 Q2 ROBOTICS Pub Date : 2025-01-30 DOI: 10.1109/LRA.2025.3536292
Ganghua Lai;Yushu Yu;Fuchun Sun;Jing Qi;Vincezo Lippiello
Small-scale autonomous aerial vehicles (AAVs) are widely used in various fields. However, their underactuated design limits their ability to perform complex tasks that require physical interaction with environments. The fully-actuated Integrated Aerial Platforms (IAPs), where multiple AAVs are connected to a central platform via passive joints, offer a promising solution. However, achieving accurate state estimation for IAPs in GPS-denied environments remains a significant hurdle. In this letter, we introduce a centralized state estimation framework for IAPs with a fusion of odometry and kinematics, using only onboard cameras and inertial measurement units (IMUs). We develop a forward-kinematic-based formulation to fully leverage localization information from kinematic constraints. An online calibration method for kinematic parameters is proposed to enhance state estimation accuracy with forward kinematics. Additionally, we perform an observability analysis, theoretically proving that these kinematic parameters are fully observable under conditions of fully excited motion. Dataset and real-world experiments on a three-agent IAP prototype confirm that our method improves localization accuracy and reduces drift compared to the baseline.
{"title":"Efficiently Kinematic-Constraint-Coupled State Estimation for Integrated Aerial Platforms in GPS-Denied Environments","authors":"Ganghua Lai;Yushu Yu;Fuchun Sun;Jing Qi;Vincezo Lippiello","doi":"10.1109/LRA.2025.3536292","DOIUrl":"https://doi.org/10.1109/LRA.2025.3536292","url":null,"abstract":"Small-scale autonomous aerial vehicles (AAVs) are widely used in various fields. However, their underactuated design limits their ability to perform complex tasks that require physical interaction with environments. The fully-actuated Integrated Aerial Platforms (IAPs), where multiple AAVs are connected to a central platform via passive joints, offer a promising solution. However, achieving accurate state estimation for IAPs in GPS-denied environments remains a significant hurdle. In this letter, we introduce a centralized state estimation framework for IAPs with a fusion of odometry and kinematics, using only onboard cameras and inertial measurement units (IMUs). We develop a forward-kinematic-based formulation to fully leverage localization information from kinematic constraints. An online calibration method for kinematic parameters is proposed to enhance state estimation accuracy with forward kinematics. Additionally, we perform an observability analysis, theoretically proving that these kinematic parameters are fully observable under conditions of fully excited motion. Dataset and real-world experiments on a three-agent IAP prototype confirm that our method improves localization accuracy and reduces drift compared to the baseline.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 3","pages":"2838-2845"},"PeriodicalIF":4.6,"publicationDate":"2025-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143396338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Human-Inspired Robotic Assembly for Multiple Peg-In/Out-Hole Tasks in On-Orbit Refueling
IF 4.6 2区 计算机科学 Q2 ROBOTICS Pub Date : 2025-01-30 DOI: 10.1109/LRA.2025.3536298
Rui Zhang;Qiang Zhang;Xiaodong Zhou
On-orbit refueling technology requires using robots with multiple peg-in-hole and peg-out-hole capabilities. However, complex contact conditions can cause jamming, thus posing significant challenges to automated refueling. To address this shortcoming, this letter proposes a human-inspired multiple peg-in/out-hole assembly method. The proposed method integrates a variable admittance force controller based on a non-diagonal stiffness matrix and a strategy for handling multiple peg-in/out-hole operations. In addition, by coupling the position and orientation stiffness, a robot's adaptability in dynamic assembly environments is significantly enhanced. Moreover, the proposed method enables autonomous posture adjustment based on real-time force sensor data and allows a robot to retry operations in case of jamming, thus eliminating the need for complex motion trajectory planning. The results of the ground refueling experiments show that the proposed method can successfully complete the multiple peg-in/out-hole tasks and effectively resist external interference. The proposed method could be of valuable reference significance for on-orbit refueling tasks.
{"title":"Human-Inspired Robotic Assembly for Multiple Peg-In/Out-Hole Tasks in On-Orbit Refueling","authors":"Rui Zhang;Qiang Zhang;Xiaodong Zhou","doi":"10.1109/LRA.2025.3536298","DOIUrl":"https://doi.org/10.1109/LRA.2025.3536298","url":null,"abstract":"On-orbit refueling technology requires using robots with multiple peg-in-hole and peg-out-hole capabilities. However, complex contact conditions can cause jamming, thus posing significant challenges to automated refueling. To address this shortcoming, this letter proposes a human-inspired multiple peg-in/out-hole assembly method. The proposed method integrates a variable admittance force controller based on a non-diagonal stiffness matrix and a strategy for handling multiple peg-in/out-hole operations. In addition, by coupling the position and orientation stiffness, a robot's adaptability in dynamic assembly environments is significantly enhanced. Moreover, the proposed method enables autonomous posture adjustment based on real-time force sensor data and allows a robot to retry operations in case of jamming, thus eliminating the need for complex motion trajectory planning. The results of the ground refueling experiments show that the proposed method can successfully complete the multiple peg-in/out-hole tasks and effectively resist external interference. The proposed method could be of valuable reference significance for on-orbit refueling tasks.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 3","pages":"2670-2677"},"PeriodicalIF":4.6,"publicationDate":"2025-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143361300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TLS-SLAM: Gaussian Splatting SLAM Tailored for Large-Scale Scenes
IF 4.6 2区 计算机科学 Q2 ROBOTICS Pub Date : 2025-01-30 DOI: 10.1109/LRA.2025.3536876
Sicong Cheng;Songyang He;Fuqing Duan;Ning An
3D Gaussian splatting (3DGS) has shown promise for fast and high-quality mapping in simultaneous localization and mapping (SLAM), but faces convergence challenges in large-scale scenes across three key aspects. Firstly, the excessive Gaussian points in 3DGS models for large-scale scenes make the search space of the model optimization process more complex, leading to local optima. Secondly, trajectory drift caused by long-term localization in large-scale scenes displaces Gaussian point cloud positions. Thirdly, dynamic objects commonly found in large-scale scenes produce numerous noise Gaussian points that disrupt gradient backpropagation. We propose TLS-SLAM to address these convergence challenges. To ensure large-scale scene map optimization attains the global optimal, we use scene memory features to encode and adaptively build sub-maps, dividing the optimization space into subspaces, which reduces the optimization complexity. To reduce trajectory drift, we use a pose update method guided by semantic information, ensuring accurate Gaussian point cloud creation. To mitigate the impact of dynamic objects, we utilize 3D Gaussian distributions to accurately extract, encode, and model dynamic objects from the scene, thereby avoiding the generation of noise points. Experiments on four datasets show that our method achieves strong performance in tracking, mapping, and rendering accuracy.
{"title":"TLS-SLAM: Gaussian Splatting SLAM Tailored for Large-Scale Scenes","authors":"Sicong Cheng;Songyang He;Fuqing Duan;Ning An","doi":"10.1109/LRA.2025.3536876","DOIUrl":"https://doi.org/10.1109/LRA.2025.3536876","url":null,"abstract":"3D Gaussian splatting (3DGS) has shown promise for fast and high-quality mapping in simultaneous localization and mapping (SLAM), but faces convergence challenges in large-scale scenes across three key aspects. Firstly, the excessive Gaussian points in 3DGS models for large-scale scenes make the search space of the model optimization process more complex, leading to local optima. Secondly, trajectory drift caused by long-term localization in large-scale scenes displaces Gaussian point cloud positions. Thirdly, dynamic objects commonly found in large-scale scenes produce numerous noise Gaussian points that disrupt gradient backpropagation. We propose TLS-SLAM to address these convergence challenges. To ensure large-scale scene map optimization attains the global optimal, we use scene memory features to encode and adaptively build sub-maps, dividing the optimization space into subspaces, which reduces the optimization complexity. To reduce trajectory drift, we use a pose update method guided by semantic information, ensuring accurate Gaussian point cloud creation. To mitigate the impact of dynamic objects, we utilize 3D Gaussian distributions to accurately extract, encode, and model dynamic objects from the scene, thereby avoiding the generation of noise points. Experiments on four datasets show that our method achieves strong performance in tracking, mapping, and rendering accuracy.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 3","pages":"2814-2821"},"PeriodicalIF":4.6,"publicationDate":"2025-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143379534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LXLv2: Enhanced LiDAR Excluded Lean 3D Object Detection with Fusion of 4D Radar and Camera
IF 4.6 2区 计算机科学 Q2 ROBOTICS Pub Date : 2025-01-30 DOI: 10.1109/LRA.2025.3536840
Weiyi Xiong;Zean Zou;Qiuchi Zhao;Fengchun He;Bing Zhu
As the previous state-of-the-art 4D radar-camera fusion-based 3D object detection method, LXL utilizes the predicted image depth distribution maps and radar 3D occupancy grids to assist the sampling-based image view transformation. However, the depth prediction lacks accuracy and consistency, and the concatenation-based fusion in LXL impedes the model robustness. In this work, we propose LXLv2, where modifications are made to overcome the limitations and improve the performance. Specifically, considering the position error in radar measurements, we devise a one-to-many depth supervision strategy via radar points, where the radar cross section (RCS) value is further exploited to adjust the supervision area for object-level depth consistency. Additionally, a channel and spatial attention-based fusion module named CSAFusion is introduced to improve feature adaptiveness. Experimental results on the View-of-Delft and TJ4DRadSet datasets show that the proposed LXLv2 can outperform LXL in detection accuracy, inference speed and robustness, demonstrating the effectiveness of the model.
{"title":"LXLv2: Enhanced LiDAR Excluded Lean 3D Object Detection with Fusion of 4D Radar and Camera","authors":"Weiyi Xiong;Zean Zou;Qiuchi Zhao;Fengchun He;Bing Zhu","doi":"10.1109/LRA.2025.3536840","DOIUrl":"https://doi.org/10.1109/LRA.2025.3536840","url":null,"abstract":"As the previous state-of-the-art 4D radar-camera fusion-based 3D object detection method, LXL utilizes the predicted image depth distribution maps and radar 3D occupancy grids to assist the sampling-based image view transformation. However, the depth prediction lacks accuracy and consistency, and the concatenation-based fusion in LXL impedes the model robustness. In this work, we propose LXLv2, where modifications are made to overcome the limitations and improve the performance. Specifically, considering the position error in radar measurements, we devise a one-to-many depth supervision strategy via radar points, where the radar cross section (RCS) value is further exploited to adjust the supervision area for object-level depth consistency. Additionally, a channel and spatial attention-based fusion module named CSAFusion is introduced to improve feature adaptiveness. Experimental results on the View-of-Delft and TJ4DRadSet datasets show that the proposed LXLv2 can outperform LXL in detection accuracy, inference speed and robustness, demonstrating the effectiveness of the model.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 3","pages":"2862-2869"},"PeriodicalIF":4.6,"publicationDate":"2025-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143404031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CAFuser: Condition-Aware Multimodal Fusion for Robust Semantic Perception of Driving Scenes
IF 4.6 2区 计算机科学 Q2 ROBOTICS Pub Date : 2025-01-30 DOI: 10.1109/LRA.2025.3536218
Tim Brödermann;Christos Sakaridis;Yuqian Fu;Luc Van Gool
Leveraging multiple sensors is crucial for robust semantic perception in autonomous driving, as each sensor type has complementary strengths and weaknesses. However, existing sensor fusion methods often treat sensors uniformly across all conditions, leading to suboptimal performance. By contrast, we propose a novel, condition-aware multimodal fusion approach for robust semantic perception of driving scenes. Our method, CAFuser, uses an RGB camera input to classify environmental conditions and generate a Condition Token that guides the fusion of multiple sensor modalities. We further newly introduce modality-specific feature adapters to align diverse sensor inputs into a shared latent space, enabling efficient integration with a single and shared pre-trained backbone. By dynamically adapting sensor fusion based on the actual condition, our model significantly improves robustness and accuracy, especially in adverse-condition scenarios. CAFuser ranks first on the public MUSES benchmarks, achieving 59.7 PQ for multimodal panoptic and 78.2 mIoU for semantic segmentation, and also sets the new state of the art on DeLiVER.
{"title":"CAFuser: Condition-Aware Multimodal Fusion for Robust Semantic Perception of Driving Scenes","authors":"Tim Brödermann;Christos Sakaridis;Yuqian Fu;Luc Van Gool","doi":"10.1109/LRA.2025.3536218","DOIUrl":"https://doi.org/10.1109/LRA.2025.3536218","url":null,"abstract":"Leveraging multiple sensors is crucial for robust semantic perception in autonomous driving, as each sensor type has complementary strengths and weaknesses. However, existing sensor fusion methods often treat sensors uniformly across all conditions, leading to suboptimal performance. By contrast, we propose a novel, <italic>condition-aware multimodal</i> fusion approach for robust semantic perception of driving scenes. Our method, CAFuser, uses an RGB camera input to classify environmental conditions and generate a <italic>Condition Token</i> that guides the fusion of multiple sensor modalities. We further newly introduce modality-specific feature adapters to align diverse sensor inputs into a shared latent space, enabling efficient integration with a single and shared pre-trained backbone. By dynamically adapting sensor fusion based on the actual condition, our model significantly improves robustness and accuracy, especially in adverse-condition scenarios. CAFuser ranks first on the public MUSES benchmarks, achieving 59.7 PQ for multimodal panoptic and 78.2 mIoU for semantic segmentation, and also sets the new state of the art on DeLiVER.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 4","pages":"3134-3141"},"PeriodicalIF":4.6,"publicationDate":"2025-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143455247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IMU-Aided Geographic Pose Estimation Method for UAVs Using Satellite Imageries Matching
IF 4.6 2区 计算机科学 Q2 ROBOTICS Pub Date : 2025-01-29 DOI: 10.1109/LRA.2025.3536285
Yongfei Li
Estimating the geographic position of Unmanned Aerial Vehicles (UAVs) in the absence of Global Navigation Satellite Systems (GNSS) is crucial for enhancing flight safety. This paper presents a vision-based geolocalization method that matches images captured by onboard cameras with satellite imageries, utilizing attitude information from Inertial Measurement Units (IMUs). We introduce a two-point solution for the Perspective-n-Point (PnP) problem, specifically when the camera's pitch and roll angles are known. This approach is shown to be highly robust against image alignment errors and significantly improves position estimation accuracy. Experiments with both synthetic and real flight data confirm the effectiveness and reliability of the proposed method in practical applications.
{"title":"IMU-Aided Geographic Pose Estimation Method for UAVs Using Satellite Imageries Matching","authors":"Yongfei Li","doi":"10.1109/LRA.2025.3536285","DOIUrl":"https://doi.org/10.1109/LRA.2025.3536285","url":null,"abstract":"Estimating the geographic position of Unmanned Aerial Vehicles (UAVs) in the absence of Global Navigation Satellite Systems (GNSS) is crucial for enhancing flight safety. This paper presents a vision-based geolocalization method that matches images captured by onboard cameras with satellite imageries, utilizing attitude information from Inertial Measurement Units (IMUs). We introduce a two-point solution for the Perspective-n-Point (PnP) problem, specifically when the camera's pitch and roll angles are known. This approach is shown to be highly robust against image alignment errors and significantly improves position estimation accuracy. Experiments with both synthetic and real flight data confirm the effectiveness and reliability of the proposed method in practical applications.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 3","pages":"2902-2909"},"PeriodicalIF":4.6,"publicationDate":"2025-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143430540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Prediction of Delay-Free Scene for Quadruped Robot Teleoperation: Integrating Delayed Data With User Commands
IF 4.6 2区 计算机科学 Q2 ROBOTICS Pub Date : 2025-01-29 DOI: 10.1109/LRA.2025.3536222
Seunghyeon Ha;Seongyong Kim;Soo-Chul Lim
Teleoperation systems are utilized in various controllable systems, including vehicles, manipulators, and quadruped robots. However, during teleoperation, communication delays can cause users to receive delayed feedback, which reduces controllability and increases the risk faced by the remote robot. To address this issue, we propose a delay-free video generation model based on user commands that allows users to receive real-time feedback despite communication delays. Our model predicts delay-free video by integrating delayed data (video, point cloud, and robot status) from the robot with the user's real-time commands. The LiDAR point cloud data, which is part of the delayed data, is used to predict the contents of areas outside the camera frame during robot rotation. We constructed our proposed model by modifying the transformer-based video prediction model VPTR-NAR to effectively integrate these data. For our experiments, we acquired a navigation dataset from a quadruped robot, and this dataset was used to train and test our proposed model. We evaluated the model's performance by comparing it with existing video prediction models and conducting an ablation study to verify the effectiveness of its utilization of command and point cloud data.
{"title":"Prediction of Delay-Free Scene for Quadruped Robot Teleoperation: Integrating Delayed Data With User Commands","authors":"Seunghyeon Ha;Seongyong Kim;Soo-Chul Lim","doi":"10.1109/LRA.2025.3536222","DOIUrl":"https://doi.org/10.1109/LRA.2025.3536222","url":null,"abstract":"Teleoperation systems are utilized in various controllable systems, including vehicles, manipulators, and quadruped robots. However, during teleoperation, communication delays can cause users to receive delayed feedback, which reduces controllability and increases the risk faced by the remote robot. To address this issue, we propose a delay-free video generation model based on user commands that allows users to receive real-time feedback despite communication delays. Our model predicts delay-free video by integrating delayed data (video, point cloud, and robot status) from the robot with the user's real-time commands. The LiDAR point cloud data, which is part of the delayed data, is used to predict the contents of areas outside the camera frame during robot rotation. We constructed our proposed model by modifying the transformer-based video prediction model VPTR-NAR to effectively integrate these data. For our experiments, we acquired a navigation dataset from a quadruped robot, and this dataset was used to train and test our proposed model. We evaluated the model's performance by comparing it with existing video prediction models and conducting an ablation study to verify the effectiveness of its utilization of command and point cloud data.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 3","pages":"2846-2853"},"PeriodicalIF":4.6,"publicationDate":"2025-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143396337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Robotics and Automation Letters
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1