Pub Date : 2026-01-19DOI: 10.1109/LRA.2026.3655311
Cong Li;Qin Rao;Zheng Tian;Jun Yang
The rubber-tired container gantry crane (RTG) is a type of heavy-duty lifting equipment commonly used in container yards, which is driven by two-side rubber tires and steered via differential drive. While moving along the desired path, the RTG must remain centered of the lane with restricted heading angle, as deviations may compromise the safety of subsequent yard operations. Due to its underactuated nature and the presence of external disturbances, achieving accurate lane-keeping poses a significant control challenge. To address this issue, a robust safety-critical steering control strategy integrating disturbance rejection vector field (VF) with a new state-interlocked control barrier function (SICBF) is proposed. The strategy initially employs a VF path-following method as the nominal controller. By strategically shrinking the safe set, the SICBF overcomes the limitations of traditional CBFs, such as state coupling in the inequality verification and infeasibility when the control coefficient tends to zero. Furthermore, by incorporating a disturbance observer (DOB) into the quadratic programming (QP) framework, the robustness and safety of the control system are significantly enhanced. Comprehensive simulation and experiment are conducted on a practical RTG with a 40-ton load capacity. To our best knowledge, the proposed method is one of the very few methods that have demonstrated successful application to the practical RTG systems.
{"title":"Safety-Critical Steering Control for Rubber-Tired Container Gantry Cranes: A State-Interlocked CBF Approach","authors":"Cong Li;Qin Rao;Zheng Tian;Jun Yang","doi":"10.1109/LRA.2026.3655311","DOIUrl":"https://doi.org/10.1109/LRA.2026.3655311","url":null,"abstract":"The rubber-tired container gantry crane (RTG) is a type of heavy-duty lifting equipment commonly used in container yards, which is driven by two-side rubber tires and steered via differential drive. While moving along the desired path, the RTG must remain centered of the lane with restricted heading angle, as deviations may compromise the safety of subsequent yard operations. Due to its underactuated nature and the presence of external disturbances, achieving accurate lane-keeping poses a significant control challenge. To address this issue, a robust safety-critical steering control strategy integrating disturbance rejection vector field (VF) with a new state-interlocked control barrier function (SICBF) is proposed. The strategy initially employs a VF path-following method as the nominal controller. By strategically shrinking the safe set, the SICBF overcomes the limitations of traditional CBFs, such as state coupling in the inequality verification and infeasibility when the control coefficient tends to zero. Furthermore, by incorporating a disturbance observer (DOB) into the quadratic programming (QP) framework, the robustness and safety of the control system are significantly enhanced. Comprehensive simulation and experiment are conducted on a practical RTG with a 40-ton load capacity. To our best knowledge, the proposed method is one of the very few methods that have demonstrated successful application to the practical RTG systems.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"11 3","pages":"3238-3245"},"PeriodicalIF":5.3,"publicationDate":"2026-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146082249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Voxel-based LiDAR–inertial odometry (LIO) is accurate and efficient but can suffer from geometric inconsistencies when single-Gaussian voxel models indiscriminately merge observations from conflicting viewpoints. To address this limitation, we propose Azimuth-LIO, a robust voxel-based LIO framework that leverages azimuth-aware voxelization and probabilistic fusion. Instead of using a single distribution per voxel, we discretize each voxel into azimuth-sectorized substructures, each modeled by an anisotropic 3D Gaussian to preserve viewpoint-specific spatial features and uncertainties. We further introduce a direction-weighted distribution-to-distribution registration metric to adaptively quantify the contributions of different azimuth sectors, followed by a Bayesian fusion framework that exploits these confidence weights to ensure azimuth-consistent map updates. The performance and efficiency of the proposed method are evaluated on public benchmarks including the M2DGR, MCD, and SubT-MRS datasets, demonstrating superior accuracy and robustness compared to existing voxel-based algorithms.
{"title":"Azimuth-LIO: Robust LiDAR-Inertial Odometry via Azimuth-Aware Voxelization and Probabilistic Fusion","authors":"Zhongguan Liu;Wei Li;Honglei Che;Lu Pan;Shuaidong Yuan","doi":"10.1109/LRA.2026.3655291","DOIUrl":"https://doi.org/10.1109/LRA.2026.3655291","url":null,"abstract":"Voxel-based LiDAR–inertial odometry (LIO) is accurate and efficient but can suffer from geometric inconsistencies when single-Gaussian voxel models indiscriminately merge observations from conflicting viewpoints. To address this limitation, we propose Azimuth-LIO, a robust voxel-based LIO framework that leverages azimuth-aware voxelization and probabilistic fusion. Instead of using a single distribution per voxel, we discretize each voxel into azimuth-sectorized substructures, each modeled by an anisotropic 3D Gaussian to preserve viewpoint-specific spatial features and uncertainties. We further introduce a direction-weighted distribution-to-distribution registration metric to adaptively quantify the contributions of different azimuth sectors, followed by a Bayesian fusion framework that exploits these confidence weights to ensure azimuth-consistent map updates. The performance and efficiency of the proposed method are evaluated on public benchmarks including the M2DGR, MCD, and SubT-MRS datasets, demonstrating superior accuracy and robustness compared to existing voxel-based algorithms.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"11 3","pages":"3158-3165"},"PeriodicalIF":5.3,"publicationDate":"2026-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146082067","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-19DOI: 10.1109/LRA.2026.3655302
Ziyu Wan;Lin Zhao
This paper proposes DiffPF, a differentiable particle filter that leverages diffusion models for state estimation in dynamic systems. Unlike conventional differentiable particle filters, which require importance weighting and typically rely on predefined or low-capacity proposal distributions, DiffPF learns a flexible posterior sampler by conditioning a diffusion model on predicted particles and the current observation. This enables accurate, equally-weighted sampling from complex, high-dimensional, and multimodal filtering distributions. We evaluate DiffPF across a range of scenarios, including both unimodal and highly multimodal distributions, and test it on simulated as well as real-world tasks, where it consistently outperforms existing filtering baselines. In particular, DiffPF achieves a 90.3% improvement in estimation accuracy on a highly multimodal global localization benchmark, and a nearly 50% improvement on the real-world robotic manipulation benchmark, compared to state-of-the-art differentiable filters. To the best of our knowledge, DiffPF is the first method to integrate conditional diffusion models into particle filtering, enabling high-quality posterior sampling that produces more informative particles and significantly improves state estimation. The code is available at https://github.com/ZiyuNUS/DiffPF.
{"title":"DiffPF: Differentiable Particle Filtering With Generative Sampling via Conditional Diffusion Models","authors":"Ziyu Wan;Lin Zhao","doi":"10.1109/LRA.2026.3655302","DOIUrl":"https://doi.org/10.1109/LRA.2026.3655302","url":null,"abstract":"This paper proposes DiffPF, a <italic>differentiable</i> particle filter that leverages <italic>diffusion</i> models for state estimation in dynamic systems. Unlike conventional differentiable particle filters, which require importance weighting and typically rely on predefined or low-capacity proposal distributions, DiffPF learns a flexible posterior sampler by conditioning a diffusion model on predicted particles and the current observation. This enables accurate, equally-weighted sampling from complex, high-dimensional, and multimodal filtering distributions. We evaluate DiffPF across a range of scenarios, including both unimodal and highly multimodal distributions, and test it on simulated as well as real-world tasks, where it consistently outperforms existing filtering baselines. In particular, DiffPF achieves a 90.3% improvement in estimation accuracy on a highly multimodal global localization benchmark, and a nearly 50% improvement on the real-world robotic manipulation benchmark, compared to state-of-the-art differentiable filters. To the best of our knowledge, DiffPF is the first method to integrate conditional diffusion models into particle filtering, enabling high-quality posterior sampling that produces more informative particles and significantly improves state estimation. The code is available at <uri>https://github.com/ZiyuNUS/DiffPF</uri>.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"11 3","pages":"3166-3173"},"PeriodicalIF":5.3,"publicationDate":"2026-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146082221","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
For peg-in-hole tasks, humans rely on binocular visual perception to locate the peg above the hole surface and then proceed with insertion. This letter draws insights from this behavior to enable agents to learn efficient assembly strategies through visual reinforcement learning. Hence, we propose a Separate Primitive Policy (S2P) to learn how to derive location and insertion actions simultaneously. S2P is compatible with model-free reinforcement learning algorithms. Ten insertion tasks featuring different polygons are developed as benchmarks for evaluations. Simulation experiments show that S2P can boost the sample efficiency and success rate even with force constraints. Real-world experiments are also performed to verify the feasibility of S2P. Ablations are finally given to discuss the generalizability of S2P and some factors that affect its performance.
{"title":"A Visual Reinforcement Learning-Based Separate Primitive Policy for Peg-in-Hole Tasks","authors":"Zichun Xu;Zhaomin Wang;Yuntao Li;Lei Zhuang;Zhiyuan Zhao;Guocai Yang;Jingdong Zhao","doi":"10.1109/LRA.2026.3655305","DOIUrl":"https://doi.org/10.1109/LRA.2026.3655305","url":null,"abstract":"For peg-in-hole tasks, humans rely on binocular visual perception to locate the peg above the hole surface and then proceed with insertion. This letter draws insights from this behavior to enable agents to learn efficient assembly strategies through visual reinforcement learning. Hence, we propose a <bold>S</b>eparate <bold>P</b>rimitive <bold>P</b>olicy (S2P) to learn how to derive location and insertion actions simultaneously. S2P is compatible with model-free reinforcement learning algorithms. Ten insertion tasks featuring different polygons are developed as benchmarks for evaluations. Simulation experiments show that S2P can boost the sample efficiency and success rate even with force constraints. Real-world experiments are also performed to verify the feasibility of S2P. Ablations are finally given to discuss the generalizability of S2P and some factors that affect its performance.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"11 3","pages":"3748-3755"},"PeriodicalIF":5.3,"publicationDate":"2026-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146223641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In the field of robotic manipulation, traditional methods lack the flexibility required to meet the demands of diverse applications. Consequently, researchers have increasingly focused on developing more general techniques, particularly for long-horizon and gentle manipulation, to enhance the manipulation ability and adaptability of robots. In this study, we propose a framework called VLM-Driven Atomic Skills with Diffusion Policy Distillation (VASK-DP), which integrates tactile sensing to enable gentle control of robotic arms in long-horizon tasks. The framework trains atomic manipulation skills through reinforcement learning in simulated environments. The Visual Language Model (VLM) interprets RGB observations and natural language instructions to select and sequence atomic skills, guiding task decomposition, skill switching, and execution. It also generates expert demonstration datasets that serve as the basis for imitation learning. Subsequently, compliant long-horizon manipulation policies are distilled from these demonstrations using diffusion-based imitation learning. We evaluate multiple control modes, distillation strategies, and decision frameworks. Quantitative results across diverse simulation environments and long-horizon tasks validate the effectiveness of our approach. Furthermore, real robot deployment demonstrates successful task execution on physical hardware.
{"title":"Gentle Manipulation of Long-Horizon Tasks Without Human Demonstrations","authors":"Jiayu Zhou;Qiwei Wu;Haitao Jiang;Xuanbao Qin;Yunjiang Lou;Xiaogang Xiong;Renjing Xu","doi":"10.1109/LRA.2026.3653406","DOIUrl":"https://doi.org/10.1109/LRA.2026.3653406","url":null,"abstract":"In the field of robotic manipulation, traditional methods lack the flexibility required to meet the demands of diverse applications. Consequently, researchers have increasingly focused on developing more general techniques, particularly for long-horizon and gentle manipulation, to enhance the manipulation ability and adaptability of robots. In this study, we propose a framework called VLM-Driven Atomic Skills with Diffusion Policy Distillation (VASK-DP), which integrates tactile sensing to enable gentle control of robotic arms in long-horizon tasks. The framework trains atomic manipulation skills through reinforcement learning in simulated environments. The Visual Language Model (VLM) interprets RGB observations and natural language instructions to select and sequence atomic skills, guiding task decomposition, skill switching, and execution. It also generates expert demonstration datasets that serve as the basis for imitation learning. Subsequently, compliant long-horizon manipulation policies are distilled from these demonstrations using diffusion-based imitation learning. We evaluate multiple control modes, distillation strategies, and decision frameworks. Quantitative results across diverse simulation environments and long-horizon tasks validate the effectiveness of our approach. Furthermore, real robot deployment demonstrates successful task execution on physical hardware.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"11 3","pages":"2538-2545"},"PeriodicalIF":5.3,"publicationDate":"2026-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146001876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In range-based SLAM systems, localization accuracy depends on the quality of geometric maps. Sparse LiDAR scans and noisy depth from RGB-D sensors often yield incomplete or inaccurate reconstructions that degrade pose estimation. Appearance and semantic cues, readily available from onboard RGB and pretrained models, can serve as complementary signals to strengthen geometry. Nevertheless, variations in appearance due to illumination or texture and inconsistencies in semantic labels across frames can hinder geometric optimization if directly used as supervision. To address these challenges, we propose AING-SLAM, an Accurate Implicit Neural Geometry-aware SLAM framework that allows appearance and semantics to effectively strengthen geometry in both mapping and odometry. A unified neural point representation with a lightweight cross-modal decoder integrates geometry, appearance and semantics, enabling auxiliary cues to refine geometry even in sparse or ambiguous regions. For pose tracking, appearance-semantic-aided odometry jointly minimizes SDF, appearance, and semantic residuals with adaptive weighting, improving scan-to-map alignment and reducing drift. To safeguard stability, a history-guided gradient fusion strategy aligns instantaneous updates with long-term optimization trends, mitigating occasional inconsistencies between appearance/semantic cues and SDF-based supervision, thereby strengthening geometric optimization. Extensive experiments on indoor RGB-D and outdoor LiDAR benchmarks demonstrate real-time performance, state-of-the-art localization accuracy, and high-fidelity reconstruction across diverse environments.
{"title":"AING-SLAM: Accurate Implicit Neural Geometry-Aware SLAM With Appearance and Semantics via History-Guided Optimization","authors":"Yanan Hao;Chenhui Shi;Pengju Zhang;Fulin Tang;Yihong Wu","doi":"10.1109/LRA.2026.3653380","DOIUrl":"https://doi.org/10.1109/LRA.2026.3653380","url":null,"abstract":"In range-based SLAM systems, localization accuracy depends on the quality of geometric maps. Sparse LiDAR scans and noisy depth from RGB-D sensors often yield incomplete or inaccurate reconstructions that degrade pose estimation. Appearance and semantic cues, readily available from onboard RGB and pretrained models, can serve as complementary signals to strengthen geometry. Nevertheless, variations in appearance due to illumination or texture and inconsistencies in semantic labels across frames can hinder geometric optimization if directly used as supervision. To address these challenges, we propose <bold>AING-SLAM</b>, an Accurate Implicit Neural Geometry-aware SLAM framework that allows appearance and semantics to effectively strengthen geometry in both mapping and odometry. A unified neural point representation with a lightweight cross-modal decoder integrates geometry, appearance and semantics, enabling auxiliary cues to refine geometry even in sparse or ambiguous regions. For pose tracking, appearance-semantic-aided odometry jointly minimizes SDF, appearance, and semantic residuals with adaptive weighting, improving scan-to-map alignment and reducing drift. To safeguard stability, a history-guided gradient fusion strategy aligns instantaneous updates with long-term optimization trends, mitigating occasional inconsistencies between appearance/semantic cues and SDF-based supervision, thereby strengthening geometric optimization. Extensive experiments on indoor RGB-D and outdoor LiDAR benchmarks demonstrate real-time performance, state-of-the-art localization accuracy, and high-fidelity reconstruction across diverse environments.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"11 3","pages":"2594-2601"},"PeriodicalIF":5.3,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146026627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-14DOI: 10.1109/LRA.2026.3653326
Manuel Weiss;Alexander Pawluchin;Jan-Hendrik Ewering;Thomas Seel;Ivo Boblan
This letter introduces a control framework that leverages Lagrangian neural network (LNN) for computed torque control (CTC) of robotic systems with unknown dynamics. Unlike prior LNN-based controllers that are placed outside the feedback-linearization framework (e.g., feedforward), we embed an LNN inverse-dynamics model within a CTC loop, thereby shaping the closed-loop error dynamics. This strategy, referred to as LNN-CTC, ensures a physically consistent model and improves extrapolation, requiring neither prior model knowledge nor extensive training data. The approach is experimentally validated on a robotic arm with four degrees of freedom and compared with conventional model-based CTC, physics-informed neural network (PINN)-CTC, deep neural network (DNN)-CTC, an LNN-based feedforward controller, and a PID controller. Results demonstrate that LNN-CTC significantly outperforms model-based baselines by up to $30 ,%$ in tracking accuracy, achieving high performance with minimal training data. In addition, LNN-CTC outperforms all other evaluated baselines in both tracking accuracy and data efficiency, attaining lower joint-space RMSE for the same training data. The findings highlight the potential of physics-informed neural architectures to generalize robustly across various operating conditions and contribute to narrowing the performance gap between learned and classical control strategies.
{"title":"Lagrangian Neural Network-Based Control: Improving Robotic Trajectory Tracking via Linearized Feedback","authors":"Manuel Weiss;Alexander Pawluchin;Jan-Hendrik Ewering;Thomas Seel;Ivo Boblan","doi":"10.1109/LRA.2026.3653326","DOIUrl":"https://doi.org/10.1109/LRA.2026.3653326","url":null,"abstract":"This letter introduces a control framework that leverages Lagrangian neural network (LNN) for computed torque control (CTC) of robotic systems with unknown dynamics. Unlike prior LNN-based controllers that are placed outside the feedback-linearization framework (e.g., feedforward), we embed an LNN inverse-dynamics model within a CTC loop, thereby shaping the closed-loop error dynamics. This strategy, referred to as LNN-CTC, ensures a physically consistent model and improves extrapolation, requiring neither prior model knowledge nor extensive training data. The approach is experimentally validated on a robotic arm with four degrees of freedom and compared with conventional model-based CTC, physics-informed neural network (PINN)-CTC, deep neural network (DNN)-CTC, an LNN-based feedforward controller, and a PID controller. Results demonstrate that LNN-CTC significantly outperforms model-based baselines by up to <inline-formula><tex-math>$30 ,%$</tex-math></inline-formula> in tracking accuracy, achieving high performance with minimal training data. In addition, LNN-CTC outperforms all other evaluated baselines in both tracking accuracy and data efficiency, attaining lower joint-space RMSE for the same training data. The findings highlight the potential of physics-informed neural architectures to generalize robustly across various operating conditions and contribute to narrowing the performance gap between learned and classical control strategies.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"11 3","pages":"2546-2553"},"PeriodicalIF":5.3,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11352810","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146001868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-14DOI: 10.1109/LRA.2026.3653405
Michael Bosello;Flavio Pinzarrone;Sara Kiade;Davide Aguiari;Yvo Keuter;Aaesha AlShehhi;Gyordan Caminati;Kei Long Wong;Ka Seng Chou;Junaid Halepota;Fares Alneyadi;Jacopo Panerati;Giovanni Pau
Drone technology is proliferating in many industries, including agriculture, logistics, defense, infrastructure, and environmental monitoring. Vision-based autonomy is one of its key enablers, particularly for real-world applications. This is essential for operating in novel, unstructured environments where traditional navigation methods may be unavailable. Autonomous drone racing has become the de facto benchmark for such systems. State-of-the-art research has shown that autonomous systems can surpass human-level performance in racing arenas. However, the direct applicability to commercial and field operations is still limited, as current systems are often trained and evaluated in highly controlled environments. In our contribution, the system's capabilities are analyzed within a controlled environment—where external tracking is available for ground-truth comparison—but also demonstrated in a challenging, uninstrumented environment—where ground-truth measurements were never available. We show that our approach can match the performance of professional human pilots in both scenarios.
{"title":"On Your Own: Pro-Level Autonomous Drone Racing in Uninstrumented Arenas","authors":"Michael Bosello;Flavio Pinzarrone;Sara Kiade;Davide Aguiari;Yvo Keuter;Aaesha AlShehhi;Gyordan Caminati;Kei Long Wong;Ka Seng Chou;Junaid Halepota;Fares Alneyadi;Jacopo Panerati;Giovanni Pau","doi":"10.1109/LRA.2026.3653405","DOIUrl":"https://doi.org/10.1109/LRA.2026.3653405","url":null,"abstract":"Drone technology is proliferating in many industries, including agriculture, logistics, defense, infrastructure, and environmental monitoring. Vision-based autonomy is one of its key enablers, particularly for real-world applications. This is essential for operating in novel, unstructured environments where traditional navigation methods may be unavailable. Autonomous drone racing has become the <italic>de facto</i> benchmark for such systems. State-of-the-art research has shown that autonomous systems can surpass human-level performance in racing arenas. However, the direct applicability to commercial and field operations is still limited, as current systems are often trained and evaluated in highly controlled environments. In our contribution, the system's capabilities are analyzed within a controlled environment—where external tracking is available for ground-truth comparison—but also demonstrated in a challenging, uninstrumented environment—where ground-truth measurements were never available. We show that our approach can match the performance of professional human pilots in both scenarios.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"11 3","pages":"2674-2681"},"PeriodicalIF":5.3,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11347474","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146026582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-14DOI: 10.1109/LRA.2026.3653372
Liansheng Wang;Xinke Zhang;Chenhui Li;Dongjiao He;Yihan Pan;Jianjun Yi
LiDAR-Inertial Odometry (LIO) is a foundational technique for autonomous systems, yet its deployment on resource-constrained platforms remains challenging due to computational and memory limitations. We propose Super-LIO, a robust LIO system that demands both high performance and accuracy, ideal for applications such as aerial robots and mobile autonomous systems. At the core of Super-LIO is a compact octo-voxel-based map structure, termed OctVox, that limits each voxel to eight subvoxel representatives, enabling strict point density control and incremental denoising during map updates. This design enables a simple yet efficient and accurate map structure, which can be easily integrated into existing LIO frameworks. Additionally, Super-LIO designs a heuristic-guided KNN strategy (HKNN) that accelerates the correspondence search by leveraging spatial locality, further reducing runtime overhead. We evaluated the proposed system using four publicly available datasets and several self-collected datasets, totaling more than 30 sequences. Extensive testing on both X86 and ARM platforms confirms that Super-LIO offers superior efficiency and robustness, while maintaining competitive accuracy. Super-LIO processes each frame approximately 73% faster than SOTA, while consuming less CPU resources. The system is fully open-source and compatible with a wide range of LiDAR sensors and computing platforms. The implementation is available at: https://github.com/Liansheng-Wang/Super-LIO.git.
{"title":"Super-LIO: A Robust and Efficient LiDAR-Inertial Odometry System With a Compact Mapping Strategy","authors":"Liansheng Wang;Xinke Zhang;Chenhui Li;Dongjiao He;Yihan Pan;Jianjun Yi","doi":"10.1109/LRA.2026.3653372","DOIUrl":"https://doi.org/10.1109/LRA.2026.3653372","url":null,"abstract":"LiDAR-Inertial Odometry (LIO) is a foundational technique for autonomous systems, yet its deployment on resource-constrained platforms remains challenging due to computational and memory limitations. We propose Super-LIO, a robust LIO system that demands both high performance and accuracy, ideal for applications such as aerial robots and mobile autonomous systems. At the core of Super-LIO is a compact octo-voxel-based map structure, termed <bold>OctVox</b>, that limits each voxel to eight subvoxel representatives, enabling strict point density control and incremental denoising during map updates. This design enables a simple yet efficient and accurate map structure, which can be easily integrated into existing LIO frameworks. Additionally, Super-LIO designs a heuristic-guided KNN strategy (HKNN) that accelerates the correspondence search by leveraging spatial locality, further reducing runtime overhead. We evaluated the proposed system using four publicly available datasets and several self-collected datasets, totaling more than 30 sequences. Extensive testing on both X86 and ARM platforms confirms that Super-LIO offers superior efficiency and robustness, while maintaining competitive accuracy. Super-LIO processes each frame approximately 73% faster than SOTA, while consuming less CPU resources. The system is fully open-source and compatible with a wide range of LiDAR sensors and computing platforms. The implementation is available at: <uri>https://github.com/Liansheng-Wang/Super-LIO.git</uri>.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"11 3","pages":"2666-2673"},"PeriodicalIF":5.3,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146026581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-13DOI: 10.1109/LRA.2026.3653368
Christian Geckeler;Niklas Neugebauer;Manasi Muglikar;Davide Scaramuzza;Stefano Mintchev
Uncrewed aerial vehicles (UAVs) are increasingly deployed in forest environments for tasks such as environmental monitoring and search and rescue, which require safe navigation through dense foliage and precise data collection. Traditional sensing approaches, including passive multispectral and RGB imaging, suffer from latency, poor depth resolution, and strong dependence on ambient light—especially under forest canopies. In this work, we present a novel event spectroscopy system that simultaneously enables high-resolution, low-latency depth reconstruction with integrated multispectral imaging using a single sensor. Depth is reconstructed using structured light, and by modulating the wavelength of the projected structured light, our system captures spectral information in controlled bands between 650 nm and 850 nm. We demonstrate up to 60% improvement in RMSE over commercial depth sensors and validate the spectral accuracy against a reference spectrometer and commercial multispectral cameras, demonstrating comparable performance. A portable version limited to RGB is used to collect real-world depth and spectral data from a Masoala Rainforest. We demonstrate color image reconstruction and material differentiation between leaves and branches using this spectral and depth data. Our results show that adding depth (available at no extra effort with our setup) to material differentiation improves the accuracy by over 30% compared to color-only method. Our system, tested in both lab and real-world rainforest environments, shows strong performance in depth estimation, RGB reconstruction, and material differentiation—paving the way for lightweight, integrated, and robust UAV perception and data collection in complex natural environments.
{"title":"Event Spectroscopy: Event-Based Multispectral and Depth Sensing Using Structured Light","authors":"Christian Geckeler;Niklas Neugebauer;Manasi Muglikar;Davide Scaramuzza;Stefano Mintchev","doi":"10.1109/LRA.2026.3653368","DOIUrl":"https://doi.org/10.1109/LRA.2026.3653368","url":null,"abstract":"Uncrewed aerial vehicles (UAVs) are increasingly deployed in forest environments for tasks such as environmental monitoring and search and rescue, which require safe navigation through dense foliage and precise data collection. Traditional sensing approaches, including passive multispectral and RGB imaging, suffer from latency, poor depth resolution, and strong dependence on ambient light—especially under forest canopies. In this work, we present a novel event spectroscopy system that simultaneously enables high-resolution, low-latency depth reconstruction with integrated multispectral imaging using a single sensor. Depth is reconstructed using structured light, and by modulating the wavelength of the projected structured light, our system captures spectral information in controlled bands between 650 nm and 850 nm. We demonstrate up to 60% improvement in RMSE over commercial depth sensors and validate the spectral accuracy against a reference spectrometer and commercial multispectral cameras, demonstrating comparable performance. A portable version limited to RGB is used to collect real-world depth and spectral data from a Masoala Rainforest. We demonstrate color image reconstruction and material differentiation between leaves and branches using this spectral and depth data. Our results show that adding depth (available at no extra effort with our setup) to material differentiation improves the accuracy by over 30% compared to color-only method. Our system, tested in both lab and real-world rainforest environments, shows strong performance in depth estimation, RGB reconstruction, and material differentiation—paving the way for lightweight, integrated, and robust UAV perception and data collection in complex natural environments.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"11 3","pages":"2658-2665"},"PeriodicalIF":5.3,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146026584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}