Voxel-based LiDAR–inertial odometry (LIO) is accurate and efficient but can suffer from geometric inconsistencies when single-Gaussian voxel models indiscriminately merge observations from conflicting viewpoints. To address this limitation, we propose Azimuth-LIO, a robust voxel-based LIO framework that leverages azimuth-aware voxelization and probabilistic fusion. Instead of using a single distribution per voxel, we discretize each voxel into azimuth-sectorized substructures, each modeled by an anisotropic 3D Gaussian to preserve viewpoint-specific spatial features and uncertainties. We further introduce a direction-weighted distribution-to-distribution registration metric to adaptively quantify the contributions of different azimuth sectors, followed by a Bayesian fusion framework that exploits these confidence weights to ensure azimuth-consistent map updates. The performance and efficiency of the proposed method are evaluated on public benchmarks including the M2DGR, MCD, and SubT-MRS datasets, demonstrating superior accuracy and robustness compared to existing voxel-based algorithms.
{"title":"Azimuth-LIO: Robust LiDAR-Inertial Odometry via Azimuth-Aware Voxelization and Probabilistic Fusion","authors":"Zhongguan Liu;Wei Li;Honglei Che;Lu Pan;Shuaidong Yuan","doi":"10.1109/LRA.2026.3655291","DOIUrl":"https://doi.org/10.1109/LRA.2026.3655291","url":null,"abstract":"Voxel-based LiDAR–inertial odometry (LIO) is accurate and efficient but can suffer from geometric inconsistencies when single-Gaussian voxel models indiscriminately merge observations from conflicting viewpoints. To address this limitation, we propose Azimuth-LIO, a robust voxel-based LIO framework that leverages azimuth-aware voxelization and probabilistic fusion. Instead of using a single distribution per voxel, we discretize each voxel into azimuth-sectorized substructures, each modeled by an anisotropic 3D Gaussian to preserve viewpoint-specific spatial features and uncertainties. We further introduce a direction-weighted distribution-to-distribution registration metric to adaptively quantify the contributions of different azimuth sectors, followed by a Bayesian fusion framework that exploits these confidence weights to ensure azimuth-consistent map updates. The performance and efficiency of the proposed method are evaluated on public benchmarks including the M2DGR, MCD, and SubT-MRS datasets, demonstrating superior accuracy and robustness compared to existing voxel-based algorithms.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"11 3","pages":"3158-3165"},"PeriodicalIF":5.3,"publicationDate":"2026-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146082067","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-19DOI: 10.1109/LRA.2026.3655302
Ziyu Wan;Lin Zhao
This paper proposes DiffPF, a differentiable particle filter that leverages diffusion models for state estimation in dynamic systems. Unlike conventional differentiable particle filters, which require importance weighting and typically rely on predefined or low-capacity proposal distributions, DiffPF learns a flexible posterior sampler by conditioning a diffusion model on predicted particles and the current observation. This enables accurate, equally-weighted sampling from complex, high-dimensional, and multimodal filtering distributions. We evaluate DiffPF across a range of scenarios, including both unimodal and highly multimodal distributions, and test it on simulated as well as real-world tasks, where it consistently outperforms existing filtering baselines. In particular, DiffPF achieves a 90.3% improvement in estimation accuracy on a highly multimodal global localization benchmark, and a nearly 50% improvement on the real-world robotic manipulation benchmark, compared to state-of-the-art differentiable filters. To the best of our knowledge, DiffPF is the first method to integrate conditional diffusion models into particle filtering, enabling high-quality posterior sampling that produces more informative particles and significantly improves state estimation. The code is available at https://github.com/ZiyuNUS/DiffPF.
{"title":"DiffPF: Differentiable Particle Filtering With Generative Sampling via Conditional Diffusion Models","authors":"Ziyu Wan;Lin Zhao","doi":"10.1109/LRA.2026.3655302","DOIUrl":"https://doi.org/10.1109/LRA.2026.3655302","url":null,"abstract":"This paper proposes DiffPF, a <italic>differentiable</i> particle filter that leverages <italic>diffusion</i> models for state estimation in dynamic systems. Unlike conventional differentiable particle filters, which require importance weighting and typically rely on predefined or low-capacity proposal distributions, DiffPF learns a flexible posterior sampler by conditioning a diffusion model on predicted particles and the current observation. This enables accurate, equally-weighted sampling from complex, high-dimensional, and multimodal filtering distributions. We evaluate DiffPF across a range of scenarios, including both unimodal and highly multimodal distributions, and test it on simulated as well as real-world tasks, where it consistently outperforms existing filtering baselines. In particular, DiffPF achieves a 90.3% improvement in estimation accuracy on a highly multimodal global localization benchmark, and a nearly 50% improvement on the real-world robotic manipulation benchmark, compared to state-of-the-art differentiable filters. To the best of our knowledge, DiffPF is the first method to integrate conditional diffusion models into particle filtering, enabling high-quality posterior sampling that produces more informative particles and significantly improves state estimation. The code is available at <uri>https://github.com/ZiyuNUS/DiffPF</uri>.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"11 3","pages":"3166-3173"},"PeriodicalIF":5.3,"publicationDate":"2026-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146082221","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In the field of robotic manipulation, traditional methods lack the flexibility required to meet the demands of diverse applications. Consequently, researchers have increasingly focused on developing more general techniques, particularly for long-horizon and gentle manipulation, to enhance the manipulation ability and adaptability of robots. In this study, we propose a framework called VLM-Driven Atomic Skills with Diffusion Policy Distillation (VASK-DP), which integrates tactile sensing to enable gentle control of robotic arms in long-horizon tasks. The framework trains atomic manipulation skills through reinforcement learning in simulated environments. The Visual Language Model (VLM) interprets RGB observations and natural language instructions to select and sequence atomic skills, guiding task decomposition, skill switching, and execution. It also generates expert demonstration datasets that serve as the basis for imitation learning. Subsequently, compliant long-horizon manipulation policies are distilled from these demonstrations using diffusion-based imitation learning. We evaluate multiple control modes, distillation strategies, and decision frameworks. Quantitative results across diverse simulation environments and long-horizon tasks validate the effectiveness of our approach. Furthermore, real robot deployment demonstrates successful task execution on physical hardware.
{"title":"Gentle Manipulation of Long-Horizon Tasks Without Human Demonstrations","authors":"Jiayu Zhou;Qiwei Wu;Haitao Jiang;Xuanbao Qin;Yunjiang Lou;Xiaogang Xiong;Renjing Xu","doi":"10.1109/LRA.2026.3653406","DOIUrl":"https://doi.org/10.1109/LRA.2026.3653406","url":null,"abstract":"In the field of robotic manipulation, traditional methods lack the flexibility required to meet the demands of diverse applications. Consequently, researchers have increasingly focused on developing more general techniques, particularly for long-horizon and gentle manipulation, to enhance the manipulation ability and adaptability of robots. In this study, we propose a framework called VLM-Driven Atomic Skills with Diffusion Policy Distillation (VASK-DP), which integrates tactile sensing to enable gentle control of robotic arms in long-horizon tasks. The framework trains atomic manipulation skills through reinforcement learning in simulated environments. The Visual Language Model (VLM) interprets RGB observations and natural language instructions to select and sequence atomic skills, guiding task decomposition, skill switching, and execution. It also generates expert demonstration datasets that serve as the basis for imitation learning. Subsequently, compliant long-horizon manipulation policies are distilled from these demonstrations using diffusion-based imitation learning. We evaluate multiple control modes, distillation strategies, and decision frameworks. Quantitative results across diverse simulation environments and long-horizon tasks validate the effectiveness of our approach. Furthermore, real robot deployment demonstrates successful task execution on physical hardware.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"11 3","pages":"2538-2545"},"PeriodicalIF":5.3,"publicationDate":"2026-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146001876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In range-based SLAM systems, localization accuracy depends on the quality of geometric maps. Sparse LiDAR scans and noisy depth from RGB-D sensors often yield incomplete or inaccurate reconstructions that degrade pose estimation. Appearance and semantic cues, readily available from onboard RGB and pretrained models, can serve as complementary signals to strengthen geometry. Nevertheless, variations in appearance due to illumination or texture and inconsistencies in semantic labels across frames can hinder geometric optimization if directly used as supervision. To address these challenges, we propose AING-SLAM, an Accurate Implicit Neural Geometry-aware SLAM framework that allows appearance and semantics to effectively strengthen geometry in both mapping and odometry. A unified neural point representation with a lightweight cross-modal decoder integrates geometry, appearance and semantics, enabling auxiliary cues to refine geometry even in sparse or ambiguous regions. For pose tracking, appearance-semantic-aided odometry jointly minimizes SDF, appearance, and semantic residuals with adaptive weighting, improving scan-to-map alignment and reducing drift. To safeguard stability, a history-guided gradient fusion strategy aligns instantaneous updates with long-term optimization trends, mitigating occasional inconsistencies between appearance/semantic cues and SDF-based supervision, thereby strengthening geometric optimization. Extensive experiments on indoor RGB-D and outdoor LiDAR benchmarks demonstrate real-time performance, state-of-the-art localization accuracy, and high-fidelity reconstruction across diverse environments.
{"title":"AING-SLAM: Accurate Implicit Neural Geometry-Aware SLAM With Appearance and Semantics via History-Guided Optimization","authors":"Yanan Hao;Chenhui Shi;Pengju Zhang;Fulin Tang;Yihong Wu","doi":"10.1109/LRA.2026.3653380","DOIUrl":"https://doi.org/10.1109/LRA.2026.3653380","url":null,"abstract":"In range-based SLAM systems, localization accuracy depends on the quality of geometric maps. Sparse LiDAR scans and noisy depth from RGB-D sensors often yield incomplete or inaccurate reconstructions that degrade pose estimation. Appearance and semantic cues, readily available from onboard RGB and pretrained models, can serve as complementary signals to strengthen geometry. Nevertheless, variations in appearance due to illumination or texture and inconsistencies in semantic labels across frames can hinder geometric optimization if directly used as supervision. To address these challenges, we propose <bold>AING-SLAM</b>, an Accurate Implicit Neural Geometry-aware SLAM framework that allows appearance and semantics to effectively strengthen geometry in both mapping and odometry. A unified neural point representation with a lightweight cross-modal decoder integrates geometry, appearance and semantics, enabling auxiliary cues to refine geometry even in sparse or ambiguous regions. For pose tracking, appearance-semantic-aided odometry jointly minimizes SDF, appearance, and semantic residuals with adaptive weighting, improving scan-to-map alignment and reducing drift. To safeguard stability, a history-guided gradient fusion strategy aligns instantaneous updates with long-term optimization trends, mitigating occasional inconsistencies between appearance/semantic cues and SDF-based supervision, thereby strengthening geometric optimization. Extensive experiments on indoor RGB-D and outdoor LiDAR benchmarks demonstrate real-time performance, state-of-the-art localization accuracy, and high-fidelity reconstruction across diverse environments.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"11 3","pages":"2594-2601"},"PeriodicalIF":5.3,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146026627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-14DOI: 10.1109/LRA.2026.3653326
Manuel Weiss;Alexander Pawluchin;Jan-Hendrik Ewering;Thomas Seel;Ivo Boblan
This letter introduces a control framework that leverages Lagrangian neural network (LNN) for computed torque control (CTC) of robotic systems with unknown dynamics. Unlike prior LNN-based controllers that are placed outside the feedback-linearization framework (e.g., feedforward), we embed an LNN inverse-dynamics model within a CTC loop, thereby shaping the closed-loop error dynamics. This strategy, referred to as LNN-CTC, ensures a physically consistent model and improves extrapolation, requiring neither prior model knowledge nor extensive training data. The approach is experimentally validated on a robotic arm with four degrees of freedom and compared with conventional model-based CTC, physics-informed neural network (PINN)-CTC, deep neural network (DNN)-CTC, an LNN-based feedforward controller, and a PID controller. Results demonstrate that LNN-CTC significantly outperforms model-based baselines by up to $30 ,%$ in tracking accuracy, achieving high performance with minimal training data. In addition, LNN-CTC outperforms all other evaluated baselines in both tracking accuracy and data efficiency, attaining lower joint-space RMSE for the same training data. The findings highlight the potential of physics-informed neural architectures to generalize robustly across various operating conditions and contribute to narrowing the performance gap between learned and classical control strategies.
{"title":"Lagrangian Neural Network-Based Control: Improving Robotic Trajectory Tracking via Linearized Feedback","authors":"Manuel Weiss;Alexander Pawluchin;Jan-Hendrik Ewering;Thomas Seel;Ivo Boblan","doi":"10.1109/LRA.2026.3653326","DOIUrl":"https://doi.org/10.1109/LRA.2026.3653326","url":null,"abstract":"This letter introduces a control framework that leverages Lagrangian neural network (LNN) for computed torque control (CTC) of robotic systems with unknown dynamics. Unlike prior LNN-based controllers that are placed outside the feedback-linearization framework (e.g., feedforward), we embed an LNN inverse-dynamics model within a CTC loop, thereby shaping the closed-loop error dynamics. This strategy, referred to as LNN-CTC, ensures a physically consistent model and improves extrapolation, requiring neither prior model knowledge nor extensive training data. The approach is experimentally validated on a robotic arm with four degrees of freedom and compared with conventional model-based CTC, physics-informed neural network (PINN)-CTC, deep neural network (DNN)-CTC, an LNN-based feedforward controller, and a PID controller. Results demonstrate that LNN-CTC significantly outperforms model-based baselines by up to <inline-formula><tex-math>$30 ,%$</tex-math></inline-formula> in tracking accuracy, achieving high performance with minimal training data. In addition, LNN-CTC outperforms all other evaluated baselines in both tracking accuracy and data efficiency, attaining lower joint-space RMSE for the same training data. The findings highlight the potential of physics-informed neural architectures to generalize robustly across various operating conditions and contribute to narrowing the performance gap between learned and classical control strategies.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"11 3","pages":"2546-2553"},"PeriodicalIF":5.3,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11352810","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146001868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-14DOI: 10.1109/LRA.2026.3653405
Michael Bosello;Flavio Pinzarrone;Sara Kiade;Davide Aguiari;Yvo Keuter;Aaesha AlShehhi;Gyordan Caminati;Kei Long Wong;Ka Seng Chou;Junaid Halepota;Fares Alneyadi;Jacopo Panerati;Giovanni Pau
Drone technology is proliferating in many industries, including agriculture, logistics, defense, infrastructure, and environmental monitoring. Vision-based autonomy is one of its key enablers, particularly for real-world applications. This is essential for operating in novel, unstructured environments where traditional navigation methods may be unavailable. Autonomous drone racing has become the de facto benchmark for such systems. State-of-the-art research has shown that autonomous systems can surpass human-level performance in racing arenas. However, the direct applicability to commercial and field operations is still limited, as current systems are often trained and evaluated in highly controlled environments. In our contribution, the system's capabilities are analyzed within a controlled environment—where external tracking is available for ground-truth comparison—but also demonstrated in a challenging, uninstrumented environment—where ground-truth measurements were never available. We show that our approach can match the performance of professional human pilots in both scenarios.
{"title":"On Your Own: Pro-Level Autonomous Drone Racing in Uninstrumented Arenas","authors":"Michael Bosello;Flavio Pinzarrone;Sara Kiade;Davide Aguiari;Yvo Keuter;Aaesha AlShehhi;Gyordan Caminati;Kei Long Wong;Ka Seng Chou;Junaid Halepota;Fares Alneyadi;Jacopo Panerati;Giovanni Pau","doi":"10.1109/LRA.2026.3653405","DOIUrl":"https://doi.org/10.1109/LRA.2026.3653405","url":null,"abstract":"Drone technology is proliferating in many industries, including agriculture, logistics, defense, infrastructure, and environmental monitoring. Vision-based autonomy is one of its key enablers, particularly for real-world applications. This is essential for operating in novel, unstructured environments where traditional navigation methods may be unavailable. Autonomous drone racing has become the <italic>de facto</i> benchmark for such systems. State-of-the-art research has shown that autonomous systems can surpass human-level performance in racing arenas. However, the direct applicability to commercial and field operations is still limited, as current systems are often trained and evaluated in highly controlled environments. In our contribution, the system's capabilities are analyzed within a controlled environment—where external tracking is available for ground-truth comparison—but also demonstrated in a challenging, uninstrumented environment—where ground-truth measurements were never available. We show that our approach can match the performance of professional human pilots in both scenarios.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"11 3","pages":"2674-2681"},"PeriodicalIF":5.3,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11347474","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146026582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-14DOI: 10.1109/LRA.2026.3653372
Liansheng Wang;Xinke Zhang;Chenhui Li;Dongjiao He;Yihan Pan;Jianjun Yi
LiDAR-Inertial Odometry (LIO) is a foundational technique for autonomous systems, yet its deployment on resource-constrained platforms remains challenging due to computational and memory limitations. We propose Super-LIO, a robust LIO system that demands both high performance and accuracy, ideal for applications such as aerial robots and mobile autonomous systems. At the core of Super-LIO is a compact octo-voxel-based map structure, termed OctVox, that limits each voxel to eight subvoxel representatives, enabling strict point density control and incremental denoising during map updates. This design enables a simple yet efficient and accurate map structure, which can be easily integrated into existing LIO frameworks. Additionally, Super-LIO designs a heuristic-guided KNN strategy (HKNN) that accelerates the correspondence search by leveraging spatial locality, further reducing runtime overhead. We evaluated the proposed system using four publicly available datasets and several self-collected datasets, totaling more than 30 sequences. Extensive testing on both X86 and ARM platforms confirms that Super-LIO offers superior efficiency and robustness, while maintaining competitive accuracy. Super-LIO processes each frame approximately 73% faster than SOTA, while consuming less CPU resources. The system is fully open-source and compatible with a wide range of LiDAR sensors and computing platforms. The implementation is available at: https://github.com/Liansheng-Wang/Super-LIO.git.
{"title":"Super-LIO: A Robust and Efficient LiDAR-Inertial Odometry System With a Compact Mapping Strategy","authors":"Liansheng Wang;Xinke Zhang;Chenhui Li;Dongjiao He;Yihan Pan;Jianjun Yi","doi":"10.1109/LRA.2026.3653372","DOIUrl":"https://doi.org/10.1109/LRA.2026.3653372","url":null,"abstract":"LiDAR-Inertial Odometry (LIO) is a foundational technique for autonomous systems, yet its deployment on resource-constrained platforms remains challenging due to computational and memory limitations. We propose Super-LIO, a robust LIO system that demands both high performance and accuracy, ideal for applications such as aerial robots and mobile autonomous systems. At the core of Super-LIO is a compact octo-voxel-based map structure, termed <bold>OctVox</b>, that limits each voxel to eight subvoxel representatives, enabling strict point density control and incremental denoising during map updates. This design enables a simple yet efficient and accurate map structure, which can be easily integrated into existing LIO frameworks. Additionally, Super-LIO designs a heuristic-guided KNN strategy (HKNN) that accelerates the correspondence search by leveraging spatial locality, further reducing runtime overhead. We evaluated the proposed system using four publicly available datasets and several self-collected datasets, totaling more than 30 sequences. Extensive testing on both X86 and ARM platforms confirms that Super-LIO offers superior efficiency and robustness, while maintaining competitive accuracy. Super-LIO processes each frame approximately 73% faster than SOTA, while consuming less CPU resources. The system is fully open-source and compatible with a wide range of LiDAR sensors and computing platforms. The implementation is available at: <uri>https://github.com/Liansheng-Wang/Super-LIO.git</uri>.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"11 3","pages":"2666-2673"},"PeriodicalIF":5.3,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146026581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-13DOI: 10.1109/LRA.2026.3653368
Christian Geckeler;Niklas Neugebauer;Manasi Muglikar;Davide Scaramuzza;Stefano Mintchev
Uncrewed aerial vehicles (UAVs) are increasingly deployed in forest environments for tasks such as environmental monitoring and search and rescue, which require safe navigation through dense foliage and precise data collection. Traditional sensing approaches, including passive multispectral and RGB imaging, suffer from latency, poor depth resolution, and strong dependence on ambient light—especially under forest canopies. In this work, we present a novel event spectroscopy system that simultaneously enables high-resolution, low-latency depth reconstruction with integrated multispectral imaging using a single sensor. Depth is reconstructed using structured light, and by modulating the wavelength of the projected structured light, our system captures spectral information in controlled bands between 650 nm and 850 nm. We demonstrate up to 60% improvement in RMSE over commercial depth sensors and validate the spectral accuracy against a reference spectrometer and commercial multispectral cameras, demonstrating comparable performance. A portable version limited to RGB is used to collect real-world depth and spectral data from a Masoala Rainforest. We demonstrate color image reconstruction and material differentiation between leaves and branches using this spectral and depth data. Our results show that adding depth (available at no extra effort with our setup) to material differentiation improves the accuracy by over 30% compared to color-only method. Our system, tested in both lab and real-world rainforest environments, shows strong performance in depth estimation, RGB reconstruction, and material differentiation—paving the way for lightweight, integrated, and robust UAV perception and data collection in complex natural environments.
{"title":"Event Spectroscopy: Event-Based Multispectral and Depth Sensing Using Structured Light","authors":"Christian Geckeler;Niklas Neugebauer;Manasi Muglikar;Davide Scaramuzza;Stefano Mintchev","doi":"10.1109/LRA.2026.3653368","DOIUrl":"https://doi.org/10.1109/LRA.2026.3653368","url":null,"abstract":"Uncrewed aerial vehicles (UAVs) are increasingly deployed in forest environments for tasks such as environmental monitoring and search and rescue, which require safe navigation through dense foliage and precise data collection. Traditional sensing approaches, including passive multispectral and RGB imaging, suffer from latency, poor depth resolution, and strong dependence on ambient light—especially under forest canopies. In this work, we present a novel event spectroscopy system that simultaneously enables high-resolution, low-latency depth reconstruction with integrated multispectral imaging using a single sensor. Depth is reconstructed using structured light, and by modulating the wavelength of the projected structured light, our system captures spectral information in controlled bands between 650 nm and 850 nm. We demonstrate up to 60% improvement in RMSE over commercial depth sensors and validate the spectral accuracy against a reference spectrometer and commercial multispectral cameras, demonstrating comparable performance. A portable version limited to RGB is used to collect real-world depth and spectral data from a Masoala Rainforest. We demonstrate color image reconstruction and material differentiation between leaves and branches using this spectral and depth data. Our results show that adding depth (available at no extra effort with our setup) to material differentiation improves the accuracy by over 30% compared to color-only method. Our system, tested in both lab and real-world rainforest environments, shows strong performance in depth estimation, RGB reconstruction, and material differentiation—paving the way for lightweight, integrated, and robust UAV perception and data collection in complex natural environments.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"11 3","pages":"2658-2665"},"PeriodicalIF":5.3,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146026584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rapid urbanization has increased demand for customized urban mobility, making on-demand services and robo-taxis central to future transportation. The efficiency of these systems hinges on real-time fleet coordination algorithms. This work accelerates the state-of-the-art high-capacity ridepooling framework by identifying its computational bottlenecks and introducing two complementary strategies: (i) a data-driven feasibility predictor that filters low-potential trips, and (ii) a graph-partitioning scheme that enables parallelizable trip generation. Using real-world Manhattan demand data, we show that the acceleration algorithms reduce the optimality gap by up to 27% under real-time constraints and cut empty travel time by up to 5%. These improvements translate into tangible economic and environmental benefits, advancing the scalability of high-capacity robo-taxi operations in dense urban settings.
{"title":"Accelerating High-Capacity Ridepooling in Robo-Taxi Systems","authors":"Xinling Li;Daniele Gammelli;Alex Wallar;Jinhua Zhao;Gioele Zardini","doi":"10.1109/LRA.2026.3653376","DOIUrl":"https://doi.org/10.1109/LRA.2026.3653376","url":null,"abstract":"Rapid urbanization has increased demand for customized urban mobility, making on-demand services and robo-taxis central to future transportation. The efficiency of these systems hinges on real-time fleet coordination algorithms. This work accelerates the state-of-the-art high-capacity ridepooling framework by identifying its computational bottlenecks and introducing two complementary strategies: (i) a data-driven feasibility predictor that filters low-potential trips, and (ii) a graph-partitioning scheme that enables parallelizable trip generation. Using real-world Manhattan demand data, we show that the acceleration algorithms reduce the optimality gap by up to 27% under real-time constraints and cut empty travel time by up to 5%. These improvements translate into tangible economic and environmental benefits, advancing the scalability of high-capacity robo-taxi operations in dense urban settings.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"11 3","pages":"2450-2457"},"PeriodicalIF":5.3,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146001873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-13DOI: 10.1109/LRA.2026.3653384
Xubo Luo;Zhaojin Li;Xue Wan;Wei Zhang;Leizheng Shu
Accurate and real-time 6-DoF localization is mission-critical for autonomous lunar landing, yet existing approaches remain limited: visual odometry (VO) drifts unboundedly, while map-based absolute localization fails in texture-sparse or low-light terrain. We introduce KANLoc, a monocular localization framework that tightly couples VO with a lightweight but robust absolute pose regressor. At its core is a Kolmogorov–Arnold Network (KAN) that learns the complex mapping from image features to map coordinates, producing sparse but highly reliable global pose anchors. These anchors are fused into a bundle adjustment framework, effectively canceling drift while retaining local motion precision. KANLoc delivers three key advances: (i) a KAN-based pose regressor that achieves high accuracy with remarkable parameter efficiency, (ii) a hybrid VO–absolute localization scheme that yields globally consistent real-time trajectories ($geq$15 FPS), and (iii) a tailored data augmentation strategy that improves robustness to sensor occlusion. On both realistic synthetic and real lunar landing datasets, KANLoc reduces average translation and rotation error by 32% and 45%, respectively, with per-trajectory gains of up to 45% /48%, outperforming strong baselines.
准确实时的六自由度定位对于自主登月至关重要,但现有的方法仍然存在局限性:视觉里程计(VO)无边界漂移,而基于地图的绝对定位在纹理稀疏或低光照地形中失败。我们介绍了KANLoc,这是一个单目定位框架,它将VO与轻量级但鲁棒的绝对姿态回归器紧密耦合。其核心是Kolmogorov-Arnold网络(KAN),该网络学习从图像特征到地图坐标的复杂映射,产生稀疏但高度可靠的全局姿态锚。这些锚融合成一个束调整框架,有效地消除漂移,同时保持局部运动精度。KANLoc提供了三个关键的进步:(i)基于kan的姿态回归器,以显着的参数效率实现高精度,(ii)混合vo -绝对定位方案,产生全球一致的实时轨迹($geq$ 15 FPS),以及(iii)量身定制的数据增强策略,提高对传感器闭塞的鲁棒性。在现实合成和真实登月数据集上,KANLoc将平均平移和旋转误差降低了32% and 45%, respectively, with per-trajectory gains of up to 45% /48%, outperforming strong baselines.
{"title":"Learning to Anchor Visual Odometry: KAN-Based Pose Regression for Planetary Landing","authors":"Xubo Luo;Zhaojin Li;Xue Wan;Wei Zhang;Leizheng Shu","doi":"10.1109/LRA.2026.3653384","DOIUrl":"https://doi.org/10.1109/LRA.2026.3653384","url":null,"abstract":"Accurate and real-time 6-DoF localization is mission-critical for autonomous lunar landing, yet existing approaches remain limited: visual odometry (VO) drifts unboundedly, while map-based absolute localization fails in texture-sparse or low-light terrain. We introduce KANLoc, a monocular localization framework that tightly couples VO with a lightweight but robust absolute pose regressor. At its core is a Kolmogorov–Arnold Network (KAN) that learns the complex mapping from image features to map coordinates, producing sparse but highly reliable global pose anchors. These anchors are fused into a bundle adjustment framework, effectively canceling drift while retaining local motion precision. KANLoc delivers three key advances: (i) a KAN-based pose regressor that achieves high accuracy with remarkable parameter efficiency, (ii) a hybrid VO–absolute localization scheme that yields globally consistent real-time trajectories (<inline-formula><tex-math>$geq$</tex-math></inline-formula>15 FPS), and (iii) a tailored data augmentation strategy that improves robustness to sensor occlusion. On both realistic synthetic and real lunar landing datasets, KANLoc reduces average translation and rotation error by 32% and 45%, respectively, with per-trajectory gains of up to 45% /48%, outperforming strong baselines.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"11 3","pages":"3574-3581"},"PeriodicalIF":5.3,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146175820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}