首页 > 最新文献

IEEE Robotics and Automation Letters最新文献

英文 中文
ProbPer-LiLo: Probabilistic Persistency Modeling for Life-Long Mapping ProbPer-LiLo:终身映射的概率持久性建模
IF 5.3 2区 计算机科学 Q2 ROBOTICS Pub Date : 2026-01-12 DOI: 10.1109/LRA.2026.3653311
Waqas Ali;Yixi Cai;Patric Jensfelt;Thien-Minh Nguyen
3D mapping is vital for a broad range of applications that rely on a consistent and accurate representation of the environment. Change is an ever-persistent force in our world and with the evolution of a scene its 3D map becomes outdated. Thus, a mapping framework that can adapt and refine the 3D maps with the changes in the scene is necessary. In this letter, we propose a lifelong mapping framework where map maintenance is based on two objectives including the preservation of static structures and the refinement of the 3D map. To preserve only the static structures, we classify the object’s state and remove the dynamic objects and the quasi-static objects, i.e., objects which temporarily appear static. For classifying the state of objects, we propose a discrete probabilistic solution utilizing a factor graph. Using this classification, we generate static maps from multiple sessions which are used for map refinement. The refinement is based on change detection and map update, leveraging semantic and geometric information. For the evaluation, we collect a multi-campus lifelong dataset as an extension of the MCD datasets from KTH and NTU campuses. The proposed approach is capable of accurately detecting quasi-static objects even in highly dynamic environments. Our system demonstrates state of the art performance in large scale environments. Furthermore, our approach can handle both SLAM-generated and survey-grade maps.
3D映射对于依赖于一致和准确的环境表示的广泛应用至关重要。在我们的世界里,变化是一种永恒的力量,随着场景的发展,其3D地图变得过时了。因此,一个映射框架,可以适应和完善的3D地图在场景的变化是必要的。在这封信中,我们提出了一个终身的地图框架,其中地图维护基于两个目标,包括保存静态结构和改进3D地图。为了只保留静态结构,我们对对象的状态进行了分类,并删除了动态对象和准静态对象,即暂时呈现静态的对象。为了对对象的状态进行分类,我们提出了一个利用因子图的离散概率解。使用这种分类,我们从多个会话生成静态映射,用于映射细化。细化基于变更检测和地图更新,利用语义和几何信息。为了评估,我们收集了一个多校区的终身数据集,作为KTH和NTU校区MCD数据集的扩展。该方法即使在高动态环境中也能准确地检测准静态目标。我们的系统在大型环境中展示了最先进的性能。此外,我们的方法可以处理slam生成的地图和测量级地图。
{"title":"ProbPer-LiLo: Probabilistic Persistency Modeling for Life-Long Mapping","authors":"Waqas Ali;Yixi Cai;Patric Jensfelt;Thien-Minh Nguyen","doi":"10.1109/LRA.2026.3653311","DOIUrl":"https://doi.org/10.1109/LRA.2026.3653311","url":null,"abstract":"3D mapping is vital for a broad range of applications that rely on a consistent and accurate representation of the environment. Change is an ever-persistent force in our world and with the evolution of a scene its 3D map becomes outdated. Thus, a mapping framework that can adapt and refine the 3D maps with the changes in the scene is necessary. In this letter, we propose a lifelong mapping framework where map maintenance is based on two objectives including the preservation of static structures and the refinement of the 3D map. To preserve only the static structures, we classify the object’s state and remove the dynamic objects and the quasi-static objects, i.e., objects which temporarily appear static. For classifying the state of objects, we propose a discrete probabilistic solution utilizing a factor graph. Using this classification, we generate static maps from multiple sessions which are used for map refinement. The refinement is based on change detection and map update, leveraging semantic and geometric information. For the evaluation, we collect a multi-campus lifelong dataset as an extension of the MCD datasets from KTH and NTU campuses. The proposed approach is capable of accurately detecting quasi-static objects even in highly dynamic environments. Our system demonstrates state of the art performance in large scale environments. Furthermore, our approach can handle both SLAM-generated and survey-grade maps.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"11 3","pages":"2530-2537"},"PeriodicalIF":5.3,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146001870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
VERM: Leveraging Foundation Models to Create a Virtual Eye for Efficient 3D Robotic Manipulation VERM:利用基础模型创建一个有效的3D机器人操作的虚拟眼睛
IF 5.3 2区 计算机科学 Q2 ROBOTICS Pub Date : 2026-01-12 DOI: 10.1109/LRA.2026.3652073
Yixiang Chen;Yan Huang;Keji He;Peiyan Li;Liang Wang
When performing 3D manipulation tasks, robots have to execute action planning based on perceptions from multiple fixed cameras. The multi-camera setup introduces substantial redundancy and irrelevant information, which increases computational costs and forces the model to spend extra training time extracting crucial task-relevant details. To filter out redundant information and accurately extract task-relevant features, we propose the VERM (Virtual Eye for Robotic Manipulation) method, leveraging the knowledge in foundation models to imagine a virtual task-adaptive view from the constructed 3D point cloud, which efficiently captures necessary information and mitigates occlusion. To facilitate 3D action planning and fine-grained manipulation, we further design a depth-aware module and a dynamic coarse-to-fine procedure. Extensive experimental results on both simulation benchmark RLBench and real-world evaluations demonstrate the effectiveness of our method, surpassing previous state-of-the-art methods while achieving 1.89× speedup in training time and 1.54× speedup in inference speed.
在执行3D操作任务时,机器人必须根据多个固定摄像机的感知来执行行动计划。多摄像头设置引入了大量冗余和不相关信息,这增加了计算成本,并迫使模型花费额外的训练时间来提取关键的任务相关细节。为了过滤冗余信息并准确提取任务相关特征,我们提出了VERM (Virtual Eye for Robotic Manipulation)方法,利用基础模型中的知识从构建的3D点云中想象一个虚拟的任务自适应视图,有效地捕获必要的信息并减轻遮挡。为了便于三维动作规划和细粒度操作,我们进一步设计了深度感知模块和动态粗到细过程。在模拟基准RLBench和现实世界的评估上的大量实验结果表明,我们的方法是有效的,超越了以前最先进的方法,同时实现了1.89倍的训练时间加速和1.54倍的推理速度加速。
{"title":"VERM: Leveraging Foundation Models to Create a Virtual Eye for Efficient 3D Robotic Manipulation","authors":"Yixiang Chen;Yan Huang;Keji He;Peiyan Li;Liang Wang","doi":"10.1109/LRA.2026.3652073","DOIUrl":"https://doi.org/10.1109/LRA.2026.3652073","url":null,"abstract":"When performing 3D manipulation tasks, robots have to execute action planning based on perceptions from multiple fixed cameras. The multi-camera setup introduces substantial redundancy and irrelevant information, which increases computational costs and forces the model to spend extra training time extracting crucial task-relevant details. To filter out redundant information and accurately extract task-relevant features, we propose the <bold>VERM</b> (<bold>V</b>irtual <bold>E</b>ye for <bold>R</b>obotic <bold>M</b>anipulation) method, leveraging the knowledge in foundation models to imagine a virtual task-adaptive view from the constructed 3D point cloud, which efficiently captures necessary information and mitigates occlusion. To facilitate 3D action planning and fine-grained manipulation, we further design a depth-aware module and a dynamic coarse-to-fine procedure. Extensive experimental results on both simulation benchmark RLBench and real-world evaluations demonstrate the effectiveness of our method, surpassing previous state-of-the-art methods while achieving <bold>1.89×</b> speedup in training time and <bold>1.54×</b> speedup in inference speed.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"11 3","pages":"2482-2489"},"PeriodicalIF":5.3,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146001875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RUSH: Recursive and Scalable 3D Coarse to Fine Path Planning RUSH:递归和可伸缩的3D粗到细路径规划
IF 5.3 2区 计算机科学 Q2 ROBOTICS Pub Date : 2026-01-12 DOI: 10.1109/LRA.2026.3653375
Hwajung Lee;Daegeol Ko;Jaehyuk Hur;Junwon Lee;Seongbo Ha;Jong Hwan Ko;Hyeonwoo Yu
Path planning in large-scale, complex 3D environments is fundamentally constrained by a trade-off between path quality and computational speed. This paper presents RUSH (Recursive and Scalable 3D Coarse To Fine Path Planning), a hierarchical framework that resolves this trade-off. RUSH decomposes the long-range planning task into a coarse plan followed by fine-grained, independent subproblems that can be solved in parallel. These subproblems are addressed by a unified, diffusion-based network that refines an initial estimate path by learning its residual to an optimal path. This approach allows RUSH to leverage rich geometric information directly from 3D voxel maps without being bottlenecked by the full map’s complexity. We validate our method on large-scale outdoor (KITTI, MulRan) and indoor (HM3D) datasets, each spanning a 200 m× 200m× 6m map. Experimental results demonstrate that RUSH generates feasible, high-quality paths with remarkable efficiency, achieving up to a 12.59× speedup over a hierarchically accelerated A* baseline, while maintaining a path cost within 24% of the optimal solution. This performance gain positions RUSH as a powerful and practical solution for applications requiring rapid global path planning in large-scale 3D maps.
在大规模、复杂的3D环境中,路径规划从根本上受到路径质量和计算速度之间权衡的限制。本文提出了RUSH(递归和可伸缩的3D粗到细路径规划),这是一个解决这种权衡的分层框架。RUSH将长期规划任务分解为粗计划,然后是细粒度的独立子问题,这些子问题可以并行解决。这些子问题由一个统一的、基于扩散的网络来解决,该网络通过学习残差来优化初始估计路径。这种方法允许RUSH直接从3D体素地图中利用丰富的几何信息,而不会受到完整地图复杂性的瓶颈。我们在大型室外(KITTI, MulRan)和室内(HM3D)数据集上验证了我们的方法,每个数据集都跨越200 mx 200mx 6m的地图。实验结果表明,RUSH以显著的效率生成可行的高质量路径,在分层加速的a *基线上实现了高达12.59倍的加速,同时将路径成本保持在最优解的24%以内。这种性能增益使RUSH成为需要在大规模3D地图中快速进行全局路径规划的应用程序的强大而实用的解决方案。
{"title":"RUSH: Recursive and Scalable 3D Coarse to Fine Path Planning","authors":"Hwajung Lee;Daegeol Ko;Jaehyuk Hur;Junwon Lee;Seongbo Ha;Jong Hwan Ko;Hyeonwoo Yu","doi":"10.1109/LRA.2026.3653375","DOIUrl":"https://doi.org/10.1109/LRA.2026.3653375","url":null,"abstract":"Path planning in large-scale, complex 3D environments is fundamentally constrained by a trade-off between path quality and computational speed. This paper presents RUSH (Recursive and Scalable 3D <italic>Coarse To Fine</i> Path Planning), a hierarchical framework that resolves this trade-off. RUSH decomposes the long-range planning task into a coarse plan followed by fine-grained, independent subproblems that can be solved in parallel. These subproblems are addressed by a unified, diffusion-based network that refines an initial estimate path by learning its residual to an optimal path. This approach allows RUSH to leverage rich geometric information directly from 3D voxel maps without being bottlenecked by the full map’s complexity. We validate our method on large-scale outdoor (KITTI, MulRan) and indoor (HM3D) datasets, each spanning a 200 m× 200m× 6m map. Experimental results demonstrate that RUSH generates feasible, high-quality paths with remarkable efficiency, achieving up to a 12.59× speedup over a hierarchically accelerated A* baseline, while maintaining a path cost within 24% of the optimal solution. This performance gain positions RUSH as a powerful and practical solution for applications requiring rapid global path planning in large-scale 3D maps.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"11 2","pages":"2346-2353"},"PeriodicalIF":5.3,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146026334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reliable and Fast Humans Removed Visual Scene Representation 可靠和快速的人类去除视觉场景表示
IF 5.3 2区 计算机科学 Q2 ROBOTICS Pub Date : 2026-01-12 DOI: 10.1109/LRA.2026.3653306
Serhat İşcan;H. Iṣıl Bozma
This paper introduces a reliable and fast method for scene representation from a single RGB frame, even with human occlusion. Our goal is to enhance vision-based spatial reasoning in dynamic environments where human presence varies over time. Once humans are detected, the method addresses two key challenges: estimating the level of visual obstruction and generating a scene descriptor with humans removed. The first is handled via a novel visual obstruction measure that prevents descriptor generation under high occlusion. The second is addressed by adapting the previously presented bubble descriptor so that surface regions corresponding to detected humans are deformed using a modified spherical interpolation method—eliminating the need for inpainting or reconstruction and enabling rapid computation. We validate our approach through extensive comparisons across multiple datasets, including two new datasets collected using both stationary and mobile robots. Results show comparable representation quality with a 14–44 × reduction in computation time.
本文介绍了一种可靠而快速的方法,用于从单个RGB帧中表示场景,即使有人类遮挡。我们的目标是在人类存在随时间变化的动态环境中增强基于视觉的空间推理。一旦检测到人类,该方法解决了两个关键挑战:估计视觉障碍的水平,并生成去除人类的场景描述符。首先是通过一种新的视觉阻塞措施来处理,该措施可以防止在高遮挡下生成描述符。第二个问题是通过调整先前提出的气泡描述符来解决的,以便使用改进的球面插值方法对检测到的人体对应的表面区域进行变形,从而消除了对涂漆或重建的需要,并实现了快速计算。我们通过对多个数据集的广泛比较来验证我们的方法,包括使用固定和移动机器人收集的两个新数据集。结果表明,在计算时间减少14-44倍的情况下,具有相当的表示质量。
{"title":"Reliable and Fast Humans Removed Visual Scene Representation","authors":"Serhat İşcan;H. Iṣıl Bozma","doi":"10.1109/LRA.2026.3653306","DOIUrl":"https://doi.org/10.1109/LRA.2026.3653306","url":null,"abstract":"This paper introduces a reliable and fast method for scene representation from a single RGB frame, even with human occlusion. Our goal is to enhance vision-based spatial reasoning in dynamic environments where human presence varies over time. Once humans are detected, the method addresses two key challenges: estimating the level of visual obstruction and generating a scene descriptor with humans removed. The first is handled via a novel visual obstruction measure that prevents descriptor generation under high occlusion. The second is addressed by adapting the previously presented bubble descriptor so that surface regions corresponding to detected humans are deformed using a modified spherical interpolation method—eliminating the need for inpainting or reconstruction and enabling rapid computation. We validate our approach through extensive comparisons across multiple datasets, including two new datasets collected using both stationary and mobile robots. Results show comparable representation quality with a 14–44 × reduction in computation time.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"11 3","pages":"2730-2737"},"PeriodicalIF":5.3,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146026545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Accounting for the Interaction Between a Dummy Finger and Joint Modular Soft Actuators for Multi-Joint Support Using a Novel FEM-Based Approach 基于有限元分析的多关节支撑模块化软执行器与假指交互作用分析
IF 5.3 2区 计算机科学 Q2 ROBOTICS Pub Date : 2026-01-12 DOI: 10.1109/LRA.2026.3653334
Pablo E. Tortós-Vinocour;Shota Kokubu;Fuko Matsunaga;Yuxi Lu;Zhongchaou Zhou;Naoki Kamijo;María Cordero-Alvarado;Jose Gomez-Tames;Wenwei Yu
Soft actuators are safer than rigid robots for hand rehabilitation, yet their performance can be significantly affected by interactions that occur on multi-joint systems. Actuator–finger and actuator–actuator interactions can impact bending output and make actuator performance dependent on the actuation pattern. To address this, we developed and validated a finite element model of a three-joint modular actuator system attached to a dummy finger. The simulation revealed that the displacement between actuators, and the contact area between the fingers and the actuators are key factors influencing actuator performance. We proposed a novel attachment method to enhance contact area and reduce actuator displacement and compared it against five existing designs across two actuator types and three actuation patterns. Our results demonstrate improved bending and reduced dependence of actuator performance on actuation pattern. This study makes a dual contribution to the area of soft robotics for hand rehabilitation by proposing a novel FEM framework for modeling soft actuators attached to multi-joint systems as well as providing insights on attachment method design for soft actuators in hand rehabilitation, emphasizing the importance of actuator–actuator and actuator–finger interactions.
在手部康复中,软致动器比刚性机器人更安全,但其性能会受到多关节系统相互作用的显著影响。致动器-手指和致动器-致动器的相互作用会影响弯曲输出,并使致动器的性能依赖于致动模式。为了解决这个问题,我们开发并验证了附着在假手指上的三关节模块化致动器系统的有限元模型。仿真结果表明,执行器之间的位移和手指与执行器的接触面积是影响执行器性能的关键因素。我们提出了一种新的连接方法,以增加接触面积和减少驱动器位移,并将其与五种现有的两种驱动器类型和三种驱动模式的设计进行了比较。我们的结果表明,提高了弯曲和减少了驱动器性能对驱动模式的依赖。本研究为手部康复软机器人领域做出了双重贡献,提出了一种新的多关节系统连接软执行器的FEM框架,并为手部康复软执行器的连接方法设计提供了见解,强调了执行器-执行器和执行器-手指交互的重要性。
{"title":"Accounting for the Interaction Between a Dummy Finger and Joint Modular Soft Actuators for Multi-Joint Support Using a Novel FEM-Based Approach","authors":"Pablo E. Tortós-Vinocour;Shota Kokubu;Fuko Matsunaga;Yuxi Lu;Zhongchaou Zhou;Naoki Kamijo;María Cordero-Alvarado;Jose Gomez-Tames;Wenwei Yu","doi":"10.1109/LRA.2026.3653334","DOIUrl":"https://doi.org/10.1109/LRA.2026.3653334","url":null,"abstract":"Soft actuators are safer than rigid robots for hand rehabilitation, yet their performance can be significantly affected by interactions that occur on multi-joint systems. Actuator–finger and actuator–actuator interactions can impact bending output and make actuator performance dependent on the actuation pattern. To address this, we developed and validated a finite element model of a three-joint modular actuator system attached to a dummy finger. The simulation revealed that the displacement between actuators, and the contact area between the fingers and the actuators are key factors influencing actuator performance. We proposed a novel attachment method to enhance contact area and reduce actuator displacement and compared it against five existing designs across two actuator types and three actuation patterns. Our results demonstrate improved bending and reduced dependence of actuator performance on actuation pattern. This study makes a dual contribution to the area of soft robotics for hand rehabilitation by proposing a novel FEM framework for modeling soft actuators attached to multi-joint systems as well as providing insights on attachment method design for soft actuators in hand rehabilitation, emphasizing the importance of actuator–actuator and actuator–finger interactions.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"11 3","pages":"2578-2585"},"PeriodicalIF":5.3,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11345949","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146026547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Virtual-Force Based Visual Servo for Multiple Peg-in-Hole Assembly With Tightly Coupled Multi-Manipulator 基于虚拟力的多机械臂紧耦合孔钉装配视觉伺服
IF 5.3 2区 计算机科学 Q2 ROBOTICS Pub Date : 2026-01-12 DOI: 10.1109/LRA.2026.3653374
Jiawei Zhang;Chengchao Bai;Wei Pan;Jifeng Guo
Multiple Peg-in-Hole (MPiH) assembly is one of the fundamental tasks in robotic assembly. In the MPiH tasks for large-size parts, it is challenging for a single manipulator to simultaneously align multiple distant pegs and holes, necessitating tightly coupled multi-manipulator systems. For such MPiH tasks using tightly coupled multiple manipulators, we propose a collaborative visual servo control framework that uses only the monocular in-hand cameras of each manipulator to reduce positioning errors. Initially, we train a state classification neural network and a positioning neural network. The former divides the states of the peg and hole in the image into three categories: obscured, separated, and overlapped, while the latter determines the position of the peg and hole in the image. Based on these findings, we propose a method to integrate the visual features of multiple manipulators using virtual forces, which can naturally combine with the cooperative controller of the multi-manipulator system. To generalize our approach to holes of different appearances, we varied the appearance of the holes during the dataset generation process. The results confirm that by considering the appearance of the holes, classification accuracy and positioning precision can be improved. Finally, the results show that our method achieves 100% success rate in dual-manipulator dual peg-in-hole tasks with a clearance of 0.2 mm, while robust to camera calibration errors.
多孔钉装配(MPiH)是机器人装配的基本任务之一。在大尺寸零件的MPiH任务中,单个机械手无法同时对准多个远距离的销和孔,因此需要多机械手系统的紧密耦合。针对这种多机械臂紧密耦合的MPiH任务,我们提出了一种仅使用每个机械臂单眼手持相机的协同视觉伺服控制框架,以减少定位误差。首先,我们训练一个状态分类神经网络和一个定位神经网络。前者将图像中钉和孔的状态分为遮挡、分离和重叠三类,后者确定钉和孔在图像中的位置。在此基础上,我们提出了一种利用虚拟力整合多机械臂视觉特征的方法,该方法可以与多机械臂系统的协同控制器自然结合。为了将我们的方法推广到不同外观的洞,我们在数据集生成过程中改变了洞的外观。结果表明,通过考虑孔的外观,可以提高分类精度和定位精度。最后,实验结果表明,该方法在间隙为0.2 mm的双机械臂双钉孔任务中成功率为100%,且对摄像机标定误差具有鲁棒性。
{"title":"Virtual-Force Based Visual Servo for Multiple Peg-in-Hole Assembly With Tightly Coupled Multi-Manipulator","authors":"Jiawei Zhang;Chengchao Bai;Wei Pan;Jifeng Guo","doi":"10.1109/LRA.2026.3653374","DOIUrl":"https://doi.org/10.1109/LRA.2026.3653374","url":null,"abstract":"Multiple Peg-in-Hole (MPiH) assembly is one of the fundamental tasks in robotic assembly. In the MPiH tasks for large-size parts, it is challenging for a single manipulator to simultaneously align multiple distant pegs and holes, necessitating tightly coupled multi-manipulator systems. For such MPiH tasks using tightly coupled multiple manipulators, we propose a collaborative visual servo control framework that uses only the monocular in-hand cameras of each manipulator to reduce positioning errors. Initially, we train a state classification neural network and a positioning neural network. The former divides the states of the peg and hole in the image into three categories: obscured, separated, and overlapped, while the latter determines the position of the peg and hole in the image. Based on these findings, we propose a method to integrate the visual features of multiple manipulators using virtual forces, which can naturally combine with the cooperative controller of the multi-manipulator system. To generalize our approach to holes of different appearances, we varied the appearance of the holes during the dataset generation process. The results confirm that by considering the appearance of the holes, classification accuracy and positioning precision can be improved. Finally, the results show that our method achieves 100% success rate in dual-manipulator dual peg-in-hole tasks with a clearance of 0.2 mm, while robust to camera calibration errors.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"11 3","pages":"2586-2593"},"PeriodicalIF":5.3,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146026563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RefDiffMap: Diffusion-Guided Progressive Refinement for Vectorized HD Map Construction RefDiffMap:用于矢量化高清地图构建的扩散引导逐步细化
IF 5.3 2区 计算机科学 Q2 ROBOTICS Pub Date : 2026-01-12 DOI: 10.1109/LRA.2026.3653402
Wenjie Gao;Entao Chang;Jiawei Fu;Ziyu Zhu;Shitao Chen;Nanning Zheng
High-definition (HD) map learning serves as an essential component of autonomous driving scene understanding, providing structured priors for planning and prediction. Recent transformer-based methods regress vectorized map elements via deformable attention over Bird’s-Eye View (BEV) features. They typically employ a single-pass paradigm, starting from a set of initial queries. However, these queries struggle to precisely localize map elements within the large-scale BEV space. This difficulty is severely amplified when using lightweight backbones that produce less distinctive features. To address this, we propose RefDiffMap, which recasts map construction as a progressive refinement process driven by a diffusion model. We introduce a novel denoising query generator that, at each step, leverages the intermediate noisy geometry to sample relevant features from adaptive BEV RoIs. These features are distilled into context-aware queries that guide the decoder’s next refinement. This creates a powerful geometry-feature co-evolution loop, allowing the model to iteratively correct localization errors. Comprehensive experiments show that RefDiffMap achieves competitive performance on the nuScenes and Argoverse 2 datasets. Notably, its robustness is highlighted with a ResNet-18 backbone, where it improves mAP by a significant 11.3% over our baseline MapTRv2. Further ablation studies validate the effectiveness of our approach.
高清(HD)地图学习是自动驾驶场景理解的重要组成部分,为规划和预测提供结构化的先验。最近基于变压器的方法通过鸟瞰(BEV)特征上的可变形关注来回归矢量化地图元素。它们通常采用单遍范式,从一组初始查询开始。然而,这些查询很难在大规模BEV空间中精确定位地图元素。当使用产生不太明显特征的轻量级主干时,这种困难会被严重放大。为了解决这个问题,我们提出了RefDiffMap,它将地图构建重塑为一个由扩散模型驱动的渐进细化过程。我们引入了一种新的去噪查询生成器,在每一步,利用中间噪声几何来从自适应BEV roi中采样相关特征。这些特性被提炼成上下文感知查询,指导解码器的下一个改进。这创建了一个强大的几何特征协同进化循环,允许模型迭代地纠正定位错误。综合实验表明,RefDiffMap在nuScenes和Argoverse 2数据集上取得了具有竞争力的性能。值得注意的是,它的鲁棒性在ResNet-18骨干网上得到了突出体现,与我们的基线MapTRv2相比,它将mAP提高了11.3%。进一步的消融研究证实了我们方法的有效性。
{"title":"RefDiffMap: Diffusion-Guided Progressive Refinement for Vectorized HD Map Construction","authors":"Wenjie Gao;Entao Chang;Jiawei Fu;Ziyu Zhu;Shitao Chen;Nanning Zheng","doi":"10.1109/LRA.2026.3653402","DOIUrl":"https://doi.org/10.1109/LRA.2026.3653402","url":null,"abstract":"High-definition (HD) map learning serves as an essential component of autonomous driving scene understanding, providing structured priors for planning and prediction. Recent transformer-based methods regress vectorized map elements via deformable attention over Bird’s-Eye View (BEV) features. They typically employ a single-pass paradigm, starting from a set of initial queries. However, these queries struggle to precisely localize map elements within the large-scale BEV space. This difficulty is severely amplified when using lightweight backbones that produce less distinctive features. To address this, we propose RefDiffMap, which recasts map construction as a progressive refinement process driven by a diffusion model. We introduce a novel denoising query generator that, at each step, leverages the intermediate noisy geometry to sample relevant features from adaptive BEV RoIs. These features are distilled into context-aware queries that guide the decoder’s next refinement. This creates a powerful geometry-feature co-evolution loop, allowing the model to iteratively correct localization errors. Comprehensive experiments show that RefDiffMap achieves competitive performance on the nuScenes and Argoverse 2 datasets. Notably, its robustness is highlighted with a ResNet-18 backbone, where it improves mAP by a significant 11.3% over our baseline MapTRv2. Further ablation studies validate the effectiveness of our approach.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"11 3","pages":"2554-2561"},"PeriodicalIF":5.3,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146026607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Informative Planning Framework for Target Tracking and Active Mapping in Dynamic Environments With ASVs 基于asv的动态环境下目标跟踪与主动映射的信息规划框架
IF 5.3 2区 计算机科学 Q2 ROBOTICS Pub Date : 2026-01-12 DOI: 10.1109/LRA.2026.3653335
Sanjeev Ramkumar Sudha;Marija Popović;Erlend M. Coates
Mobile robot platforms are increasingly being used to automate information gathering tasks such as environmental monitoring. Efficient target tracking in dynamic environments is critical for applications such as search and rescue and pollutant cleanups. In this letter, we study active mapping of floating targets that drift due to environmental disturbances such as wind and currents. This is a challenging problem as it involves predicting both spatial and temporal variations in the map due to changing conditions. We introduce an integrated framework combining dynamic occupancy grid mapping and an informative planning approach to actively map and track freely drifting targets with an autonomous surface vehicle. A key component of our adaptive planning approach is a spatiotemporal prediction network that predicts target position distributions over time. We further propose a planning objective for target tracking that leverages these predictions. Simulation experiments show that this planning objective improves target tracking performance compared to existing methods that consider only entropy reduction as the planning objective. Finally, we validate our approach in field tests, showcasing its ability to track targets in real-world monitoring scenarios.
移动机器人平台越来越多地被用于自动化信息收集任务,如环境监测。在动态环境中有效的目标跟踪对于搜索、救援和污染物清理等应用至关重要。在这封信中,我们研究了由于风和洋流等环境干扰而漂移的浮动目标的主动映射。这是一个具有挑战性的问题,因为它涉及到预测由于条件变化而导致的地图空间和时间变化。我们介绍了一个结合动态占用网格映射和信息规划方法的集成框架,用于自动水面车辆主动映射和跟踪自由漂移目标。我们的适应性规划方法的一个关键组成部分是一个时空预测网络,预测目标位置随时间的分布。我们进一步提出了利用这些预测进行目标跟踪的规划目标。仿真实验表明,与仅以熵降为规划目标的现有方法相比,该规划目标提高了目标跟踪性能。最后,我们在现场测试中验证了我们的方法,展示了它在实际监控场景中跟踪目标的能力。
{"title":"An Informative Planning Framework for Target Tracking and Active Mapping in Dynamic Environments With ASVs","authors":"Sanjeev Ramkumar Sudha;Marija Popović;Erlend M. Coates","doi":"10.1109/LRA.2026.3653335","DOIUrl":"https://doi.org/10.1109/LRA.2026.3653335","url":null,"abstract":"Mobile robot platforms are increasingly being used to automate information gathering tasks such as environmental monitoring. Efficient target tracking in dynamic environments is critical for applications such as search and rescue and pollutant cleanups. In this letter, we study active mapping of floating targets that drift due to environmental disturbances such as wind and currents. This is a challenging problem as it involves predicting both spatial and temporal variations in the map due to changing conditions. We introduce an integrated framework combining dynamic occupancy grid mapping and an informative planning approach to actively map and track freely drifting targets with an autonomous surface vehicle. A key component of our adaptive planning approach is a spatiotemporal prediction network that predicts target position distributions over time. We further propose a planning objective for target tracking that leverages these predictions. Simulation experiments show that this planning objective improves target tracking performance compared to existing methods that consider only entropy reduction as the planning objective. Finally, we validate our approach in field tests, showcasing its ability to track targets in real-world monitoring scenarios.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"11 3","pages":"2690-2697"},"PeriodicalIF":5.3,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146026608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HEAPGrasp: Hand-Eye Active Perception to Grasp Objects With Diverse Optical Properties HEAPGrasp:手眼主动感知抓取具有不同光学特性的物体
IF 5.3 2区 计算机科学 Q2 ROBOTICS Pub Date : 2026-01-12 DOI: 10.1109/LRA.2026.3653331
Ginga Kennis;Shogo Arai
Autonomous robotic handling requires accurate 3-D scene measurement followed by grasp planning. Conventional systems struggle with transparent or specular objects.Additionally, in hand–eye setups, moving through multiple viewpoints increases handling execution time. In this paper, we propose HEAPGrasp—Hand-Eye Active Perception to Grasp objects with diverse optical properties. To measure such objects, we focus on the ability to segment objects regardless of their optical properties in RGB images. We employ Shape from Silhouette based on the segmented images for 3-D measurement. To shorten the time required for multi-view capture with a hand-eye camera, we plan its trajectory using a cost function that balances 3-D measurement accuracy against its trajectory length. Real-robot experiments achieve a 96.0% grasp success rate on transparent, specular, and opaque objects, while reducing the hand-eye camera’s trajectory length by 52% and handling execution time by 19% relative to a baseline that circles around the scene for 3-D measurement.
自主机器人操作需要精确的三维场景测量,然后进行抓取规划。传统的系统难以处理透明或高光物体。此外,在手眼设置中,通过多个视点移动会增加处理执行时间。在本文中,我们提出了heapgrip - hand - eye Active Perception来抓取具有不同光学特性的物体。为了测量这样的物体,我们关注的是分割物体的能力,而不管它们在RGB图像中的光学特性如何。我们使用基于分割图像的轮廓形状进行三维测量。为了缩短手眼相机多视角捕获所需的时间,我们使用平衡3d测量精度和轨迹长度的成本函数来规划其轨迹。真实机器人实验在透明、镜面和不透明物体上的抓取成功率达到96.0%,相对于环绕场景进行三维测量的基线,手眼相机的轨迹长度减少了52%,处理执行时间减少了19%。
{"title":"HEAPGrasp: Hand-Eye Active Perception to Grasp Objects With Diverse Optical Properties","authors":"Ginga Kennis;Shogo Arai","doi":"10.1109/LRA.2026.3653331","DOIUrl":"https://doi.org/10.1109/LRA.2026.3653331","url":null,"abstract":"Autonomous robotic handling requires accurate 3-D scene measurement followed by grasp planning. Conventional systems struggle with transparent or specular objects.Additionally, in hand–eye setups, moving through multiple viewpoints increases handling execution time. In this paper, we propose HEAPGrasp—Hand-Eye Active Perception to Grasp objects with diverse optical properties. To measure such objects, we focus on the ability to segment objects regardless of their optical properties in RGB images. We employ Shape from Silhouette based on the segmented images for 3-D measurement. To shorten the time required for multi-view capture with a hand-eye camera, we plan its trajectory using a cost function that balances 3-D measurement accuracy against its trajectory length. Real-robot experiments achieve a 96.0% grasp success rate on transparent, specular, and opaque objects, while reducing the hand-eye camera’s trajectory length by 52% and handling execution time by 19% relative to a baseline that circles around the scene for 3-D measurement.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"11 3","pages":"3206-3213"},"PeriodicalIF":5.3,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11345713","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146082220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SilRef: Joint Visual Silhouette and Tactile Pose Optimization for Transparent Object Manipulation 面向透明对象操作的关节视觉轮廓和触觉姿态优化
IF 5.3 2区 计算机科学 Q2 ROBOTICS Pub Date : 2026-01-12 DOI: 10.1109/LRA.2026.3653340
Jean-Baptiste Weibel;Clemence Dubois;Negar Layegh Khavidaki;Saifeddine Aloui;Mathieu Grossard;Markus Vincze;Andreas Holzinger
Transparent objects are ubiquitous in laboratory automation settings, as liquids need to be visually controlled regularly. Automating laboratory processes would make the creation of small-batch medication feasible, thus making more personalized and better-targeted treatments more accessible. However, transparent objects present a major challenge for robust vision systems, in turn compromising their manipulation. Their appearance varies depending on the environment and depth sensors fail to capture their measurements. These objects therefore break central assumptions made by depth-based as well as render-and-compare pose refinement strategies. To ensure reliable pose estimation, we propose Silhouette-based object pose Refinement (SilRef), a novel pose refinement approach leveraging object silhouette detection and geometric cues, circumventing the need for depth maps or realistic rendering making it robust to environment change. Our proposed formulation directly optimizes the poses by gradient descent based on 3D models rendering and benefits from a large convergence basin. SilRef is evaluated on the Keypose dataset and the newly collected Tracebot In-Gripper dataset. Results show an improvement of 2.8x and 2.7x in Average Distance of Model Points-Symmetric (ADD-S@0.01 m) when the object is standing on a surface and when the object is already grasped, respectively, compared to Megapose6D and ICP (Iterative Closest Point).
透明物体在实验室自动化设置中无处不在,因为液体需要定期进行视觉控制。自动化实验室流程将使小批量药物的创造成为可能,从而使更个性化和更有针对性的治疗更容易获得。然而,透明物体对强大的视觉系统提出了重大挑战,从而影响了它们的操作。它们的外观因环境而异,深度传感器无法捕捉到它们的测量值。因此,这些对象打破了基于深度以及渲染和比较姿态优化策略的中心假设。为了确保可靠的姿态估计,我们提出了基于轮廓的物体姿态改进(SilRef),这是一种利用物体轮廓检测和几何线索的新型姿态改进方法,绕过了对深度图或逼真渲染的需要,使其对环境变化具有鲁棒性。我们提出的公式基于三维模型渲染,直接通过梯度下降优化姿态,并受益于一个大的收敛盆地。在Keypose数据集和新收集的Tracebot In-Gripper数据集上评估SilRef。结果表明,与Megapose6D(迭代最近点)和ICP(迭代最近点)相比,当物体站在表面上和物体已经被抓取时,模型点对称的平均距离(ADD-S@0.01 m)分别提高了2.8倍和2.7倍。
{"title":"SilRef: Joint Visual Silhouette and Tactile Pose Optimization for Transparent Object Manipulation","authors":"Jean-Baptiste Weibel;Clemence Dubois;Negar Layegh Khavidaki;Saifeddine Aloui;Mathieu Grossard;Markus Vincze;Andreas Holzinger","doi":"10.1109/LRA.2026.3653340","DOIUrl":"https://doi.org/10.1109/LRA.2026.3653340","url":null,"abstract":"Transparent objects are ubiquitous in laboratory automation settings, as liquids need to be visually controlled regularly. Automating laboratory processes would make the creation of small-batch medication feasible, thus making more personalized and better-targeted treatments more accessible. However, transparent objects present a major challenge for robust vision systems, in turn compromising their manipulation. Their appearance varies depending on the environment and depth sensors fail to capture their measurements. These objects therefore break central assumptions made by depth-based as well as render-and-compare pose refinement strategies. To ensure reliable pose estimation, we propose Silhouette-based object pose Refinement (SilRef), a novel pose refinement approach leveraging object silhouette detection and geometric cues, circumventing the need for depth maps or realistic rendering making it robust to environment change. Our proposed formulation directly optimizes the poses by gradient descent based on 3D models rendering and benefits from a large convergence basin. SilRef is evaluated on the Keypose dataset and the newly collected Tracebot In-Gripper dataset. Results show an improvement of 2.8x and 2.7x in Average Distance of Model Points-Symmetric (ADD-S@0.01 m) when the object is standing on a surface and when the object is already grasped, respectively, compared to Megapose6D and ICP (Iterative Closest Point).","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"11 3","pages":"2490-2497"},"PeriodicalIF":5.3,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11346999","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146001866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Robotics and Automation Letters
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1