Pub Date : 2025-02-25DOI: 10.1109/LRA.2025.3543709
{"title":"IEEE Robotics and Automation Society Information","authors":"","doi":"10.1109/LRA.2025.3543709","DOIUrl":"https://doi.org/10.1109/LRA.2025.3543709","url":null,"abstract":"","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 3","pages":"C3-C3"},"PeriodicalIF":4.6,"publicationDate":"2025-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10903211","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143512927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-25DOI: 10.1109/LRA.2025.3543711
{"title":"IEEE Robotics and Automation Letters Information for Authors","authors":"","doi":"10.1109/LRA.2025.3543711","DOIUrl":"https://doi.org/10.1109/LRA.2025.3543711","url":null,"abstract":"","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 3","pages":"C4-C4"},"PeriodicalIF":4.6,"publicationDate":"2025-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10903547","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143512926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-25DOI: 10.1109/LRA.2025.3543707
{"title":"IEEE Robotics and Automation Society Information","authors":"","doi":"10.1109/LRA.2025.3543707","DOIUrl":"https://doi.org/10.1109/LRA.2025.3543707","url":null,"abstract":"","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 3","pages":"C2-C2"},"PeriodicalIF":4.6,"publicationDate":"2025-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10903152","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143489279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-18DOI: 10.1109/LRA.2025.3543137
Zizhe Zhang;Yuan Yang;Wenqiang Zuo;Guangming Song;Aiguo Song;Yang Shi
The cooperation of a pair of robot manipulators is required to manipulate a target object without any fixtures. The conventional control methods coordinate the end-effector pose of each manipulator with that of the other using their kinematics and joint coordinate measurements. Yet, the manipulators' inaccurate kinematics and joint coordinate measurements can cause significant pose synchronization errors in practice. This letter thus proposes an image-based visual servoing approach for enhancing the cooperation of a dual-arm manipulation system. On top of the classical control, the visual servoing controller lets each manipulator use its carried camera to measure the image features of the other's marker and adapt its end-effector pose with the counterpart on the move. Because visual measurements are robust to kinematic errors, the proposed control can reduce the end-effector pose synchronization errors and the fluctuations of the interaction forces of the pair of manipulators on the move. Theoretical analyses have rigorously proven the stability of the closed-loop system. Comparative experiments on real robots have substantiated the effectiveness of the proposed control.
{"title":"Image-Based Visual Servoing for Enhanced Cooperation of Dual-Arm Manipulation","authors":"Zizhe Zhang;Yuan Yang;Wenqiang Zuo;Guangming Song;Aiguo Song;Yang Shi","doi":"10.1109/LRA.2025.3543137","DOIUrl":"https://doi.org/10.1109/LRA.2025.3543137","url":null,"abstract":"The cooperation of a pair of robot manipulators is required to manipulate a target object without any fixtures. The conventional control methods coordinate the end-effector pose of each manipulator with that of the other using their kinematics and joint coordinate measurements. Yet, the manipulators' inaccurate kinematics and joint coordinate measurements can cause significant pose synchronization errors in practice. This letter thus proposes an image-based visual servoing approach for enhancing the cooperation of a dual-arm manipulation system. On top of the classical control, the visual servoing controller lets each manipulator use its carried camera to measure the image features of the other's marker and adapt its end-effector pose with the counterpart on the move. Because visual measurements are robust to kinematic errors, the proposed control can reduce the end-effector pose synchronization errors and the fluctuations of the interaction forces of the pair of manipulators on the move. Theoretical analyses have rigorously proven the stability of the closed-loop system. Comparative experiments on real robots have substantiated the effectiveness of the proposed control.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 4","pages":"3374-3381"},"PeriodicalIF":4.6,"publicationDate":"2025-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143521422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-17DOI: 10.1109/LRA.2025.3542693
Chenguang Huang;Shengchao Yan;Wolfram Burgard
Dynamic scene understanding remains a persistent challenge in robotic applications. Early dynamic mapping methods focused on mitigating the negative influence of short-term dynamic objects on camera motion estimation by masking or tracking specific categories, which often fall short in adapting to long-term scene changes. Recent efforts address object association in long-term dynamic environments using neural networks trained on synthetic datasets, but they still rely on predefined object shapes and categories. Other methods incorporate visual, geometric, or semantic heuristics for the association but often lack robustness. In this work, we introduce BYE, a class-agnostic, per-scene point cloud encoder that removes the need for predefined categories, shape priors, or extensive association datasets. Trained on only a single sequence of exploration data, BYE can efficiently perform object association in dynamically changing scenes. We further propose an ensembling scheme combining the semantic strengths of Vision Language Models (VLMs) with the scene-specific expertise of BYE, achieving a 7% improvement and a 95% success rate in object association tasks.
{"title":"BYE: Build Your Encoder With One Sequence of Exploration Data for Long-Term Dynamic Scene Understanding","authors":"Chenguang Huang;Shengchao Yan;Wolfram Burgard","doi":"10.1109/LRA.2025.3542693","DOIUrl":"https://doi.org/10.1109/LRA.2025.3542693","url":null,"abstract":"Dynamic scene understanding remains a persistent challenge in robotic applications. Early dynamic mapping methods focused on mitigating the negative influence of short-term dynamic objects on camera motion estimation by masking or tracking specific categories, which often fall short in adapting to long-term scene changes. Recent efforts address object association in long-term dynamic environments using neural networks trained on synthetic datasets, but they still rely on predefined object shapes and categories. Other methods incorporate visual, geometric, or semantic heuristics for the association but often lack robustness. In this work, we introduce BYE, a class-agnostic, per-scene point cloud encoder that removes the need for predefined categories, shape priors, or extensive association datasets. Trained on only a single sequence of exploration data, BYE can efficiently perform object association in dynamically changing scenes. We further propose an ensembling scheme combining the semantic strengths of Vision Language Models (VLMs) with the scene-specific expertise of BYE, achieving a 7% improvement and a 95% success rate in object association tasks.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 4","pages":"3334-3341"},"PeriodicalIF":4.6,"publicationDate":"2025-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143512806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Inspired by how humans perceive, remember, and understand the world, semantic graphs have become an efficient solution for place representation and location. However, many current graph-based LiDAR loop closing methods focus on extracting adjacency matrices or semantic histograms to describe the scene, which ignore a lot of multifaceted topology information for efficiency. In this letter, we propose a LiDAR loop closing method based on semantic graph with triangular spatial topology (SGT-LLC), which fully considers both semantic and spatial topological information. To ensure that descriptors contain robust spatial information while maintaining good rotation invariance, a local descriptor based on semantic topological encoding and triangular spatial topology is proposed, which can effectively correlate scenes and estimate 6-DoF poses. In addition, we aggregate local descriptors from various nodes in the graph using fuzzy classification to create lightweight database and efficient global search. Extensive experiments on KITTI, KITTI360, Apollo, MulRAN and MCD datasets prove the superiority of our approach, compared with state-of-art methods.
{"title":"SGT-LLC: LiDAR Loop Closing Based on Semantic Graph With Triangular Spatial Topology","authors":"Shaocong Wang;Fengkui Cao;Ting Wang;Xieyuanli Chen;Shiliang Shao","doi":"10.1109/LRA.2025.3542695","DOIUrl":"https://doi.org/10.1109/LRA.2025.3542695","url":null,"abstract":"Inspired by how humans perceive, remember, and understand the world, semantic graphs have become an efficient solution for place representation and location. However, many current graph-based LiDAR loop closing methods focus on extracting adjacency matrices or semantic histograms to describe the scene, which ignore a lot of multifaceted topology information for efficiency. In this letter, we propose a LiDAR loop closing method based on semantic graph with triangular spatial topology (SGT-LLC), which fully considers both semantic and spatial topological information. To ensure that descriptors contain robust spatial information while maintaining good rotation invariance, a local descriptor based on semantic topological encoding and triangular spatial topology is proposed, which can effectively correlate scenes and estimate 6-DoF poses. In addition, we aggregate local descriptors from various nodes in the graph using fuzzy classification to create lightweight database and efficient global search. Extensive experiments on KITTI, KITTI360, Apollo, MulRAN and MCD datasets prove the superiority of our approach, compared with state-of-art methods.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 4","pages":"3326-3333"},"PeriodicalIF":4.6,"publicationDate":"2025-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143512744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-14DOI: 10.1109/LRA.2025.3542317
Zhengtong Xu;Qiang Qiu;Yu She
In the era of generative AI, integrating video generation models into robotics opens new possibilities for the general-purpose robot agent. This letter introduces imitation learning with latent video planning (VILP). We propose a latent video diffusion model to generate predictive robot videos that adhere to temporal consistency to a good degree. Our method is able to generate highly time-aligned videos from multiple views, which is crucial for robot policy learning. Our video generation model is highly time-efficient. For example, it can generate videos from two distinct perspectives, each consisting of six frames with a resolution of 96 × 160 pixels, at a rate of 5 Hz. In the experiments, we demonstrate that VILP outperforms the existing video generation robot policy across several metrics: training costs, inference speed, temporal consistency of generated videos, and the performance of the policy. We also compared our method with other imitation learning methods. Our findings indicate that VILP can rely less on extensive high-quality task-specific robot action data while still maintaining robust performance. In addition, VILP possesses robust capabilities in representing multi-modal action distributions. Our paper provides a practical example of how to effectively integrate video generation models into robot policies, potentially offering insights for related fields and directions.
{"title":"VILP: Imitation Learning With Latent Video Planning","authors":"Zhengtong Xu;Qiang Qiu;Yu She","doi":"10.1109/LRA.2025.3542317","DOIUrl":"https://doi.org/10.1109/LRA.2025.3542317","url":null,"abstract":"In the era of generative AI, integrating video generation models into robotics opens new possibilities for the general-purpose robot agent. This letter introduces imitation learning with latent video planning (VILP). We propose a latent video diffusion model to generate predictive robot videos that adhere to temporal consistency to a good degree. Our method is able to generate highly time-aligned videos from multiple views, which is crucial for robot policy learning. Our video generation model is highly time-efficient. For example, it can generate videos from two distinct perspectives, each consisting of six frames with a resolution of 96 × 160 pixels, at a rate of 5 Hz. In the experiments, we demonstrate that VILP outperforms the existing video generation robot policy across several metrics: training costs, inference speed, temporal consistency of generated videos, and the performance of the policy. We also compared our method with other imitation learning methods. Our findings indicate that VILP can rely less on extensive high-quality task-specific robot action data while still maintaining robust performance. In addition, VILP possesses robust capabilities in representing multi-modal action distributions. Our paper provides a practical example of how to effectively integrate video generation models into robot policies, potentially offering insights for related fields and directions.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 4","pages":"3350-3357"},"PeriodicalIF":4.6,"publicationDate":"2025-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143512924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Untethered miniature robots, after ultra-long-distance transportation to the lesion by continuum robots, can further deliver drugs to the deep fine tissues. Specifically, the magnetic steering continuum robot with the follower-the-leader manner enhances the safety of channel construction, while the magnetically-driven swimming robot with high mobility and maneuverability ensures robust, precise drug delivery. However, it requires an external actuation system capable of covering a large working space and generating diverse magnetic fields to support both robots. Here, a magnetic actuation system excited by dual circuits is developed for the linear combination of constant (up to 13.8 mT) and alternating magnetic fields (up to 93.2 Hz, 11.5 mT) within a large Ø308 mm spherical workspace. This is achieved through the resonance effect of series capacitors and suppressing induced currents through series choke coils. Minimizing conductive materials and optimizing the core and yoke structure reduce the impact of eddy current losses. Through effective temperature management and error compensation, the system can stably generate multiple required magnetic fields over extended periods, navigating the continuum robot through the aortic arch and delivering an untethered miniature swimming robot for precise exploration of the heart's microvasculature.
{"title":"A Dual-Circuit Magnetic Actuation System for Multi-Robot Collaboration in Large-Scale Medical Environments","authors":"Liyang Mao;Chenyao Tian;Peng Yang;Xianghe Meng;Hao Zhang;Hui Xie","doi":"10.1109/LRA.2025.3542205","DOIUrl":"https://doi.org/10.1109/LRA.2025.3542205","url":null,"abstract":"Untethered miniature robots, after ultra-long-distance transportation to the lesion by continuum robots, can further deliver drugs to the deep fine tissues. Specifically, the magnetic steering continuum robot with the follower-the-leader manner enhances the safety of channel construction, while the magnetically-driven swimming robot with high mobility and maneuverability ensures robust, precise drug delivery. However, it requires an external actuation system capable of covering a large working space and generating diverse magnetic fields to support both robots. Here, a magnetic actuation system excited by dual circuits is developed for the linear combination of constant (up to 13.8 mT) and alternating magnetic fields (up to 93.2 Hz, 11.5 mT) within a large Ø308 mm spherical workspace. This is achieved through the resonance effect of series capacitors and suppressing induced currents through series choke coils. Minimizing conductive materials and optimizing the core and yoke structure reduce the impact of eddy current losses. Through effective temperature management and error compensation, the system can stably generate multiple required magnetic fields over extended periods, navigating the continuum robot through the aortic arch and delivering an untethered miniature swimming robot for precise exploration of the heart's microvasculature.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 4","pages":"3382-3389"},"PeriodicalIF":4.6,"publicationDate":"2025-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143521420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-14DOI: 10.1109/LRA.2025.3542322
Chung Hee Kim;Abhisesh Silwal;George Kantor
Automating tasks in outdoor agricultural fields poses significant challenges due to environmental variability, unstructured terrain, and diverse crop characteristics. We present a robotic system that leverages imitation learning for autonomous pepper harvesting designed to operate in these complex settings. Utilizing a custom handheld shear-gripper, we collected 300 demonstrations to train a visuomotor policy, enabling the system to adapt to varying field conditions and crop diversity. We achieved a success rate of 28.95% with a cycle time of 31.71 seconds, comparable to existing systems tested under more controlled conditions like greenhouses. Our system demonstrates the potential feasibility and effectiveness of employing imitation learning for automated harvesting in unstructured agricultural environments. This work aims to advance scalable, automated robotic solutions for agriculture in natural settings.
{"title":"Autonomous Robotic Pepper Harvesting: Imitation Learning in Unstructured Agricultural Environments","authors":"Chung Hee Kim;Abhisesh Silwal;George Kantor","doi":"10.1109/LRA.2025.3542322","DOIUrl":"https://doi.org/10.1109/LRA.2025.3542322","url":null,"abstract":"Automating tasks in outdoor agricultural fields poses significant challenges due to environmental variability, unstructured terrain, and diverse crop characteristics. We present a robotic system that leverages imitation learning for autonomous pepper harvesting designed to operate in these complex settings. Utilizing a custom handheld shear-gripper, we collected 300 demonstrations to train a visuomotor policy, enabling the system to adapt to varying field conditions and crop diversity. We achieved a success rate of 28.95% with a cycle time of 31.71 seconds, comparable to existing systems tested under more controlled conditions like greenhouses. Our system demonstrates the potential feasibility and effectiveness of employing imitation learning for automated harvesting in unstructured agricultural environments. This work aims to advance scalable, automated robotic solutions for agriculture in natural settings.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 4","pages":"3406-3413"},"PeriodicalIF":4.6,"publicationDate":"2025-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143521395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mapping is one of the crucial tasks enabling autonomous navigation of a mobile robot. Conventional mapping methods output a dense geometric map representation, e.g. an occupancy grid, which is not trivial to keep consistent for prolonged runs covering large environments. Meanwhile, capturing the topological structure of the workspace enables fast path planning, is typically less prone to odometry error accumulation, and does not consume much memory. Following this idea, this letter introduces PRISM-TopoMap – a topological mapping method that maintains a graph of locally aligned locations not relying on global metric coordinates. The proposed method involves original learnable multimodal place recognition paired with the scan matching pipeline for localization and loop closure in the graph of locations. The latter is updated online, and the robot is localized in a proper node at each time step. We conduct a broad experimental evaluation of the suggested approach in a range of photo-realistic environments and on a real robot, and compare it to state of the art. The results of the empirical evaluation confirm that PRISM-Topomap consistently outperforms competitors computationally-wise, achieves high mapping quality and performs well on a real robot.
{"title":"PRISM-TopoMap: Online Topological Mapping With Place Recognition and Scan Matching","authors":"Kirill Muravyev;Alexander Melekhin;Dmitry Yudin;Konstantin Yakovlev","doi":"10.1109/LRA.2025.3541454","DOIUrl":"https://doi.org/10.1109/LRA.2025.3541454","url":null,"abstract":"Mapping is one of the crucial tasks enabling autonomous navigation of a mobile robot. Conventional mapping methods output a dense geometric map representation, e.g. an occupancy grid, which is not trivial to keep consistent for prolonged runs covering large environments. Meanwhile, capturing the topological structure of the workspace enables fast path planning, is typically less prone to odometry error accumulation, and does not consume much memory. Following this idea, this letter introduces PRISM-TopoMap – a topological mapping method that maintains a graph of locally aligned locations not relying on global metric coordinates. The proposed method involves original learnable multimodal place recognition paired with the scan matching pipeline for localization and loop closure in the graph of locations. The latter is updated online, and the robot is localized in a proper node at each time step. We conduct a broad experimental evaluation of the suggested approach in a range of photo-realistic environments and on a real robot, and compare it to state of the art. The results of the empirical evaluation confirm that PRISM-Topomap consistently outperforms competitors computationally-wise, achieves high mapping quality and performs well on a real robot.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 4","pages":"3126-3133"},"PeriodicalIF":4.6,"publicationDate":"2025-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143455243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}