In many modern astronomical facilities, multi-object telescopes are crucial instruments. Most of these telescopes have thousands of robotic fiber positioners(RFPs) installed on their focal plane, sharing an overlapping workspace. Collisions between RFPs during their movement can result in some targets becoming unreachable and cause structural damage. Therefore, it is necessary to reasonably assess and evaluate the collision probability of the RFPs. In this study, we propose a mathematical models of collision probability and validate its results using Monte Carlo simulations. In addition, a new collision calculation method is proposed for faster calculation(nearly 0.15% of original time). Simulation experiments have verified that our method can evaluate the collision probability between RFPs with both equal and unequal arm lengths. Additionally, we found that adopting a target distribution based on a Poisson distribution can reduce the collision probability by approximately 2.6% on average.
{"title":"General Methods for Evaluating Collision Probability of Different Types of Theta-phi Positioners","authors":"Baolong Chen, Jianping Wang, Zhigang Liu, Zengxiang Zhou, Hongzhuan Hu, Feifan Zhang","doi":"arxiv-2409.07288","DOIUrl":"https://doi.org/arxiv-2409.07288","url":null,"abstract":"In many modern astronomical facilities, multi-object telescopes are crucial\u0000instruments. Most of these telescopes have thousands of robotic fiber\u0000positioners(RFPs) installed on their focal plane, sharing an overlapping\u0000workspace. Collisions between RFPs during their movement can result in some\u0000targets becoming unreachable and cause structural damage. Therefore, it is\u0000necessary to reasonably assess and evaluate the collision probability of the\u0000RFPs. In this study, we propose a mathematical models of collision probability\u0000and validate its results using Monte Carlo simulations. In addition, a new\u0000collision calculation method is proposed for faster calculation(nearly 0.15% of\u0000original time). Simulation experiments have verified that our method can\u0000evaluate the collision probability between RFPs with both equal and unequal arm\u0000lengths. Additionally, we found that adopting a target distribution based on a\u0000Poisson distribution can reduce the collision probability by approximately 2.6%\u0000on average.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lucas C. Hanson, William H. Reinhardt, Scott Shrager, Tarunyaa Sivakumar, Marc Z. Miskin
Robots too small to see by eye have rapidly evolved in recent years thanks to the incorporation of on-board microelectronics. Semiconductor circuits have been used in microrobots capable of executing controlled wireless steering, prescribed legged gait patterns, and user-triggered transitions between digital states. Yet these promising new capabilities have come at the steep price of complicated fabrication. Even though circuit components can be reliably built by semiconductor foundries, currently available actuators for electronically integrated microrobots are built with intricate multi-step cleanroom protocols and use mechanisms like articulated legs or bubble generators that are hard to design and control. Here, we present a propulsion system for electronically integrated microrobots that can be built with a single step of lithographic processing, readily integrates with microelectronics thanks to low current/low voltage operation (1V, 10nA), and yields robots that swim at speeds over one body length per second. Inspired by work on micromotors, these robots generate electric fields in a surrounding fluid, and by extension propulsive electrokinetic flows. The underlying physics is captured by a model in which robot speed is proportional to applied current, making design and control straightforward. As proof, we build basic robots that use on-board circuits and a closed-loop optical control scheme to navigate waypoints and move in coordinated swarms. Broadly, solid-state propulsion clears the way for robust, easy to manufacture, electronically controlled microrobots that operate reliably over months to years.
{"title":"Electrokinetic Propulsion for Electronically Integrated Microscopic Robots","authors":"Lucas C. Hanson, William H. Reinhardt, Scott Shrager, Tarunyaa Sivakumar, Marc Z. Miskin","doi":"arxiv-2409.07293","DOIUrl":"https://doi.org/arxiv-2409.07293","url":null,"abstract":"Robots too small to see by eye have rapidly evolved in recent years thanks to\u0000the incorporation of on-board microelectronics. Semiconductor circuits have\u0000been used in microrobots capable of executing controlled wireless steering,\u0000prescribed legged gait patterns, and user-triggered transitions between digital\u0000states. Yet these promising new capabilities have come at the steep price of\u0000complicated fabrication. Even though circuit components can be reliably built\u0000by semiconductor foundries, currently available actuators for electronically\u0000integrated microrobots are built with intricate multi-step cleanroom protocols\u0000and use mechanisms like articulated legs or bubble generators that are hard to\u0000design and control. Here, we present a propulsion system for electronically\u0000integrated microrobots that can be built with a single step of lithographic\u0000processing, readily integrates with microelectronics thanks to low current/low\u0000voltage operation (1V, 10nA), and yields robots that swim at speeds over one\u0000body length per second. Inspired by work on micromotors, these robots generate\u0000electric fields in a surrounding fluid, and by extension propulsive\u0000electrokinetic flows. The underlying physics is captured by a model in which\u0000robot speed is proportional to applied current, making design and control\u0000straightforward. As proof, we build basic robots that use on-board circuits and\u0000a closed-loop optical control scheme to navigate waypoints and move in\u0000coordinated swarms. Broadly, solid-state propulsion clears the way for robust,\u0000easy to manufacture, electronically controlled microrobots that operate\u0000reliably over months to years.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Interactive artificial intelligence in the motion control field is an interesting topic, especially when universal knowledge is adaptive to multiple tasks and universal environments. Despite there being increasing efforts in the field of Reinforcement Learning (RL) with the aid of transformers, most of them might be limited by the offline training pipeline, which prohibits exploration and generalization abilities. To address this limitation, we propose the framework of Online Decision MetaMorphFormer (ODM) which aims to achieve self-awareness, environment recognition, and action planning through a unified model architecture. Motivated by cognitive and behavioral psychology, an ODM agent is able to learn from others, recognize the world, and practice itself based on its own experience. ODM can also be applied to any arbitrary agent with a multi-joint body, located in different environments, and trained with different types of tasks using large-scale pre-trained datasets. Through the use of pre-trained datasets, ODM can quickly warm up and learn the necessary knowledge to perform the desired task, while the target environment continues to reinforce the universal policy. Extensive online experiments as well as few-shot and zero-shot environmental tests are used to verify ODM's performance and generalization ability. The results of our study contribute to the study of general artificial intelligence in embodied and cognitive fields. Code, results, and video examples can be found on the website url{https://rlodm.github.io/odm/}.
{"title":"Online Decision MetaMorphFormer: A Casual Transformer-Based Reinforcement Learning Framework of Universal Embodied Intelligence","authors":"Luo Ji, Runji Lin","doi":"arxiv-2409.07341","DOIUrl":"https://doi.org/arxiv-2409.07341","url":null,"abstract":"Interactive artificial intelligence in the motion control field is an\u0000interesting topic, especially when universal knowledge is adaptive to multiple\u0000tasks and universal environments. Despite there being increasing efforts in the\u0000field of Reinforcement Learning (RL) with the aid of transformers, most of them\u0000might be limited by the offline training pipeline, which prohibits exploration\u0000and generalization abilities. To address this limitation, we propose the\u0000framework of Online Decision MetaMorphFormer (ODM) which aims to achieve\u0000self-awareness, environment recognition, and action planning through a unified\u0000model architecture. Motivated by cognitive and behavioral psychology, an ODM\u0000agent is able to learn from others, recognize the world, and practice itself\u0000based on its own experience. ODM can also be applied to any arbitrary agent\u0000with a multi-joint body, located in different environments, and trained with\u0000different types of tasks using large-scale pre-trained datasets. Through the\u0000use of pre-trained datasets, ODM can quickly warm up and learn the necessary\u0000knowledge to perform the desired task, while the target environment continues\u0000to reinforce the universal policy. Extensive online experiments as well as\u0000few-shot and zero-shot environmental tests are used to verify ODM's performance\u0000and generalization ability. The results of our study contribute to the study of\u0000general artificial intelligence in embodied and cognitive fields. Code,\u0000results, and video examples can be found on the website\u0000url{https://rlodm.github.io/odm/}.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Specifying tasks for robotic systems traditionally requires coding expertise, deep domain knowledge, and significant time investment. While learning from demonstration offers a promising alternative, existing methods often struggle with tasks of longer horizons. To address this limitation, we introduce a computationally efficient approach for learning probabilistic deterministic finite automata (PDFA) that capture task structures and expert preferences directly from demonstrations. Our approach infers sub-goals and their temporal dependencies, producing an interpretable task specification that domain experts can easily understand and adjust. We validate our method through experiments involving object manipulation tasks, showcasing how our method enables a robot arm to effectively replicate diverse expert strategies while adapting to changing conditions.
{"title":"Learning Task Specifications from Demonstrations as Probabilistic Automata","authors":"Mattijs Baert, Sam Leroux, Pieter Simoens","doi":"arxiv-2409.07091","DOIUrl":"https://doi.org/arxiv-2409.07091","url":null,"abstract":"Specifying tasks for robotic systems traditionally requires coding expertise,\u0000deep domain knowledge, and significant time investment. While learning from\u0000demonstration offers a promising alternative, existing methods often struggle\u0000with tasks of longer horizons. To address this limitation, we introduce a\u0000computationally efficient approach for learning probabilistic deterministic\u0000finite automata (PDFA) that capture task structures and expert preferences\u0000directly from demonstrations. Our approach infers sub-goals and their temporal\u0000dependencies, producing an interpretable task specification that domain experts\u0000can easily understand and adjust. We validate our method through experiments\u0000involving object manipulation tasks, showcasing how our method enables a robot\u0000arm to effectively replicate diverse expert strategies while adapting to\u0000changing conditions.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Christian Löwens, Thorben Funke, André Wagner, Alexandru Paul Condurache
Rigid point cloud registration is a fundamental problem and highly relevant in robotics and autonomous driving. Nowadays deep learning methods can be trained to match a pair of point clouds, given the transformation between them. However, this training is often not scalable due to the high cost of collecting ground truth poses. Therefore, we present a self-distillation approach to learn point cloud registration in an unsupervised fashion. Here, each sample is passed to a teacher network and an augmented view is passed to a student network. The teacher includes a trainable feature extractor and a learning-free robust solver such as RANSAC. The solver forces consistency among correspondences and optimizes for the unsupervised inlier ratio, eliminating the need for ground truth labels. Our approach simplifies the training procedure by removing the need for initial hand-crafted features or consecutive point cloud frames as seen in related methods. We show that our method not only surpasses them on the RGB-D benchmark 3DMatch but also generalizes well to automotive radar, where classical features adopted by others fail. The code is available at https://github.com/boschresearch/direg .
{"title":"Unsupervised Point Cloud Registration with Self-Distillation","authors":"Christian Löwens, Thorben Funke, André Wagner, Alexandru Paul Condurache","doi":"arxiv-2409.07558","DOIUrl":"https://doi.org/arxiv-2409.07558","url":null,"abstract":"Rigid point cloud registration is a fundamental problem and highly relevant\u0000in robotics and autonomous driving. Nowadays deep learning methods can be\u0000trained to match a pair of point clouds, given the transformation between them.\u0000However, this training is often not scalable due to the high cost of collecting\u0000ground truth poses. Therefore, we present a self-distillation approach to learn\u0000point cloud registration in an unsupervised fashion. Here, each sample is\u0000passed to a teacher network and an augmented view is passed to a student\u0000network. The teacher includes a trainable feature extractor and a learning-free\u0000robust solver such as RANSAC. The solver forces consistency among\u0000correspondences and optimizes for the unsupervised inlier ratio, eliminating\u0000the need for ground truth labels. Our approach simplifies the training\u0000procedure by removing the need for initial hand-crafted features or consecutive\u0000point cloud frames as seen in related methods. We show that our method not only\u0000surpasses them on the RGB-D benchmark 3DMatch but also generalizes well to\u0000automotive radar, where classical features adopted by others fail. The code is\u0000available at https://github.com/boschresearch/direg .","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Collision avoidance and trajectory planning are crucial in multi-robot systems, particularly in environments with numerous obstacles. Although extensive research has been conducted in this field, the challenge of rapid traversal through such environments has not been fully addressed. This paper addresses this problem by proposing a novel real-time scheduling scheme designed to optimize the passage of multi-robot systems through complex, obstacle-rich maps. Inspired from network flow optimization, our scheme decomposes the environment into a network structure, enabling the efficient allocation of robots to paths based on real-time congestion data. The proposed scheduling planner operates on top of existing collision avoidance algorithms, focusing on minimizing traversal time by balancing robot detours and waiting times. Our simulation results demonstrate the efficiency of the proposed scheme. Additionally, we validated its effectiveness through real world flight tests using ten quadrotors. This work contributes a lightweight, effective scheduling planner capable of meeting the real-time demands of multi-robot systems in obstacle-rich environments.
{"title":"Flow-Inspired Lightweight Multi-Robot Real-Time Scheduling Planner","authors":"Han Liu, Yu Jin, Tianjiang Hu, Kai Huang","doi":"arxiv-2409.06952","DOIUrl":"https://doi.org/arxiv-2409.06952","url":null,"abstract":"Collision avoidance and trajectory planning are crucial in multi-robot\u0000systems, particularly in environments with numerous obstacles. Although\u0000extensive research has been conducted in this field, the challenge of rapid\u0000traversal through such environments has not been fully addressed. This paper\u0000addresses this problem by proposing a novel real-time scheduling scheme\u0000designed to optimize the passage of multi-robot systems through complex,\u0000obstacle-rich maps. Inspired from network flow optimization, our scheme\u0000decomposes the environment into a network structure, enabling the efficient\u0000allocation of robots to paths based on real-time congestion data. The proposed\u0000scheduling planner operates on top of existing collision avoidance algorithms,\u0000focusing on minimizing traversal time by balancing robot detours and waiting\u0000times. Our simulation results demonstrate the efficiency of the proposed\u0000scheme. Additionally, we validated its effectiveness through real world flight\u0000tests using ten quadrotors. This work contributes a lightweight, effective\u0000scheduling planner capable of meeting the real-time demands of multi-robot\u0000systems in obstacle-rich environments.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaomin Lin, Vivek Mange, Arjun Suresh, Bernhard Neuberger, Aadi Palnitkar, Brendan Campbell, Alan Williams, Kleio Baxevani, Jeremy Mallette, Alhim Vera, Markus Vincze, Ioannis Rekleitis, Herbert G. Tanner, Yiannis Aloimonos
Oysters are a keystone species in coastal ecosystems, offering significant economic, environmental, and cultural benefits. However, current monitoring systems are often destructive, typically involving dredging to physically collect and count oysters. A nondestructive alternative is manual identification from video footage collected by divers, which is time-consuming and labor-intensive with expert input. An alternative to human monitoring is the deployment of a system with trained object detection models that performs real-time, on edge oyster detection in the field. One such platform is the Aqua2 robot. Effective training of these models requires extensive high-quality data, which is difficult to obtain in marine settings. To address these complications, we introduce a novel method that leverages stable diffusion to generate high-quality synthetic data for the marine domain. We exploit diffusion models to create photorealistic marine imagery, using ControlNet inputs to ensure consistency with the segmentation ground-truth mask, the geometry of the scene, and the target domain of real underwater images for oysters. The resulting dataset is used to train a YOLOv10-based vision model, achieving a state-of-the-art 0.657 mAP@50 for oyster detection on the Aqua2 platform. The system we introduce not only improves oyster habitat monitoring, but also paves the way to autonomous surveillance for various tasks in marine contexts, improving aquaculture and conservation efforts.
{"title":"ODYSSEE: Oyster Detection Yielded by Sensor Systems on Edge Electronics","authors":"Xiaomin Lin, Vivek Mange, Arjun Suresh, Bernhard Neuberger, Aadi Palnitkar, Brendan Campbell, Alan Williams, Kleio Baxevani, Jeremy Mallette, Alhim Vera, Markus Vincze, Ioannis Rekleitis, Herbert G. Tanner, Yiannis Aloimonos","doi":"arxiv-2409.07003","DOIUrl":"https://doi.org/arxiv-2409.07003","url":null,"abstract":"Oysters are a keystone species in coastal ecosystems, offering significant\u0000economic, environmental, and cultural benefits. However, current monitoring\u0000systems are often destructive, typically involving dredging to physically\u0000collect and count oysters. A nondestructive alternative is manual\u0000identification from video footage collected by divers, which is time-consuming\u0000and labor-intensive with expert input. An alternative to human monitoring is the deployment of a system with trained\u0000object detection models that performs real-time, on edge oyster detection in\u0000the field. One such platform is the Aqua2 robot. Effective training of these\u0000models requires extensive high-quality data, which is difficult to obtain in\u0000marine settings. To address these complications, we introduce a novel method\u0000that leverages stable diffusion to generate high-quality synthetic data for the\u0000marine domain. We exploit diffusion models to create photorealistic marine\u0000imagery, using ControlNet inputs to ensure consistency with the segmentation\u0000ground-truth mask, the geometry of the scene, and the target domain of real\u0000underwater images for oysters. The resulting dataset is used to train a\u0000YOLOv10-based vision model, achieving a state-of-the-art 0.657 mAP@50 for\u0000oyster detection on the Aqua2 platform. The system we introduce not only\u0000improves oyster habitat monitoring, but also paves the way to autonomous\u0000surveillance for various tasks in marine contexts, improving aquaculture and\u0000conservation efforts.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents a Human-Robot Blind Handover architecture within the context of Human-Robot Collaboration (HRC). The focus lies on a blind handover scenario where the operator is intentionally faced away, focused in a task, and requires an object from the robot. In this context, it is imperative for the robot to autonomously manage the entire handover process. Key considerations include ensuring safety while handing the object to the operator's hand, and detect the proper timing to release the object. The article explores strategies to navigate these challenges, emphasizing the need for a robot to operate safely and independently in facilitating blind handovers, thereby contributing to the advancement of HRC protocols and fostering a natural and efficient collaboration between humans and robots.
{"title":"Compliant Blind Handover Control for Human-Robot Collaboration","authors":"Davide Ferrari, Andrea Pupa, Cristian Secchi","doi":"arxiv-2409.07155","DOIUrl":"https://doi.org/arxiv-2409.07155","url":null,"abstract":"This paper presents a Human-Robot Blind Handover architecture within the\u0000context of Human-Robot Collaboration (HRC). The focus lies on a blind handover\u0000scenario where the operator is intentionally faced away, focused in a task, and\u0000requires an object from the robot. In this context, it is imperative for the\u0000robot to autonomously manage the entire handover process. Key considerations\u0000include ensuring safety while handing the object to the operator's hand, and\u0000detect the proper timing to release the object. The article explores strategies\u0000to navigate these challenges, emphasizing the need for a robot to operate\u0000safely and independently in facilitating blind handovers, thereby contributing\u0000to the advancement of HRC protocols and fostering a natural and efficient\u0000collaboration between humans and robots.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222155","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Eugenio Chisari, Nick Heppert, Max Argus, Tim Welschehold, Thomas Brox, Abhinav Valada
Learning from expert demonstrations is a promising approach for training robotic manipulation policies from limited data. However, imitation learning algorithms require a number of design choices ranging from the input modality, training objective, and 6-DoF end-effector pose representation. Diffusion-based methods have gained popularity as they enable predicting long-horizon trajectories and handle multimodal action distributions. Recently, Conditional Flow Matching (CFM) (or Rectified Flow) has been proposed as a more flexible generalization of diffusion models. In this paper, we investigate the application of CFM in the context of robotic policy learning and specifically study the interplay with the other design choices required to build an imitation learning algorithm. We show that CFM gives the best performance when combined with point cloud input observations. Additionally, we study the feasibility of a CFM formulation on the SO(3) manifold and evaluate its suitability with a simplified example. We perform extensive experiments on RLBench which demonstrate that our proposed PointFlowMatch approach achieves a state-of-the-art average success rate of 67.8% over eight tasks, double the performance of the next best method.
{"title":"Learning Robotic Manipulation Policies from Point Clouds with Conditional Flow Matching","authors":"Eugenio Chisari, Nick Heppert, Max Argus, Tim Welschehold, Thomas Brox, Abhinav Valada","doi":"arxiv-2409.07343","DOIUrl":"https://doi.org/arxiv-2409.07343","url":null,"abstract":"Learning from expert demonstrations is a promising approach for training\u0000robotic manipulation policies from limited data. However, imitation learning\u0000algorithms require a number of design choices ranging from the input modality,\u0000training objective, and 6-DoF end-effector pose representation. Diffusion-based\u0000methods have gained popularity as they enable predicting long-horizon\u0000trajectories and handle multimodal action distributions. Recently, Conditional\u0000Flow Matching (CFM) (or Rectified Flow) has been proposed as a more flexible\u0000generalization of diffusion models. In this paper, we investigate the\u0000application of CFM in the context of robotic policy learning and specifically\u0000study the interplay with the other design choices required to build an\u0000imitation learning algorithm. We show that CFM gives the best performance when\u0000combined with point cloud input observations. Additionally, we study the\u0000feasibility of a CFM formulation on the SO(3) manifold and evaluate its\u0000suitability with a simplified example. We perform extensive experiments on\u0000RLBench which demonstrate that our proposed PointFlowMatch approach achieves a\u0000state-of-the-art average success rate of 67.8% over eight tasks, double the\u0000performance of the next best method.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Diffusion models have been widely employed in the field of 3D manipulation due to their efficient capability to learn distributions, allowing for precise prediction of action trajectories. However, diffusion models typically rely on large parameter UNet backbones as policy networks, which can be challenging to deploy on resource-constrained devices. Recently, the Mamba model has emerged as a promising solution for efficient modeling, offering low computational complexity and strong performance in sequence modeling. In this work, we propose the Mamba Policy, a lighter but stronger policy that reduces the parameter count by over 80% compared to the original policy network while achieving superior performance. Specifically, we introduce the XMamba Block, which effectively integrates input information with conditional features and leverages a combination of Mamba and Attention mechanisms for deep feature extraction. Extensive experiments demonstrate that the Mamba Policy excels on the Adroit, Dexart, and MetaWorld datasets, requiring significantly fewer computational resources. Additionally, we highlight the Mamba Policy's enhanced robustness in long-horizon scenarios compared to baseline methods and explore the performance of various Mamba variants within the Mamba Policy framework. Our project page is in https://andycao1125.github.io/mamba_policy/.
{"title":"Mamba Policy: Towards Efficient 3D Diffusion Policy with Hybrid Selective State Models","authors":"Jiahang Cao, Qiang Zhang, Jingkai Sun, Jiaxu Wang, Hao Cheng, Yulin Li, Jun Ma, Yecheng Shao, Wen Zhao, Gang Han, Yijie Guo, Renjing Xu","doi":"arxiv-2409.07163","DOIUrl":"https://doi.org/arxiv-2409.07163","url":null,"abstract":"Diffusion models have been widely employed in the field of 3D manipulation\u0000due to their efficient capability to learn distributions, allowing for precise\u0000prediction of action trajectories. However, diffusion models typically rely on\u0000large parameter UNet backbones as policy networks, which can be challenging to\u0000deploy on resource-constrained devices. Recently, the Mamba model has emerged\u0000as a promising solution for efficient modeling, offering low computational\u0000complexity and strong performance in sequence modeling. In this work, we\u0000propose the Mamba Policy, a lighter but stronger policy that reduces the\u0000parameter count by over 80% compared to the original policy network while\u0000achieving superior performance. Specifically, we introduce the XMamba Block,\u0000which effectively integrates input information with conditional features and\u0000leverages a combination of Mamba and Attention mechanisms for deep feature\u0000extraction. Extensive experiments demonstrate that the Mamba Policy excels on\u0000the Adroit, Dexart, and MetaWorld datasets, requiring significantly fewer\u0000computational resources. Additionally, we highlight the Mamba Policy's enhanced\u0000robustness in long-horizon scenarios compared to baseline methods and explore\u0000the performance of various Mamba variants within the Mamba Policy framework.\u0000Our project page is in https://andycao1125.github.io/mamba_policy/.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}