Pub Date : 2023-07-10DOI: 10.15607/RSS.2023.XIX.072
Nate Merrill, Patrick Geneva, Saimouli Katragadda, Chuchu Chen, G. Huang
—In monocular visual-inertial navigation systems, it is ideal to initialize as quickly and robustly as possible. State-of-the-art initialization methods typically make linear approximations using the image features and inertial information in order to initialize in closed-form, and then refine the states with a nonlinear optimization. While the standard methods typically wait for a 2sec data window, a recent work has shown that it is possible to initialize faster (0.5sec) by adding constraints from a robust but only up-to-scale monocular depth network in the nonlinear optimization. To further expedite the initialization, in this work, we leverage the scale-less depth measurements instead in the linear initialization step that is performed prior to the nonlinear one, which only requires a single depth image for the first frame. We show that the typical estimation of each feature state independently in the closed-form solution can be replaced by just estimating the scale and offset parameters of the learned depth map. Interestingly, our formulation makes it possible to construct small minimal problems in a RANSAC loop, whereas the typical linear system’s minimal problem is quite large and includes every feature state. Experiments show that our method can improve the overall initialization performance on popular public datasets (EuRoC MAV and TUM-VI) over state-of-the-art methods. For the TUM-VI dataset, we show superior initialization performance with only a 0.3sec window of data, which is the smallest ever reported, and show that our method can initialize more often, robustly, and accurately in different challenging scenarios.
{"title":"Fast Monocular Visual-Inertial Initialization Leveraging Learned Single-View Depth","authors":"Nate Merrill, Patrick Geneva, Saimouli Katragadda, Chuchu Chen, G. Huang","doi":"10.15607/RSS.2023.XIX.072","DOIUrl":"https://doi.org/10.15607/RSS.2023.XIX.072","url":null,"abstract":"—In monocular visual-inertial navigation systems, it is ideal to initialize as quickly and robustly as possible. State-of-the-art initialization methods typically make linear approximations using the image features and inertial information in order to initialize in closed-form, and then refine the states with a nonlinear optimization. While the standard methods typically wait for a 2sec data window, a recent work has shown that it is possible to initialize faster (0.5sec) by adding constraints from a robust but only up-to-scale monocular depth network in the nonlinear optimization. To further expedite the initialization, in this work, we leverage the scale-less depth measurements instead in the linear initialization step that is performed prior to the nonlinear one, which only requires a single depth image for the first frame. We show that the typical estimation of each feature state independently in the closed-form solution can be replaced by just estimating the scale and offset parameters of the learned depth map. Interestingly, our formulation makes it possible to construct small minimal problems in a RANSAC loop, whereas the typical linear system’s minimal problem is quite large and includes every feature state. Experiments show that our method can improve the overall initialization performance on popular public datasets (EuRoC MAV and TUM-VI) over state-of-the-art methods. For the TUM-VI dataset, we show superior initialization performance with only a 0.3sec window of data, which is the smallest ever reported, and show that our method can initialize more often, robustly, and accurately in different challenging scenarios.","PeriodicalId":248720,"journal":{"name":"Robotics: Science and Systems XIX","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128766189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-07-10DOI: 10.15607/RSS.2023.XIX.059
S. Manyam, Abhishek Nayak, S. Rathinam
—We consider a Curvature-constrained Shortest Path (CSP) problem on a 2D plane for a robot with minimum turning radius constraints in the presence of obstacles. We introduce a new bounding technique called Gate* (G*) that provides optimality guarantees to the CSP. Our approach relies on relaxing the obstacle avoidance constraints but allows a path to travel through some restricted sets of configurations called gates which are informed by the obstacles. We also let the path to be discontinuous when it reaches a gate. This approach allows us to pose the bounding problem as a least-cost problem in a graph where the cost of traveling an edge requires us to solve a new motion planning problem called the Dubins gate problem. In addition to the theoretical results, our numerical tests show that G* can significantly improve the lower bounds with respect to the baseline approaches, and by more than 60% in some instances.
{"title":"G*: A New Approach to Bounding Curvature Constrained Shortest Paths through Dubins Gates","authors":"S. Manyam, Abhishek Nayak, S. Rathinam","doi":"10.15607/RSS.2023.XIX.059","DOIUrl":"https://doi.org/10.15607/RSS.2023.XIX.059","url":null,"abstract":"—We consider a Curvature-constrained Shortest Path (CSP) problem on a 2D plane for a robot with minimum turning radius constraints in the presence of obstacles. We introduce a new bounding technique called Gate* (G*) that provides optimality guarantees to the CSP. Our approach relies on relaxing the obstacle avoidance constraints but allows a path to travel through some restricted sets of configurations called gates which are informed by the obstacles. We also let the path to be discontinuous when it reaches a gate. This approach allows us to pose the bounding problem as a least-cost problem in a graph where the cost of traveling an edge requires us to solve a new motion planning problem called the Dubins gate problem. In addition to the theoretical results, our numerical tests show that G* can significantly improve the lower bounds with respect to the baseline approaches, and by more than 60% in some instances.","PeriodicalId":248720,"journal":{"name":"Robotics: Science and Systems XIX","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132023769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-07-10DOI: 10.15607/RSS.2023.XIX.004
Nina Moorman, N. Gopalan, Aman Singh, Erin Botti, Mariah L. Schrum, Chuxuan Yang, Lakshmi Seelam, M. Gombolay
The field of Learning from Demonstration enables end-users, who are not robotics experts, to shape robot behavior. However, using human demonstrations to teach robots to solve long-horizon problems by leveraging the hierarchical structure of the task is still an unsolved problem. Prior work has yet to show that human users can provide sufficient demonstrations in novel domains without showing the demonstrators explicit teaching strategies for each domain. In this work, we investigate whether non-expert demonstrators can generalize robot teaching strategies to provide necessary and sufficient demonstrations to robots zero-shot in novel domains. We find that increasing participant experience with providing demonstrations improves their demonstration’s degree of sub-task abstraction (p < .001), teaching efficiency (p < .001), and sub-task redundancy (p < .05) in novel domains, allowing generalization in robot teaching. Our findings demonstrate for the first time that non-expert demonstrators can transfer knowledge from a series of training experiences to novel domains without the need for explicit instruction, such that they can provide necessary and sufficient demonstrations when programming robots to complete task and motion planning problems.
{"title":"Investigating the Impact of Experience on a User's Ability to Perform Hierarchical Abstraction","authors":"Nina Moorman, N. Gopalan, Aman Singh, Erin Botti, Mariah L. Schrum, Chuxuan Yang, Lakshmi Seelam, M. Gombolay","doi":"10.15607/RSS.2023.XIX.004","DOIUrl":"https://doi.org/10.15607/RSS.2023.XIX.004","url":null,"abstract":"The field of Learning from Demonstration enables end-users, who are not robotics experts, to shape robot behavior. However, using human demonstrations to teach robots to solve long-horizon problems by leveraging the hierarchical structure of the task is still an unsolved problem. Prior work has yet to show that human users can provide sufficient demonstrations in novel domains without showing the demonstrators explicit teaching strategies for each domain. In this work, we investigate whether non-expert demonstrators can generalize robot teaching strategies to provide necessary and sufficient demonstrations to robots zero-shot in novel domains. We find that increasing participant experience with providing demonstrations improves their demonstration’s degree of sub-task abstraction (p < .001), teaching efficiency (p < .001), and sub-task redundancy (p < .05) in novel domains, allowing generalization in robot teaching. Our findings demonstrate for the first time that non-expert demonstrators can transfer knowledge from a series of training experiences to novel domains without the need for explicit instruction, such that they can provide necessary and sufficient demonstrations when programming robots to complete task and motion planning problems.","PeriodicalId":248720,"journal":{"name":"Robotics: Science and Systems XIX","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134087917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-07-10DOI: 10.15607/RSS.2023.XIX.001
Georg Friedrich Schuppe, Ilaria Torre, Iolanda Leite, Jana Tumova
—We focus on correct-by-design robot task planning from finite Linear Temporal Logic (LTL f ) specifications with a human in the loop. Since provable guarantees are difficult to obtain unconditionally, we take an assume-guarantee perspective. Along with guarantees on the robot’s task satisfaction, we com- pute the weakest sufficient assumptions on the human’s behavior. We approach the problem via a stochastic game and leverage algorithmic synthesis of the weakest sufficient assumptions. We turn the assumptions into runtime advice to be communicated to the human. We conducted an online user study and showed that the robot is perceived as safer, more intelligent and more compliant with our approach than a robot giving more frequent advice corresponding to stronger assumptions. In addition, we show that our approach leads to less violations of the specification than not communicating with the participant at all.
{"title":"Follow my Advice: Assume-Guarantee Approach to Task Planning with Human in the Loop","authors":"Georg Friedrich Schuppe, Ilaria Torre, Iolanda Leite, Jana Tumova","doi":"10.15607/RSS.2023.XIX.001","DOIUrl":"https://doi.org/10.15607/RSS.2023.XIX.001","url":null,"abstract":"—We focus on correct-by-design robot task planning from finite Linear Temporal Logic (LTL f ) specifications with a human in the loop. Since provable guarantees are difficult to obtain unconditionally, we take an assume-guarantee perspective. Along with guarantees on the robot’s task satisfaction, we com- pute the weakest sufficient assumptions on the human’s behavior. We approach the problem via a stochastic game and leverage algorithmic synthesis of the weakest sufficient assumptions. We turn the assumptions into runtime advice to be communicated to the human. We conducted an online user study and showed that the robot is perceived as safer, more intelligent and more compliant with our approach than a robot giving more frequent advice corresponding to stronger assumptions. In addition, we show that our approach leads to less violations of the specification than not communicating with the participant at all.","PeriodicalId":248720,"journal":{"name":"Robotics: Science and Systems XIX","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133244983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-07-10DOI: 10.15607/RSS.2023.XIX.055
M. Bajracharya, James Borders, Richard Cheng, D. Helmick, Lukas Kaul, Daniel Kruse, John Leichty, Jeremy Ma, Carolyn Matl, Frank Michel, Chavdar Papazov, Josh Petersen, K. Shankar, Mark Tjersland
—We present our general-purpose mobile manipu- lation system consisting of a custom robot platform and key algorithms spanning perception and planning. To extensively test the system in the wild and benchmark its performance, we choose a grocery shopping scenario in an actual, unmodified grocery store. We derive key performance metrics from detailed robot log data collected during six week-long field tests, spread across 18 months. These objective metrics, gained from complex yet repeatable tests, drive the direction of our research efforts and let us continuously improve our system’s performance. We find that thorough end-to-end system-level testing of a complex mobile manipulation system can serve as a reality-check for state-of-the-art methods in robotics. This effectively grounds robotics research efforts in real world needs and challenges, which we deem highly useful for the advancement of the field. To this end, we share our key insights and takeaways to inspire and accelerate similar system-level research projects.
{"title":"Demonstrating Mobile Manipulation in the Wild: A Metrics-Driven Approach","authors":"M. Bajracharya, James Borders, Richard Cheng, D. Helmick, Lukas Kaul, Daniel Kruse, John Leichty, Jeremy Ma, Carolyn Matl, Frank Michel, Chavdar Papazov, Josh Petersen, K. Shankar, Mark Tjersland","doi":"10.15607/RSS.2023.XIX.055","DOIUrl":"https://doi.org/10.15607/RSS.2023.XIX.055","url":null,"abstract":"—We present our general-purpose mobile manipu- lation system consisting of a custom robot platform and key algorithms spanning perception and planning. To extensively test the system in the wild and benchmark its performance, we choose a grocery shopping scenario in an actual, unmodified grocery store. We derive key performance metrics from detailed robot log data collected during six week-long field tests, spread across 18 months. These objective metrics, gained from complex yet repeatable tests, drive the direction of our research efforts and let us continuously improve our system’s performance. We find that thorough end-to-end system-level testing of a complex mobile manipulation system can serve as a reality-check for state-of-the-art methods in robotics. This effectively grounds robotics research efforts in real world needs and challenges, which we deem highly useful for the advancement of the field. To this end, we share our key insights and takeaways to inspire and accelerate similar system-level research projects.","PeriodicalId":248720,"journal":{"name":"Robotics: Science and Systems XIX","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133357596","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-07-10DOI: 10.15607/RSS.2023.XIX.085
Oswin So, Chuchu Fan
{"title":"Solving Stabilize-Avoid via Epigraph Form Optimal Control using Deep Reinforcement Learning","authors":"Oswin So, Chuchu Fan","doi":"10.15607/RSS.2023.XIX.085","DOIUrl":"https://doi.org/10.15607/RSS.2023.XIX.085","url":null,"abstract":"","PeriodicalId":248720,"journal":{"name":"Robotics: Science and Systems XIX","volume":"170 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124140611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-07-10DOI: 10.15607/RSS.2023.XIX.002
Matthew B. Luebbers, Aaquib Tabrez, K. Ruvane, Bradley Hayes
—Justification is an important facet of policy expla- nation, a process for describing the behavior of an autonomous system. In human-robot collaboration, an autonomous agent can attempt to justify distinctly important decisions by offering explanations as to why those decisions are right or reasonable, leveraging a snapshot of its internal reasoning to do so. Without sufficient insight into a robot’s decision-making process, it becomes challenging for users to trust or comply with those important decisions, especially when they are viewed as confusing or contrary to the user’s expectations (e.g., when decisions change as new information is introduced to the agent’s decision-making process). In this work we characterize the benefits of justification within the context of decision-support during human- robot teaming (i.e., agents giving recommendations to human teammates). We introduce a formal framework using value of information theory to strategically time justifications during periods of misaligned expectations for greater effect. We also characterize four different types of counterfactual justification derived from established explainable AI literature and evaluate them against each other in a human-subjects study involving a collaborative, partially observable search task. Based on our findings, we present takeaways on the effective use of different types of justifications in human-robot teaming scenarios, to improve user compliance and decision-making by strategically influencing human teammate thinking patterns. Finally, we present an augmented reality system incorporating these findings into a real-world decision-support system for human-robot teaming.
{"title":"Autonomous Justification for Enabling Explainable Decision Support in Human-Robot Teaming","authors":"Matthew B. Luebbers, Aaquib Tabrez, K. Ruvane, Bradley Hayes","doi":"10.15607/RSS.2023.XIX.002","DOIUrl":"https://doi.org/10.15607/RSS.2023.XIX.002","url":null,"abstract":"—Justification is an important facet of policy expla- nation, a process for describing the behavior of an autonomous system. In human-robot collaboration, an autonomous agent can attempt to justify distinctly important decisions by offering explanations as to why those decisions are right or reasonable, leveraging a snapshot of its internal reasoning to do so. Without sufficient insight into a robot’s decision-making process, it becomes challenging for users to trust or comply with those important decisions, especially when they are viewed as confusing or contrary to the user’s expectations (e.g., when decisions change as new information is introduced to the agent’s decision-making process). In this work we characterize the benefits of justification within the context of decision-support during human- robot teaming (i.e., agents giving recommendations to human teammates). We introduce a formal framework using value of information theory to strategically time justifications during periods of misaligned expectations for greater effect. We also characterize four different types of counterfactual justification derived from established explainable AI literature and evaluate them against each other in a human-subjects study involving a collaborative, partially observable search task. Based on our findings, we present takeaways on the effective use of different types of justifications in human-robot teaming scenarios, to improve user compliance and decision-making by strategically influencing human teammate thinking patterns. Finally, we present an augmented reality system incorporating these findings into a real-world decision-support system for human-robot teaming.","PeriodicalId":248720,"journal":{"name":"Robotics: Science and Systems XIX","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133760604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-07-10DOI: 10.15607/RSS.2023.XIX.104
Mahmoud Ali, Hassan Jardali, N. Roy, Lantao Liu
—Navigating and exploring an unknown environment is a challenging task for autonomous robots, especially in complex and unstructured environments. We propose a new framework that can simultaneously accomplish multiple objectives that are essential to robot autonomy including identifying free space for navigation, building a metric-topological representation for mapping, and ensuring good spatial coverage for unknown space exploration. Different from existing work that model these critical objectives separately, we show that navigation, mapping, and exploration can be derived with the same foundation modeled with a sparse variant of a Gaussian process. Specifically, in our framework the robot navigates by following frontiers computed from a local Gaussian process perception model, and along the way builds a map in a metric-topological form where nodes are adaptively selected from important perception frontiers. The topology expands towards unexplored areas by assessing a low-cost global uncertainty map also computed from a sparse Gaussian process. Through evaluations in various cluttered and unstructured environments, we validate that the proposed framework can explore unknown environments faster and with a shorter distance travelled than the state-of-the-art frontier explo- ration approaches. Through field demonstration, we have begun to lay the groundwork for field robots to explore challenging environments such as forests that humans have yet to set foot in 1 .
{"title":"Autonomous Navigation, Mapping and Exploration with Gaussian Processes","authors":"Mahmoud Ali, Hassan Jardali, N. Roy, Lantao Liu","doi":"10.15607/RSS.2023.XIX.104","DOIUrl":"https://doi.org/10.15607/RSS.2023.XIX.104","url":null,"abstract":"—Navigating and exploring an unknown environment is a challenging task for autonomous robots, especially in complex and unstructured environments. We propose a new framework that can simultaneously accomplish multiple objectives that are essential to robot autonomy including identifying free space for navigation, building a metric-topological representation for mapping, and ensuring good spatial coverage for unknown space exploration. Different from existing work that model these critical objectives separately, we show that navigation, mapping, and exploration can be derived with the same foundation modeled with a sparse variant of a Gaussian process. Specifically, in our framework the robot navigates by following frontiers computed from a local Gaussian process perception model, and along the way builds a map in a metric-topological form where nodes are adaptively selected from important perception frontiers. The topology expands towards unexplored areas by assessing a low-cost global uncertainty map also computed from a sparse Gaussian process. Through evaluations in various cluttered and unstructured environments, we validate that the proposed framework can explore unknown environments faster and with a shorter distance travelled than the state-of-the-art frontier explo- ration approaches. Through field demonstration, we have begun to lay the groundwork for field robots to explore challenging environments such as forests that humans have yet to set foot in 1 .","PeriodicalId":248720,"journal":{"name":"Robotics: Science and Systems XIX","volume":"125 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121413917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-05-17DOI: 10.15607/RSS.2023.XIX.023
Shuai-Peng Li, Azarakhsh Keipour, Kevin G. Jamieson, Nicolas Hudson, Charles Swan, Kostas E. Bekris
Automating warehouse operations can reduce logistics overhead costs, ultimately driving down the final price for consumers, increasing the speed of delivery, and enhancing the resiliency to workforce fluctuations. The past few years have seen increased interest in automating such repeated tasks but mostly in controlled settings. Tasks such as picking objects from unstructured, cluttered piles have only recently become robust enough for large-scale deployment with minimal human intervention. This paper demonstrates a large-scale package manipulation from unstructured piles in Amazon Robotics' Robot Induction (Robin) fleet, which utilizes a pick success predictor trained on real production data. Specifically, the system was trained on over 394K picks. It is used for singulating up to 5 million packages per day and has manipulated over 200 million packages during this paper's evaluation period. The developed learned pick quality measure ranks various pick alternatives in real-time and prioritizes the most promising ones for execution. The pick success predictor aims to estimate from prior experience the success probability of a desired pick by the deployed industrial robotic arms in cluttered scenes containing deformable and rigid objects with partially known properties. It is a shallow machine learning model, which allows us to evaluate which features are most important for the prediction. An online pick ranker leverages the learned success predictor to prioritize the most promising picks for the robotic arm, which are then assessed for collision avoidance. This learned ranking process is demonstrated to overcome the limitations and outperform the performance of manually engineered and heuristic alternatives. To the best of the authors' knowledge, this paper presents the first large-scale deployment of learned pick quality estimation methods in a real production system.
{"title":"Demonstrating Large-Scale Package Manipulation via Learned Metrics of Pick Success","authors":"Shuai-Peng Li, Azarakhsh Keipour, Kevin G. Jamieson, Nicolas Hudson, Charles Swan, Kostas E. Bekris","doi":"10.15607/RSS.2023.XIX.023","DOIUrl":"https://doi.org/10.15607/RSS.2023.XIX.023","url":null,"abstract":"Automating warehouse operations can reduce logistics overhead costs, ultimately driving down the final price for consumers, increasing the speed of delivery, and enhancing the resiliency to workforce fluctuations. The past few years have seen increased interest in automating such repeated tasks but mostly in controlled settings. Tasks such as picking objects from unstructured, cluttered piles have only recently become robust enough for large-scale deployment with minimal human intervention. This paper demonstrates a large-scale package manipulation from unstructured piles in Amazon Robotics' Robot Induction (Robin) fleet, which utilizes a pick success predictor trained on real production data. Specifically, the system was trained on over 394K picks. It is used for singulating up to 5 million packages per day and has manipulated over 200 million packages during this paper's evaluation period. The developed learned pick quality measure ranks various pick alternatives in real-time and prioritizes the most promising ones for execution. The pick success predictor aims to estimate from prior experience the success probability of a desired pick by the deployed industrial robotic arms in cluttered scenes containing deformable and rigid objects with partially known properties. It is a shallow machine learning model, which allows us to evaluate which features are most important for the prediction. An online pick ranker leverages the learned success predictor to prioritize the most promising picks for the robotic arm, which are then assessed for collision avoidance. This learned ranking process is demonstrated to overcome the limitations and outperform the performance of manually engineered and heuristic alternatives. To the best of the authors' knowledge, this paper presents the first large-scale deployment of learned pick quality estimation methods in a real production system.","PeriodicalId":248720,"journal":{"name":"Robotics: Science and Systems XIX","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121024093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-04-27DOI: 10.15607/RSS.2023.XIX.030
N. Gkanatsios, Ayush Jain, Zhou Xian, Yunchu Zhang, C. Atkeson, Katerina Fragkiadaki
Language is compositional; an instruction can express multiple relation constraints to hold among objects in a scene that a robot is tasked to rearrange. Our focus in this work is an instructable scene-rearranging framework that generalizes to longer instructions and to spatial concept compositions never seen at training time. We propose to represent language-instructed spatial concepts with energy functions over relative object arrangements. A language parser maps instructions to corresponding energy functions and an open-vocabulary visual-language model grounds their arguments to relevant objects in the scene. We generate goal scene configurations by gradient descent on the sum of energy functions, one per language predicate in the instruction. Local vision-based policies then re-locate objects to the inferred goal locations. We test our model on established instruction-guided manipulation benchmarks, as well as benchmarks of compositional instructions we introduce. We show our model can execute highly compositional instructions zero-shot in simulation and in the real world. It outperforms language-to-action reactive policies and Large Language Model planners by a large margin, especially for long instructions that involve compositions of multiple spatial concepts. Simulation and real-world robot execution videos, as well as our code and datasets are publicly available on our website: https://ebmplanner.github.io.
{"title":"Energy-based Models are Zero-Shot Planners for Compositional Scene Rearrangement","authors":"N. Gkanatsios, Ayush Jain, Zhou Xian, Yunchu Zhang, C. Atkeson, Katerina Fragkiadaki","doi":"10.15607/RSS.2023.XIX.030","DOIUrl":"https://doi.org/10.15607/RSS.2023.XIX.030","url":null,"abstract":"Language is compositional; an instruction can express multiple relation constraints to hold among objects in a scene that a robot is tasked to rearrange. Our focus in this work is an instructable scene-rearranging framework that generalizes to longer instructions and to spatial concept compositions never seen at training time. We propose to represent language-instructed spatial concepts with energy functions over relative object arrangements. A language parser maps instructions to corresponding energy functions and an open-vocabulary visual-language model grounds their arguments to relevant objects in the scene. We generate goal scene configurations by gradient descent on the sum of energy functions, one per language predicate in the instruction. Local vision-based policies then re-locate objects to the inferred goal locations. We test our model on established instruction-guided manipulation benchmarks, as well as benchmarks of compositional instructions we introduce. We show our model can execute highly compositional instructions zero-shot in simulation and in the real world. It outperforms language-to-action reactive policies and Large Language Model planners by a large margin, especially for long instructions that involve compositions of multiple spatial concepts. Simulation and real-world robot execution videos, as well as our code and datasets are publicly available on our website: https://ebmplanner.github.io.","PeriodicalId":248720,"journal":{"name":"Robotics: Science and Systems XIX","volume":"534 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133389274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}