Make a Donut: Hierarchical EMD-Space Planning for Zero-Shot Deformable Manipulation With Tools

IF 5.3 2区计算机科学 Q2 ROBOTICS IEEE Robotics and Automation Letters Pub Date : 2025-02-03 DOI:10.1109/LRA.2025.3537899

Yang You;Bokui Shen;Congyue Deng;Haoran Geng;Songlin Wei;He Wang;Leonidas Guibas

{"title":"Make a Donut: Hierarchical EMD-Space Planning for Zero-Shot Deformable Manipulation With Tools","authors":"Yang You;Bokui Shen;Congyue Deng;Haoran Geng;Songlin Wei;He Wang;Leonidas Guibas","doi":"10.1109/LRA.2025.3537899","DOIUrl":null,"url":null,"abstract":"Deformable object manipulation stands as one of the most captivating yet formidable challenges in robotics. While previous techniques have predominantly relied on learning latent dynamics through demonstrations, typically represented as either particles or images, there exists a pertinent limitation: acquiring suitable demonstrations, especially for long-horizon tasks, can be elusive. Moreover, basing learning entirely on demonstrations can hamper the model's ability to generalize beyond the demonstrated tasks. In this work, we introduce a demonstration-free hierarchical planning approach capable of tackling intricate long-horizon deformable manipulation tasks without necessitating any training. We employ large language models (LLMs) to articulate a high-level, stage-by-stage plan corresponding to a specified task. For every individual stage, the LLM provides both the tool's name and the Python code to craft intermediate subgoal point clouds. With the tool and subgoal for a particular stage at our disposal, we present a granular closed-loop model predictive control strategy. This leverages Differentiable Physics with Point-to-Point correspondence (DiffPhysics-P2P) loss in the Earth Mover Distance (EMD) space, applied iteratively. Experimental findings affirm that our technique surpasses multiple benchmarks in dough manipulation, spanning both short and long horizons. Remarkably, our model demonstrates robust generalization capabilities to novel and previously unencountered complex tasks without any preliminary demonstrations. We further substantiate our approach with experimental trials on real-world robotic platforms.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 4","pages":"3270-3277"},"PeriodicalIF":5.3000,"publicationDate":"2025-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Robotics and Automation Letters","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10869460/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ROBOTICS","Score":null,"Total":0}

引用次数: 0

Abstract

Deformable object manipulation stands as one of the most captivating yet formidable challenges in robotics. While previous techniques have predominantly relied on learning latent dynamics through demonstrations, typically represented as either particles or images, there exists a pertinent limitation: acquiring suitable demonstrations, especially for long-horizon tasks, can be elusive. Moreover, basing learning entirely on demonstrations can hamper the model's ability to generalize beyond the demonstrated tasks. In this work, we introduce a demonstration-free hierarchical planning approach capable of tackling intricate long-horizon deformable manipulation tasks without necessitating any training. We employ large language models (LLMs) to articulate a high-level, stage-by-stage plan corresponding to a specified task. For every individual stage, the LLM provides both the tool's name and the Python code to craft intermediate subgoal point clouds. With the tool and subgoal for a particular stage at our disposal, we present a granular closed-loop model predictive control strategy. This leverages Differentiable Physics with Point-to-Point correspondence (DiffPhysics-P2P) loss in the Earth Mover Distance (EMD) space, applied iteratively. Experimental findings affirm that our technique surpasses multiple benchmarks in dough manipulation, spanning both short and long horizons. Remarkably, our model demonstrates robust generalization capabilities to novel and previously unencountered complex tasks without any preliminary demonstrations. We further substantiate our approach with experimental trials on real-world robotic platforms.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

制作甜甜圈：用工具进行零射击可变形操纵的分层emd -空间规划

可变形物体操纵是机器人技术中最迷人但又最艰巨的挑战之一。虽然以前的技术主要依赖于通过演示来学习潜在动力学，通常以粒子或图像表示，但存在一个相关的限制：获取合适的演示，特别是对于长期任务，可能是难以捉摸的。此外，完全基于演示的学习可能会妨碍模型在演示任务之外泛化的能力。在这项工作中，我们介绍了一种无需演示的分层规划方法，该方法能够在不需要任何训练的情况下处理复杂的长视界可变形操作任务。我们使用大型语言模型（llm）来清晰地表达与指定任务相对应的高级、分阶段的计划。对于每个单独的阶段，LLM提供工具的名称和Python代码来制作中间子目标点云。利用我们所掌握的工具和特定阶段的子目标，我们提出了一种颗粒闭环模型预测控制策略。这利用了可微分物理点对点对应（DiffPhysics-P2P）损耗在地球移动距离（EMD）空间，迭代应用。实验结果证实，我们的技术超越了多个基准面团操作，跨越短期和长期的视野。值得注意的是，我们的模型在没有任何初步演示的情况下，展示了对新颖和以前从未遇到过的复杂任务的强大泛化能力。我们在现实世界的机器人平台上进一步验证了我们的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Robotics and Automation Letters Computer Science-Computer Science Applications

CiteScore

9.60

自引率

15.40%

发文量

1428

期刊介绍： The scope of this journal is to publish peer-reviewed articles that provide a timely and concise account of innovative research ideas and application results, reporting significant theoretical findings and application case studies in areas of robotics and automation.