StructDiffusion: Language-Guided Creation of Physically-Valid Structures using Unseen Objects

Weiyu Liu, Yilun Du, Tucker Hermans, S. Chernova, Chris Paxton
{"title":"StructDiffusion: Language-Guided Creation of Physically-Valid Structures using Unseen Objects","authors":"Weiyu Liu, Yilun Du, Tucker Hermans, S. Chernova, Chris Paxton","doi":"10.15607/RSS.2023.XIX.031","DOIUrl":null,"url":null,"abstract":"Robots operating in human environments must be able to rearrange objects into semantically-meaningful configurations, even if these objects are previously unseen. In this work, we focus on the problem of building physically-valid structures without step-by-step instructions. We propose StructDiffusion, which combines a diffusion model and an object-centric transformer to construct structures given partial-view point clouds and high-level language goals, such as\"set the table\". Our method can perform multiple challenging language-conditioned multi-step 3D planning tasks using one model. StructDiffusion even improves the success rate of assembling physically-valid structures out of unseen objects by on average 16% over an existing multi-modal transformer model trained on specific structures. We show experiments on held-out objects in both simulation and on real-world rearrangement tasks. Importantly, we show how integrating both a diffusion model and a collision-discriminator model allows for improved generalization over other methods when rearranging previously-unseen objects. For videos and additional results, see our website: https://structdiffusion.github.io/.","PeriodicalId":248720,"journal":{"name":"Robotics: Science and Systems XIX","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Robotics: Science and Systems XIX","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.15607/RSS.2023.XIX.031","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 13

Abstract

Robots operating in human environments must be able to rearrange objects into semantically-meaningful configurations, even if these objects are previously unseen. In this work, we focus on the problem of building physically-valid structures without step-by-step instructions. We propose StructDiffusion, which combines a diffusion model and an object-centric transformer to construct structures given partial-view point clouds and high-level language goals, such as"set the table". Our method can perform multiple challenging language-conditioned multi-step 3D planning tasks using one model. StructDiffusion even improves the success rate of assembling physically-valid structures out of unseen objects by on average 16% over an existing multi-modal transformer model trained on specific structures. We show experiments on held-out objects in both simulation and on real-world rearrangement tasks. Importantly, we show how integrating both a diffusion model and a collision-discriminator model allows for improved generalization over other methods when rearranging previously-unseen objects. For videos and additional results, see our website: https://structdiffusion.github.io/.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
StructDiffusion:语言引导下使用看不见的对象创建物理有效的结构
在人类环境中工作的机器人必须能够将物体重新排列成语义上有意义的配置,即使这些物体以前是看不见的。在这项工作中,我们专注于在没有逐步指导的情况下建造物理有效结构的问题。我们提出了StructDiffusion,它结合了扩散模型和以对象为中心的转换器来构建给定部分视图点云和高级语言目标(如“设置表格”)的结构。我们的方法可以使用一个模型执行多个具有挑战性的语言条件的多步骤3D规划任务。StructDiffusion甚至提高了从看不见的物体中组装物理有效结构的成功率,比现有的针对特定结构训练的多模态变压器模型平均提高了16%。我们展示了在模拟和现实世界的重排任务中放置对象的实验。重要的是,我们展示了如何集成扩散模型和碰撞鉴别器模型,以便在重新排列以前看不见的对象时,比其他方法更好地进行泛化。有关视频和其他结果,请参阅我们的网站:https://structdiffusion.github.io/。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A Sampling-Based Approach for Heterogeneous Coalition Scheduling with Temporal Uncertainty ROSE: Rotation-based Squeezing Robotic Gripper toward Universal Handling of Objects ERASOR2: Instance-Aware Robust 3D Mapping of the Static World in Dynamic Scenes Autonomous Navigation, Mapping and Exploration with Gaussian Processes Predefined-Time Convergent Motion Control for Heterogeneous Continuum Robots
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1