GR-MG: Leveraging Partially-Annotated Data via Multi-Modal Goal-Conditioned Policy

IF 4.6 2区 计算机科学 Q2 ROBOTICS IEEE Robotics and Automation Letters Pub Date : 2025-01-06 DOI:10.1109/LRA.2025.3526436
Peiyan Li;Hongtao Wu;Yan Huang;Chilam Cheang;Liang Wang;Tao Kong
{"title":"GR-MG: Leveraging Partially-Annotated Data via Multi-Modal Goal-Conditioned Policy","authors":"Peiyan Li;Hongtao Wu;Yan Huang;Chilam Cheang;Liang Wang;Tao Kong","doi":"10.1109/LRA.2025.3526436","DOIUrl":null,"url":null,"abstract":"The robotics community has consistently aimed to achieve generalizable robot manipulation with flexible natural language instructions. One primary challenge is that obtaining robot trajectories fully annotated with both actions and texts is time-consuming and labor-intensive. However, partially-annotated data, such as human activity videos without action labels and robot trajectories without text labels, are much easier to collect. Can we leverage these data to enhance the generalization capabilities of robots? In this letter, we propose GR-MG, a novel method which supports conditioning on a text instruction and a goal image. During training, GR-MG samples goal images from trajectories and conditions on both the text and the goal image or solely on the image when text is not available. During inference, where only the text is provided, GR-MG generates the goal image via a diffusion-based image-editing model and conditions on both the text and the generated image. This approach enables GR-MG to leverage large amounts of partially-annotated data while still using languages to flexibly specify tasks. To generate accurate goal images, we propose a novel progress-guided goal image generation model which injects task progress information into the generation process. In simulation experiments, GR-MG improves the average number of tasks completed in a row of 5 from 3.35 to 4.04. In real-robot experiments, GR-MG is able to perform 58 different tasks and improves the success rate from 68.7% to 78.1% and 44.4% to 60.6% in simple and generalization settings, respectively. It also outperforms comparing baseline methods in few-shot learning of novel skills.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 2","pages":"1912-1919"},"PeriodicalIF":4.6000,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Robotics and Automation Letters","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10829675/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ROBOTICS","Score":null,"Total":0}
引用次数: 0

Abstract

The robotics community has consistently aimed to achieve generalizable robot manipulation with flexible natural language instructions. One primary challenge is that obtaining robot trajectories fully annotated with both actions and texts is time-consuming and labor-intensive. However, partially-annotated data, such as human activity videos without action labels and robot trajectories without text labels, are much easier to collect. Can we leverage these data to enhance the generalization capabilities of robots? In this letter, we propose GR-MG, a novel method which supports conditioning on a text instruction and a goal image. During training, GR-MG samples goal images from trajectories and conditions on both the text and the goal image or solely on the image when text is not available. During inference, where only the text is provided, GR-MG generates the goal image via a diffusion-based image-editing model and conditions on both the text and the generated image. This approach enables GR-MG to leverage large amounts of partially-annotated data while still using languages to flexibly specify tasks. To generate accurate goal images, we propose a novel progress-guided goal image generation model which injects task progress information into the generation process. In simulation experiments, GR-MG improves the average number of tasks completed in a row of 5 from 3.35 to 4.04. In real-robot experiments, GR-MG is able to perform 58 different tasks and improves the success rate from 68.7% to 78.1% and 44.4% to 60.6% in simple and generalization settings, respectively. It also outperforms comparing baseline methods in few-shot learning of novel skills.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
GR-MG:通过多模态目标条件策略利用部分注释数据
机器人社区一直致力于用灵活的自然语言指令实现可泛化的机器人操作。一个主要的挑战是获得机器人轨迹的动作和文本完全注释是耗时和劳动密集型的。然而,部分注释的数据,如没有动作标签的人类活动视频和没有文本标签的机器人轨迹,更容易收集。我们能否利用这些数据来增强机器人的泛化能力?在本文中,我们提出了一种支持文本指令和目标图像条件反射的新方法GR-MG。在训练过程中,GR-MG从文本和目标图像的轨迹和条件中采样目标图像,或者在没有文本时仅对图像进行采样。在只提供文本的推理过程中,GR-MG通过基于扩散的图像编辑模型和文本和生成图像的条件来生成目标图像。这种方法使GR-MG能够利用大量部分注释的数据,同时仍然使用语言灵活地指定任务。为了生成精确的目标图像,我们提出了一种新的进度导向目标图像生成模型,该模型将任务进度信息注入到生成过程中。在仿真实验中,GR-MG将每行5个任务的平均完成次数从3.35次提高到4.04次。在真实机器人实验中,GR-MG能够执行58个不同的任务,在简单和泛化设置下的成功率分别从68.7%提高到78.1%和44.4%提高到60.6%。在新技能的少量学习中,它也优于比较基线方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IEEE Robotics and Automation Letters
IEEE Robotics and Automation Letters Computer Science-Computer Science Applications
CiteScore
9.60
自引率
15.40%
发文量
1428
期刊介绍: The scope of this journal is to publish peer-reviewed articles that provide a timely and concise account of innovative research ideas and application results, reporting significant theoretical findings and application case studies in areas of robotics and automation.
期刊最新文献
RA-RRTV*: Risk-Averse RRT* With Local Vine Expansion for Path Planning in Narrow Passages Under Localization Uncertainty Controlling Pneumatic Bending Actuator With Gain-Scheduled Feedforward and Physical Reservoir Computing State Estimation Funabot-Sleeve: A Wearable Device Employing McKibben Artificial Muscles for Haptic Sensation in the Forearm 3D Guidance Law for Flexible Target Enclosing With Inherent Safety Learning Agile Swimming: An End-to-End Approach Without CPGs
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1