通过视频进行手与物体交互预训练

arXiv - CS - Robotics Pub Date : 2024-09-12 DOI:arxiv-2409.08273

Himanshu Gaurav Singh, Antonio Loquercio, Carmelo Sferrazza, Jane Wu, Haozhi Qi, Pieter Abbeel, Jitendra Malik

{"title":"通过视频进行手与物体交互预训练","authors":"Himanshu Gaurav Singh, Antonio Loquercio, Carmelo Sferrazza, Jane Wu, Haozhi Qi, Pieter Abbeel, Jitendra Malik","doi":"arxiv-2409.08273","DOIUrl":null,"url":null,"abstract":"We present an approach to learn general robot manipulation priors from 3D\nhand-object interaction trajectories. We build a framework to use in-the-wild\nvideos to generate sensorimotor robot trajectories. We do so by lifting both\nthe human hand and the manipulated object in a shared 3D space and retargeting\nhuman motions to robot actions. Generative modeling on this data gives us a\ntask-agnostic base policy. This policy captures a general yet flexible\nmanipulation prior. We empirically demonstrate that finetuning this policy,\nwith both reinforcement learning (RL) and behavior cloning (BC), enables\nsample-efficient adaptation to downstream tasks and simultaneously improves\nrobustness and generalizability compared to prior approaches. Qualitative\nexperiments are available at: \\url{https://hgaurav2k.github.io/hop/}.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":"1 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Hand-Object Interaction Pretraining from Videos\",\"authors\":\"Himanshu Gaurav Singh, Antonio Loquercio, Carmelo Sferrazza, Jane Wu, Haozhi Qi, Pieter Abbeel, Jitendra Malik\",\"doi\":\"arxiv-2409.08273\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We present an approach to learn general robot manipulation priors from 3D\\nhand-object interaction trajectories. We build a framework to use in-the-wild\\nvideos to generate sensorimotor robot trajectories. We do so by lifting both\\nthe human hand and the manipulated object in a shared 3D space and retargeting\\nhuman motions to robot actions. Generative modeling on this data gives us a\\ntask-agnostic base policy. This policy captures a general yet flexible\\nmanipulation prior. We empirically demonstrate that finetuning this policy,\\nwith both reinforcement learning (RL) and behavior cloning (BC), enables\\nsample-efficient adaptation to downstream tasks and simultaneously improves\\nrobustness and generalizability compared to prior approaches. Qualitative\\nexperiments are available at: \\\\url{https://hgaurav2k.github.io/hop/}.\",\"PeriodicalId\":501031,\"journal\":{\"name\":\"arXiv - CS - Robotics\",\"volume\":\"1 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Robotics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.08273\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Robotics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.08273","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

我们提出了一种从三维手-物交互轨迹中学习通用机器人操纵先验的方法。我们建立了一个框架，利用实时视频生成机器人的感应运动轨迹。我们的方法是在共享的三维空间中同时抬起人手和被操纵物体，并将人的动作重定向为机器人动作。通过对这些数据进行生成建模，我们可以获得与物体无关的基本策略。该策略捕捉到了通用但灵活的操纵先验。我们通过经验证明，利用强化学习（RL）和行为克隆（BC）对这一策略进行微调，可以实现对下游任务的无例高效适应，与之前的方法相比，同时提高了稳健性和普适性。定性实验见\url{https://hgaurav2k.github.io/hop/}.

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Hand-Object Interaction Pretraining from Videos

We present an approach to learn general robot manipulation priors from 3D hand-object interaction trajectories. We build a framework to use in-the-wild videos to generate sensorimotor robot trajectories. We do so by lifting both the human hand and the manipulated object in a shared 3D space and retargeting human motions to robot actions. Generative modeling on this data gives us a task-agnostic base policy. This policy captures a general yet flexible manipulation prior. We empirically demonstrate that finetuning this policy, with both reinforcement learning (RL) and behavior cloning (BC), enables sample-efficient adaptation to downstream tasks and simultaneously improves robustness and generalizability compared to prior approaches. Qualitative experiments are available at: \url{https://hgaurav2k.github.io/hop/}.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

arXiv - CS - Robotics

自引率

0.00%

发文量