视图:带路径点的视觉模仿学习

IF 3.7 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Autonomous Robots Pub Date : 2025-01-18 DOI:10.1007/s10514-024-10188-y
Ananth Jonnavittula, Sagar Parekh, Dylan P. Losey
{"title":"视图:带路径点的视觉模仿学习","authors":"Ananth Jonnavittula,&nbsp;Sagar Parekh,&nbsp;Dylan P. Losey","doi":"10.1007/s10514-024-10188-y","DOIUrl":null,"url":null,"abstract":"<div><p>Robots can use visual imitation learning (VIL) to learn manipulation tasks from video demonstrations. However, translating visual observations into actionable robot policies is challenging due to the high-dimensional nature of video data. This challenge is further exacerbated by the morphological differences between humans and robots, especially when the video demonstrations feature humans performing tasks. To address these problems we introduce <b>V</b>isual <b>I</b>mitation l<b>E</b>arning with <b>W</b>aypoints (VIEW), an algorithm that significantly enhances the sample efficiency of human-to-robot VIL. VIEW achieves this efficiency using a multi-pronged approach: extracting a condensed prior trajectory that captures the demonstrator’s intent, employing an agent-agnostic reward function for feedback on the robot’s actions, and utilizing an exploration algorithm that efficiently samples around waypoints in the extracted trajectory. VIEW also segments the human trajectory into grasp and task phases to further accelerate learning efficiency. Through comprehensive simulations and real-world experiments, VIEW demonstrates improved performance compared to current state-of-the-art VIL methods. VIEW enables robots to learn manipulation tasks involving multiple objects from arbitrarily long video demonstrations. Additionally, it can learn standard manipulation tasks such as pushing or moving objects from a single video demonstration in under 30 min, with fewer than 20 real-world rollouts. Code and videos here: https://collab.me.vt.edu/view/</p></div>","PeriodicalId":55409,"journal":{"name":"Autonomous Robots","volume":"49 1","pages":""},"PeriodicalIF":3.7000,"publicationDate":"2025-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10514-024-10188-y.pdf","citationCount":"0","resultStr":"{\"title\":\"View: visual imitation learning with waypoints\",\"authors\":\"Ananth Jonnavittula,&nbsp;Sagar Parekh,&nbsp;Dylan P. Losey\",\"doi\":\"10.1007/s10514-024-10188-y\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Robots can use visual imitation learning (VIL) to learn manipulation tasks from video demonstrations. However, translating visual observations into actionable robot policies is challenging due to the high-dimensional nature of video data. This challenge is further exacerbated by the morphological differences between humans and robots, especially when the video demonstrations feature humans performing tasks. To address these problems we introduce <b>V</b>isual <b>I</b>mitation l<b>E</b>arning with <b>W</b>aypoints (VIEW), an algorithm that significantly enhances the sample efficiency of human-to-robot VIL. VIEW achieves this efficiency using a multi-pronged approach: extracting a condensed prior trajectory that captures the demonstrator’s intent, employing an agent-agnostic reward function for feedback on the robot’s actions, and utilizing an exploration algorithm that efficiently samples around waypoints in the extracted trajectory. VIEW also segments the human trajectory into grasp and task phases to further accelerate learning efficiency. Through comprehensive simulations and real-world experiments, VIEW demonstrates improved performance compared to current state-of-the-art VIL methods. VIEW enables robots to learn manipulation tasks involving multiple objects from arbitrarily long video demonstrations. Additionally, it can learn standard manipulation tasks such as pushing or moving objects from a single video demonstration in under 30 min, with fewer than 20 real-world rollouts. Code and videos here: https://collab.me.vt.edu/view/</p></div>\",\"PeriodicalId\":55409,\"journal\":{\"name\":\"Autonomous Robots\",\"volume\":\"49 1\",\"pages\":\"\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2025-01-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://link.springer.com/content/pdf/10.1007/s10514-024-10188-y.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Autonomous Robots\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://link.springer.com/article/10.1007/s10514-024-10188-y\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Autonomous Robots","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10514-024-10188-y","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

机器人可以使用视觉模仿学习(VIL)从视频演示中学习操作任务。然而,由于视频数据的高维性质,将视觉观察转化为可操作的机器人政策是具有挑战性的。人类和机器人之间的形态差异进一步加剧了这一挑战,特别是当视频演示以人类执行任务为特征时。为了解决这些问题,我们引入了带有路径点的视觉模仿学习(VIEW)算法,该算法显著提高了人对机器人VIL的采样效率。VIEW通过多管齐下的方法实现了这一效率:提取一个浓缩的先验轨迹,捕捉演示者的意图,采用一个代理不可知的奖励函数来反馈机器人的动作,并利用一个探索算法,在提取的轨迹中有效地对路点进行采样。VIEW还将人的轨迹划分为掌握和任务阶段,以进一步提高学习效率。通过全面的仿真和真实世界的实验,VIEW展示了与当前最先进的VIL方法相比,其性能有所提高。VIEW使机器人能够从任意长的视频演示中学习涉及多个对象的操作任务。此外,它可以在30分钟内从一个视频演示中学习标准的操作任务,例如推动或移动物体,而现实世界的演示次数不到20次。代码和视频在这里:https://collab.me.vt.edu/view/
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
View: visual imitation learning with waypoints

Robots can use visual imitation learning (VIL) to learn manipulation tasks from video demonstrations. However, translating visual observations into actionable robot policies is challenging due to the high-dimensional nature of video data. This challenge is further exacerbated by the morphological differences between humans and robots, especially when the video demonstrations feature humans performing tasks. To address these problems we introduce Visual Imitation lEarning with Waypoints (VIEW), an algorithm that significantly enhances the sample efficiency of human-to-robot VIL. VIEW achieves this efficiency using a multi-pronged approach: extracting a condensed prior trajectory that captures the demonstrator’s intent, employing an agent-agnostic reward function for feedback on the robot’s actions, and utilizing an exploration algorithm that efficiently samples around waypoints in the extracted trajectory. VIEW also segments the human trajectory into grasp and task phases to further accelerate learning efficiency. Through comprehensive simulations and real-world experiments, VIEW demonstrates improved performance compared to current state-of-the-art VIL methods. VIEW enables robots to learn manipulation tasks involving multiple objects from arbitrarily long video demonstrations. Additionally, it can learn standard manipulation tasks such as pushing or moving objects from a single video demonstration in under 30 min, with fewer than 20 real-world rollouts. Code and videos here: https://collab.me.vt.edu/view/

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Autonomous Robots
Autonomous Robots 工程技术-机器人学
CiteScore
7.90
自引率
5.70%
发文量
46
审稿时长
3 months
期刊介绍: Autonomous Robots reports on the theory and applications of robotic systems capable of some degree of self-sufficiency. It features papers that include performance data on actual robots in the real world. Coverage includes: control of autonomous robots · real-time vision · autonomous wheeled and tracked vehicles · legged vehicles · computational architectures for autonomous systems · distributed architectures for learning, control and adaptation · studies of autonomous robot systems · sensor fusion · theory of autonomous systems · terrain mapping and recognition · self-calibration and self-repair for robots · self-reproducing intelligent structures · genetic algorithms as models for robot development. The focus is on the ability to move and be self-sufficient, not on whether the system is an imitation of biology. Of course, biological models for robotic systems are of major interest to the journal since living systems are prototypes for autonomous behavior.
期刊最新文献
View: visual imitation learning with waypoints Safe and stable teleoperation of quadrotor UAVs under haptic shared autonomy Synthesizing compact behavior trees for probabilistic robotics domains Integrative biomechanics of a human–robot carrying task: implications for future collaborative work Mori-zwanzig approach for belief abstraction with application to belief space planning
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1