视图：带路径点的视觉模仿学习

IF 3.7 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Autonomous Robots Pub Date : 2025-01-18 DOI:10.1007/s10514-024-10188-y

Ananth Jonnavittula, Sagar Parekh, Dylan P. Losey

{"title":"视图：带路径点的视觉模仿学习","authors":"Ananth Jonnavittula, Sagar Parekh, Dylan P. Losey","doi":"10.1007/s10514-024-10188-y","DOIUrl":null,"url":null,"abstract":"<div>Robots can use visual imitation learning (VIL) to learn manipulation tasks from video demonstrations. However, translating visual observations into actionable robot policies is challenging due to the high-dimensional nature of video data. This challenge is further exacerbated by the morphological differences between humans and robots, especially when the video demonstrations feature humans performing tasks. To address these problems we introduce Visual Imitation lEarning with Waypoints (VIEW), an algorithm that significantly enhances the sample efficiency of human-to-robot VIL. VIEW achieves this efficiency using a multi-pronged approach: extracting a condensed prior trajectory that captures the demonstrator’s intent, employing an agent-agnostic reward function for feedback on the robot’s actions, and utilizing an exploration algorithm that efficiently samples around waypoints in the extracted trajectory. VIEW also segments the human trajectory into grasp and task phases to further accelerate learning efficiency. Through comprehensive simulations and real-world experiments, VIEW demonstrates improved performance compared to current state-of-the-art VIL methods. VIEW enables robots to learn manipulation tasks involving multiple objects from arbitrarily long video demonstrations. Additionally, it can learn standard manipulation tasks such as pushing or moving objects from a single video demonstration in under 30 min, with fewer than 20 real-world rollouts. Code and videos here: https://collab.me.vt.edu/view/</div>","PeriodicalId":55409,"journal":{"name":"Autonomous Robots","volume":"49 1","pages":""},"PeriodicalIF":3.7000,"publicationDate":"2025-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10514-024-10188-y.pdf","citationCount":"0","resultStr":"{\"title\":\"View: visual imitation learning with waypoints\",\"authors\":\"Ananth Jonnavittula, Sagar Parekh, Dylan P. Losey\",\"doi\":\"10.1007/s10514-024-10188-y\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>Robots can use visual imitation learning (VIL) to learn manipulation tasks from video demonstrations. However, translating visual observations into actionable robot policies is challenging due to the high-dimensional nature of video data. This challenge is further exacerbated by the morphological differences between humans and robots, especially when the video demonstrations feature humans performing tasks. To address these problems we introduce Visual Imitation lEarning with Waypoints (VIEW), an algorithm that significantly enhances the sample efficiency of human-to-robot VIL. VIEW achieves this efficiency using a multi-pronged approach: extracting a condensed prior trajectory that captures the demonstrator’s intent, employing an agent-agnostic reward function for feedback on the robot’s actions, and utilizing an exploration algorithm that efficiently samples around waypoints in the extracted trajectory. VIEW also segments the human trajectory into grasp and task phases to further accelerate learning efficiency. Through comprehensive simulations and real-world experiments, VIEW demonstrates improved performance compared to current state-of-the-art VIL methods. VIEW enables robots to learn manipulation tasks involving multiple objects from arbitrarily long video demonstrations. Additionally, it can learn standard manipulation tasks such as pushing or moving objects from a single video demonstration in under 30 min, with fewer than 20 real-world rollouts. Code and videos here: https://collab.me.vt.edu/view/</div>\",\"PeriodicalId\":55409,\"journal\":{\"name\":\"Autonomous Robots\",\"volume\":\"49 1\",\"pages\":\"\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2025-01-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://link.springer.com/content/pdf/10.1007/s10514-024-10188-y.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Autonomous Robots\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://link.springer.com/article/10.1007/s10514-024-10188-y\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Autonomous Robots","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10514-024-10188-y","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

机器人可以使用视觉模仿学习（VIL）从视频演示中学习操作任务。然而，由于视频数据的高维性质，将视觉观察转化为可操作的机器人政策是具有挑战性的。人类和机器人之间的形态差异进一步加剧了这一挑战，特别是当视频演示以人类执行任务为特征时。为了解决这些问题，我们引入了带有路径点的视觉模仿学习（VIEW）算法，该算法显著提高了人对机器人VIL的采样效率。VIEW通过多管齐下的方法实现了这一效率：提取一个浓缩的先验轨迹，捕捉演示者的意图，采用一个代理不可知的奖励函数来反馈机器人的动作，并利用一个探索算法，在提取的轨迹中有效地对路点进行采样。VIEW还将人的轨迹划分为掌握和任务阶段，以进一步提高学习效率。通过全面的仿真和真实世界的实验，VIEW展示了与当前最先进的VIL方法相比，其性能有所提高。VIEW使机器人能够从任意长的视频演示中学习涉及多个对象的操作任务。此外，它可以在30分钟内从一个视频演示中学习标准的操作任务，例如推动或移动物体，而现实世界的演示次数不到20次。代码和视频在这里：https://collab.me.vt.edu/view/

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

View: visual imitation learning with waypoints

Robots can use visual imitation learning (VIL) to learn manipulation tasks from video demonstrations. However, translating visual observations into actionable robot policies is challenging due to the high-dimensional nature of video data. This challenge is further exacerbated by the morphological differences between humans and robots, especially when the video demonstrations feature humans performing tasks. To address these problems we introduce Visual Imitation lEarning with Waypoints (VIEW), an algorithm that significantly enhances the sample efficiency of human-to-robot VIL. VIEW achieves this efficiency using a multi-pronged approach: extracting a condensed prior trajectory that captures the demonstrator’s intent, employing an agent-agnostic reward function for feedback on the robot’s actions, and utilizing an exploration algorithm that efficiently samples around waypoints in the extracted trajectory. VIEW also segments the human trajectory into grasp and task phases to further accelerate learning efficiency. Through comprehensive simulations and real-world experiments, VIEW demonstrates improved performance compared to current state-of-the-art VIL methods. VIEW enables robots to learn manipulation tasks involving multiple objects from arbitrarily long video demonstrations. Additionally, it can learn standard manipulation tasks such as pushing or moving objects from a single video demonstration in under 30 min, with fewer than 20 real-world rollouts. Code and videos here: https://collab.me.vt.edu/view/

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Autonomous Robots 工程技术-机器人学

CiteScore

7.90

自引率

5.70%

发文量

审稿时长

3 months

期刊介绍： Autonomous Robots reports on the theory and applications of robotic systems capable of some degree of self-sufficiency. It features papers that include performance data on actual robots in the real world. Coverage includes: control of autonomous robots · real-time vision · autonomous wheeled and tracked vehicles · legged vehicles · computational architectures for autonomous systems · distributed architectures for learning, control and adaptation · studies of autonomous robot systems · sensor fusion · theory of autonomous systems · terrain mapping and recognition · self-calibration and self-repair for robots · self-reproducing intelligent structures · genetic algorithms as models for robot development. The focus is on the ability to move and be self-sufficient, not on whether the system is an imitation of biology. Of course, biological models for robotic systems are of major interest to the journal since living systems are prototypes for autonomous behavior.

期刊最新文献

Isolated Kalman filtering: theory and decoupled estimator design Eigen-factors a bilevel optimization for plane SLAM of 3D point clouds View: visual imitation learning with waypoints Safe and stable teleoperation of quadrotor UAVs under haptic shared autonomy Synthesizing compact behavior trees for probabilistic robotics domains