观察然后行动：机器人操作的异步主动视觉-动作模型

IF 5.3 2区计算机科学 Q2 ROBOTICS IEEE Robotics and Automation Letters Pub Date : 2025-02-12 DOI:10.1109/LRA.2025.3541334

Guokang Wang;Hang Li;Shuyuan Zhang;Di Guo;Yanhong Liu;Huaping Liu

{"title":"观察然后行动：机器人操作的异步主动视觉-动作模型","authors":"Guokang Wang;Hang Li;Shuyuan Zhang;Di Guo;Yanhong Liu;Huaping Liu","doi":"10.1109/LRA.2025.3541334","DOIUrl":null,"url":null,"abstract":"In real-world scenarios, many robotic manipulation tasks are hindered by occlusions and limited fields of view, posing significant challenges for passive observation-based models that rely on fixed or wrist-mounted cameras. In this letter, we investigate the problem of robotic manipulation under limited visual observation and propose a task-driven asynchronous active vision-action model. Our model serially connects a camera Next-Best-View (NBV) policy with a gripper Next-Best-Pose (NBP) policy, and trains them in a sensor-motor coordination framework using few-shot reinforcement learning. This approach enables the agent to reposition a third-person camera to actively observe the environment based on the task goal, and subsequently determine the appropriate manipulation actions. We trained and evaluated our model on 8 viewpoint-constrained tasks in RLBench. The results demonstrate that our model consistently outperforms baseline algorithms, showcasing its effectiveness in handling visual constraints in manipulation tasks.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 4","pages":"3422-3429"},"PeriodicalIF":5.3000,"publicationDate":"2025-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Observe Then Act: Asynchronous Active Vision-Action Model for Robotic Manipulation\",\"authors\":\"Guokang Wang;Hang Li;Shuyuan Zhang;Di Guo;Yanhong Liu;Huaping Liu\",\"doi\":\"10.1109/LRA.2025.3541334\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In real-world scenarios, many robotic manipulation tasks are hindered by occlusions and limited fields of view, posing significant challenges for passive observation-based models that rely on fixed or wrist-mounted cameras. In this letter, we investigate the problem of robotic manipulation under limited visual observation and propose a task-driven asynchronous active vision-action model. Our model serially connects a camera Next-Best-View (NBV) policy with a gripper Next-Best-Pose (NBP) policy, and trains them in a sensor-motor coordination framework using few-shot reinforcement learning. This approach enables the agent to reposition a third-person camera to actively observe the environment based on the task goal, and subsequently determine the appropriate manipulation actions. We trained and evaluated our model on 8 viewpoint-constrained tasks in RLBench. The results demonstrate that our model consistently outperforms baseline algorithms, showcasing its effectiveness in handling visual constraints in manipulation tasks.\",\"PeriodicalId\":13241,\"journal\":{\"name\":\"IEEE Robotics and Automation Letters\",\"volume\":\"10 4\",\"pages\":\"3422-3429\"},\"PeriodicalIF\":5.3000,\"publicationDate\":\"2025-02-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Robotics and Automation Letters\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10883018/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ROBOTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Robotics and Automation Letters","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10883018/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ROBOTICS","Score":null,"Total":0}

引用次数: 0

摘要

在现实场景中，许多机器人操作任务受到遮挡和有限视野的阻碍，这对依赖固定或腕戴式相机的被动观察模型提出了重大挑战。在这封信中，我们研究了有限视觉观察下的机器人操作问题，并提出了一个任务驱动的异步主动视觉-动作模型。我们的模型将相机的下一个最佳视角（NBV）策略与抓手的下一个最佳姿势（NBP）策略串联起来，并使用少镜头强化学习在传感器运动协调框架中训练它们。该方法使代理能够根据任务目标重新定位第三人称摄像机，主动观察环境，并随后确定适当的操作动作。我们在RLBench中的8个视点约束任务上训练和评估了我们的模型。结果表明，我们的模型始终优于基线算法，展示了其在处理操作任务中的视觉约束方面的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Observe Then Act: Asynchronous Active Vision-Action Model for Robotic Manipulation

In real-world scenarios, many robotic manipulation tasks are hindered by occlusions and limited fields of view, posing significant challenges for passive observation-based models that rely on fixed or wrist-mounted cameras. In this letter, we investigate the problem of robotic manipulation under limited visual observation and propose a task-driven asynchronous active vision-action model. Our model serially connects a camera Next-Best-View (NBV) policy with a gripper Next-Best-Pose (NBP) policy, and trains them in a sensor-motor coordination framework using few-shot reinforcement learning. This approach enables the agent to reposition a third-person camera to actively observe the environment based on the task goal, and subsequently determine the appropriate manipulation actions. We trained and evaluated our model on 8 viewpoint-constrained tasks in RLBench. The results demonstrate that our model consistently outperforms baseline algorithms, showcasing its effectiveness in handling visual constraints in manipulation tasks.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Robotics and Automation Letters Computer Science-Computer Science Applications

CiteScore

9.60

自引率

15.40%

发文量

1428

期刊介绍： The scope of this journal is to publish peer-reviewed articles that provide a timely and concise account of innovative research ideas and application results, reporting significant theoretical findings and application case studies in areas of robotics and automation.