PLATO：利用 LLM 和 Affordances 进行工具操作规划

arXiv - CS - Robotics Pub Date : 2024-09-17 DOI:arxiv-2409.11580

Arvind Car, Sai Sravan Yarlagadda, Alison Bartsch, Abraham George, Amir Barati Farimani

{"title":"PLATO：利用 LLM 和 Affordances 进行工具操作规划","authors":"Arvind Car, Sai Sravan Yarlagadda, Alison Bartsch, Abraham George, Amir Barati Farimani","doi":"arxiv-2409.11580","DOIUrl":null,"url":null,"abstract":"As robotic systems become increasingly integrated into complex real-world\nenvironments, there is a growing need for approaches that enable robots to\nunderstand and act upon natural language instructions without relying on\nextensive pre-programmed knowledge of their surroundings. This paper presents\nPLATO, an innovative system that addresses this challenge by leveraging\nspecialized large language model agents to process natural language inputs,\nunderstand the environment, predict tool affordances, and generate executable\nactions for robotic systems. Unlike traditional systems that depend on\nhard-coded environmental information, PLATO employs a modular architecture of\nspecialized agents to operate without any initial knowledge of the environment.\nThese agents identify objects and their locations within the scene, generate a\ncomprehensive high-level plan, translate this plan into a series of low-level\nactions, and verify the completion of each step. The system is particularly\ntested on challenging tool-use tasks, which involve handling diverse objects\nand require long-horizon planning. PLATO's design allows it to adapt to dynamic\nand unstructured settings, significantly enhancing its flexibility and\nrobustness. By evaluating the system across various complex scenarios, we\ndemonstrate its capability to tackle a diverse range of tasks and offer a novel\nsolution to integrate LLMs with robotic platforms, advancing the\nstate-of-the-art in autonomous robotic task execution. For videos and prompt\ndetails, please see our project website:\nhttps://sites.google.com/andrew.cmu.edu/plato","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":"5 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"PLATO: Planning with LLMs and Affordances for Tool Manipulation\",\"authors\":\"Arvind Car, Sai Sravan Yarlagadda, Alison Bartsch, Abraham George, Amir Barati Farimani\",\"doi\":\"arxiv-2409.11580\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As robotic systems become increasingly integrated into complex real-world\\nenvironments, there is a growing need for approaches that enable robots to\\nunderstand and act upon natural language instructions without relying on\\nextensive pre-programmed knowledge of their surroundings. This paper presents\\nPLATO, an innovative system that addresses this challenge by leveraging\\nspecialized large language model agents to process natural language inputs,\\nunderstand the environment, predict tool affordances, and generate executable\\nactions for robotic systems. Unlike traditional systems that depend on\\nhard-coded environmental information, PLATO employs a modular architecture of\\nspecialized agents to operate without any initial knowledge of the environment.\\nThese agents identify objects and their locations within the scene, generate a\\ncomprehensive high-level plan, translate this plan into a series of low-level\\nactions, and verify the completion of each step. The system is particularly\\ntested on challenging tool-use tasks, which involve handling diverse objects\\nand require long-horizon planning. PLATO's design allows it to adapt to dynamic\\nand unstructured settings, significantly enhancing its flexibility and\\nrobustness. By evaluating the system across various complex scenarios, we\\ndemonstrate its capability to tackle a diverse range of tasks and offer a novel\\nsolution to integrate LLMs with robotic platforms, advancing the\\nstate-of-the-art in autonomous robotic task execution. For videos and prompt\\ndetails, please see our project website:\\nhttps://sites.google.com/andrew.cmu.edu/plato\",\"PeriodicalId\":501031,\"journal\":{\"name\":\"arXiv - CS - Robotics\",\"volume\":\"5 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Robotics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.11580\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Robotics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11580","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

随着机器人系统越来越多地融入复杂的现实世界环境，人们越来越需要能让机器人理解自然语言指令并根据指令行动的方法，而无需依赖对周围环境的大量预编程知识。本文介绍了一种创新系统--PLATO，它利用专门的大型语言模型代理来处理自然语言输入、理解环境、预测工具承受能力，并为机器人系统生成可执行的动作，从而应对这一挑战。与依赖硬编码环境信息的传统系统不同，PLATO 采用了由专业代理组成的模块化架构，无需任何初始环境知识即可运行。这些代理可识别场景中的物体及其位置，生成全面的高级计划，将该计划转化为一系列低级动作，并验证每个步骤的完成情况。该系统特别在具有挑战性的工具使用任务中进行了测试，这些任务涉及处理各种不同的物体，需要进行长远规划。PLATO的设计使其能够适应动态和非结构化的环境，大大提高了灵活性和稳健性。通过在各种复杂场景中对该系统进行评估，我们展示了该系统处理各种任务的能力，并提供了将 LLM 与机器人平台集成的新型解决方案，从而推动了自主机器人任务执行技术的发展。有关视频和提示详情，请访问我们的项目网站：https://sites.google.com/andrew.cmu.edu/plato。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

PLATO: Planning with LLMs and Affordances for Tool Manipulation

As robotic systems become increasingly integrated into complex real-world environments, there is a growing need for approaches that enable robots to understand and act upon natural language instructions without relying on extensive pre-programmed knowledge of their surroundings. This paper presents PLATO, an innovative system that addresses this challenge by leveraging specialized large language model agents to process natural language inputs, understand the environment, predict tool affordances, and generate executable actions for robotic systems. Unlike traditional systems that depend on hard-coded environmental information, PLATO employs a modular architecture of specialized agents to operate without any initial knowledge of the environment. These agents identify objects and their locations within the scene, generate a comprehensive high-level plan, translate this plan into a series of low-level actions, and verify the completion of each step. The system is particularly tested on challenging tool-use tasks, which involve handling diverse objects and require long-horizon planning. PLATO's design allows it to adapt to dynamic and unstructured settings, significantly enhancing its flexibility and robustness. By evaluating the system across various complex scenarios, we demonstrate its capability to tackle a diverse range of tasks and offer a novel solution to integrate LLMs with robotic platforms, advancing the state-of-the-art in autonomous robotic task execution. For videos and prompt details, please see our project website: https://sites.google.com/andrew.cmu.edu/plato

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

arXiv - CS - Robotics

自引率

0.00%

发文量