{"title":"P-RAG: Progressive Retrieval Augmented Generation For Planning on Embodied Everyday Task","authors":"Weiye Xu, Min Wang, Wengang Zhou, Houqiang Li","doi":"arxiv-2409.11279","DOIUrl":null,"url":null,"abstract":"Embodied Everyday Task is a popular task in the embodied AI community,\nrequiring agents to make a sequence of actions based on natural language\ninstructions and visual observations. Traditional learning-based approaches\nface two challenges. Firstly, natural language instructions often lack explicit\ntask planning. Secondly, extensive training is required to equip models with\nknowledge of the task environment. Previous works based on Large Language Model\n(LLM) either suffer from poor performance due to the lack of task-specific\nknowledge or rely on ground truth as few-shot samples. To address the above\nlimitations, we propose a novel approach called Progressive Retrieval Augmented\nGeneration (P-RAG), which not only effectively leverages the powerful language\nprocessing capabilities of LLMs but also progressively accumulates\ntask-specific knowledge without ground-truth. Compared to the conventional RAG\nmethods, which retrieve relevant information from the database in a one-shot\nmanner to assist generation, P-RAG introduces an iterative approach to\nprogressively update the database. In each iteration, P-RAG retrieves the\nlatest database and obtains historical information from the previous\ninteraction as experiential references for the current interaction. Moreover,\nwe also introduce a more granular retrieval scheme that not only retrieves\nsimilar tasks but also incorporates retrieval of similar situations to provide\nmore valuable reference experiences. Extensive experiments reveal that P-RAG\nachieves competitive results without utilizing ground truth and can even\nfurther improve performance through self-iterations.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Robotics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11279","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Embodied Everyday Task is a popular task in the embodied AI community,
requiring agents to make a sequence of actions based on natural language
instructions and visual observations. Traditional learning-based approaches
face two challenges. Firstly, natural language instructions often lack explicit
task planning. Secondly, extensive training is required to equip models with
knowledge of the task environment. Previous works based on Large Language Model
(LLM) either suffer from poor performance due to the lack of task-specific
knowledge or rely on ground truth as few-shot samples. To address the above
limitations, we propose a novel approach called Progressive Retrieval Augmented
Generation (P-RAG), which not only effectively leverages the powerful language
processing capabilities of LLMs but also progressively accumulates
task-specific knowledge without ground-truth. Compared to the conventional RAG
methods, which retrieve relevant information from the database in a one-shot
manner to assist generation, P-RAG introduces an iterative approach to
progressively update the database. In each iteration, P-RAG retrieves the
latest database and obtains historical information from the previous
interaction as experiential references for the current interaction. Moreover,
we also introduce a more granular retrieval scheme that not only retrieves
similar tasks but also incorporates retrieval of similar situations to provide
more valuable reference experiences. Extensive experiments reveal that P-RAG
achieves competitive results without utilizing ground truth and can even
further improve performance through self-iterations.