AlignBot: Aligning VLM-powered Customized Task Planning with User Reminders Through Fine-Tuning for Household Robots

arXiv - CS - Robotics Pub Date : 2024-09-18 DOI:arxiv-2409.11905

Zhaxizhuoma, Pengan Chen, Ziniu Wu, Jiawei Sun, Dong Wang, Peng Zhou, Nieqing Cao, Yan Ding, Bin Zhao, Xuelong Li

{"title":"AlignBot: Aligning VLM-powered Customized Task Planning with User Reminders Through Fine-Tuning for Household Robots","authors":"Zhaxizhuoma, Pengan Chen, Ziniu Wu, Jiawei Sun, Dong Wang, Peng Zhou, Nieqing Cao, Yan Ding, Bin Zhao, Xuelong Li","doi":"arxiv-2409.11905","DOIUrl":null,"url":null,"abstract":"This paper presents AlignBot, a novel framework designed to optimize\nVLM-powered customized task planning for household robots by effectively\naligning with user reminders. In domestic settings, aligning task planning with\nuser reminders poses significant challenges due to the limited quantity,\ndiversity, and multimodal nature of the reminders. To address these challenges,\nAlignBot employs a fine-tuned LLaVA-7B model, functioning as an adapter for\nGPT-4o. This adapter model internalizes diverse forms of user reminders-such as\npersonalized preferences, corrective guidance, and contextual assistance-into\nstructured instruction-formatted cues that prompt GPT-4o in generating\ncustomized task plans. Additionally, AlignBot integrates a dynamic retrieval\nmechanism that selects task-relevant historical successes as prompts for\nGPT-4o, further enhancing task planning accuracy. To validate the effectiveness\nof AlignBot, experiments are conducted in real-world household environments,\nwhich are constructed within the laboratory to replicate typical household\nsettings. A multimodal dataset with over 1,500 entries derived from volunteer\nreminders is used for training and evaluation. The results demonstrate that\nAlignBot significantly improves customized task planning, outperforming\nexisting LLM- and VLM-powered planners by interpreting and aligning with user\nreminders, achieving 86.8% success rate compared to the vanilla GPT-4o baseline\nat 21.6%, reflecting a 65% improvement and over four times greater\neffectiveness. Supplementary materials are available at:\nhttps://yding25.com/AlignBot/","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":"32 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Robotics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11905","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

This paper presents AlignBot, a novel framework designed to optimize VLM-powered customized task planning for household robots by effectively aligning with user reminders. In domestic settings, aligning task planning with user reminders poses significant challenges due to the limited quantity, diversity, and multimodal nature of the reminders. To address these challenges, AlignBot employs a fine-tuned LLaVA-7B model, functioning as an adapter for GPT-4o. This adapter model internalizes diverse forms of user reminders-such as personalized preferences, corrective guidance, and contextual assistance-into structured instruction-formatted cues that prompt GPT-4o in generating customized task plans. Additionally, AlignBot integrates a dynamic retrieval mechanism that selects task-relevant historical successes as prompts for GPT-4o, further enhancing task planning accuracy. To validate the effectiveness of AlignBot, experiments are conducted in real-world household environments, which are constructed within the laboratory to replicate typical household settings. A multimodal dataset with over 1,500 entries derived from volunteer reminders is used for training and evaluation. The results demonstrate that AlignBot significantly improves customized task planning, outperforming existing LLM- and VLM-powered planners by interpreting and aligning with user reminders, achieving 86.8% success rate compared to the vanilla GPT-4o baseline at 21.6%, reflecting a 65% improvement and over four times greater effectiveness. Supplementary materials are available at: https://yding25.com/AlignBot/

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

AlignBot：通过对家用机器人进行微调，使 VLM 驱动的定制任务规划与用户提醒相一致

本文介绍了 AlignBot，这是一个新颖的框架，旨在通过有效地与用户提醒保持一致，优化由 VLM 驱动的家用机器人定制任务规划。在家庭环境中，由于提醒的数量、多样性和多模态性有限，使任务规划与用户提醒保持一致面临着巨大挑战。为了应对这些挑战，AlignBot 采用了经过微调的 LLaVA-7B 模型，作为 GPT-4o 的适配器。该适配器模型将多种形式的用户提醒（如个性化偏好、纠正指导和上下文帮助）内化为结构化指令格式的提示，从而促使 GPT-4o 生成定制的任务计划。此外，AlignBot 还集成了动态检索机制，可选择与任务相关的历史成功案例作为 GPT-4o 的提示，从而进一步提高任务规划的准确性。为了验证 AlignBot 的有效性，我们在真实的家庭环境中进行了实验。训练和评估使用了一个多模式数据集，该数据集包含来自志愿者提醒的 1,500 多个条目。结果表明，AlignBot 显著改进了定制任务规划，通过解释和对齐用户提醒，它的表现优于现有的 LLM 和 VLM 驱动的规划器，成功率达到 86.8%，而 vanilla GPT-4o 的基线成功率仅为 21.6%，提高了 65%，效率提高了四倍多。补充材料见：https://yding25.com/AlignBot/

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

arXiv - CS - Robotics

自引率

0.00%

发文量