AlignBot: Aligning VLM-powered Customized Task Planning with User Reminders Through Fine-Tuning for Household Robots

Zhaxizhuoma, Pengan Chen, Ziniu Wu, Jiawei Sun, Dong Wang, Peng Zhou, Nieqing Cao, Yan Ding, Bin Zhao, Xuelong Li
{"title":"AlignBot: Aligning VLM-powered Customized Task Planning with User Reminders Through Fine-Tuning for Household Robots","authors":"Zhaxizhuoma, Pengan Chen, Ziniu Wu, Jiawei Sun, Dong Wang, Peng Zhou, Nieqing Cao, Yan Ding, Bin Zhao, Xuelong Li","doi":"arxiv-2409.11905","DOIUrl":null,"url":null,"abstract":"This paper presents AlignBot, a novel framework designed to optimize\nVLM-powered customized task planning for household robots by effectively\naligning with user reminders. In domestic settings, aligning task planning with\nuser reminders poses significant challenges due to the limited quantity,\ndiversity, and multimodal nature of the reminders. To address these challenges,\nAlignBot employs a fine-tuned LLaVA-7B model, functioning as an adapter for\nGPT-4o. This adapter model internalizes diverse forms of user reminders-such as\npersonalized preferences, corrective guidance, and contextual assistance-into\nstructured instruction-formatted cues that prompt GPT-4o in generating\ncustomized task plans. Additionally, AlignBot integrates a dynamic retrieval\nmechanism that selects task-relevant historical successes as prompts for\nGPT-4o, further enhancing task planning accuracy. To validate the effectiveness\nof AlignBot, experiments are conducted in real-world household environments,\nwhich are constructed within the laboratory to replicate typical household\nsettings. A multimodal dataset with over 1,500 entries derived from volunteer\nreminders is used for training and evaluation. The results demonstrate that\nAlignBot significantly improves customized task planning, outperforming\nexisting LLM- and VLM-powered planners by interpreting and aligning with user\nreminders, achieving 86.8% success rate compared to the vanilla GPT-4o baseline\nat 21.6%, reflecting a 65% improvement and over four times greater\neffectiveness. Supplementary materials are available at:\nhttps://yding25.com/AlignBot/","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Robotics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11905","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

This paper presents AlignBot, a novel framework designed to optimize VLM-powered customized task planning for household robots by effectively aligning with user reminders. In domestic settings, aligning task planning with user reminders poses significant challenges due to the limited quantity, diversity, and multimodal nature of the reminders. To address these challenges, AlignBot employs a fine-tuned LLaVA-7B model, functioning as an adapter for GPT-4o. This adapter model internalizes diverse forms of user reminders-such as personalized preferences, corrective guidance, and contextual assistance-into structured instruction-formatted cues that prompt GPT-4o in generating customized task plans. Additionally, AlignBot integrates a dynamic retrieval mechanism that selects task-relevant historical successes as prompts for GPT-4o, further enhancing task planning accuracy. To validate the effectiveness of AlignBot, experiments are conducted in real-world household environments, which are constructed within the laboratory to replicate typical household settings. A multimodal dataset with over 1,500 entries derived from volunteer reminders is used for training and evaluation. The results demonstrate that AlignBot significantly improves customized task planning, outperforming existing LLM- and VLM-powered planners by interpreting and aligning with user reminders, achieving 86.8% success rate compared to the vanilla GPT-4o baseline at 21.6%, reflecting a 65% improvement and over four times greater effectiveness. Supplementary materials are available at: https://yding25.com/AlignBot/
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
AlignBot:通过对家用机器人进行微调,使 VLM 驱动的定制任务规划与用户提醒相一致
本文介绍了 AlignBot,这是一个新颖的框架,旨在通过有效地与用户提醒保持一致,优化由 VLM 驱动的家用机器人定制任务规划。在家庭环境中,由于提醒的数量、多样性和多模态性有限,使任务规划与用户提醒保持一致面临着巨大挑战。为了应对这些挑战,AlignBot 采用了经过微调的 LLaVA-7B 模型,作为 GPT-4o 的适配器。该适配器模型将多种形式的用户提醒(如个性化偏好、纠正指导和上下文帮助)内化为结构化指令格式的提示,从而促使 GPT-4o 生成定制的任务计划。此外,AlignBot 还集成了动态检索机制,可选择与任务相关的历史成功案例作为 GPT-4o 的提示,从而进一步提高任务规划的准确性。为了验证 AlignBot 的有效性,我们在真实的家庭环境中进行了实验。训练和评估使用了一个多模式数据集,该数据集包含来自志愿者提醒的 1,500 多个条目。结果表明,AlignBot 显著改进了定制任务规划,通过解释和对齐用户提醒,它的表现优于现有的 LLM 和 VLM 驱动的规划器,成功率达到 86.8%,而 vanilla GPT-4o 的基线成功率仅为 21.6%,提高了 65%,效率提高了四倍多。补充材料见:https://yding25.com/AlignBot/
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
IMRL: Integrating Visual, Physical, Temporal, and Geometric Representations for Enhanced Food Acquisition Human-Robot Cooperative Piano Playing with Learning-Based Real-Time Music Accompaniment GauTOAO: Gaussian-based Task-Oriented Affordance of Objects Reinforcement Learning with Lie Group Orientations for Robotics Haptic-ACT: Bridging Human Intuition with Compliant Robotic Manipulation via Immersive VR
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1