Language-Guided Dexterous Functional Grasping by LLM Generated Grasp Functionality and Synergy for Humanoid Manipulation

IF 6.4 2区 计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS IEEE Transactions on Automation Science and Engineering Pub Date : 2025-01-22 DOI:10.1109/TASE.2024.3524426
Zhuo Li;Junjia Liu;Zhihao Li;Zhipeng Dong;Tao Teng;Yongsheng Ou;Darwin Caldwell;Fei Chen
{"title":"Language-Guided Dexterous Functional Grasping by LLM Generated Grasp Functionality and Synergy for Humanoid Manipulation","authors":"Zhuo Li;Junjia Liu;Zhihao Li;Zhipeng Dong;Tao Teng;Yongsheng Ou;Darwin Caldwell;Fei Chen","doi":"10.1109/TASE.2024.3524426","DOIUrl":null,"url":null,"abstract":"Dexterous Functional Grasping (DFG) is the crucial first step for humanoid robots to perform generalized manipulation tasks. However, enabling robots to learn language-guided DFG skills in real-world environments presents several challenges, including comprehending the complex relationship between task instructions and grasp functionality, generating feasible functional grasps of dexterous hands, and handling generalization for novel functional concepts. To tackle these challenges, we introduce SayFuncGrasp, a Large Language Model (LLM) based DFG framework that can synthesize versatile dexterous functional grasps from language instructions and achieve generalization on novel functional concepts. SayFuncGrasp first harnesses the open-ended manipulation knowledge from an LLM to infer grasp functionality based on language instructions. Subsequently, it employs the inferred grasp functionality to synthesize plausible DFG actions characterized by hand synergies. Simulation experiments show that SayFuncGrasp significantly outperforms the baseline method in open-set grasp functionality generalization. Real robot experiments demonstrate the effectiveness and generalizability of SayFuncGrasp for interactive humanoid manipulation tasks, achieving an overall grasp success rate of 64.66% and a manipulation success rate of 70.41%. Note to Practitioners—This research was motivated by the practical challenge of enabling humanoid robots with high-DoF dexterous hands to perform functional grasping based on verbal instructions. In industrial settings, such capabilities can significantly enhance the versatility and adaptability of humanoid assistants, allowing them to perform complex manipulations simply by being told what to do, thereby reducing programming complexity and increasing flexibility. Current dexterous functional grasping methods rely solely on visual input, without the ability to process language instructions. Furthermore, they are restricted to pre-defined functional concepts and cannot be generalized to novel object classes and manipulation tasks within natural language. Our newly proposed language-guided dexterous functional grasping system takes advantage of open-ended manipulation knowledge from LLMs to produce generalized functional grasps of dexterous robot hands according to verbal commands. Our experiment results demonstrate improved versatility and generalizability compared to the state-of-the-art.","PeriodicalId":51060,"journal":{"name":"IEEE Transactions on Automation Science and Engineering","volume":"22 ","pages":"10506-10519"},"PeriodicalIF":6.4000,"publicationDate":"2025-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Automation Science and Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10849537/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Dexterous Functional Grasping (DFG) is the crucial first step for humanoid robots to perform generalized manipulation tasks. However, enabling robots to learn language-guided DFG skills in real-world environments presents several challenges, including comprehending the complex relationship between task instructions and grasp functionality, generating feasible functional grasps of dexterous hands, and handling generalization for novel functional concepts. To tackle these challenges, we introduce SayFuncGrasp, a Large Language Model (LLM) based DFG framework that can synthesize versatile dexterous functional grasps from language instructions and achieve generalization on novel functional concepts. SayFuncGrasp first harnesses the open-ended manipulation knowledge from an LLM to infer grasp functionality based on language instructions. Subsequently, it employs the inferred grasp functionality to synthesize plausible DFG actions characterized by hand synergies. Simulation experiments show that SayFuncGrasp significantly outperforms the baseline method in open-set grasp functionality generalization. Real robot experiments demonstrate the effectiveness and generalizability of SayFuncGrasp for interactive humanoid manipulation tasks, achieving an overall grasp success rate of 64.66% and a manipulation success rate of 70.41%. Note to Practitioners—This research was motivated by the practical challenge of enabling humanoid robots with high-DoF dexterous hands to perform functional grasping based on verbal instructions. In industrial settings, such capabilities can significantly enhance the versatility and adaptability of humanoid assistants, allowing them to perform complex manipulations simply by being told what to do, thereby reducing programming complexity and increasing flexibility. Current dexterous functional grasping methods rely solely on visual input, without the ability to process language instructions. Furthermore, they are restricted to pre-defined functional concepts and cannot be generalized to novel object classes and manipulation tasks within natural language. Our newly proposed language-guided dexterous functional grasping system takes advantage of open-ended manipulation knowledge from LLMs to produce generalized functional grasps of dexterous robot hands according to verbal commands. Our experiment results demonstrate improved versatility and generalizability compared to the state-of-the-art.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于LLM的语言引导灵巧功能抓取方法及其对仿人操作的协同作用
灵巧功能抓取(DFG)是类人机器人完成广义操作任务的关键一步。然而,让机器人在现实环境中学习语言引导的DFG技能存在一些挑战,包括理解任务指令和抓取功能之间的复杂关系,生成灵巧手的可行功能抓取,以及处理新功能概念的泛化。为了应对这些挑战,我们引入了SayFuncGrasp,这是一个基于大语言模型(LLM)的DFG框架,它可以从语言指令中合成多种灵巧的功能抓取,并实现对新功能概念的泛化。SayFuncGrasp首先利用来自LLM的开放式操作知识来推断基于语言指令的抓取功能。随后,它利用推断的抓取功能来合成以手协同为特征的貌似合理的DFG动作。仿真实验表明,SayFuncGrasp在开集抓取功能泛化方面明显优于基线方法。真实机器人实验证明了SayFuncGrasp在交互式类人操作任务中的有效性和通用性,总体抓取成功率为64.66%,操作成功率为70.41%。从业人员注意事项-这项研究的动机是使具有高自由度灵巧手的人形机器人能够根据口头指令执行功能性抓取的实际挑战。在工业环境中,这种能力可以显著增强类人助手的多功能性和适应性,允许他们简单地通过被告知做什么来执行复杂的操作,从而降低编程的复杂性并增加灵活性。目前灵巧的功能性抓取方法完全依赖于视觉输入,没有处理语言指令的能力。此外,它们仅限于预定义的功能概念,不能推广到自然语言中的新对象类和操作任务。我们提出的语言引导灵巧功能抓取系统利用法学硕士的开放式操作知识,根据口头命令产生灵巧机械手的广义功能抓取。我们的实验结果表明,与最先进的技术相比,我们的多功能性和通用性得到了提高。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IEEE Transactions on Automation Science and Engineering
IEEE Transactions on Automation Science and Engineering 工程技术-自动化与控制系统
CiteScore
12.50
自引率
14.30%
发文量
404
审稿时长
3.0 months
期刊介绍: The IEEE Transactions on Automation Science and Engineering (T-ASE) publishes fundamental papers on Automation, emphasizing scientific results that advance efficiency, quality, productivity, and reliability. T-ASE encourages interdisciplinary approaches from computer science, control systems, electrical engineering, mathematics, mechanical engineering, operations research, and other fields. T-ASE welcomes results relevant to industries such as agriculture, biotechnology, healthcare, home automation, maintenance, manufacturing, pharmaceuticals, retail, security, service, supply chains, and transportation. T-ASE addresses a research community willing to integrate knowledge across disciplines and industries. For this purpose, each paper includes a Note to Practitioners that summarizes how its results can be applied or how they might be extended to apply in practice.
期刊最新文献
A Novel Likelihood Gradient-Based Incipient Fault Detection Approach for Avionics Systems Finite-time Adaptive FeedForward Fractional-order RISE α Control of an Actuated Ankle-Foot Orthosis Reinforcement Learning-Based Whole-Body Motion Control for Humanoids with Position-Controlled Joints Frame-level temporal action segmentation in nonhuman primates by fusing skeleton and visual modalities Nonsingular Generalized Adjustable Predefined-Time Sliding Mode Controllers with Adaptive Predefined-Time Observers for Nonlinear Dynamical Systems
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1