Language-Guided Dexterous Functional Grasping by LLM Generated Grasp Functionality and Synergy for Humanoid Manipulation

IF 6.4 2区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS IEEE Transactions on Automation Science and Engineering Pub Date : 2025-01-22 DOI:10.1109/TASE.2024.3524426

Zhuo Li;Junjia Liu;Zhihao Li;Zhipeng Dong;Tao Teng;Yongsheng Ou;Darwin Caldwell;Fei Chen

{"title":"Language-Guided Dexterous Functional Grasping by LLM Generated Grasp Functionality and Synergy for Humanoid Manipulation","authors":"Zhuo Li;Junjia Liu;Zhihao Li;Zhipeng Dong;Tao Teng;Yongsheng Ou;Darwin Caldwell;Fei Chen","doi":"10.1109/TASE.2024.3524426","DOIUrl":null,"url":null,"abstract":"Dexterous Functional Grasping (DFG) is the crucial first step for humanoid robots to perform generalized manipulation tasks. However, enabling robots to learn language-guided DFG skills in real-world environments presents several challenges, including comprehending the complex relationship between task instructions and grasp functionality, generating feasible functional grasps of dexterous hands, and handling generalization for novel functional concepts. To tackle these challenges, we introduce SayFuncGrasp, a Large Language Model (LLM) based DFG framework that can synthesize versatile dexterous functional grasps from language instructions and achieve generalization on novel functional concepts. SayFuncGrasp first harnesses the open-ended manipulation knowledge from an LLM to infer grasp functionality based on language instructions. Subsequently, it employs the inferred grasp functionality to synthesize plausible DFG actions characterized by hand synergies. Simulation experiments show that SayFuncGrasp significantly outperforms the baseline method in open-set grasp functionality generalization. Real robot experiments demonstrate the effectiveness and generalizability of SayFuncGrasp for interactive humanoid manipulation tasks, achieving an overall grasp success rate of 64.66% and a manipulation success rate of 70.41%. Note to Practitioners—This research was motivated by the practical challenge of enabling humanoid robots with high-DoF dexterous hands to perform functional grasping based on verbal instructions. In industrial settings, such capabilities can significantly enhance the versatility and adaptability of humanoid assistants, allowing them to perform complex manipulations simply by being told what to do, thereby reducing programming complexity and increasing flexibility. Current dexterous functional grasping methods rely solely on visual input, without the ability to process language instructions. Furthermore, they are restricted to pre-defined functional concepts and cannot be generalized to novel object classes and manipulation tasks within natural language. Our newly proposed language-guided dexterous functional grasping system takes advantage of open-ended manipulation knowledge from LLMs to produce generalized functional grasps of dexterous robot hands according to verbal commands. Our experiment results demonstrate improved versatility and generalizability compared to the state-of-the-art.","PeriodicalId":51060,"journal":{"name":"IEEE Transactions on Automation Science and Engineering","volume":"22 ","pages":"10506-10519"},"PeriodicalIF":6.4000,"publicationDate":"2025-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Automation Science and Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10849537/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Dexterous Functional Grasping (DFG) is the crucial first step for humanoid robots to perform generalized manipulation tasks. However, enabling robots to learn language-guided DFG skills in real-world environments presents several challenges, including comprehending the complex relationship between task instructions and grasp functionality, generating feasible functional grasps of dexterous hands, and handling generalization for novel functional concepts. To tackle these challenges, we introduce SayFuncGrasp, a Large Language Model (LLM) based DFG framework that can synthesize versatile dexterous functional grasps from language instructions and achieve generalization on novel functional concepts. SayFuncGrasp first harnesses the open-ended manipulation knowledge from an LLM to infer grasp functionality based on language instructions. Subsequently, it employs the inferred grasp functionality to synthesize plausible DFG actions characterized by hand synergies. Simulation experiments show that SayFuncGrasp significantly outperforms the baseline method in open-set grasp functionality generalization. Real robot experiments demonstrate the effectiveness and generalizability of SayFuncGrasp for interactive humanoid manipulation tasks, achieving an overall grasp success rate of 64.66% and a manipulation success rate of 70.41%. Note to Practitioners—This research was motivated by the practical challenge of enabling humanoid robots with high-DoF dexterous hands to perform functional grasping based on verbal instructions. In industrial settings, such capabilities can significantly enhance the versatility and adaptability of humanoid assistants, allowing them to perform complex manipulations simply by being told what to do, thereby reducing programming complexity and increasing flexibility. Current dexterous functional grasping methods rely solely on visual input, without the ability to process language instructions. Furthermore, they are restricted to pre-defined functional concepts and cannot be generalized to novel object classes and manipulation tasks within natural language. Our newly proposed language-guided dexterous functional grasping system takes advantage of open-ended manipulation knowledge from LLMs to produce generalized functional grasps of dexterous robot hands according to verbal commands. Our experiment results demonstrate improved versatility and generalizability compared to the state-of-the-art.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于LLM的语言引导灵巧功能抓取方法及其对仿人操作的协同作用

灵巧功能抓取（DFG）是类人机器人完成广义操作任务的关键一步。然而，让机器人在现实环境中学习语言引导的DFG技能存在一些挑战，包括理解任务指令和抓取功能之间的复杂关系，生成灵巧手的可行功能抓取，以及处理新功能概念的泛化。为了应对这些挑战，我们引入了SayFuncGrasp，这是一个基于大语言模型（LLM）的DFG框架，它可以从语言指令中合成多种灵巧的功能抓取，并实现对新功能概念的泛化。SayFuncGrasp首先利用来自LLM的开放式操作知识来推断基于语言指令的抓取功能。随后，它利用推断的抓取功能来合成以手协同为特征的貌似合理的DFG动作。仿真实验表明，SayFuncGrasp在开集抓取功能泛化方面明显优于基线方法。真实机器人实验证明了SayFuncGrasp在交互式类人操作任务中的有效性和通用性，总体抓取成功率为64.66%，操作成功率为70.41%。从业人员注意事项-这项研究的动机是使具有高自由度灵巧手的人形机器人能够根据口头指令执行功能性抓取的实际挑战。在工业环境中，这种能力可以显著增强类人助手的多功能性和适应性，允许他们简单地通过被告知做什么来执行复杂的操作，从而降低编程的复杂性并增加灵活性。目前灵巧的功能性抓取方法完全依赖于视觉输入，没有处理语言指令的能力。此外，它们仅限于预定义的功能概念，不能推广到自然语言中的新对象类和操作任务。我们提出的语言引导灵巧功能抓取系统利用法学硕士的开放式操作知识，根据口头命令产生灵巧机械手的广义功能抓取。我们的实验结果表明，与最先进的技术相比，我们的多功能性和通用性得到了提高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Transactions on Automation Science and Engineering 工程技术-自动化与控制系统

CiteScore

12.50

自引率

14.30%

发文量

404

审稿时长

3.0 months

期刊介绍： The IEEE Transactions on Automation Science and Engineering (T-ASE) publishes fundamental papers on Automation, emphasizing scientific results that advance efficiency, quality, productivity, and reliability. T-ASE encourages interdisciplinary approaches from computer science, control systems, electrical engineering, mathematics, mechanical engineering, operations research, and other fields. T-ASE welcomes results relevant to industries such as agriculture, biotechnology, healthcare, home automation, maintenance, manufacturing, pharmaceuticals, retail, security, service, supply chains, and transportation. T-ASE addresses a research community willing to integrate knowledge across disciplines and industries. For this purpose, each paper includes a Note to Practitioners that summarizes how its results can be applied or how they might be extended to apply in practice.