RIGNet: Robot Intention Grasp for Dense Stacked Targets With Multi-Task Siamese Schema Through RoIs Learning

IF 6.4 2区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS IEEE Transactions on Automation Science and Engineering Pub Date : 2024-12-31 DOI:10.1109/TASE.2024.3522625

Xungao Zhong;Tao Gong;Junzhi Yu;Chengxian Zhou;Xunyu Zhong;Qiang Liu

{"title":"RIGNet: Robot Intention Grasp for Dense Stacked Targets With Multi-Task Siamese Schema Through RoIs Learning","authors":"Xungao Zhong;Tao Gong;Junzhi Yu;Chengxian Zhou;Xunyu Zhong;Qiang Liu","doi":"10.1109/TASE.2024.3522625","DOIUrl":null,"url":null,"abstract":"Autonomous grasping is a critical topic of robotic embodied intelligence. However, it remains challenging for robots to grasp an intended target, particularly in cluttered and densely stacked environments. This paper presents a novel solution by proposing a Robot Intention Grasp Network (RIGNet) with a multi-task siamese schema, which is based on an improved Region Proposal Network (RPN) and a Region of Interest (RoI) learning approach. More specifically, the RPN robustly outputs the RoIs to describe the candidates in stacked scenes, while the multi-task siamese network consisting of category comprehension and grasp perception modules serves to detect the intended object and generate the optimal grasp configuration. To improve the reasoning precision for grasp posture, a dynamic-oriented anchor matching strategy is further proposed to adapt to the grasp perception module. The proposed RIGNet is comparable to the state-of-the-art algorithms on single-object tasks and has better performance on stacked multi-object grasp detection. This study can provide new insights for autonomous robot reasoning in real time, facilitating an understanding of both the rationale and methodology for grasping an intended target in highly stacked environments. Note to Practitioners—This work is motivated by robotic embodied intelligence technique, which plays an irreplaceable role in robot autonomous grasping manipulation. Using this technology, the robot is capable of performing grasp tasks for human. The previous grasp detection methods focus on optimizing grasp posture for successful robot grasping manipulation as maximum as possible, but ignore the comprehension of candidate object’s attributes. Hence, this paper proposes a novel robot intention grasp network (RIGNet) for robot perception of how to grasp and comprehend why grasping. The improved RPN module robustly predict the RoIs to locate candidates in a bunch of overlap and occlusion objects, and then the multi-task siamese module finely detect the object category corresponding to the grasp configuration. What is more important, the proposed technology does not lose efficiency in single-object grasping, and even outperforms the state-of-the-art algorithms in multi-object stacked grasping. The novel RIGNet schema is suitable for human intends a real robot to execute desired grasp tasks.","PeriodicalId":51060,"journal":{"name":"IEEE Transactions on Automation Science and Engineering","volume":"22 ","pages":"10354-10367"},"PeriodicalIF":6.4000,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Automation Science and Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10818974/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Autonomous grasping is a critical topic of robotic embodied intelligence. However, it remains challenging for robots to grasp an intended target, particularly in cluttered and densely stacked environments. This paper presents a novel solution by proposing a Robot Intention Grasp Network (RIGNet) with a multi-task siamese schema, which is based on an improved Region Proposal Network (RPN) and a Region of Interest (RoI) learning approach. More specifically, the RPN robustly outputs the RoIs to describe the candidates in stacked scenes, while the multi-task siamese network consisting of category comprehension and grasp perception modules serves to detect the intended object and generate the optimal grasp configuration. To improve the reasoning precision for grasp posture, a dynamic-oriented anchor matching strategy is further proposed to adapt to the grasp perception module. The proposed RIGNet is comparable to the state-of-the-art algorithms on single-object tasks and has better performance on stacked multi-object grasp detection. This study can provide new insights for autonomous robot reasoning in real time, facilitating an understanding of both the rationale and methodology for grasping an intended target in highly stacked environments. Note to Practitioners—This work is motivated by robotic embodied intelligence technique, which plays an irreplaceable role in robot autonomous grasping manipulation. Using this technology, the robot is capable of performing grasp tasks for human. The previous grasp detection methods focus on optimizing grasp posture for successful robot grasping manipulation as maximum as possible, but ignore the comprehension of candidate object’s attributes. Hence, this paper proposes a novel robot intention grasp network (RIGNet) for robot perception of how to grasp and comprehend why grasping. The improved RPN module robustly predict the RoIs to locate candidates in a bunch of overlap and occlusion objects, and then the multi-task siamese module finely detect the object category corresponding to the grasp configuration. What is more important, the proposed technology does not lose efficiency in single-object grasping, and even outperforms the state-of-the-art algorithms in multi-object stacked grasping. The novel RIGNet schema is suitable for human intends a real robot to execute desired grasp tasks.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

RIGNet：基于RoIs学习的多任务暹罗图式密集堆叠目标的机器人意图抓取

自主抓取是机器人具身智能研究的一个重要课题。然而，对于机器人来说，抓住预定目标仍然是一个挑战，特别是在杂乱和密集堆叠的环境中。基于改进的区域建议网络（RPN）和兴趣区域（RoI）学习方法，提出了一种基于多任务连体模式的机器人意图抓取网络（RIGNet）。具体而言，RPN鲁棒输出roi来描述堆叠场景中的候选对象，而由类别理解和抓取感知模块组成的多任务暹罗网络用于检测目标对象并生成最优抓取配置。为了提高抓取姿态的推理精度，进一步提出了一种适应抓取感知模块的面向动态的锚点匹配策略。所提出的RIGNet在单目标任务上可与最先进的算法相媲美，在堆叠多目标抓取检测上具有更好的性能。这项研究可以为自主机器人的实时推理提供新的见解，有助于理解在高度堆叠的环境中捕获预定目标的基本原理和方法。本研究的动机是机器人具身智能技术，它在机器人自主抓取操作中起着不可替代的作用。利用该技术，机器人可以完成人类的抓取任务。以往的抓握检测方法主要关注的是尽可能优化机器人成功抓握操作的抓握姿态，而忽略了对候选对象属性的理解。因此，本文提出了一种新的机器人意图抓取网络（RIGNet），用于机器人感知如何抓取和理解抓取原因。改进的RPN模块对roi进行鲁棒性预测，在一堆重叠和遮挡的目标中定位候选对象，然后多任务暹罗模块对抓取配置对应的目标类别进行精细检测。更重要的是，该技术在单目标抓取中不损失效率，甚至在多目标堆叠抓取中优于现有算法。新的RIGNet模式适合于人类让真实的机器人执行期望的抓取任务。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Transactions on Automation Science and Engineering 工程技术-自动化与控制系统

CiteScore

12.50

自引率

14.30%

发文量

404

审稿时长

3.0 months

期刊介绍： The IEEE Transactions on Automation Science and Engineering (T-ASE) publishes fundamental papers on Automation, emphasizing scientific results that advance efficiency, quality, productivity, and reliability. T-ASE encourages interdisciplinary approaches from computer science, control systems, electrical engineering, mathematics, mechanical engineering, operations research, and other fields. T-ASE welcomes results relevant to industries such as agriculture, biotechnology, healthcare, home automation, maintenance, manufacturing, pharmaceuticals, retail, security, service, supply chains, and transportation. T-ASE addresses a research community willing to integrate knowledge across disciplines and industries. For this purpose, each paper includes a Note to Practitioners that summarizes how its results can be applied or how they might be extended to apply in practice.