基于 RGB-D 图像的机械手抓握姿势检测方法

IF 2.6 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Neural Processing Letters Pub Date : 2024-07-09 DOI:10.1007/s11063-024-11662-5

Cheng Huang, Zhen Pang, Jiazhong Xu

{"title":"基于 RGB-D 图像的机械手抓握姿势检测方法","authors":"Cheng Huang, Zhen Pang, Jiazhong Xu","doi":"10.1007/s11063-024-11662-5","DOIUrl":null,"url":null,"abstract":"<p>In order to better solve the visual detection problem of manipulator grasping non-cooperative targets, we propose a method of grasp pose detection based on pixel point and feature fusion. By using the improved U2net network as the backbone for feature extraction and feature fusion of the input image, and the grasp prediction layer detects the grasp pose on each pixel. In order to adapt the U2net to grasp pose detection and improve its detection performance, we improve detection speed and control sampling depth by simplifying its network structure, while retaining some shallow features in feature fusion to enhance its feature extraction capability. We introduce depthwise separable convolution in the grasp prediction layer, further fusing the features extracted from the backbone to obtain predictive feature maps with stronger feature expressiveness. FocalLoss is selected as the loss function to solve the problem of unbalanced positive and negative samples in network training. We use the Cornell dataset for training and testing, perform pixel-level labeling on the image, and replace the labels that are not conducive to the actual grasping. This adaptation helps the dataset better suit the network training and testing while meeting the real-world grasping requirements of the manipulator. The evaluation results on image-wise and object-wise are 95.65% and 91.20% respectively, and the detection speed is 0.007 s/frame. We also used the method for actual manipulator grasping experiments. The results show that our method has improved accuracy and speed compared with previous methods, and has strong generalization ability and portability.</p>","PeriodicalId":51144,"journal":{"name":"Neural Processing Letters","volume":"39 1","pages":""},"PeriodicalIF":2.6000,"publicationDate":"2024-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Detection Method of Manipulator Grasp Pose Based on RGB-D Image\",\"authors\":\"Cheng Huang, Zhen Pang, Jiazhong Xu\",\"doi\":\"10.1007/s11063-024-11662-5\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>In order to better solve the visual detection problem of manipulator grasping non-cooperative targets, we propose a method of grasp pose detection based on pixel point and feature fusion. By using the improved U2net network as the backbone for feature extraction and feature fusion of the input image, and the grasp prediction layer detects the grasp pose on each pixel. In order to adapt the U2net to grasp pose detection and improve its detection performance, we improve detection speed and control sampling depth by simplifying its network structure, while retaining some shallow features in feature fusion to enhance its feature extraction capability. We introduce depthwise separable convolution in the grasp prediction layer, further fusing the features extracted from the backbone to obtain predictive feature maps with stronger feature expressiveness. FocalLoss is selected as the loss function to solve the problem of unbalanced positive and negative samples in network training. We use the Cornell dataset for training and testing, perform pixel-level labeling on the image, and replace the labels that are not conducive to the actual grasping. This adaptation helps the dataset better suit the network training and testing while meeting the real-world grasping requirements of the manipulator. The evaluation results on image-wise and object-wise are 95.65% and 91.20% respectively, and the detection speed is 0.007 s/frame. We also used the method for actual manipulator grasping experiments. The results show that our method has improved accuracy and speed compared with previous methods, and has strong generalization ability and portability.</p>\",\"PeriodicalId\":51144,\"journal\":{\"name\":\"Neural Processing Letters\",\"volume\":\"39 1\",\"pages\":\"\"},\"PeriodicalIF\":2.6000,\"publicationDate\":\"2024-07-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neural Processing Letters\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s11063-024-11662-5\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Processing Letters","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s11063-024-11662-5","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

为了更好地解决机械手抓取非合作目标时的视觉检测问题，我们提出了一种基于像素点和特征融合的抓取姿势检测方法。以改进的 U2net 网络为骨干，对输入图像进行特征提取和特征融合，并由抓取预测层检测每个像素点上的抓取姿势。为了使 U2net 适应抓取姿势检测并提高其检测性能，我们通过简化其网络结构来提高检测速度和控制采样深度，同时在特征融合中保留一些浅层特征，以增强其特征提取能力。我们在抓取预测层引入深度可分离卷积，进一步融合从骨干层提取的特征，得到特征表现力更强的预测特征图。我们选择 FocalLoss 作为损失函数，以解决网络训练中正负样本不平衡的问题。我们使用康奈尔数据集进行训练和测试，对图像进行像素级标注，并替换不利于实际抓取的标签。这种调整有助于数据集更好地适应网络训练和测试，同时满足机械手的实际抓取要求。对图像和物体的评估结果分别为 95.65% 和 91.20%，检测速度为 0.007 秒/帧。我们还使用该方法进行了实际机械手抓取实验。结果表明，与之前的方法相比，我们的方法提高了准确性和速度，并且具有很强的泛化能力和可移植性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Detection Method of Manipulator Grasp Pose Based on RGB-D Image

In order to better solve the visual detection problem of manipulator grasping non-cooperative targets, we propose a method of grasp pose detection based on pixel point and feature fusion. By using the improved U2net network as the backbone for feature extraction and feature fusion of the input image, and the grasp prediction layer detects the grasp pose on each pixel. In order to adapt the U2net to grasp pose detection and improve its detection performance, we improve detection speed and control sampling depth by simplifying its network structure, while retaining some shallow features in feature fusion to enhance its feature extraction capability. We introduce depthwise separable convolution in the grasp prediction layer, further fusing the features extracted from the backbone to obtain predictive feature maps with stronger feature expressiveness. FocalLoss is selected as the loss function to solve the problem of unbalanced positive and negative samples in network training. We use the Cornell dataset for training and testing, perform pixel-level labeling on the image, and replace the labels that are not conducive to the actual grasping. This adaptation helps the dataset better suit the network training and testing while meeting the real-world grasping requirements of the manipulator. The evaluation results on image-wise and object-wise are 95.65% and 91.20% respectively, and the detection speed is 0.007 s/frame. We also used the method for actual manipulator grasping experiments. The results show that our method has improved accuracy and speed compared with previous methods, and has strong generalization ability and portability.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Neural Processing Letters 工程技术-计算机：人工智能

CiteScore

4.90

自引率

12.90%

发文量

392

审稿时长

2.8 months

期刊介绍： Neural Processing Letters is an international journal publishing research results and innovative ideas on all aspects of artificial neural networks. Coverage includes theoretical developments, biological models, new formal modes, learning, applications, software and hardware developments, and prospective researches. The journal promotes fast exchange of information in the community of neural network researchers and users. The resurgence of interest in the field of artificial neural networks since the beginning of the 1980s is coupled to tremendous research activity in specialized or multidisciplinary groups. Research, however, is not possible without good communication between people and the exchange of information, especially in a field covering such different areas; fast communication is also a key aspect, and this is the reason for Neural Processing Letters