利用 SSD 和 YOLOV5 模型实现手部物体姿态估计，用于 SCARA 机器人抓取物体

IF 4.2 2区计算机科学 Q2 ROBOTICS Journal of Field Robotics Pub Date : 2024-05-07 DOI:10.1002/rob.22358

Ramasamy Sivabalakrishnan, Angappamudaliar Palanisamy Senthil Kumar, Janaki Saminathan

{"title":"利用 SSD 和 YOLOV5 模型实现手部物体姿态估计，用于 SCARA 机器人抓取物体","authors":"Ramasamy Sivabalakrishnan, Angappamudaliar Palanisamy Senthil Kumar, Janaki Saminathan","doi":"10.1002/rob.22358","DOIUrl":null,"url":null,"abstract":"Enforcement of advanced deep learning methods in hand-object pose estimation is an imperative method for grasping the objects safely during the human–robot collaborative tasks. The position and orientation of a hand-object from a two-dimensional image is still a crucial problem under various circumstances like occlusion, critical lighting, and salient region detection and blur images. In this paper, the proposed method uses an enhanced MobileNetV3 with single shot detection (SSD) and YOLOv5 to ensure the improvement in accuracy and without compromising the latency in the detection of hand-object pose and its orientation. To overcome the limitations of higher computation cost, latency and accuracy, the Network Architecture Search and NetAdapt Algorithm is used in MobileNetV3 that perform the network search for parameter tuning and adaptive learning for multiscale feature extraction and anchor box offset adjustment due to auto-variance of weight in the level of each layers. The squeeze-and-excitation block reduces the computation and latency of the model. Hard-swish activation function and feature pyramid networks are used to prevent over fitting the data and stabilizing the training. Based on the comparative analysis of MobileNetV3 with its predecessor and YOLOV5 are carried out, the obtained results are 92.8% and 89.7% of precision value, recall value of 93.1% and 90.2%, mAP value of 93.3% and 89.2%, respectively. The proposed methods ensure better grasping for robots by providing the pose estimation and orientation of hand-objects with tolerance of −1.9 to 2.15 mm along x, −1.55 to 2.21 mm along y, −0.833 to 1.51 mm along z axis and −0.233° to 0.273° along z-axis.","PeriodicalId":192,"journal":{"name":"Journal of Field Robotics","volume":"41 5","pages":"1558-1569"},"PeriodicalIF":4.2000,"publicationDate":"2024-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Implementation of hand-object pose estimation using SSD and YOLOV5 model for object grasping by SCARA robot\",\"authors\":\"Ramasamy Sivabalakrishnan, Angappamudaliar Palanisamy Senthil Kumar, Janaki Saminathan\",\"doi\":\"10.1002/rob.22358\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Enforcement of advanced deep learning methods in hand-object pose estimation is an imperative method for grasping the objects safely during the human–robot collaborative tasks. The position and orientation of a hand-object from a two-dimensional image is still a crucial problem under various circumstances like occlusion, critical lighting, and salient region detection and blur images. In this paper, the proposed method uses an enhanced MobileNetV3 with single shot detection (SSD) and YOLOv5 to ensure the improvement in accuracy and without compromising the latency in the detection of hand-object pose and its orientation. To overcome the limitations of higher computation cost, latency and accuracy, the Network Architecture Search and NetAdapt Algorithm is used in MobileNetV3 that perform the network search for parameter tuning and adaptive learning for multiscale feature extraction and anchor box offset adjustment due to auto-variance of weight in the level of each layers. The squeeze-and-excitation block reduces the computation and latency of the model. Hard-swish activation function and feature pyramid networks are used to prevent over fitting the data and stabilizing the training. Based on the comparative analysis of MobileNetV3 with its predecessor and YOLOV5 are carried out, the obtained results are 92.8% and 89.7% of precision value, recall value of 93.1% and 90.2%, mAP value of 93.3% and 89.2%, respectively. The proposed methods ensure better grasping for robots by providing the pose estimation and orientation of hand-objects with tolerance of −1.9 to 2.15 mm along x, −1.55 to 2.21 mm along y, −0.833 to 1.51 mm along z axis and −0.233° to 0.273° along z-axis.\",\"PeriodicalId\":192,\"journal\":{\"name\":\"Journal of Field Robotics\",\"volume\":\"41 5\",\"pages\":\"1558-1569\"},\"PeriodicalIF\":4.2000,\"publicationDate\":\"2024-05-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Field Robotics\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/rob.22358\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ROBOTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Field Robotics","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/rob.22358","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ROBOTICS","Score":null,"Total":0}

引用次数: 0

摘要

在手部物体姿态估计中采用先进的深度学习方法是在人机协作任务中安全抓取物体的必要方法。在遮挡、关键光照、突出区域检测和模糊图像等各种情况下，从二维图像中获取手部物体的位置和方向仍然是一个关键问题。本文提出的方法使用了具有单次检测（SSD）功能的增强型 MobileNetV3 和 YOLOv5，以确保在不影响手部物体姿态和方向检测延迟的情况下提高精度。为了克服较高的计算成本、延迟和准确性等限制，MobileNetV3 采用了网络结构搜索和 NetAdapt 算法，执行网络搜索参数调整和自适应学习，以进行多尺度特征提取，并根据各层权重的自动变化调整锚框偏移。挤压-激励块可减少模型的计算量和延迟。硬偏移激活函数和特征金字塔网络用于防止数据过度拟合和稳定训练。在对 MobileNetV3 与其前身和 YOLOV5 进行对比分析的基础上，得到的结果分别是精度值为 92.8% 和 89.7%，召回值为 93.1% 和 90.2%，mAP 值为 93.3% 和 89.2%。所提出的方法可提供手部物体的姿态估计和方向定位，X 轴公差为 -1.9 至 2.15 mm，Y 轴公差为 -1.55 至 2.21 mm，Z 轴公差为 -0.833 至 1.51 mm，Z 轴公差为 -0.233 至 0.273°，从而确保机器人能更好地抓取物体。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Implementation of hand-object pose estimation using SSD and YOLOV5 model for object grasping by SCARA robot

Enforcement of advanced deep learning methods in hand-object pose estimation is an imperative method for grasping the objects safely during the human–robot collaborative tasks. The position and orientation of a hand-object from a two-dimensional image is still a crucial problem under various circumstances like occlusion, critical lighting, and salient region detection and blur images. In this paper, the proposed method uses an enhanced MobileNetV3 with single shot detection (SSD) and YOLOv5 to ensure the improvement in accuracy and without compromising the latency in the detection of hand-object pose and its orientation. To overcome the limitations of higher computation cost, latency and accuracy, the Network Architecture Search and NetAdapt Algorithm is used in MobileNetV3 that perform the network search for parameter tuning and adaptive learning for multiscale feature extraction and anchor box offset adjustment due to auto-variance of weight in the level of each layers. The squeeze-and-excitation block reduces the computation and latency of the model. Hard-swish activation function and feature pyramid networks are used to prevent over fitting the data and stabilizing the training. Based on the comparative analysis of MobileNetV3 with its predecessor and YOLOV5 are carried out, the obtained results are 92.8% and 89.7% of precision value, recall value of 93.1% and 90.2%, mAP value of 93.3% and 89.2%, respectively. The proposed methods ensure better grasping for robots by providing the pose estimation and orientation of hand-objects with tolerance of −1.9 to 2.15 mm along x, −1.55 to 2.21 mm along y, −0.833 to 1.51 mm along z axis and −0.233° to 0.273° along z-axis.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Field Robotics 工程技术-机器人学

CiteScore

15.00

自引率

3.60%

发文量

审稿时长

6 months

期刊介绍： The Journal of Field Robotics seeks to promote scholarly publications dealing with the fundamentals of robotics in unstructured and dynamic environments. The Journal focuses on experimental robotics and encourages publication of work that has both theoretical and practical significance.

期刊最新文献

Issue Information Issue Information Cover Image, Volume 42, Number 1, January 2025 Back Cover, Volume 42, Number 1, January 2025 Issue Information