{"title":"利用 SSD 和 YOLOV5 模型实现手部物体姿态估计,用于 SCARA 机器人抓取物体","authors":"Ramasamy Sivabalakrishnan, Angappamudaliar Palanisamy Senthil Kumar, Janaki Saminathan","doi":"10.1002/rob.22358","DOIUrl":null,"url":null,"abstract":"<p>Enforcement of advanced deep learning methods in hand-object pose estimation is an imperative method for grasping the objects safely during the human–robot collaborative tasks. The position and orientation of a hand-object from a two-dimensional image is still a crucial problem under various circumstances like occlusion, critical lighting, and salient region detection and blur images. In this paper, the proposed method uses an enhanced MobileNetV3 with single shot detection (SSD) and YOLOv5 to ensure the improvement in accuracy and without compromising the latency in the detection of hand-object pose and its orientation. To overcome the limitations of higher computation cost, latency and accuracy, the Network Architecture Search and NetAdapt Algorithm is used in MobileNetV3 that perform the network search for parameter tuning and adaptive learning for multiscale feature extraction and anchor box offset adjustment due to auto-variance of weight in the level of each layers. The squeeze-and-excitation block reduces the computation and latency of the model. Hard-swish activation function and feature pyramid networks are used to prevent over fitting the data and stabilizing the training. Based on the comparative analysis of MobileNetV3 with its predecessor and YOLOV5 are carried out, the obtained results are 92.8% and 89.7% of precision value, recall value of 93.1% and 90.2%, mAP value of 93.3% and 89.2%, respectively. The proposed methods ensure better grasping for robots by providing the pose estimation and orientation of hand-objects with tolerance of −1.9 to 2.15 mm along <i>x</i>, −1.55 to 2.21 mm along <i>y</i>, −0.833 to 1.51 mm along <i>z</i> axis and −0.233° to 0.273° along <i>z</i>-axis.</p>","PeriodicalId":192,"journal":{"name":"Journal of Field Robotics","volume":"41 5","pages":"1558-1569"},"PeriodicalIF":4.2000,"publicationDate":"2024-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Implementation of hand-object pose estimation using SSD and YOLOV5 model for object grasping by SCARA robot\",\"authors\":\"Ramasamy Sivabalakrishnan, Angappamudaliar Palanisamy Senthil Kumar, Janaki Saminathan\",\"doi\":\"10.1002/rob.22358\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Enforcement of advanced deep learning methods in hand-object pose estimation is an imperative method for grasping the objects safely during the human–robot collaborative tasks. The position and orientation of a hand-object from a two-dimensional image is still a crucial problem under various circumstances like occlusion, critical lighting, and salient region detection and blur images. In this paper, the proposed method uses an enhanced MobileNetV3 with single shot detection (SSD) and YOLOv5 to ensure the improvement in accuracy and without compromising the latency in the detection of hand-object pose and its orientation. To overcome the limitations of higher computation cost, latency and accuracy, the Network Architecture Search and NetAdapt Algorithm is used in MobileNetV3 that perform the network search for parameter tuning and adaptive learning for multiscale feature extraction and anchor box offset adjustment due to auto-variance of weight in the level of each layers. The squeeze-and-excitation block reduces the computation and latency of the model. Hard-swish activation function and feature pyramid networks are used to prevent over fitting the data and stabilizing the training. Based on the comparative analysis of MobileNetV3 with its predecessor and YOLOV5 are carried out, the obtained results are 92.8% and 89.7% of precision value, recall value of 93.1% and 90.2%, mAP value of 93.3% and 89.2%, respectively. The proposed methods ensure better grasping for robots by providing the pose estimation and orientation of hand-objects with tolerance of −1.9 to 2.15 mm along <i>x</i>, −1.55 to 2.21 mm along <i>y</i>, −0.833 to 1.51 mm along <i>z</i> axis and −0.233° to 0.273° along <i>z</i>-axis.</p>\",\"PeriodicalId\":192,\"journal\":{\"name\":\"Journal of Field Robotics\",\"volume\":\"41 5\",\"pages\":\"1558-1569\"},\"PeriodicalIF\":4.2000,\"publicationDate\":\"2024-05-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Field Robotics\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/rob.22358\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ROBOTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Field Robotics","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/rob.22358","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ROBOTICS","Score":null,"Total":0}
Implementation of hand-object pose estimation using SSD and YOLOV5 model for object grasping by SCARA robot
Enforcement of advanced deep learning methods in hand-object pose estimation is an imperative method for grasping the objects safely during the human–robot collaborative tasks. The position and orientation of a hand-object from a two-dimensional image is still a crucial problem under various circumstances like occlusion, critical lighting, and salient region detection and blur images. In this paper, the proposed method uses an enhanced MobileNetV3 with single shot detection (SSD) and YOLOv5 to ensure the improvement in accuracy and without compromising the latency in the detection of hand-object pose and its orientation. To overcome the limitations of higher computation cost, latency and accuracy, the Network Architecture Search and NetAdapt Algorithm is used in MobileNetV3 that perform the network search for parameter tuning and adaptive learning for multiscale feature extraction and anchor box offset adjustment due to auto-variance of weight in the level of each layers. The squeeze-and-excitation block reduces the computation and latency of the model. Hard-swish activation function and feature pyramid networks are used to prevent over fitting the data and stabilizing the training. Based on the comparative analysis of MobileNetV3 with its predecessor and YOLOV5 are carried out, the obtained results are 92.8% and 89.7% of precision value, recall value of 93.1% and 90.2%, mAP value of 93.3% and 89.2%, respectively. The proposed methods ensure better grasping for robots by providing the pose estimation and orientation of hand-objects with tolerance of −1.9 to 2.15 mm along x, −1.55 to 2.21 mm along y, −0.833 to 1.51 mm along z axis and −0.233° to 0.273° along z-axis.
期刊介绍:
The Journal of Field Robotics seeks to promote scholarly publications dealing with the fundamentals of robotics in unstructured and dynamic environments.
The Journal focuses on experimental robotics and encourages publication of work that has both theoretical and practical significance.