Tomato Recognition and Localization Method Based on Improved YOLOv5n-seg Model and Binocular Stereo Vision

IF 3.3 2区农林科学 Q1 AGRONOMY Agronomy-Basel Pub Date : 2023-09-08 DOI:10.3390/agronomy13092339

Shuhe Zheng, Yang Liu, Wuxiong Weng, Xuexin Jia, Shilong Yu, Zuoxun Wu

{"title":"Tomato Recognition and Localization Method Based on Improved YOLOv5n-seg Model and Binocular Stereo Vision","authors":"Shuhe Zheng, Yang Liu, Wuxiong Weng, Xuexin Jia, Shilong Yu, Zuoxun Wu","doi":"10.3390/agronomy13092339","DOIUrl":null,"url":null,"abstract":"Recognition and localization of fruits are key components to achieve automated fruit picking. However, current neural-network-based fruit recognition algorithms have disadvantages such as high complexity. Traditional stereo matching algorithms also have low accuracy. To solve these problems, this study targeting greenhouse tomatoes proposed an algorithm framework based on YOLO-TomatoSeg, a lightweight tomato instance segmentation model improved from YOLOv5n-seg, and an accurate tomato localization approach using RAFT-Stereo disparity estimation and least squares point cloud fitting. First, binocular tomato images were captured using a binocular camera system. The left image was processed by YOLO-TomatoSeg to segment tomato instances and generate masks. Concurrently, RAFT-Stereo estimated image disparity for computing the original depth point cloud. Then, the point cloud was clipped by tomato masks to isolate tomato point clouds, which were further preprocessed. Finally, a least squares sphere fitting method estimated the 3D centroid co-ordinates and radii of tomatoes by fitting the tomato point clouds to spherical models. The experimental results showed that, in the tomato instance segmentation stage, the YOLO-TomatoSeg model replaced the Backbone network of YOLOv5n-seg with the building blocks of ShuffleNetV2 and incorporated an SE attention module, which reduced model complexity while improving model segmentation accuracy. Ultimately, the YOLO-TomatoSeg model achieved an AP of 99.01% with a size of only 2.52 MB, significantly outperforming mainstream instance segmentation models such as Mask R-CNN (98.30% AP) and YOLACT (96.49% AP). The model size was reduced by 68.3% compared to the original YOLOv5n-seg model. In the tomato localization stage, at the range of 280 mm to 480 mm, the average error of the tomato centroid localization was affected by occlusion and sunlight conditions. The maximum average localization error was ±5.0 mm, meeting the localization accuracy requirements of the tomato-picking robots. This study developed a lightweight tomato instance segmentation model and achieved accurate localization of tomato, which can facilitate research, development, and application of fruit-picking robots.","PeriodicalId":56066,"journal":{"name":"Agronomy-Basel","volume":" ","pages":""},"PeriodicalIF":3.3000,"publicationDate":"2023-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Agronomy-Basel","FirstCategoryId":"97","ListUrlMain":"https://doi.org/10.3390/agronomy13092339","RegionNum":2,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRONOMY","Score":null,"Total":0}

引用次数: 1

Abstract

Recognition and localization of fruits are key components to achieve automated fruit picking. However, current neural-network-based fruit recognition algorithms have disadvantages such as high complexity. Traditional stereo matching algorithms also have low accuracy. To solve these problems, this study targeting greenhouse tomatoes proposed an algorithm framework based on YOLO-TomatoSeg, a lightweight tomato instance segmentation model improved from YOLOv5n-seg, and an accurate tomato localization approach using RAFT-Stereo disparity estimation and least squares point cloud fitting. First, binocular tomato images were captured using a binocular camera system. The left image was processed by YOLO-TomatoSeg to segment tomato instances and generate masks. Concurrently, RAFT-Stereo estimated image disparity for computing the original depth point cloud. Then, the point cloud was clipped by tomato masks to isolate tomato point clouds, which were further preprocessed. Finally, a least squares sphere fitting method estimated the 3D centroid co-ordinates and radii of tomatoes by fitting the tomato point clouds to spherical models. The experimental results showed that, in the tomato instance segmentation stage, the YOLO-TomatoSeg model replaced the Backbone network of YOLOv5n-seg with the building blocks of ShuffleNetV2 and incorporated an SE attention module, which reduced model complexity while improving model segmentation accuracy. Ultimately, the YOLO-TomatoSeg model achieved an AP of 99.01% with a size of only 2.52 MB, significantly outperforming mainstream instance segmentation models such as Mask R-CNN (98.30% AP) and YOLACT (96.49% AP). The model size was reduced by 68.3% compared to the original YOLOv5n-seg model. In the tomato localization stage, at the range of 280 mm to 480 mm, the average error of the tomato centroid localization was affected by occlusion and sunlight conditions. The maximum average localization error was ±5.0 mm, meeting the localization accuracy requirements of the tomato-picking robots. This study developed a lightweight tomato instance segmentation model and achieved accurate localization of tomato, which can facilitate research, development, and application of fruit-picking robots.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于改进YOLOv5n-seg模型和双目立体视觉的番茄识别定位方法

水果的识别和定位是实现水果自动采摘的关键组成部分。然而，目前基于神经网络的水果识别算法存在复杂度高的缺点。传统的立体匹配算法也存在精度低的问题。为了解决这些问题，本研究针对温室番茄提出了一种基于YOLO-TomatoSeg的算法框架，一种在YOLOv5n-seg基础上改进的轻量级番茄实例分割模型，以及一种使用RAFT立体视差估计和最小二乘点云拟合的精确番茄定位方法。首先，使用双目摄像系统拍摄了番茄的双目图像。左图像由YOLO TomatoSeg处理，以分割番茄实例并生成遮罩。同时，RAFT Stereo估计了用于计算原始深度点云的图像视差。然后，用番茄掩模对点云进行裁剪，分离出番茄点云，并对其进行进一步的预处理。最后，最小二乘球面拟合方法通过将番茄点云拟合到球形模型来估计番茄的三维质心坐标和半径。实验结果表明，在番茄实例分割阶段，YOLO TomatoSeg模型用ShuffleNetV2的构建块取代了YOLOv5n-seg的骨干网络，并引入了SE注意力模块，在提高模型分割精度的同时降低了模型复杂度。最终，YOLO TomatoSeg模型在仅2.52MB的大小下实现了99.01%的AP，显著优于主流实例分割模型，如Mask R-CNN（98.30%AP）和YOLACT（96.49%AP）。与最初的YOLOv5n-seg模型相比，模型尺寸减小了68.3%。在番茄定位阶段，在280mm至480mm的范围内，番茄质心定位的平均误差受到遮挡和阳光条件的影响。最大平均定位误差为±5.0mm，满足番茄采摘机器人的定位精度要求。本研究开发了一个轻量级的番茄实例分割模型，实现了番茄的精确定位，有助于水果采摘机器人的研究、开发和应用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Agronomy-Basel Agricultural and Biological Sciences-Agronomy and Crop Science

CiteScore

6.20

自引率

13.50%

发文量

2665

审稿时长

20.32 days

期刊介绍： Agronomy (ISSN 2073-4395) is an international and cross-disciplinary scholarly journal on agronomy and agroecology. It publishes reviews, regular research papers, communications and short notes, and there is no restriction on the length of the papers. Our aim is to encourage scientists to publish their experimental and theoretical research in as much detail as possible. Full experimental and/or methodical details must be provided for research articles.