YOLOv5s-BC: an improved YOLOv5s-based method for real-time apple detection

IF 2.9 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Journal of Real-Time Image Processing Pub Date : 2024-05-10 DOI:10.1007/s11554-024-01473-1

Jingfan Liu, Zhaobing Liu

{"title":"YOLOv5s-BC: an improved YOLOv5s-based method for real-time apple detection","authors":"Jingfan Liu, Zhaobing Liu","doi":"10.1007/s11554-024-01473-1","DOIUrl":null,"url":null,"abstract":"<p>The current apple detection algorithms fail to accurately differentiate obscured apples from pickable ones, thus leading to low accuracy in apple harvesting and a high rate of instances where apples are either mispicked or missed altogether. To address the issues associated with the existing algorithms, this study proposes an improved YOLOv5s-based method, named YOLOv5s-BC, for real-time apple detection, in which a series of modifications have been introduced. First, a coordinate attention block has been incorporated into the backbone module to construct a new backbone network. Second, the original concatenation operation has been replaced with a bi-directional feature pyramid network in the neck network. Finally, a new detection head has been added to the head module, enabling the detection of smaller and more distant targets within the field of view of the robot. The proposed YOLOv5s-BC model was compared to several target detection algorithms, including YOLOv5s, YOLOv4, YOLOv3, SSD, Faster R-CNN (ResNet50), and Faster R-CNN (VGG), with significant improvements of 4.6%, 3.6%, 20.48%, 23.22%, 15.27%, and 15.59% in mAP, respectively. The detection accuracy of the proposed model is also greatly enhanced over the original YOLOv5s model. The model boasts an average detection speed of 0.018 s per image, and the weight size is only 16.7 Mb with 4.7 Mb smaller than that of YOLOv8s, meeting the real-time requirements for the picking robot. Furthermore, according to the heat map, our proposed model can focus more on and learn the high-level features of the target apples, and recognize the smaller target apples better than the original YOLOv5s model. Then, in other apple orchard tests, the model can detect the pickable apples in real time and correctly, illustrating a decent generalization ability. It is noted that our model can provide technical support for the apple harvesting robot in terms of real-time target detection and harvesting sequence planning.</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"128 1","pages":""},"PeriodicalIF":2.9000,"publicationDate":"2024-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Real-Time Image Processing","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s11554-024-01473-1","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

The current apple detection algorithms fail to accurately differentiate obscured apples from pickable ones, thus leading to low accuracy in apple harvesting and a high rate of instances where apples are either mispicked or missed altogether. To address the issues associated with the existing algorithms, this study proposes an improved YOLOv5s-based method, named YOLOv5s-BC, for real-time apple detection, in which a series of modifications have been introduced. First, a coordinate attention block has been incorporated into the backbone module to construct a new backbone network. Second, the original concatenation operation has been replaced with a bi-directional feature pyramid network in the neck network. Finally, a new detection head has been added to the head module, enabling the detection of smaller and more distant targets within the field of view of the robot. The proposed YOLOv5s-BC model was compared to several target detection algorithms, including YOLOv5s, YOLOv4, YOLOv3, SSD, Faster R-CNN (ResNet50), and Faster R-CNN (VGG), with significant improvements of 4.6%, 3.6%, 20.48%, 23.22%, 15.27%, and 15.59% in mAP, respectively. The detection accuracy of the proposed model is also greatly enhanced over the original YOLOv5s model. The model boasts an average detection speed of 0.018 s per image, and the weight size is only 16.7 Mb with 4.7 Mb smaller than that of YOLOv8s, meeting the real-time requirements for the picking robot. Furthermore, according to the heat map, our proposed model can focus more on and learn the high-level features of the target apples, and recognize the smaller target apples better than the original YOLOv5s model. Then, in other apple orchard tests, the model can detect the pickable apples in real time and correctly, illustrating a decent generalization ability. It is noted that our model can provide technical support for the apple harvesting robot in terms of real-time target detection and harvesting sequence planning.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

YOLOv5s-BC：基于 YOLOv5s 的改进型苹果实时检测方法

目前的苹果检测算法无法准确区分被遮挡的苹果和可采摘的苹果，因此导致苹果采摘的准确率较低，误摘或漏摘苹果的情况较多。针对现有算法存在的问题，本研究提出了一种基于 YOLOv5s 的改进型苹果实时检测方法，并将其命名为 YOLOv5s-BC。首先，在主干模块中加入了坐标注意块，以构建新的主干网络。其次，在颈部网络中用双向特征金字塔网络取代了原来的连接操作。最后，在头部模块中加入了一个新的探测头，从而能够探测机器人视野范围内更小更远的目标。所提出的 YOLOv5s-BC 模型与几种目标检测算法进行了比较，包括 YOLOv5s、YOLOv4、YOLOv3、SSD、Faster R-CNN (ResNet50) 和 Faster R-CNN (VGG)，mAP 分别显著提高了 4.6%、3.6%、20.48%、23.22%、15.27% 和 15.59%。与最初的 YOLOv5s 模型相比，所提模型的检测精度也有了很大提高。该模型每幅图像的平均检测速度为 0.018 s，权重大小仅为 16.7 Mb，比 YOLOv8s 小 4.7 Mb，满足了拣选机器人的实时性要求。此外，根据热图，我们提出的模型能更多地关注和学习目标苹果的高级特征，对较小目标苹果的识别能力优于原始的 YOLOv5s 模型。随后，在其他苹果园测试中，该模型也能实时、正确地检测到可采摘的苹果，说明其具有良好的泛化能力。由此可见，我们的模型可以在实时目标检测和采摘顺序规划方面为苹果采摘机器人提供技术支持。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Journal of Real-Time Image Processing COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-ENGINEERING, ELECTRICAL & ELECTRONIC

CiteScore

6.80

自引率

6.70%

发文量

审稿时长

6 months

期刊介绍： Due to rapid advancements in integrated circuit technology, the rich theoretical results that have been developed by the image and video processing research community are now being increasingly applied in practical systems to solve real-world image and video processing problems. Such systems involve constraints placed not only on their size, cost, and power consumption, but also on the timeliness of the image data processed. Examples of such systems are mobile phones, digital still/video/cell-phone cameras, portable media players, personal digital assistants, high-definition television, video surveillance systems, industrial visual inspection systems, medical imaging devices, vision-guided autonomous robots, spectral imaging systems, and many other real-time embedded systems. In these real-time systems, strict timing requirements demand that results are available within a certain interval of time as imposed by the application. It is often the case that an image processing algorithm is developed and proven theoretically sound, presumably with a specific application in mind, but its practical applications and the detailed steps, methodology, and trade-off analysis required to achieve its real-time performance are not fully explored, leaving these critical and usually non-trivial issues for those wishing to employ the algorithm in a real-time system. The Journal of Real-Time Image Processing is intended to bridge the gap between the theory and practice of image processing, serving the greater community of researchers, practicing engineers, and industrial professionals who deal with designing, implementing or utilizing image processing systems which must satisfy real-time design constraints.