$$\eta$$ -repyolo: real-time object detection method based on $$\eta$$ -RepConv and YOLOv8

IF 2.9 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Journal of Real-Time Image Processing Pub Date : 2024-05-03 DOI:10.1007/s11554-024-01462-4

Shuai Feng, Huaming Qian, Huilin Wang, Wenna Wang

{"title":"$$\\eta$$ -repyolo: real-time object detection method based on $$\\eta$$ -RepConv and YOLOv8","authors":"Shuai Feng, Huaming Qian, Huilin Wang, Wenna Wang","doi":"10.1007/s11554-024-01462-4","DOIUrl":null,"url":null,"abstract":"Deep learning-based object detection methods often grapple with excessive model parameters, high complexity, and subpar real-time performance. In response, the YOLO series, particularly the YOLOv5s to YOLOv8s methods, has been developed by scholars to strike a balance between real-time processing and accuracy. Nevertheless, YOLOv8’s precision can fall short in certain specific applications. To address this, we introduce a real-time object detection method called \$\\eta\$-RepYOLO, which is built upon the \$\\eta\$-RepConv structure. This method is designed to maintain consistent detection speeds while improving accuracy. We begin by crafting a backbone network named \$\\eta\$-EfficientRep, which utilizes a strategically designed network unit-\$\\eta\$-RepConv and \$\\eta\$-RepC2f module, to reparameterize and subsequently generate an efficient inference model. This model achieves superior performance by extracting detailed feature maps from images. Subsequently, we propose the enhanced \$\\eta\$-RepPANet and \$\\eta\$-RepAFPN as the model’s detection neck, with the addition of the \$\\eta\$-RepC2f for optimized feature fusion, thus boosting the neck’s functionality. Our innovation continues with the development of an advanced decoupled head for detection, where the \$\\eta\$-RepConv takes the place of the traditional \$3 \\times 3\$ conv, resulting in a marked increase in detection precision during the inference stage. Our proposed \$\\eta\$-RepYOLO method, when applied to distinct neck modules, \$\\eta\$-RepPANet and \$\\eta\$-RepAFPN, achieves mAP of 84.77%/85.65% on the PASCAL VOC07+12 dataset and AP of 45.3%/45.8% on the MSCOCO dataset, respectively. These figures represent a significant advancement over the YOLOv8s method. Additionally, the model parameters for \$\\eta\$-RepYOLO are reduced to 10.8M/8.8M, which is 3.6%/21.4% less than that of YOLOv8, culminating in a more streamlined detection model. The detection speeds clocked on an RTX3060 are 116 FPS/81 FPS, showcasing a substantial enhancement in comparison to YOLOv8s. In summary, our approach delivers competitive performance and presents a more lightweight alternative to the SOTA YOLO models, making it a robust choice for real-time object detection applications.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"31 1","pages":""},"PeriodicalIF":2.9000,"publicationDate":"2024-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Real-Time Image Processing","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s11554-024-01462-4","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Deep learning-based object detection methods often grapple with excessive model parameters, high complexity, and subpar real-time performance. In response, the YOLO series, particularly the YOLOv5s to YOLOv8s methods, has been developed by scholars to strike a balance between real-time processing and accuracy. Nevertheless, YOLOv8’s precision can fall short in certain specific applications. To address this, we introduce a real-time object detection method called $\eta$-RepYOLO, which is built upon the $\eta$-RepConv structure. This method is designed to maintain consistent detection speeds while improving accuracy. We begin by crafting a backbone network named $\eta$-EfficientRep, which utilizes a strategically designed network unit-$\eta$-RepConv and $\eta$-RepC2f module, to reparameterize and subsequently generate an efficient inference model. This model achieves superior performance by extracting detailed feature maps from images. Subsequently, we propose the enhanced $\eta$-RepPANet and $\eta$-RepAFPN as the model’s detection neck, with the addition of the $\eta$-RepC2f for optimized feature fusion, thus boosting the neck’s functionality. Our innovation continues with the development of an advanced decoupled head for detection, where the $\eta$-RepConv takes the place of the traditional $3 \times 3$ conv, resulting in a marked increase in detection precision during the inference stage. Our proposed $\eta$-RepYOLO method, when applied to distinct neck modules, $\eta$-RepPANet and $\eta$-RepAFPN, achieves mAP of 84.77%/85.65% on the PASCAL VOC07+12 dataset and AP of 45.3%/45.8% on the MSCOCO dataset, respectively. These figures represent a significant advancement over the YOLOv8s method. Additionally, the model parameters for $\eta$-RepYOLO are reduced to 10.8M/8.8M, which is 3.6%/21.4% less than that of YOLOv8, culminating in a more streamlined detection model. The detection speeds clocked on an RTX3060 are 116 FPS/81 FPS, showcasing a substantial enhancement in comparison to YOLOv8s. In summary, our approach delivers competitive performance and presents a more lightweight alternative to the SOTA YOLO models, making it a robust choice for real-time object detection applications.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

$$eta\$ -repyolo：基于 $$\eta$ -RepConv 和 YOLOv8 的实时物体检测方法

基于深度学习的物体检测方法通常会面临模型参数过多、复杂度高、实时性差等问题。为此，学者们开发了 YOLO 系列，特别是 YOLOv5s 至 YOLOv8s 方法，以在实时处理和精度之间取得平衡。然而，在某些特定应用中，YOLOv8 的精度可能会有所欠缺。为了解决这个问题，我们引入了一种实时对象检测方法--RepYOLO，它建立在 $\eta$-RepConv 结构之上。这种方法旨在保持稳定的检测速度，同时提高准确性。我们首先创建了一个名为（$\eta$-EfficientRep）的骨干网络，它利用战略性设计的网络单元--（$\eta$-RepConv 和（$\eta$-RepC2f）模块，重新参数化并随后生成一个高效的推理模型。该模型通过从图像中提取详细的特征图实现了卓越的性能。随后，我们提出了增强型（\\eta\）-RepPANet 和（\\eta\）-RepAFPN 作为该模型的检测颈部，并添加了用于优化特征融合的（\\eta\）-RepC2f，从而增强了颈部的功能。我们的创新还体现在为检测开发了一个先进的解耦头部，在这个头部中，$\eta$-RepConv 取代了传统的$3\times 3$ conv，从而显著提高了推理阶段的检测精度。当我们提出的 $\eta$-RepYOLO 方法应用于不同的颈部模块，即 $\eta$-RepPANet 和 $\eta$-RepAFPN 时，在 PASCAL VOC07+12 数据集上的 mAP 分别达到了 84.77%/85.65%，在 MSCOCO 数据集上的 AP 分别达到了 45.3%/45.8%。与 YOLOv8s 方法相比，这些数据都有显著提高。此外，（\eta\）-RepYOLO 的模型参数减少到 10.8M/8.8M, 比 YOLOv8 减少了 3.6%/21.4%, 最终形成了一个更精简的检测模型。在 RTX3060 上的检测速度为 116 FPS/81 FPS，与 YOLOv8 相比有了大幅提升。总之，我们的方法提供了具有竞争力的性能，并为 SOTA YOLO 模型提供了更轻便的替代方案，使其成为实时目标检测应用的可靠选择。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Journal of Real-Time Image Processing COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-ENGINEERING, ELECTRICAL & ELECTRONIC

CiteScore

6.80

自引率

6.70%

发文量

审稿时长

6 months

期刊介绍： Due to rapid advancements in integrated circuit technology, the rich theoretical results that have been developed by the image and video processing research community are now being increasingly applied in practical systems to solve real-world image and video processing problems. Such systems involve constraints placed not only on their size, cost, and power consumption, but also on the timeliness of the image data processed. Examples of such systems are mobile phones, digital still/video/cell-phone cameras, portable media players, personal digital assistants, high-definition television, video surveillance systems, industrial visual inspection systems, medical imaging devices, vision-guided autonomous robots, spectral imaging systems, and many other real-time embedded systems. In these real-time systems, strict timing requirements demand that results are available within a certain interval of time as imposed by the application. It is often the case that an image processing algorithm is developed and proven theoretically sound, presumably with a specific application in mind, but its practical applications and the detailed steps, methodology, and trade-off analysis required to achieve its real-time performance are not fully explored, leaving these critical and usually non-trivial issues for those wishing to employ the algorithm in a real-time system. The Journal of Real-Time Image Processing is intended to bridge the gap between the theory and practice of image processing, serving the greater community of researchers, practicing engineers, and industrial professionals who deal with designing, implementing or utilizing image processing systems which must satisfy real-time design constraints.