首页 > 最新文献

Journal of Real-Time Image Processing最新文献

英文 中文
Hardware architecture optimization for high-frequency zeroing and LFNST in H.266/VVC based on FPGA 基于 FPGA 的 H.266/VVC 高频归零和 LFNST 硬件架构优化
IF 3 4区 计算机科学 Q2 Computer Science Pub Date : 2024-05-11 DOI: 10.1007/s11554-024-01470-4
Junxiang Zhang, Qinghua Sheng, Rui Pan, Jiawei Wang, Kuan Qin, Xiaofang Huang, Xiaoyan Niu

To reduce the hardware implementation resource consumption of the two-dimensional transform component in H.266 VVC, a unified hardware structure is proposed that supports full-size Discrete Cosine Transform (DCT), Discrete Sine Transform (DST), and full-size Low-Frequency Non-Separable Transform (LFNST). This paper presents an area-efficient hardware architecture for two-dimensional transforms based on a general Regular Multiplier (RM) and a high-throughput hardware design for LFNST in the context of H.266/VVC. The first approach utilizes the high-frequency zeroing characteristics of VVC and the symmetric properties of the DCT-II matrix, allowing the RM-based architecture to use only 256 general multipliers in a fully pipelined structure with a parallelism of 16. The second approach optimizes the transpose operation of the input matrix for LFNST in a parallelism of 16 architecture, aiming to save storage and logic resources.

为了减少 H.266/VVC 中二维变换组件的硬件实现资源消耗,本文提出了一种统一的硬件结构,支持全尺寸离散余弦变换(DCT)、离散正弦变换(DST)和全尺寸低频非分离变换(LFNST)。本文介绍了基于通用正则乘法器(RM)的二维变换的高面积效率硬件架构,以及在 H.266/VVC 背景下 LFNST 的高吞吐量硬件设计。第一种方法利用了 VVC 的高频归零特性和 DCT-II 矩阵的对称特性,使基于 RM 的架构在并行度为 16 的全流水线结构中仅使用 256 个普通乘法器。第二种方法优化了并行度为 16 的架构中 LFNST 输入矩阵的转置操作,旨在节省存储和逻辑资源。
{"title":"Hardware architecture optimization for high-frequency zeroing and LFNST in H.266/VVC based on FPGA","authors":"Junxiang Zhang, Qinghua Sheng, Rui Pan, Jiawei Wang, Kuan Qin, Xiaofang Huang, Xiaoyan Niu","doi":"10.1007/s11554-024-01470-4","DOIUrl":"https://doi.org/10.1007/s11554-024-01470-4","url":null,"abstract":"<p>To reduce the hardware implementation resource consumption of the two-dimensional transform component in H.266 VVC, a unified hardware structure is proposed that supports full-size Discrete Cosine Transform (DCT), Discrete Sine Transform (DST), and full-size Low-Frequency Non-Separable Transform (LFNST). This paper presents an area-efficient hardware architecture for two-dimensional transforms based on a general Regular Multiplier (RM) and a high-throughput hardware design for LFNST in the context of H.266/VVC. The first approach utilizes the high-frequency zeroing characteristics of VVC and the symmetric properties of the DCT-II matrix, allowing the RM-based architecture to use only 256 general multipliers in a fully pipelined structure with a parallelism of 16. The second approach optimizes the transpose operation of the input matrix for LFNST in a parallelism of 16 architecture, aiming to save storage and logic resources.</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2024-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140930174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RCSLFNet: a novel real-time pedestrian detection network based on re-parameterized convolution and channel-spatial location fusion attention for low-resolution infrared image RCSLFNet:基于重新参数化卷积和通道空间位置融合注意力的新型实时行人检测网络,适用于低分辨率红外图像
IF 3 4区 计算机科学 Q2 Computer Science Pub Date : 2024-05-11 DOI: 10.1007/s11554-024-01469-x
Shuai Hao, Zhengqi Liu, Xu Ma, Yingqi Wu, Tian He, Jiahao Li

A novel real-time infrared pedestrian detection algorithm is introduced in this study. The proposed approach leverages re-parameterized convolution and channel-spatial location fusion attention to tackle the difficulties presented by low-resolution, partial occlusion, and environmental interference in infrared pedestrian images. These factors have historically hindered the accurate detection of pedestrians using traditional algorithms. First, to tackle the problem of weak feature representation of infrared pedestrian targets caused by low resolution and partial occlusion, a new attention module that integrates channel and spatial is devised and introduced to CSPDarkNet53 to design a new backbone CSLF-DarkNet53. The designed attention model can enhance the feature expression ability of pedestrian targets and make pedestrian targets more prominent in complex backgrounds. Second, to enhance the efficiency of detection and accelerate convergence, a multi-branch decoupled detector head is designed to operate the classification and location of infrared pedestrians separately. Finally, to improve poor real-time without losing precision, we introduce the re-parameterized convolution (Repconv) using parameter identity transformation to decouple the training process and detection process. During the training procedure, to enhance the fitting ability of small convolution kernels, a multi-branch structure with convolution kernels of different scales is designed. Compared with the nice classical detection algorithms, the results of the experiment show that the proposed RCSLFNet not only detects partial occlusion infrared pedestrians in complex environments accurately but also has better real-time performance on the KAIST dataset. The mAP@0.5 reaches 86% and the detection time is 0.0081 s, 2.9% higher than the baseline.

本研究介绍了一种新型实时红外行人检测算法。所提出的方法利用重新参数化卷积和信道空间位置融合注意力来解决红外行人图像中的低分辨率、部分遮挡和环境干扰所带来的困难。这些因素一直阻碍着传统算法对行人的准确检测。首先,针对低分辨率和部分遮挡导致的红外行人目标特征表征不强的问题,设计了一种融合通道和空间的新注意力模块,并将其引入到 CSPDarkNet53 中,设计出一种新的主干 CSLF-DarkNet53。设计的注意力模型可以增强行人目标的特征表达能力,使行人目标在复杂背景中更加突出。其次,为了提高检测效率,加快收敛速度,设计了多分支解耦检测头,将红外行人的分类和定位分开操作。最后,为了在不损失精度的情况下改善较差的实时性,我们引入了重新参数化卷积(Repconv),利用参数标识变换将训练过程和检测过程解耦。在训练过程中,为了提高小卷积核的拟合能力,我们设计了一种具有不同尺度卷积核的多分支结构。实验结果表明,与优秀的经典检测算法相比,所提出的 RCSLFNet 不仅能准确检测复杂环境中部分遮挡的红外行人,而且在 KAIST 数据集上具有更好的实时性能。mAP@0.5 达到 86%,检测时间为 0.0081 s,比基线高出 2.9%。
{"title":"RCSLFNet: a novel real-time pedestrian detection network based on re-parameterized convolution and channel-spatial location fusion attention for low-resolution infrared image","authors":"Shuai Hao, Zhengqi Liu, Xu Ma, Yingqi Wu, Tian He, Jiahao Li","doi":"10.1007/s11554-024-01469-x","DOIUrl":"https://doi.org/10.1007/s11554-024-01469-x","url":null,"abstract":"<p>A novel real-time infrared pedestrian detection algorithm is introduced in this study. The proposed approach leverages re-parameterized convolution and channel-spatial location fusion attention to tackle the difficulties presented by low-resolution, partial occlusion, and environmental interference in infrared pedestrian images. These factors have historically hindered the accurate detection of pedestrians using traditional algorithms. First, to tackle the problem of weak feature representation of infrared pedestrian targets caused by low resolution and partial occlusion, a new attention module that integrates channel and spatial is devised and introduced to CSPDarkNet53 to design a new backbone CSLF-DarkNet53. The designed attention model can enhance the feature expression ability of pedestrian targets and make pedestrian targets more prominent in complex backgrounds. Second, to enhance the efficiency of detection and accelerate convergence, a multi-branch decoupled detector head is designed to operate the classification and location of infrared pedestrians separately. Finally, to improve poor real-time without losing precision, we introduce the re-parameterized convolution (Repconv) using parameter identity transformation to decouple the training process and detection process. During the training procedure, to enhance the fitting ability of small convolution kernels, a multi-branch structure with convolution kernels of different scales is designed. Compared with the nice classical detection algorithms, the results of the experiment show that the proposed RCSLFNet not only detects partial occlusion infrared pedestrians in complex environments accurately but also has better real-time performance on the KAIST dataset. The mAP@0.5 reaches 86% and the detection time is 0.0081 s, 2.9% higher than the baseline.</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2024-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140930056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
YOLOv5s-BC: an improved YOLOv5s-based method for real-time apple detection YOLOv5s-BC:基于 YOLOv5s 的改进型苹果实时检测方法
IF 3 4区 计算机科学 Q2 Computer Science Pub Date : 2024-05-10 DOI: 10.1007/s11554-024-01473-1
Jingfan Liu, Zhaobing Liu

The current apple detection algorithms fail to accurately differentiate obscured apples from pickable ones, thus leading to low accuracy in apple harvesting and a high rate of instances where apples are either mispicked or missed altogether. To address the issues associated with the existing algorithms, this study proposes an improved YOLOv5s-based method, named YOLOv5s-BC, for real-time apple detection, in which a series of modifications have been introduced. First, a coordinate attention block has been incorporated into the backbone module to construct a new backbone network. Second, the original concatenation operation has been replaced with a bi-directional feature pyramid network in the neck network. Finally, a new detection head has been added to the head module, enabling the detection of smaller and more distant targets within the field of view of the robot. The proposed YOLOv5s-BC model was compared to several target detection algorithms, including YOLOv5s, YOLOv4, YOLOv3, SSD, Faster R-CNN (ResNet50), and Faster R-CNN (VGG), with significant improvements of 4.6%, 3.6%, 20.48%, 23.22%, 15.27%, and 15.59% in mAP, respectively. The detection accuracy of the proposed model is also greatly enhanced over the original YOLOv5s model. The model boasts an average detection speed of 0.018 s per image, and the weight size is only 16.7 Mb with 4.7 Mb smaller than that of YOLOv8s, meeting the real-time requirements for the picking robot. Furthermore, according to the heat map, our proposed model can focus more on and learn the high-level features of the target apples, and recognize the smaller target apples better than the original YOLOv5s model. Then, in other apple orchard tests, the model can detect the pickable apples in real time and correctly, illustrating a decent generalization ability. It is noted that our model can provide technical support for the apple harvesting robot in terms of real-time target detection and harvesting sequence planning.

目前的苹果检测算法无法准确区分被遮挡的苹果和可采摘的苹果,因此导致苹果采摘的准确率较低,误摘或漏摘苹果的情况较多。针对现有算法存在的问题,本研究提出了一种基于 YOLOv5s 的改进型苹果实时检测方法,并将其命名为 YOLOv5s-BC。首先,在主干模块中加入了坐标注意块,以构建新的主干网络。其次,在颈部网络中用双向特征金字塔网络取代了原来的连接操作。最后,在头部模块中加入了一个新的探测头,从而能够探测机器人视野范围内更小更远的目标。所提出的 YOLOv5s-BC 模型与几种目标检测算法进行了比较,包括 YOLOv5s、YOLOv4、YOLOv3、SSD、Faster R-CNN (ResNet50) 和 Faster R-CNN (VGG),mAP 分别显著提高了 4.6%、3.6%、20.48%、23.22%、15.27% 和 15.59%。与最初的 YOLOv5s 模型相比,所提模型的检测精度也有了很大提高。该模型每幅图像的平均检测速度为 0.018 s,权重大小仅为 16.7 Mb,比 YOLOv8s 小 4.7 Mb,满足了拣选机器人的实时性要求。此外,根据热图,我们提出的模型能更多地关注和学习目标苹果的高级特征,对较小目标苹果的识别能力优于原始的 YOLOv5s 模型。随后,在其他苹果园测试中,该模型也能实时、正确地检测到可采摘的苹果,说明其具有良好的泛化能力。由此可见,我们的模型可以在实时目标检测和采摘顺序规划方面为苹果采摘机器人提供技术支持。
{"title":"YOLOv5s-BC: an improved YOLOv5s-based method for real-time apple detection","authors":"Jingfan Liu, Zhaobing Liu","doi":"10.1007/s11554-024-01473-1","DOIUrl":"https://doi.org/10.1007/s11554-024-01473-1","url":null,"abstract":"<p>The current apple detection algorithms fail to accurately differentiate obscured apples from pickable ones, thus leading to low accuracy in apple harvesting and a high rate of instances where apples are either mispicked or missed altogether. To address the issues associated with the existing algorithms, this study proposes an improved YOLOv5s-based method, named YOLOv5s-BC, for real-time apple detection, in which a series of modifications have been introduced. First, a coordinate attention block has been incorporated into the backbone module to construct a new backbone network. Second, the original concatenation operation has been replaced with a bi-directional feature pyramid network in the neck network. Finally, a new detection head has been added to the head module, enabling the detection of smaller and more distant targets within the field of view of the robot. The proposed YOLOv5s-BC model was compared to several target detection algorithms, including YOLOv5s, YOLOv4, YOLOv3, SSD, Faster R-CNN (ResNet50), and Faster R-CNN (VGG), with significant improvements of 4.6%, 3.6%, 20.48%, 23.22%, 15.27%, and 15.59% in mAP, respectively. The detection accuracy of the proposed model is also greatly enhanced over the original YOLOv5s model. The model boasts an average detection speed of 0.018 s per image, and the weight size is only 16.7 Mb with 4.7 Mb smaller than that of YOLOv8s, meeting the real-time requirements for the picking robot. Furthermore, according to the heat map, our proposed model can focus more on and learn the high-level features of the target apples, and recognize the smaller target apples better than the original YOLOv5s model. Then, in other apple orchard tests, the model can detect the pickable apples in real time and correctly, illustrating a decent generalization ability. It is noted that our model can provide technical support for the apple harvesting robot in terms of real-time target detection and harvesting sequence planning.</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2024-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140929976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
High-speed hardware accelerator based on brightness improved by Light-DehazeNet 通过 Light-DehazeNet 改进基于亮度的高速硬件加速器
IF 3 4区 计算机科学 Q2 Computer Science Pub Date : 2024-05-09 DOI: 10.1007/s11554-024-01464-2
Peiyi Teng, Gaoming Du, Zhenmin Li, Xiaolei Wang, Yongsheng Yin

Due to the increasing demand for artificial intelligence technology in today’s society, the entire industrial production system is undergoing a transformative process related to automation, reliability, and robustness, seeking higher productivity and product competitiveness. Additionally, many hardware platforms are unable to deploy complex algorithms due to limited resources. To address these challenges, this paper proposes a computationally efficient lightweight convolutional neural network called Brightness Improved by Light-DehazeNet, which removes the impact of fog and haze to reconstruct clear images. Additionally, we introduce an efficient hardware accelerator architecture based on this network for deployment on low-resource platforms. Furthermore, we present a brightness visibility restoration method to prevent brightness loss in dehazed images. To evaluate the performance of our method, extensive experiments were conducted, comparing it with various traditional and deep learning-based methods, including images with artificial synthesis and natural blur. The experimental results demonstrate that our proposed method excels in dehazing ability, outperforming other methods in comprehensive comparisons. Moreover, it achieves rapid processing speeds, with a maximum frame rate of 105 frames per second, meeting the requirements of real-time processing.

由于当今社会对人工智能技术的需求日益增长,整个工业生产系统正在经历一个与自动化、可靠性和稳健性有关的转型过程,以寻求更高的生产率和产品竞争力。此外,由于资源有限,许多硬件平台无法部署复杂的算法。为了应对这些挑战,本文提出了一种计算效率高的轻量级卷积神经网络--Brightness Improved by Light-DehazeNet,它可以消除雾和霾的影响,重建清晰的图像。此外,我们还介绍了基于该网络的高效硬件加速器架构,以便在低资源平台上部署。此外,我们还提出了一种亮度可见性恢复方法,以防止去雾图像中的亮度损失。为了评估我们方法的性能,我们进行了大量实验,将其与各种传统方法和基于深度学习的方法进行了比较,包括人工合成和自然模糊的图像。实验结果表明,我们提出的方法具有出色的去毛刺能力,在综合比较中优于其他方法。此外,它的处理速度也很快,最高帧率可达每秒 105 帧,满足了实时处理的要求。
{"title":"High-speed hardware accelerator based on brightness improved by Light-DehazeNet","authors":"Peiyi Teng, Gaoming Du, Zhenmin Li, Xiaolei Wang, Yongsheng Yin","doi":"10.1007/s11554-024-01464-2","DOIUrl":"https://doi.org/10.1007/s11554-024-01464-2","url":null,"abstract":"<p>Due to the increasing demand for artificial intelligence technology in today’s society, the entire industrial production system is undergoing a transformative process related to automation, reliability, and robustness, seeking higher productivity and product competitiveness. Additionally, many hardware platforms are unable to deploy complex algorithms due to limited resources. To address these challenges, this paper proposes a computationally efficient lightweight convolutional neural network called Brightness Improved by Light-DehazeNet, which removes the impact of fog and haze to reconstruct clear images. Additionally, we introduce an efficient hardware accelerator architecture based on this network for deployment on low-resource platforms. Furthermore, we present a brightness visibility restoration method to prevent brightness loss in dehazed images. To evaluate the performance of our method, extensive experiments were conducted, comparing it with various traditional and deep learning-based methods, including images with artificial synthesis and natural blur. The experimental results demonstrate that our proposed method excels in dehazing ability, outperforming other methods in comprehensive comparisons. Moreover, it achieves rapid processing speeds, with a maximum frame rate of 105 frames per second, meeting the requirements of real-time processing.</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2024-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140930072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DRI-Net: a model for insulator defect detection on transmission lines in rainy backgrounds DRI-Net:雨天背景下输电线路绝缘体缺陷检测模型
IF 3 4区 计算机科学 Q2 Computer Science Pub Date : 2024-05-09 DOI: 10.1007/s11554-024-01461-5
Chao Ji, Mingjiang Gao, Siyuan Zhou, Junpeng Liu, Yongcan Zhu, Xinbo Huang

Transmission line insulators often operate in challenging weather conditions, particularly on rainy days. Continuous exposure to humidity and rain accelerates the aging process of insulators, leading to a decline in insulating material performance, the occurrence of cracks, and deformation. This situation poses a significant risk to the operation of the power system. Scene images collected on rainy days are frequently obstructed by rain lines, resulting in blurred backgrounds that significantly impact the performance of detection models. To improve the accuracy of insulator defect detection in rainy day environments, this paper proposes the DRI-Net (Derain-Insulator-net) detection model. Firstly, a dataset of insulator defects in rainy weather environments is constructed. Second, designing the de-raining model DRGAN and integrating it as an end-to-end DRGAN de-raining structural layer into the input end of the DRI-Net detection model, we significantly enhance the clarity and quality of images affected by rain, thereby reducing adverse effects such as image blurring and occlusion caused by rainwater. Finally, to enhance the lightweight performance of the model, partial convolution (PConv) and the lightweight upsampling operator CARAFE are utilized in the detection network to reduce the computational complexity of the model. The Wise-IoU bounding box regression loss function is applied to achieve faster convergence and improved detector accuracy. Experimental results demonstrate the effectiveness of the DRI-Net model in the task of rainy-day insulator defect detection, achieving an average precision MAP value of 82.65% in the established dataset. Additionally, an online detection system for rainy day insulator defects is designed in conjunction with the detection model, demonstrating practical engineering applications value.

输电线路绝缘子经常在恶劣的天气条件下工作,尤其是在雨天。持续暴露在潮湿和雨水中会加速绝缘子的老化过程,导致绝缘材料性能下降、出现裂纹和变形。这种情况给电力系统的运行带来了极大的风险。雨天采集的现场图像经常被雨线遮挡,导致背景模糊,严重影响检测模型的性能。为了提高雨天环境下绝缘子缺陷检测的准确性,本文提出了 DRI-Net (Derain-Innsulator-net)检测模型。首先,构建雨天环境下的绝缘子缺陷数据集。其次,设计去raining模型DRGAN,并将其作为端到端DRGAN去raining结构层集成到DRI-Net检测模型的输入端,显著提高受雨水影响图像的清晰度和质量,从而减少雨水造成的图像模糊和遮挡等不良影响。最后,为了提高模型的轻量级性能,我们在检测网络中使用了部分卷积(PConv)和轻量级上采样算子 CARAFE,以降低模型的计算复杂度。此外,还采用了 Wise-IoU 边框回归损失函数,以加快收敛速度并提高检测精度。实验结果证明了 DRI-Net 模型在雨天绝缘体缺陷检测任务中的有效性,在已建立的数据集中,平均精度 MAP 值达到了 82.65%。此外,结合该检测模型还设计了雨天绝缘体缺陷在线检测系统,展示了实际工程应用价值。
{"title":"DRI-Net: a model for insulator defect detection on transmission lines in rainy backgrounds","authors":"Chao Ji, Mingjiang Gao, Siyuan Zhou, Junpeng Liu, Yongcan Zhu, Xinbo Huang","doi":"10.1007/s11554-024-01461-5","DOIUrl":"https://doi.org/10.1007/s11554-024-01461-5","url":null,"abstract":"<p>Transmission line insulators often operate in challenging weather conditions, particularly on rainy days. Continuous exposure to humidity and rain accelerates the aging process of insulators, leading to a decline in insulating material performance, the occurrence of cracks, and deformation. This situation poses a significant risk to the operation of the power system. Scene images collected on rainy days are frequently obstructed by rain lines, resulting in blurred backgrounds that significantly impact the performance of detection models. To improve the accuracy of insulator defect detection in rainy day environments, this paper proposes the DRI-Net (Derain-Insulator-net) detection model. Firstly, a dataset of insulator defects in rainy weather environments is constructed. Second, designing the de-raining model DRGAN and integrating it as an end-to-end DRGAN de-raining structural layer into the input end of the DRI-Net detection model, we significantly enhance the clarity and quality of images affected by rain, thereby reducing adverse effects such as image blurring and occlusion caused by rainwater. Finally, to enhance the lightweight performance of the model, partial convolution (PConv) and the lightweight upsampling operator CARAFE are utilized in the detection network to reduce the computational complexity of the model. The Wise-IoU bounding box regression loss function is applied to achieve faster convergence and improved detector accuracy. Experimental results demonstrate the effectiveness of the DRI-Net model in the task of rainy-day insulator defect detection, achieving an average precision MAP value of 82.65% in the established dataset. Additionally, an online detection system for rainy day insulator defects is designed in conjunction with the detection model, demonstrating practical engineering applications value.</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2024-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140929979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Yolo-global: a real-time target detector for mineral particles Yolo-global:矿物颗粒实时目标探测器
IF 3 4区 计算机科学 Q2 Computer Science Pub Date : 2024-05-08 DOI: 10.1007/s11554-024-01468-y
Zihao Wang, Dong Zhou, Chengjun Guo, Ruihao Zhou

Recently, deep learning methodologies have achieved significant advancements in mineral automatic sorting and anomaly detection. However, the limited features of minerals transported in the form of small particles pose significant challenges to accurate detection. To address this challenge, we propose a enhanced mineral particle detection algorithm based on the YOLOv8s model. Initially, a C2f-SRU block is introduced to enable the feature extraction network to more effectively process spatial redundant information. Additionally, we designed the GFF module with the aim of enhancing information propagation between non-adjacent scale features, thereby enabling deep networks to more fully leverage spatial positional information from shallower networks. Finally, we adopted the Wise-IoU loss function to optimize the detection performance of the model. We also re-designed the position of the prediction heads to achieve precise detection of small-scale targets. The experimental results substantiate the effectiveness of the algorithm, with YOLO-Global achieving a mAP@.5 of 95.8%. In comparison to the original YOLOv8s, the improved model exhibits a 2.5% increase in mAP, achieving a model inference speed of 81 fps, meeting the requirements for real-time processing and accuracy.

最近,深度学习方法在矿物自动分拣和异常检测方面取得了重大进展。然而,以小颗粒形式运输的矿物特征有限,这给精确检测带来了巨大挑战。为了应对这一挑战,我们提出了一种基于 YOLOv8s 模型的增强型矿物颗粒检测算法。首先,我们引入了 C2f-SRU 模块,使特征提取网络能够更有效地处理空间冗余信息。此外,我们还设计了 GFF 模块,旨在加强非相邻尺度特征之间的信息传播,从而使深度网络能够更充分地利用来自较浅网络的空间位置信息。最后,我们采用了 Wise-IoU 损失函数来优化模型的检测性能。我们还重新设计了预测头的位置,以实现对小尺度目标的精确检测。实验结果证明了该算法的有效性,YOLO-Global 的 mAP@.5 高达 95.8%。与最初的 YOLOv8s 相比,改进后的模型 mAP 提高了 2.5%,模型推理速度达到 81 fps,满足了实时处理和精确度的要求。
{"title":"Yolo-global: a real-time target detector for mineral particles","authors":"Zihao Wang, Dong Zhou, Chengjun Guo, Ruihao Zhou","doi":"10.1007/s11554-024-01468-y","DOIUrl":"https://doi.org/10.1007/s11554-024-01468-y","url":null,"abstract":"<p>Recently, deep learning methodologies have achieved significant advancements in mineral automatic sorting and anomaly detection. However, the limited features of minerals transported in the form of small particles pose significant challenges to accurate detection. To address this challenge, we propose a enhanced mineral particle detection algorithm based on the YOLOv8s model. Initially, a C2f-SRU block is introduced to enable the feature extraction network to more effectively process spatial redundant information. Additionally, we designed the GFF module with the aim of enhancing information propagation between non-adjacent scale features, thereby enabling deep networks to more fully leverage spatial positional information from shallower networks. Finally, we adopted the Wise-IoU loss function to optimize the detection performance of the model. We also re-designed the position of the prediction heads to achieve precise detection of small-scale targets. The experimental results substantiate the effectiveness of the algorithm, with YOLO-Global achieving a mAP@.5 of 95.8%. In comparison to the original YOLOv8s, the improved model exhibits a 2.5% increase in mAP, achieving a model inference speed of 81 fps, meeting the requirements for real-time processing and accuracy.</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2024-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140930022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Real-time lossless image compression by dynamic Huffman coding hardware implementation 通过动态哈夫曼编码硬件实现实时无损图像压缩
IF 3 4区 计算机科学 Q2 Computer Science Pub Date : 2024-05-07 DOI: 10.1007/s11554-024-01467-z
Duc Khai Lam

Over the decades, implementing information technology (IT) has become increasingly common, equating to an increasing amount of data that needs to be stored, creating a massive challenge in data storage. Using a large storage capacity can solve the problem of the file size. However, this method is costly in terms of both capacity and bandwidth. One possible method is data compression, which significantly reduces the file size. With the development of IT and increasing computing capacity, data compression is becoming more and more widespread in many fields, such as broadcast television, aircraft, computer transmission, and medical imaging. In this work, we introduce an image compression algorithm based on the Huffman coding algorithm and use linear techniques to increase image compression efficiency. Besides, we replace 8-bit pixel-by-pixel compression by dividing one pixel into two 4-bit halves to save hardware capacity (because only 4-bit for each input) and optimize run time (because the number of different inputs is less). The goal is to reduce the image’s complexity, increase the data’s repetition rate, reduce the compression time, and increase the image compression efficiency. A hardware accelerator is designed and implemented on the Virtex-7 VC707 FPGA to make it work in real-time. The achieved average compression ratio is 3,467. Hardware design achieves a maximum frequency of 125 MHz.

几十年来,信息技术(IT)的应用越来越普遍,需要存储的数据量也越来越大,这给数据存储带来了巨大的挑战。使用大容量存储可以解决文件大小的问题。然而,这种方法在容量和带宽方面都很昂贵。数据压缩是一种可行的方法,它可以大大减小文件大小。随着信息技术的发展和计算能力的不断提高,数据压缩在广播电视、飞机、计算机传输和医学成像等许多领域的应用越来越广泛。在这项工作中,我们介绍了一种基于哈夫曼编码算法的图像压缩算法,并使用线性技术来提高图像压缩效率。此外,我们将一个像素分成两个 4 位的半像素来取代 8 位的逐像素压缩,以节省硬件容量(因为每个输入只有 4 位)和优化运行时间(因为不同输入的数量较少)。目标是降低图像的复杂性,提高数据的重复率,缩短压缩时间,提高图像压缩效率。为了使其实时工作,设计了一个硬件加速器,并在 Virtex-7 VC707 FPGA 上实现。实现的平均压缩率为 3467。硬件设计实现了 125 MHz 的最高频率。
{"title":"Real-time lossless image compression by dynamic Huffman coding hardware implementation","authors":"Duc Khai Lam","doi":"10.1007/s11554-024-01467-z","DOIUrl":"https://doi.org/10.1007/s11554-024-01467-z","url":null,"abstract":"<p>Over the decades, implementing information technology (IT) has become increasingly common, equating to an increasing amount of data that needs to be stored, creating a massive challenge in data storage. Using a large storage capacity can solve the problem of the file size. However, this method is costly in terms of both capacity and bandwidth. One possible method is data compression, which significantly reduces the file size. With the development of IT and increasing computing capacity, data compression is becoming more and more widespread in many fields, such as broadcast television, aircraft, computer transmission, and medical imaging. In this work, we introduce an image compression algorithm based on the Huffman coding algorithm and use linear techniques to increase image compression efficiency. Besides, we replace 8-bit pixel-by-pixel compression by dividing one pixel into two 4-bit halves to save hardware capacity (because only 4-bit for each input) and optimize run time (because the number of different inputs is less). The goal is to reduce the image’s complexity, increase the data’s repetition rate, reduce the compression time, and increase the image compression efficiency. A hardware accelerator is designed and implemented on the Virtex-7 VC707 FPGA to make it work in real-time. The achieved average compression ratio is 3,467. Hardware design achieves a maximum frequency of 125 MHz.</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2024-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140882428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Software and hardware realizations for different designs of chaos-based secret image sharing systems 基于混沌的秘密图像共享系统不同设计的软件和硬件实现
IF 3 4区 计算机科学 Q2 Computer Science Pub Date : 2024-05-06 DOI: 10.1007/s11554-024-01450-8
Bishoy K. Sharobim, Muhammad Hosam, Salwa K. Abd-El-Hafiz, Wafaa S. Sayed, Lobna A. Said, Ahmed G. Radwan

Secret image sharing (SIS) conveys a secret image to mutually suspicious receivers by sending meaningless shares to the participants, and all shares must be present to recover the secret. This paper proposes and compares three systems for secret sharing, where a visual cryptography system is designed with a fast recovery scheme as the backbone for all systems. Then, an SIS system is introduced for sharing any type of image, where it improves security using the Lorenz chaotic system as the source of randomness and the generalized Arnold transform as a permutation module. The second SIS system further enhances security and robustness by utilizing SHA-256 and RSA cryptosystem. The presented architectures are implemented on a field programmable gate array (FPGA) to enhance computational efficiency and facilitate real-time processing. Detailed experimental results and comparisons between the software and hardware realizations are presented. Security analysis and comparisons with related literature are also introduced with good results, including statistical tests, differential attack measures, robustness tests against noise and crop attacks, key sensitivity tests, and performance analysis.

秘密图像共享(SIS)是通过向参与者发送无意义的共享图像,将秘密图像传递给相互怀疑的接收者,而且所有共享图像都必须存在才能恢复秘密。本文提出并比较了三种秘密共享系统,其中设计了一种视觉密码学系统,并将快速恢复方案作为所有系统的支柱。然后,介绍了一种用于共享任何类型图像的 SIS 系统,该系统使用洛伦兹混沌系统作为随机性源,并使用广义阿诺德变换作为置换模块,从而提高了安全性。第二个 SIS 系统利用 SHA-256 和 RSA 密码系统进一步增强了安全性和稳健性。所介绍的架构是在现场可编程门阵列(FPGA)上实现的,以提高计算效率并促进实时处理。本文介绍了详细的实验结果以及软件和硬件实现之间的比较。此外,还介绍了安全分析以及与相关文献的比较,包括统计测试、差分攻击测量、抗噪声和裁剪攻击的鲁棒性测试、密钥灵敏度测试和性能分析,并取得了良好的效果。
{"title":"Software and hardware realizations for different designs of chaos-based secret image sharing systems","authors":"Bishoy K. Sharobim, Muhammad Hosam, Salwa K. Abd-El-Hafiz, Wafaa S. Sayed, Lobna A. Said, Ahmed G. Radwan","doi":"10.1007/s11554-024-01450-8","DOIUrl":"https://doi.org/10.1007/s11554-024-01450-8","url":null,"abstract":"<p>Secret image sharing (SIS) conveys a secret image to mutually suspicious receivers by sending meaningless shares to the participants, and all shares must be present to recover the secret. This paper proposes and compares three systems for secret sharing, where a visual cryptography system is designed with a fast recovery scheme as the backbone for all systems. Then, an SIS system is introduced for sharing any type of image, where it improves security using the Lorenz chaotic system as the source of randomness and the generalized Arnold transform as a permutation module. The second SIS system further enhances security and robustness by utilizing SHA-256 and RSA cryptosystem. The presented architectures are implemented on a field programmable gate array (FPGA) to enhance computational efficiency and facilitate real-time processing. Detailed experimental results and comparisons between the software and hardware realizations are presented. Security analysis and comparisons with related literature are also introduced with good results, including statistical tests, differential attack measures, robustness tests against noise and crop attacks, key sensitivity tests, and performance analysis.</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2024-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140882445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
T-psd: T-shape parking slot detection with self-calibrated convolution network T-psd:利用自校准卷积网络进行 T 型停车槽检测
IF 3 4区 计算机科学 Q2 Computer Science Pub Date : 2024-05-04 DOI: 10.1007/s11554-024-01460-6
Ruitao Zheng, Haifei Zhu, Xinghua Wu, Wei Meng

This paper deals with a challenging autonomous parking problem in which the parking slots are with various different angles. We transform the problem of parking slot detection into center keypoint detection, representing the parking slot as a T-shape to make it robust and simple. For diverse types of parking slots, we propose a T-shape parking slot detection method, called T-PSD, to extract the T-shape center information based on a self-calibrated convolution network (SCCN). This method can concurrently obtain the entrance center confidence, the relative offsets of the paired junctions, the direction of the middle line, the occupancy and the inferred type in the parking slots. Final detection results are produced by utilizing Half-Heatmap, MultiBins and Midline-Grid to more accurately extract the center keypoint, direction and occupancy, respectively. To verify the performance of our method, we conduct experiments on the public PS2.0 dataset. The results have shown that our method outperforms state-of-the-art competitors by showing recall rate of 99.86% and precision rate of 99.82%. It is capable of achieving 65 frames per second (FPS) and satisfying a real-time detection performance. In contrast to the simultaneous detection of global and local information, our SCCN detector exclusively concentrates on the T-shape center information, which achieves comparable performance and significantly accelerates the inference time without non-maximum suppression (NMS).

本文讨论的是一个具有挑战性的自主停车问题,其中停车位的角度各不相同。我们将停车位检测问题转化为中心关键点检测问题,将停车位表示为一个 T 形,使其更加稳健和简单。针对不同类型的停车位,我们提出了一种名为 T-PSD 的 T 形停车位检测方法,基于自校准卷积网络(SCCN)提取 T 形中心信息。该方法可同时获得入口中心置信度、成对路口的相对偏移、中线方向、占用率以及推断停车槽的类型。通过利用半热图、多宾和中线网格,分别更准确地提取中心关键点、方向和占用率,从而得出最终的检测结果。为了验证我们方法的性能,我们在公共 PS2.0 数据集上进行了实验。结果表明,我们的方法优于最先进的竞争对手,召回率达到 99.86%,精确率达到 99.82%。它能够达到每秒 65 帧(FPS),满足实时检测性能要求。与同时检测全局和局部信息的方法相比,我们的 SCCN 检测器只集中检测 T 形中心信息,不仅性能相当,而且在没有非最大抑制(NMS)的情况下大大加快了推理时间。
{"title":"T-psd: T-shape parking slot detection with self-calibrated convolution network","authors":"Ruitao Zheng, Haifei Zhu, Xinghua Wu, Wei Meng","doi":"10.1007/s11554-024-01460-6","DOIUrl":"https://doi.org/10.1007/s11554-024-01460-6","url":null,"abstract":"<p>This paper deals with a challenging autonomous parking problem in which the parking slots are with various different angles. We transform the problem of parking slot detection into center keypoint detection, representing the parking slot as a T-shape to make it robust and simple. For diverse types of parking slots, we propose a T-shape parking slot detection method, called T-PSD, to extract the T-shape center information based on a self-calibrated convolution network (SCCN). This method can concurrently obtain the entrance center confidence, the relative offsets of the paired junctions, the direction of the middle line, the occupancy and the inferred type in the parking slots. Final detection results are produced by utilizing Half-Heatmap, MultiBins and Midline-Grid to more accurately extract the center keypoint, direction and occupancy, respectively. To verify the performance of our method, we conduct experiments on the public PS2.0 dataset. The results have shown that our method outperforms state-of-the-art competitors by showing recall rate of 99.86% and precision rate of 99.82%. It is capable of achieving 65 frames per second (FPS) and satisfying a real-time detection performance. In contrast to the simultaneous detection of global and local information, our SCCN detector exclusively concentrates on the T-shape center information, which achieves comparable performance and significantly accelerates the inference time without non-maximum suppression (NMS).</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2024-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140882380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
$$eta$$ -repyolo: real-time object detection method based on $$eta$$ -RepConv and YOLOv8 $$eta$ -repyolo:基于 $$eta$ -RepConv 和 YOLOv8 的实时物体检测方法
IF 3 4区 计算机科学 Q2 Computer Science Pub Date : 2024-05-03 DOI: 10.1007/s11554-024-01462-4
Shuai Feng, Huaming Qian, Huilin Wang, Wenna Wang

Deep learning-based object detection methods often grapple with excessive model parameters, high complexity, and subpar real-time performance. In response, the YOLO series, particularly the YOLOv5s to YOLOv8s methods, has been developed by scholars to strike a balance between real-time processing and accuracy. Nevertheless, YOLOv8’s precision can fall short in certain specific applications. To address this, we introduce a real-time object detection method called (eta)-RepYOLO, which is built upon the (eta)-RepConv structure. This method is designed to maintain consistent detection speeds while improving accuracy. We begin by crafting a backbone network named (eta)-EfficientRep, which utilizes a strategically designed network unit-(eta)-RepConv and (eta)-RepC2f module, to reparameterize and subsequently generate an efficient inference model. This model achieves superior performance by extracting detailed feature maps from images. Subsequently, we propose the enhanced (eta)-RepPANet and (eta)-RepAFPN as the model’s detection neck, with the addition of the (eta)-RepC2f for optimized feature fusion, thus boosting the neck’s functionality. Our innovation continues with the development of an advanced decoupled head for detection, where the (eta)-RepConv takes the place of the traditional (3 times 3) conv, resulting in a marked increase in detection precision during the inference stage. Our proposed (eta)-RepYOLO method, when applied to distinct neck modules, (eta)-RepPANet and (eta)-RepAFPN, achieves mAP of 84.77%/85.65% on the PASCAL VOC07+12 dataset and AP of 45.3%/45.8% on the MSCOCO dataset, respectively. These figures represent a significant advancement over the YOLOv8s method. Additionally, the model parameters for (eta)-RepYOLO are reduced to 10.8M/8.8M, which is 3.6%/21.4% less than that of YOLOv8, culminating in a more streamlined detection model. The detection speeds clocked on an RTX3060 are 116 FPS/81 FPS, showcasing a substantial enhancement in comparison to YOLOv8s. In summary, our approach delivers competitive performance and presents a more lightweight alternative to the SOTA YOLO models, making it a robust choice for real-time object detection applications.

基于深度学习的物体检测方法通常会面临模型参数过多、复杂度高、实时性差等问题。为此,学者们开发了 YOLO 系列,特别是 YOLOv5s 至 YOLOv8s 方法,以在实时处理和精度之间取得平衡。然而,在某些特定应用中,YOLOv8 的精度可能会有所欠缺。为了解决这个问题,我们引入了一种实时对象检测方法--RepYOLO,它建立在 (eta)-RepConv 结构之上。这种方法旨在保持稳定的检测速度,同时提高准确性。我们首先创建了一个名为((eta)-EfficientRep)的骨干网络,它利用战略性设计的网络单元--((eta)-RepConv 和((eta)-RepC2f)模块,重新参数化并随后生成一个高效的推理模型。该模型通过从图像中提取详细的特征图实现了卓越的性能。随后,我们提出了增强型(eta)-RepPANet 和(eta)-RepAFPN 作为该模型的检测颈部,并添加了用于优化特征融合的(eta)-RepC2f,从而增强了颈部的功能。我们的创新还体现在为检测开发了一个先进的解耦头部,在这个头部中,(eta)-RepConv 取代了传统的(3times 3) conv,从而显著提高了推理阶段的检测精度。当我们提出的 (eta)-RepYOLO 方法应用于不同的颈部模块,即 (eta)-RepPANet 和 (eta)-RepAFPN 时,在 PASCAL VOC07+12 数据集上的 mAP 分别达到了 84.77%/85.65%,在 MSCOCO 数据集上的 AP 分别达到了 45.3%/45.8%。与 YOLOv8s 方法相比,这些数据都有显著提高。此外,(eta)-RepYOLO 的模型参数减少到 10.8M/8.8M, 比 YOLOv8 减少了 3.6%/21.4%, 最终形成了一个更精简的检测模型。在 RTX3060 上的检测速度为 116 FPS/81 FPS,与 YOLOv8 相比有了大幅提升。总之,我们的方法提供了具有竞争力的性能,并为 SOTA YOLO 模型提供了更轻便的替代方案,使其成为实时目标检测应用的可靠选择。
{"title":"$$eta$$ -repyolo: real-time object detection method based on $$eta$$ -RepConv and YOLOv8","authors":"Shuai Feng, Huaming Qian, Huilin Wang, Wenna Wang","doi":"10.1007/s11554-024-01462-4","DOIUrl":"https://doi.org/10.1007/s11554-024-01462-4","url":null,"abstract":"<p>Deep learning-based object detection methods often grapple with excessive model parameters, high complexity, and subpar real-time performance. In response, the YOLO series, particularly the YOLOv5s to YOLOv8s methods, has been developed by scholars to strike a balance between real-time processing and accuracy. Nevertheless, YOLOv8’s precision can fall short in certain specific applications. To address this, we introduce a real-time object detection method called <span>(eta)</span>-RepYOLO, which is built upon the <span>(eta)</span>-RepConv structure. This method is designed to maintain consistent detection speeds while improving accuracy. We begin by crafting a backbone network named <span>(eta)</span>-EfficientRep, which utilizes a strategically designed network unit-<span>(eta)</span>-RepConv and <span>(eta)</span>-RepC2f module, to reparameterize and subsequently generate an efficient inference model. This model achieves superior performance by extracting detailed feature maps from images. Subsequently, we propose the enhanced <span>(eta)</span>-RepPANet and <span>(eta)</span>-RepAFPN as the model’s detection neck, with the addition of the <span>(eta)</span>-RepC2f for optimized feature fusion, thus boosting the neck’s functionality. Our innovation continues with the development of an advanced decoupled head for detection, where the <span>(eta)</span>-RepConv takes the place of the traditional <span>(3 times 3)</span> conv, resulting in a marked increase in detection precision during the inference stage. Our proposed <span>(eta)</span>-RepYOLO method, when applied to distinct neck modules, <span>(eta)</span>-RepPANet and <span>(eta)</span>-RepAFPN, achieves mAP of 84.77%/85.65% on the PASCAL VOC07+12 dataset and AP of 45.3%/45.8% on the MSCOCO dataset, respectively. These figures represent a significant advancement over the YOLOv8s method. Additionally, the model parameters for <span>(eta)</span>-RepYOLO are reduced to 10.8M/8.8M, which is 3.6%/21.4% less than that of YOLOv8, culminating in a more streamlined detection model. The detection speeds clocked on an RTX3060 are 116 FPS/81 FPS, showcasing a substantial enhancement in comparison to YOLOv8s. In summary, our approach delivers competitive performance and presents a more lightweight alternative to the SOTA YOLO models, making it a robust choice for real-time object detection applications.</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2024-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140882430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Real-Time Image Processing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1