Journal of Real-Time Image Processing最新文献_第8页

Real-time segmentation algorithm of unstructured road scenes based on improved BiSeNet 基于改进型 BiSeNet 的非结构化道路场景实时分割算法

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Real-Time Image Processing

Pub Date : 2024-05-12 DOI: 10.1007/s11554-024-01472-2

Chunhui Bai, Lilian Zhang, Lutao Gao, Lin Peng, Peishan Li, Linnan Yang

In response to the fuzzy and complex boundaries of unstructured road scenes, as well as the high difficulty of segmentation, this paper uses BiSeNet as the benchmark model to improve the above situation and proposes a real-time segmentation model based on partial convolution. Using FasterNet based on partial convolution as the backbone network and improving it, adopting higher floating-point operations per second operators to improve the inference speed of the model; optimizing the model structure, removing inefficient spatial paths, and using shallow features of context paths to replace their roles, reducing model complexity; the Residual Atrous Spatial Pyramid Pooling Module is proposed to replace a single context embedding module in the original model, allowing better extraction of multi-scale context information and improving the accuracy of model segmentation; the feature fusion module is upgraded, the proposed Dual Attention Features Fusion Module is more helpful for the model to better understand image context through cross-level feature fusion. This paper proposes a model with a inference speed of 78.81 f/s, which meets the real-time requirements of unstructured road scene segmentation. Regarding accuracy metrics, the model in this paper excels with Mean Intersection over Union and Macro F1 at 72.63% and 83.20%, respectively, showing significant advantages over other advanced real-time segmentation models. Therefore, the real-time segmentation model based on partial convolution in this paper well meets the accuracy and speed required for segmentation tasks in complex and variable unstructured road scenes, and has reference value for the development of autonomous driving technology in unstructured road scenes. Code is available at https://github.com/BaiChunhui2001/Real-time-segmentation.

针对非结构化道路场景边界模糊、复杂，分割难度高的特点，本文以 BiSeNet 为基准模型，对上述情况进行改进，提出了基于部分卷积的实时分割模型。以基于部分卷积的 FasterNet 为骨干网络并对其进行改进，采用每秒较高浮点运算的算子，提高模型的推理速度；优化模型结构，去除低效空间路径，利用上下文路径的浅层特征替代其作用，降低模型复杂度；提出了残差阿特拉斯空间金字塔池化模块，取代了原模型中单一的上下文嵌入模块，可以更好地提取多尺度上下文信息，提高模型分割的准确性；升级了特征融合模块，提出的双注意特征融合模块通过跨层次的特征融合，更有助于模型更好地理解图像上下文。本文提出的模型推理速度为 78.81 f/s，满足了非结构化道路场景分割的实时性要求。在准确度指标方面，本文模型的平均交集超过联合率和宏观 F1 分别为 72.63% 和 83.20%，与其他先进的实时分割模型相比优势明显。因此，本文基于部分卷积的实时分割模型很好地满足了复杂多变的非结构化道路场景下的分割任务所需的精度和速度，对非结构化道路场景下的自动驾驶技术发展具有参考价值。代码见 https://github.com/BaiChunhui2001/Real-time-segmentation。

{"title":"Real-time segmentation algorithm of unstructured road scenes based on improved BiSeNet","authors":"Chunhui Bai, Lilian Zhang, Lutao Gao, Lin Peng, Peishan Li, Linnan Yang","doi":"10.1007/s11554-024-01472-2","DOIUrl":"https://doi.org/10.1007/s11554-024-01472-2","url":null,"abstract":"In response to the fuzzy and complex boundaries of unstructured road scenes, as well as the high difficulty of segmentation, this paper uses BiSeNet as the benchmark model to improve the above situation and proposes a real-time segmentation model based on partial convolution. Using FasterNet based on partial convolution as the backbone network and improving it, adopting higher floating-point operations per second operators to improve the inference speed of the model; optimizing the model structure, removing inefficient spatial paths, and using shallow features of context paths to replace their roles, reducing model complexity; the Residual Atrous Spatial Pyramid Pooling Module is proposed to replace a single context embedding module in the original model, allowing better extraction of multi-scale context information and improving the accuracy of model segmentation; the feature fusion module is upgraded, the proposed Dual Attention Features Fusion Module is more helpful for the model to better understand image context through cross-level feature fusion. This paper proposes a model with a inference speed of 78.81 f/s, which meets the real-time requirements of unstructured road scene segmentation. Regarding accuracy metrics, the model in this paper excels with Mean Intersection over Union and Macro F1 at 72.63% and 83.20%, respectively, showing significant advantages over other advanced real-time segmentation models. Therefore, the real-time segmentation model based on partial convolution in this paper well meets the accuracy and speed required for segmentation tasks in complex and variable unstructured road scenes, and has reference value for the development of autonomous driving technology in unstructured road scenes. Code is available at https://github.com/BaiChunhui2001/Real-time-segmentation.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"30 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140941977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Hardware architecture optimization for high-frequency zeroing and LFNST in H.266/VVC based on FPGA 基于 FPGA 的 H.266/VVC 高频归零和 LFNST 硬件架构优化

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Real-Time Image Processing

Pub Date : 2024-05-11 DOI: 10.1007/s11554-024-01470-4

Junxiang Zhang, Qinghua Sheng, Rui Pan, Jiawei Wang, Kuan Qin, Xiaofang Huang, Xiaoyan Niu

To reduce the hardware implementation resource consumption of the two-dimensional transform component in H.266 VVC, a unified hardware structure is proposed that supports full-size Discrete Cosine Transform (DCT), Discrete Sine Transform (DST), and full-size Low-Frequency Non-Separable Transform (LFNST). This paper presents an area-efficient hardware architecture for two-dimensional transforms based on a general Regular Multiplier (RM) and a high-throughput hardware design for LFNST in the context of H.266/VVC. The first approach utilizes the high-frequency zeroing characteristics of VVC and the symmetric properties of the DCT-II matrix, allowing the RM-based architecture to use only 256 general multipliers in a fully pipelined structure with a parallelism of 16. The second approach optimizes the transpose operation of the input matrix for LFNST in a parallelism of 16 architecture, aiming to save storage and logic resources.

为了减少 H.266/VVC 中二维变换组件的硬件实现资源消耗，本文提出了一种统一的硬件结构，支持全尺寸离散余弦变换（DCT）、离散正弦变换（DST）和全尺寸低频非分离变换（LFNST）。本文介绍了基于通用正则乘法器（RM）的二维变换的高面积效率硬件架构，以及在 H.266/VVC 背景下 LFNST 的高吞吐量硬件设计。第一种方法利用了 VVC 的高频归零特性和 DCT-II 矩阵的对称特性，使基于 RM 的架构在并行度为 16 的全流水线结构中仅使用 256 个普通乘法器。第二种方法优化了并行度为 16 的架构中 LFNST 输入矩阵的转置操作，旨在节省存储和逻辑资源。

引用次数: 0

RCSLFNet: a novel real-time pedestrian detection network based on re-parameterized convolution and channel-spatial location fusion attention for low-resolution infrared image RCSLFNet：基于重新参数化卷积和通道空间位置融合注意力的新型实时行人检测网络，适用于低分辨率红外图像

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Real-Time Image Processing

Pub Date : 2024-05-11 DOI: 10.1007/s11554-024-01469-x

Shuai Hao, Zhengqi Liu, Xu Ma, Yingqi Wu, Tian He, Jiahao Li

A novel real-time infrared pedestrian detection algorithm is introduced in this study. The proposed approach leverages re-parameterized convolution and channel-spatial location fusion attention to tackle the difficulties presented by low-resolution, partial occlusion, and environmental interference in infrared pedestrian images. These factors have historically hindered the accurate detection of pedestrians using traditional algorithms. First, to tackle the problem of weak feature representation of infrared pedestrian targets caused by low resolution and partial occlusion, a new attention module that integrates channel and spatial is devised and introduced to CSPDarkNet53 to design a new backbone CSLF-DarkNet53. The designed attention model can enhance the feature expression ability of pedestrian targets and make pedestrian targets more prominent in complex backgrounds. Second, to enhance the efficiency of detection and accelerate convergence, a multi-branch decoupled detector head is designed to operate the classification and location of infrared pedestrians separately. Finally, to improve poor real-time without losing precision, we introduce the re-parameterized convolution (Repconv) using parameter identity transformation to decouple the training process and detection process. During the training procedure, to enhance the fitting ability of small convolution kernels, a multi-branch structure with convolution kernels of different scales is designed. Compared with the nice classical detection algorithms, the results of the experiment show that the proposed RCSLFNet not only detects partial occlusion infrared pedestrians in complex environments accurately but also has better real-time performance on the KAIST dataset. The mAP@0.5 reaches 86% and the detection time is 0.0081 s, 2.9% higher than the baseline.

本研究介绍了一种新型实时红外行人检测算法。所提出的方法利用重新参数化卷积和信道空间位置融合注意力来解决红外行人图像中的低分辨率、部分遮挡和环境干扰所带来的困难。这些因素一直阻碍着传统算法对行人的准确检测。首先，针对低分辨率和部分遮挡导致的红外行人目标特征表征不强的问题，设计了一种融合通道和空间的新注意力模块，并将其引入到 CSPDarkNet53 中，设计出一种新的主干 CSLF-DarkNet53。设计的注意力模型可以增强行人目标的特征表达能力，使行人目标在复杂背景中更加突出。其次，为了提高检测效率，加快收敛速度，设计了多分支解耦检测头，将红外行人的分类和定位分开操作。最后，为了在不损失精度的情况下改善较差的实时性，我们引入了重新参数化卷积（Repconv），利用参数标识变换将训练过程和检测过程解耦。在训练过程中，为了提高小卷积核的拟合能力，我们设计了一种具有不同尺度卷积核的多分支结构。实验结果表明，与优秀的经典检测算法相比，所提出的 RCSLFNet 不仅能准确检测复杂环境中部分遮挡的红外行人，而且在 KAIST 数据集上具有更好的实时性能。mAP@0.5 达到 86%，检测时间为 0.0081 s，比基线高出 2.9%。

{"title":"RCSLFNet: a novel real-time pedestrian detection network based on re-parameterized convolution and channel-spatial location fusion attention for low-resolution infrared image","authors":"Shuai Hao, Zhengqi Liu, Xu Ma, Yingqi Wu, Tian He, Jiahao Li","doi":"10.1007/s11554-024-01469-x","DOIUrl":"https://doi.org/10.1007/s11554-024-01469-x","url":null,"abstract":"A novel real-time infrared pedestrian detection algorithm is introduced in this study. The proposed approach leverages re-parameterized convolution and channel-spatial location fusion attention to tackle the difficulties presented by low-resolution, partial occlusion, and environmental interference in infrared pedestrian images. These factors have historically hindered the accurate detection of pedestrians using traditional algorithms. First, to tackle the problem of weak feature representation of infrared pedestrian targets caused by low resolution and partial occlusion, a new attention module that integrates channel and spatial is devised and introduced to CSPDarkNet53 to design a new backbone CSLF-DarkNet53. The designed attention model can enhance the feature expression ability of pedestrian targets and make pedestrian targets more prominent in complex backgrounds. Second, to enhance the efficiency of detection and accelerate convergence, a multi-branch decoupled detector head is designed to operate the classification and location of infrared pedestrians separately. Finally, to improve poor real-time without losing precision, we introduce the re-parameterized convolution (Repconv) using parameter identity transformation to decouple the training process and detection process. During the training procedure, to enhance the fitting ability of small convolution kernels, a multi-branch structure with convolution kernels of different scales is designed. Compared with the nice classical detection algorithms, the results of the experiment show that the proposed RCSLFNet not only detects partial occlusion infrared pedestrians in complex environments accurately but also has better real-time performance on the KAIST dataset. The mAP@0.5 reaches 86% and the detection time is 0.0081 s, 2.9% higher than the baseline.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"43 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140930056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

YOLOv5s-BC: an improved YOLOv5s-based method for real-time apple detection YOLOv5s-BC：基于 YOLOv5s 的改进型苹果实时检测方法

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Real-Time Image Processing

Pub Date : 2024-05-10 DOI: 10.1007/s11554-024-01473-1

Jingfan Liu, Zhaobing Liu

The current apple detection algorithms fail to accurately differentiate obscured apples from pickable ones, thus leading to low accuracy in apple harvesting and a high rate of instances where apples are either mispicked or missed altogether. To address the issues associated with the existing algorithms, this study proposes an improved YOLOv5s-based method, named YOLOv5s-BC, for real-time apple detection, in which a series of modifications have been introduced. First, a coordinate attention block has been incorporated into the backbone module to construct a new backbone network. Second, the original concatenation operation has been replaced with a bi-directional feature pyramid network in the neck network. Finally, a new detection head has been added to the head module, enabling the detection of smaller and more distant targets within the field of view of the robot. The proposed YOLOv5s-BC model was compared to several target detection algorithms, including YOLOv5s, YOLOv4, YOLOv3, SSD, Faster R-CNN (ResNet50), and Faster R-CNN (VGG), with significant improvements of 4.6%, 3.6%, 20.48%, 23.22%, 15.27%, and 15.59% in mAP, respectively. The detection accuracy of the proposed model is also greatly enhanced over the original YOLOv5s model. The model boasts an average detection speed of 0.018 s per image, and the weight size is only 16.7 Mb with 4.7 Mb smaller than that of YOLOv8s, meeting the real-time requirements for the picking robot. Furthermore, according to the heat map, our proposed model can focus more on and learn the high-level features of the target apples, and recognize the smaller target apples better than the original YOLOv5s model. Then, in other apple orchard tests, the model can detect the pickable apples in real time and correctly, illustrating a decent generalization ability. It is noted that our model can provide technical support for the apple harvesting robot in terms of real-time target detection and harvesting sequence planning.

目前的苹果检测算法无法准确区分被遮挡的苹果和可采摘的苹果，因此导致苹果采摘的准确率较低，误摘或漏摘苹果的情况较多。针对现有算法存在的问题，本研究提出了一种基于 YOLOv5s 的改进型苹果实时检测方法，并将其命名为 YOLOv5s-BC。首先，在主干模块中加入了坐标注意块，以构建新的主干网络。其次，在颈部网络中用双向特征金字塔网络取代了原来的连接操作。最后，在头部模块中加入了一个新的探测头，从而能够探测机器人视野范围内更小更远的目标。所提出的 YOLOv5s-BC 模型与几种目标检测算法进行了比较，包括 YOLOv5s、YOLOv4、YOLOv3、SSD、Faster R-CNN (ResNet50) 和 Faster R-CNN (VGG)，mAP 分别显著提高了 4.6%、3.6%、20.48%、23.22%、15.27% 和 15.59%。与最初的 YOLOv5s 模型相比，所提模型的检测精度也有了很大提高。该模型每幅图像的平均检测速度为 0.018 s，权重大小仅为 16.7 Mb，比 YOLOv8s 小 4.7 Mb，满足了拣选机器人的实时性要求。此外，根据热图，我们提出的模型能更多地关注和学习目标苹果的高级特征，对较小目标苹果的识别能力优于原始的 YOLOv5s 模型。随后，在其他苹果园测试中，该模型也能实时、正确地检测到可采摘的苹果，说明其具有良好的泛化能力。由此可见，我们的模型可以在实时目标检测和采摘顺序规划方面为苹果采摘机器人提供技术支持。

{"title":"YOLOv5s-BC: an improved YOLOv5s-based method for real-time apple detection","authors":"Jingfan Liu, Zhaobing Liu","doi":"10.1007/s11554-024-01473-1","DOIUrl":"https://doi.org/10.1007/s11554-024-01473-1","url":null,"abstract":"The current apple detection algorithms fail to accurately differentiate obscured apples from pickable ones, thus leading to low accuracy in apple harvesting and a high rate of instances where apples are either mispicked or missed altogether. To address the issues associated with the existing algorithms, this study proposes an improved YOLOv5s-based method, named YOLOv5s-BC, for real-time apple detection, in which a series of modifications have been introduced. First, a coordinate attention block has been incorporated into the backbone module to construct a new backbone network. Second, the original concatenation operation has been replaced with a bi-directional feature pyramid network in the neck network. Finally, a new detection head has been added to the head module, enabling the detection of smaller and more distant targets within the field of view of the robot. The proposed YOLOv5s-BC model was compared to several target detection algorithms, including YOLOv5s, YOLOv4, YOLOv3, SSD, Faster R-CNN (ResNet50), and Faster R-CNN (VGG), with significant improvements of 4.6%, 3.6%, 20.48%, 23.22%, 15.27%, and 15.59% in mAP, respectively. The detection accuracy of the proposed model is also greatly enhanced over the original YOLOv5s model. The model boasts an average detection speed of 0.018 s per image, and the weight size is only 16.7 Mb with 4.7 Mb smaller than that of YOLOv8s, meeting the real-time requirements for the picking robot. Furthermore, according to the heat map, our proposed model can focus more on and learn the high-level features of the target apples, and recognize the smaller target apples better than the original YOLOv5s model. Then, in other apple orchard tests, the model can detect the pickable apples in real time and correctly, illustrating a decent generalization ability. It is noted that our model can provide technical support for the apple harvesting robot in terms of real-time target detection and harvesting sequence planning.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"128 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140929976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

High-speed hardware accelerator based on brightness improved by Light-DehazeNet 通过 Light-DehazeNet 改进基于亮度的高速硬件加速器

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Real-Time Image Processing

Pub Date : 2024-05-09 DOI: 10.1007/s11554-024-01464-2

Peiyi Teng, Gaoming Du, Zhenmin Li, Xiaolei Wang, Yongsheng Yin

Due to the increasing demand for artificial intelligence technology in today’s society, the entire industrial production system is undergoing a transformative process related to automation, reliability, and robustness, seeking higher productivity and product competitiveness. Additionally, many hardware platforms are unable to deploy complex algorithms due to limited resources. To address these challenges, this paper proposes a computationally efficient lightweight convolutional neural network called Brightness Improved by Light-DehazeNet, which removes the impact of fog and haze to reconstruct clear images. Additionally, we introduce an efficient hardware accelerator architecture based on this network for deployment on low-resource platforms. Furthermore, we present a brightness visibility restoration method to prevent brightness loss in dehazed images. To evaluate the performance of our method, extensive experiments were conducted, comparing it with various traditional and deep learning-based methods, including images with artificial synthesis and natural blur. The experimental results demonstrate that our proposed method excels in dehazing ability, outperforming other methods in comprehensive comparisons. Moreover, it achieves rapid processing speeds, with a maximum frame rate of 105 frames per second, meeting the requirements of real-time processing.

由于当今社会对人工智能技术的需求日益增长，整个工业生产系统正在经历一个与自动化、可靠性和稳健性有关的转型过程，以寻求更高的生产率和产品竞争力。此外，由于资源有限，许多硬件平台无法部署复杂的算法。为了应对这些挑战，本文提出了一种计算效率高的轻量级卷积神经网络--Brightness Improved by Light-DehazeNet，它可以消除雾和霾的影响，重建清晰的图像。此外，我们还介绍了基于该网络的高效硬件加速器架构，以便在低资源平台上部署。此外，我们还提出了一种亮度可见性恢复方法，以防止去雾图像中的亮度损失。为了评估我们方法的性能，我们进行了大量实验，将其与各种传统方法和基于深度学习的方法进行了比较，包括人工合成和自然模糊的图像。实验结果表明，我们提出的方法具有出色的去毛刺能力，在综合比较中优于其他方法。此外，它的处理速度也很快，最高帧率可达每秒 105 帧，满足了实时处理的要求。

{"title":"High-speed hardware accelerator based on brightness improved by Light-DehazeNet","authors":"Peiyi Teng, Gaoming Du, Zhenmin Li, Xiaolei Wang, Yongsheng Yin","doi":"10.1007/s11554-024-01464-2","DOIUrl":"https://doi.org/10.1007/s11554-024-01464-2","url":null,"abstract":"Due to the increasing demand for artificial intelligence technology in today’s society, the entire industrial production system is undergoing a transformative process related to automation, reliability, and robustness, seeking higher productivity and product competitiveness. Additionally, many hardware platforms are unable to deploy complex algorithms due to limited resources. To address these challenges, this paper proposes a computationally efficient lightweight convolutional neural network called Brightness Improved by Light-DehazeNet, which removes the impact of fog and haze to reconstruct clear images. Additionally, we introduce an efficient hardware accelerator architecture based on this network for deployment on low-resource platforms. Furthermore, we present a brightness visibility restoration method to prevent brightness loss in dehazed images. To evaluate the performance of our method, extensive experiments were conducted, comparing it with various traditional and deep learning-based methods, including images with artificial synthesis and natural blur. The experimental results demonstrate that our proposed method excels in dehazing ability, outperforming other methods in comprehensive comparisons. Moreover, it achieves rapid processing speeds, with a maximum frame rate of 105 frames per second, meeting the requirements of real-time processing.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"43 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140930072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DRI-Net: a model for insulator defect detection on transmission lines in rainy backgrounds DRI-Net：雨天背景下输电线路绝缘体缺陷检测模型

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Real-Time Image Processing

Pub Date : 2024-05-09 DOI: 10.1007/s11554-024-01461-5

Chao Ji, Mingjiang Gao, Siyuan Zhou, Junpeng Liu, Yongcan Zhu, Xinbo Huang

Transmission line insulators often operate in challenging weather conditions, particularly on rainy days. Continuous exposure to humidity and rain accelerates the aging process of insulators, leading to a decline in insulating material performance, the occurrence of cracks, and deformation. This situation poses a significant risk to the operation of the power system. Scene images collected on rainy days are frequently obstructed by rain lines, resulting in blurred backgrounds that significantly impact the performance of detection models. To improve the accuracy of insulator defect detection in rainy day environments, this paper proposes the DRI-Net (Derain-Insulator-net) detection model. Firstly, a dataset of insulator defects in rainy weather environments is constructed. Second, designing the de-raining model DRGAN and integrating it as an end-to-end DRGAN de-raining structural layer into the input end of the DRI-Net detection model, we significantly enhance the clarity and quality of images affected by rain, thereby reducing adverse effects such as image blurring and occlusion caused by rainwater. Finally, to enhance the lightweight performance of the model, partial convolution (PConv) and the lightweight upsampling operator CARAFE are utilized in the detection network to reduce the computational complexity of the model. The Wise-IoU bounding box regression loss function is applied to achieve faster convergence and improved detector accuracy. Experimental results demonstrate the effectiveness of the DRI-Net model in the task of rainy-day insulator defect detection, achieving an average precision MAP value of 82.65% in the established dataset. Additionally, an online detection system for rainy day insulator defects is designed in conjunction with the detection model, demonstrating practical engineering applications value.

输电线路绝缘子经常在恶劣的天气条件下工作，尤其是在雨天。持续暴露在潮湿和雨水中会加速绝缘子的老化过程，导致绝缘材料性能下降、出现裂纹和变形。这种情况给电力系统的运行带来了极大的风险。雨天采集的现场图像经常被雨线遮挡，导致背景模糊，严重影响检测模型的性能。为了提高雨天环境下绝缘子缺陷检测的准确性，本文提出了 DRI-Net （Derain-Innsulator-net）检测模型。首先，构建雨天环境下的绝缘子缺陷数据集。其次，设计去raining模型DRGAN，并将其作为端到端DRGAN去raining结构层集成到DRI-Net检测模型的输入端，显著提高受雨水影响图像的清晰度和质量，从而减少雨水造成的图像模糊和遮挡等不良影响。最后，为了提高模型的轻量级性能，我们在检测网络中使用了部分卷积（PConv）和轻量级上采样算子 CARAFE，以降低模型的计算复杂度。此外，还采用了 Wise-IoU 边框回归损失函数，以加快收敛速度并提高检测精度。实验结果证明了 DRI-Net 模型在雨天绝缘体缺陷检测任务中的有效性，在已建立的数据集中，平均精度 MAP 值达到了 82.65%。此外，结合该检测模型还设计了雨天绝缘体缺陷在线检测系统，展示了实际工程应用价值。

{"title":"DRI-Net: a model for insulator defect detection on transmission lines in rainy backgrounds","authors":"Chao Ji, Mingjiang Gao, Siyuan Zhou, Junpeng Liu, Yongcan Zhu, Xinbo Huang","doi":"10.1007/s11554-024-01461-5","DOIUrl":"https://doi.org/10.1007/s11554-024-01461-5","url":null,"abstract":"Transmission line insulators often operate in challenging weather conditions, particularly on rainy days. Continuous exposure to humidity and rain accelerates the aging process of insulators, leading to a decline in insulating material performance, the occurrence of cracks, and deformation. This situation poses a significant risk to the operation of the power system. Scene images collected on rainy days are frequently obstructed by rain lines, resulting in blurred backgrounds that significantly impact the performance of detection models. To improve the accuracy of insulator defect detection in rainy day environments, this paper proposes the DRI-Net (Derain-Insulator-net) detection model. Firstly, a dataset of insulator defects in rainy weather environments is constructed. Second, designing the de-raining model DRGAN and integrating it as an end-to-end DRGAN de-raining structural layer into the input end of the DRI-Net detection model, we significantly enhance the clarity and quality of images affected by rain, thereby reducing adverse effects such as image blurring and occlusion caused by rainwater. Finally, to enhance the lightweight performance of the model, partial convolution (PConv) and the lightweight upsampling operator CARAFE are utilized in the detection network to reduce the computational complexity of the model. The Wise-IoU bounding box regression loss function is applied to achieve faster convergence and improved detector accuracy. Experimental results demonstrate the effectiveness of the DRI-Net model in the task of rainy-day insulator defect detection, achieving an average precision MAP value of 82.65% in the established dataset. Additionally, an online detection system for rainy day insulator defects is designed in conjunction with the detection model, demonstrating practical engineering applications value.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"14 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140929979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Yolo-global: a real-time target detector for mineral particles Yolo-global：矿物颗粒实时目标探测器

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Real-Time Image Processing

Pub Date : 2024-05-08 DOI: 10.1007/s11554-024-01468-y

Zihao Wang, Dong Zhou, Chengjun Guo, Ruihao Zhou

Recently, deep learning methodologies have achieved significant advancements in mineral automatic sorting and anomaly detection. However, the limited features of minerals transported in the form of small particles pose significant challenges to accurate detection. To address this challenge, we propose a enhanced mineral particle detection algorithm based on the YOLOv8s model. Initially, a C2f-SRU block is introduced to enable the feature extraction network to more effectively process spatial redundant information. Additionally, we designed the GFF module with the aim of enhancing information propagation between non-adjacent scale features, thereby enabling deep networks to more fully leverage spatial positional information from shallower networks. Finally, we adopted the Wise-IoU loss function to optimize the detection performance of the model. We also re-designed the position of the prediction heads to achieve precise detection of small-scale targets. The experimental results substantiate the effectiveness of the algorithm, with YOLO-Global achieving a mAP@.5 of 95.8%. In comparison to the original YOLOv8s, the improved model exhibits a 2.5% increase in mAP, achieving a model inference speed of 81 fps, meeting the requirements for real-time processing and accuracy.

最近，深度学习方法在矿物自动分拣和异常检测方面取得了重大进展。然而，以小颗粒形式运输的矿物特征有限，这给精确检测带来了巨大挑战。为了应对这一挑战，我们提出了一种基于 YOLOv8s 模型的增强型矿物颗粒检测算法。首先，我们引入了 C2f-SRU 模块，使特征提取网络能够更有效地处理空间冗余信息。此外，我们还设计了 GFF 模块，旨在加强非相邻尺度特征之间的信息传播，从而使深度网络能够更充分地利用来自较浅网络的空间位置信息。最后，我们采用了 Wise-IoU 损失函数来优化模型的检测性能。我们还重新设计了预测头的位置，以实现对小尺度目标的精确检测。实验结果证明了该算法的有效性，YOLO-Global 的 mAP@.5 高达 95.8%。与最初的 YOLOv8s 相比，改进后的模型 mAP 提高了 2.5%，模型推理速度达到 81 fps，满足了实时处理和精确度的要求。

{"title":"Yolo-global: a real-time target detector for mineral particles","authors":"Zihao Wang, Dong Zhou, Chengjun Guo, Ruihao Zhou","doi":"10.1007/s11554-024-01468-y","DOIUrl":"https://doi.org/10.1007/s11554-024-01468-y","url":null,"abstract":"Recently, deep learning methodologies have achieved significant advancements in mineral automatic sorting and anomaly detection. However, the limited features of minerals transported in the form of small particles pose significant challenges to accurate detection. To address this challenge, we propose a enhanced mineral particle detection algorithm based on the YOLOv8s model. Initially, a C2f-SRU block is introduced to enable the feature extraction network to more effectively process spatial redundant information. Additionally, we designed the GFF module with the aim of enhancing information propagation between non-adjacent scale features, thereby enabling deep networks to more fully leverage spatial positional information from shallower networks. Finally, we adopted the Wise-IoU loss function to optimize the detection performance of the model. We also re-designed the position of the prediction heads to achieve precise detection of small-scale targets. The experimental results substantiate the effectiveness of the algorithm, with YOLO-Global achieving a mAP@.5 of 95.8%. In comparison to the original YOLOv8s, the improved model exhibits a 2.5% increase in mAP, achieving a model inference speed of 81 fps, meeting the requirements for real-time processing and accuracy.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"59 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140930022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Real-time lossless image compression by dynamic Huffman coding hardware implementation 通过动态哈夫曼编码硬件实现实时无损图像压缩

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Real-Time Image Processing

Pub Date : 2024-05-07 DOI: 10.1007/s11554-024-01467-z

Duc Khai Lam

Over the decades, implementing information technology (IT) has become increasingly common, equating to an increasing amount of data that needs to be stored, creating a massive challenge in data storage. Using a large storage capacity can solve the problem of the file size. However, this method is costly in terms of both capacity and bandwidth. One possible method is data compression, which significantly reduces the file size. With the development of IT and increasing computing capacity, data compression is becoming more and more widespread in many fields, such as broadcast television, aircraft, computer transmission, and medical imaging. In this work, we introduce an image compression algorithm based on the Huffman coding algorithm and use linear techniques to increase image compression efficiency. Besides, we replace 8-bit pixel-by-pixel compression by dividing one pixel into two 4-bit halves to save hardware capacity (because only 4-bit for each input) and optimize run time (because the number of different inputs is less). The goal is to reduce the image’s complexity, increase the data’s repetition rate, reduce the compression time, and increase the image compression efficiency. A hardware accelerator is designed and implemented on the Virtex-7 VC707 FPGA to make it work in real-time. The achieved average compression ratio is 3,467. Hardware design achieves a maximum frequency of 125 MHz.

几十年来，信息技术（IT）的应用越来越普遍，需要存储的数据量也越来越大，这给数据存储带来了巨大的挑战。使用大容量存储可以解决文件大小的问题。然而，这种方法在容量和带宽方面都很昂贵。数据压缩是一种可行的方法，它可以大大减小文件大小。随着信息技术的发展和计算能力的不断提高，数据压缩在广播电视、飞机、计算机传输和医学成像等许多领域的应用越来越广泛。在这项工作中，我们介绍了一种基于哈夫曼编码算法的图像压缩算法，并使用线性技术来提高图像压缩效率。此外，我们将一个像素分成两个 4 位的半像素来取代 8 位的逐像素压缩，以节省硬件容量（因为每个输入只有 4 位）和优化运行时间（因为不同输入的数量较少）。目标是降低图像的复杂性，提高数据的重复率，缩短压缩时间，提高图像压缩效率。为了使其实时工作，设计了一个硬件加速器，并在 Virtex-7 VC707 FPGA 上实现。实现的平均压缩率为 3467。硬件设计实现了 125 MHz 的最高频率。

{"title":"Real-time lossless image compression by dynamic Huffman coding hardware implementation","authors":"Duc Khai Lam","doi":"10.1007/s11554-024-01467-z","DOIUrl":"https://doi.org/10.1007/s11554-024-01467-z","url":null,"abstract":"Over the decades, implementing information technology (IT) has become increasingly common, equating to an increasing amount of data that needs to be stored, creating a massive challenge in data storage. Using a large storage capacity can solve the problem of the file size. However, this method is costly in terms of both capacity and bandwidth. One possible method is data compression, which significantly reduces the file size. With the development of IT and increasing computing capacity, data compression is becoming more and more widespread in many fields, such as broadcast television, aircraft, computer transmission, and medical imaging. In this work, we introduce an image compression algorithm based on the Huffman coding algorithm and use linear techniques to increase image compression efficiency. Besides, we replace 8-bit pixel-by-pixel compression by dividing one pixel into two 4-bit halves to save hardware capacity (because only 4-bit for each input) and optimize run time (because the number of different inputs is less). The goal is to reduce the image’s complexity, increase the data’s repetition rate, reduce the compression time, and increase the image compression efficiency. A hardware accelerator is designed and implemented on the Virtex-7 VC707 FPGA to make it work in real-time. The achieved average compression ratio is 3,467. Hardware design achieves a maximum frequency of 125 MHz.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"20 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140882428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Software and hardware realizations for different designs of chaos-based secret image sharing systems 基于混沌的秘密图像共享系统不同设计的软件和硬件实现

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Real-Time Image Processing

Pub Date : 2024-05-06 DOI: 10.1007/s11554-024-01450-8

Bishoy K. Sharobim, Muhammad Hosam, Salwa K. Abd-El-Hafiz, Wafaa S. Sayed, Lobna A. Said, Ahmed G. Radwan

Secret image sharing (SIS) conveys a secret image to mutually suspicious receivers by sending meaningless shares to the participants, and all shares must be present to recover the secret. This paper proposes and compares three systems for secret sharing, where a visual cryptography system is designed with a fast recovery scheme as the backbone for all systems. Then, an SIS system is introduced for sharing any type of image, where it improves security using the Lorenz chaotic system as the source of randomness and the generalized Arnold transform as a permutation module. The second SIS system further enhances security and robustness by utilizing SHA-256 and RSA cryptosystem. The presented architectures are implemented on a field programmable gate array (FPGA) to enhance computational efficiency and facilitate real-time processing. Detailed experimental results and comparisons between the software and hardware realizations are presented. Security analysis and comparisons with related literature are also introduced with good results, including statistical tests, differential attack measures, robustness tests against noise and crop attacks, key sensitivity tests, and performance analysis.

秘密图像共享（SIS）是通过向参与者发送无意义的共享图像，将秘密图像传递给相互怀疑的接收者，而且所有共享图像都必须存在才能恢复秘密。本文提出并比较了三种秘密共享系统，其中设计了一种视觉密码学系统，并将快速恢复方案作为所有系统的支柱。然后，介绍了一种用于共享任何类型图像的 SIS 系统，该系统使用洛伦兹混沌系统作为随机性源，并使用广义阿诺德变换作为置换模块，从而提高了安全性。第二个 SIS 系统利用 SHA-256 和 RSA 密码系统进一步增强了安全性和稳健性。所介绍的架构是在现场可编程门阵列（FPGA）上实现的，以提高计算效率并促进实时处理。本文介绍了详细的实验结果以及软件和硬件实现之间的比较。此外，还介绍了安全分析以及与相关文献的比较，包括统计测试、差分攻击测量、抗噪声和裁剪攻击的鲁棒性测试、密钥灵敏度测试和性能分析，并取得了良好的效果。

{"title":"Software and hardware realizations for different designs of chaos-based secret image sharing systems","authors":"Bishoy K. Sharobim, Muhammad Hosam, Salwa K. Abd-El-Hafiz, Wafaa S. Sayed, Lobna A. Said, Ahmed G. Radwan","doi":"10.1007/s11554-024-01450-8","DOIUrl":"https://doi.org/10.1007/s11554-024-01450-8","url":null,"abstract":"Secret image sharing (SIS) conveys a secret image to mutually suspicious receivers by sending meaningless shares to the participants, and all shares must be present to recover the secret. This paper proposes and compares three systems for secret sharing, where a visual cryptography system is designed with a fast recovery scheme as the backbone for all systems. Then, an SIS system is introduced for sharing any type of image, where it improves security using the Lorenz chaotic system as the source of randomness and the generalized Arnold transform as a permutation module. The second SIS system further enhances security and robustness by utilizing SHA-256 and RSA cryptosystem. The presented architectures are implemented on a field programmable gate array (FPGA) to enhance computational efficiency and facilitate real-time processing. Detailed experimental results and comparisons between the software and hardware realizations are presented. Security analysis and comparisons with related literature are also introduced with good results, including statistical tests, differential attack measures, robustness tests against noise and crop attacks, key sensitivity tests, and performance analysis.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"2012 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140882445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

T-psd: T-shape parking slot detection with self-calibrated convolution network T-psd：利用自校准卷积网络进行 T 型停车槽检测

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Real-Time Image Processing

Pub Date : 2024-05-04 DOI: 10.1007/s11554-024-01460-6

Ruitao Zheng, Haifei Zhu, Xinghua Wu, Wei Meng

This paper deals with a challenging autonomous parking problem in which the parking slots are with various different angles. We transform the problem of parking slot detection into center keypoint detection, representing the parking slot as a T-shape to make it robust and simple. For diverse types of parking slots, we propose a T-shape parking slot detection method, called T-PSD, to extract the T-shape center information based on a self-calibrated convolution network (SCCN). This method can concurrently obtain the entrance center confidence, the relative offsets of the paired junctions, the direction of the middle line, the occupancy and the inferred type in the parking slots. Final detection results are produced by utilizing Half-Heatmap, MultiBins and Midline-Grid to more accurately extract the center keypoint, direction and occupancy, respectively. To verify the performance of our method, we conduct experiments on the public PS2.0 dataset. The results have shown that our method outperforms state-of-the-art competitors by showing recall rate of 99.86% and precision rate of 99.82%. It is capable of achieving 65 frames per second (FPS) and satisfying a real-time detection performance. In contrast to the simultaneous detection of global and local information, our SCCN detector exclusively concentrates on the T-shape center information, which achieves comparable performance and significantly accelerates the inference time without non-maximum suppression (NMS).

本文讨论的是一个具有挑战性的自主停车问题，其中停车位的角度各不相同。我们将停车位检测问题转化为中心关键点检测问题，将停车位表示为一个 T 形，使其更加稳健和简单。针对不同类型的停车位，我们提出了一种名为 T-PSD 的 T 形停车位检测方法，基于自校准卷积网络（SCCN）提取 T 形中心信息。该方法可同时获得入口中心置信度、成对路口的相对偏移、中线方向、占用率以及推断停车槽的类型。通过利用半热图、多宾和中线网格，分别更准确地提取中心关键点、方向和占用率，从而得出最终的检测结果。为了验证我们方法的性能，我们在公共 PS2.0 数据集上进行了实验。结果表明，我们的方法优于最先进的竞争对手，召回率达到 99.86%，精确率达到 99.82%。它能够达到每秒 65 帧（FPS），满足实时检测性能要求。与同时检测全局和局部信息的方法相比，我们的 SCCN 检测器只集中检测 T 形中心信息，不仅性能相当，而且在没有非最大抑制（NMS）的情况下大大加快了推理时间。

{"title":"T-psd: T-shape parking slot detection with self-calibrated convolution network","authors":"Ruitao Zheng, Haifei Zhu, Xinghua Wu, Wei Meng","doi":"10.1007/s11554-024-01460-6","DOIUrl":"https://doi.org/10.1007/s11554-024-01460-6","url":null,"abstract":"This paper deals with a challenging autonomous parking problem in which the parking slots are with various different angles. We transform the problem of parking slot detection into center keypoint detection, representing the parking slot as a T-shape to make it robust and simple. For diverse types of parking slots, we propose a T-shape parking slot detection method, called T-PSD, to extract the T-shape center information based on a self-calibrated convolution network (SCCN). This method can concurrently obtain the entrance center confidence, the relative offsets of the paired junctions, the direction of the middle line, the occupancy and the inferred type in the parking slots. Final detection results are produced by utilizing Half-Heatmap, MultiBins and Midline-Grid to more accurately extract the center keypoint, direction and occupancy, respectively. To verify the performance of our method, we conduct experiments on the public PS2.0 dataset. The results have shown that our method outperforms state-of-the-art competitors by showing recall rate of 99.86% and precision rate of 99.82%. It is capable of achieving 65 frames per second (FPS) and satisfying a real-time detection performance. In contrast to the simultaneous detection of global and local information, our SCCN detector exclusively concentrates on the T-shape center information, which achieves comparable performance and significantly accelerates the inference time without non-maximum suppression (NMS).","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"12 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140882380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0