Journal of Real-Time Image Processing最新文献

High-precision real-time autonomous driving target detection based on YOLOv8 基于 YOLOv8 的高精度实时自动驾驶目标检测

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Real-Time Image Processing

Pub Date : 2024-09-19 DOI: 10.1007/s11554-024-01553-2

Huixin Liu, Guohua Lu, Mingxi Li, Weihua Su, Ziyi Liu, Xu Dang, Dongyuan Zang

In traffic scenarios, the size of targets varies significantly, and there is a limitation on computing power. This poses a significant challenge for algorithms to detect traffic targets accurately. This paper proposes a new traffic target detection method that balances accuracy and real-time performance—Deep and Filtered You Only Look Once (DF-YOLO). In response to the challenges posed by significant differences in target scales within complex scenes, we designed the Deep and Filtered Path Aggregation Network (DF-PAN). This module effectively fuses multi-scale features, enhancing the model's capability to detect multi-scale targets accurately. In response to the challenge posed by limited computational resources, we design a parameter-sharing detection head (PSD) and use Faster Neural Network (FasterNet) as the backbone network. PSD reduces computational load by parameter sharing and allows for feature extraction capability sharing across different positions. FasterNet enhances memory access efficiency, thereby maximizing computational resource utilization. The experimental results on the KITTI dataset show that our method achieves satisfactory balances between real-time and precision and reaches 90.9% mean average precision(mAP) with 77 frames/s, and the number of parameters is reduced by 28.1% and the detection accuracy is increased by 3% compared to the baseline model. We test it on the challenging BDD100K dataset and the SODA10M dataset, and the results show that DF-YOLO has excellent generalization ability.

在交通场景中，目标的大小差异很大，而且计算能力有限。这给算法准确检测交通目标带来了巨大挑战。本文提出了一种兼顾准确性和实时性的新型交通目标检测方法--深度过滤只看一次（DF-YOLO）。为了应对复杂场景中目标尺度的显著差异所带来的挑战，我们设计了深度和过滤路径聚合网络（DF-PAN）。该模块可有效融合多尺度特征，增强模型准确检测多尺度目标的能力。为应对有限计算资源带来的挑战，我们设计了参数共享检测头（PSD），并使用快速神经网络（FasterNet）作为骨干网络。PSD 通过参数共享降低了计算负荷，并允许不同位置共享特征提取能力。FasterNet 提高了内存访问效率，从而最大限度地提高了计算资源利用率。在 KITTI 数据集上的实验结果表明，我们的方法在实时性和精确度之间取得了令人满意的平衡，以 77 帧/秒的速度达到了 90.9% 的平均精确度（mAP），与基线模型相比，参数数量减少了 28.1%，检测精确度提高了 3%。我们在具有挑战性的 BDD100K 数据集和 SODA10M 数据集上对其进行了测试，结果表明 DF-YOLO 具有出色的泛化能力。

{"title":"High-precision real-time autonomous driving target detection based on YOLOv8","authors":"Huixin Liu, Guohua Lu, Mingxi Li, Weihua Su, Ziyi Liu, Xu Dang, Dongyuan Zang","doi":"10.1007/s11554-024-01553-2","DOIUrl":"https://doi.org/10.1007/s11554-024-01553-2","url":null,"abstract":"In traffic scenarios, the size of targets varies significantly, and there is a limitation on computing power. This poses a significant challenge for algorithms to detect traffic targets accurately. This paper proposes a new traffic target detection method that balances accuracy and real-time performance—Deep and Filtered You Only Look Once (DF-YOLO). In response to the challenges posed by significant differences in target scales within complex scenes, we designed the Deep and Filtered Path Aggregation Network (DF-PAN). This module effectively fuses multi-scale features, enhancing the model's capability to detect multi-scale targets accurately. In response to the challenge posed by limited computational resources, we design a parameter-sharing detection head (PSD) and use Faster Neural Network (FasterNet) as the backbone network. PSD reduces computational load by parameter sharing and allows for feature extraction capability sharing across different positions. FasterNet enhances memory access efficiency, thereby maximizing computational resource utilization. The experimental results on the KITTI dataset show that our method achieves satisfactory balances between real-time and precision and reaches 90.9% mean average precision(mAP) with 77 frames/s, and the number of parameters is reduced by 28.1% and the detection accuracy is increased by 3% compared to the baseline model. We test it on the challenging BDD100K dataset and the SODA10M dataset, and the results show that DF-YOLO has excellent generalization ability.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"15 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142253665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

GMS-YOLO: an enhanced algorithm for water meter reading recognition in complex environments GMS-YOLO：复杂环境下水表读数识别的增强算法

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Real-Time Image Processing

Pub Date : 2024-09-13 DOI: 10.1007/s11554-024-01551-4

Yu Wang, Xiaodong Xiang

The disordered arrangement of water-meter pipes and the random rotation angles of their mechanical character wheels frequently result in captured water-meter images exhibiting tilt, blur, and incomplete characters. These issues complicate the detection of water-meter images, rendering traditional OCR (optical character recognition) methods inadequate for current detection requirements. Furthermore, the two-stage detection method, which involves first locating and then recognizing, proves overly cumbersome. In this paper, water-meter reading recognition is approached as an object-detection task, extracting readings using the algorithm’s Predicted Box information, establishing a water-meter dataset, and refining the algorithmic framework to improve the accuracy of recognizing incomplete characters. Utilizing YOLOv8n as the baseline, we propose GMS-YOLO, a novel object-detection algorithm that employs Grouped Multi-Scale Convolution for enhanced performance. First, by substituting the Bottleneck module’s convolution with GMSC (Grouped Multi-Scale Convolution), the model can access various scale receptive fields, thus boosting its feature-extraction prowess. Second, incorporating LSKA (Large Kernel Separable Attention) into the SPPF (Spatial Pyramid Pooling Fast) module improves the perception of fine-grained features. Finally, replacing CIoU (Generalized Intersection over Union) with the ShapeIoU bounding box loss function enhances the model’s ability to localize objects and speeds up its convergence. Evaluating a self-compiled water-meter image dataset, GMS-YOLO attained a mAP@0.5 of 92.4% and a precision of 93.2%, marking a 2.0% and 2.1% enhancement over YOLOv8n, respectively. Despite the increased computational burden, GMS-YOLO maintains an average detection time of 10 ms per image, meeting practical detection needs.

水表管道的无序排列及其机械字符轮的随机旋转角度经常导致捕捉到的水表图像显示出倾斜、模糊和不完整的字符。这些问题使得水表图像的检测变得复杂，传统的 OCR（光学字符识别）方法无法满足当前的检测要求。此外，先定位再识别的两阶段检测方法也过于繁琐。本文将水表读数识别作为一项对象检测任务，利用算法的预测框信息提取读数，建立水表数据集，并改进算法框架，以提高识别不完整字符的准确性。以 YOLOv8n 为基线，我们提出了 GMS-YOLO，一种采用分组多尺度卷积以提高性能的新型物体检测算法。首先，通过用 GMSC（分组多尺度卷积）替代瓶颈模块的卷积，该模型可以访问各种尺度的感受野，从而提高其特征提取能力。其次，将 LSKA（大核可分离注意力）纳入 SPPF（空间金字塔池化快速）模块，提高了对细粒度特征的感知能力。最后，用 ShapeIoU 边框损失函数取代 CIoU（广义联合相交），增强了模型定位物体的能力，并加快了收敛速度。在评估自编的水表图像数据集时，GMS-YOLO 的 mAP@0.5 和精度分别达到了 92.4% 和 93.2%，比 YOLOv8n 分别提高了 2.0% 和 2.1%。尽管增加了计算负担，GMS-YOLO 每幅图像的平均检测时间仍保持在 10 毫秒，满足了实际检测需要。

{"title":"GMS-YOLO: an enhanced algorithm for water meter reading recognition in complex environments","authors":"Yu Wang, Xiaodong Xiang","doi":"10.1007/s11554-024-01551-4","DOIUrl":"https://doi.org/10.1007/s11554-024-01551-4","url":null,"abstract":"The disordered arrangement of water-meter pipes and the random rotation angles of their mechanical character wheels frequently result in captured water-meter images exhibiting tilt, blur, and incomplete characters. These issues complicate the detection of water-meter images, rendering traditional OCR (optical character recognition) methods inadequate for current detection requirements. Furthermore, the two-stage detection method, which involves first locating and then recognizing, proves overly cumbersome. In this paper, water-meter reading recognition is approached as an object-detection task, extracting readings using the algorithm’s Predicted Box information, establishing a water-meter dataset, and refining the algorithmic framework to improve the accuracy of recognizing incomplete characters. Utilizing YOLOv8n as the baseline, we propose GMS-YOLO, a novel object-detection algorithm that employs Grouped Multi-Scale Convolution for enhanced performance. First, by substituting the Bottleneck module’s convolution with GMSC (Grouped Multi-Scale Convolution), the model can access various scale receptive fields, thus boosting its feature-extraction prowess. Second, incorporating LSKA (Large Kernel Separable Attention) into the SPPF (Spatial Pyramid Pooling Fast) module improves the perception of fine-grained features. Finally, replacing CIoU (Generalized Intersection over Union) with the ShapeIoU bounding box loss function enhances the model’s ability to localize objects and speeds up its convergence. Evaluating a self-compiled water-meter image dataset, GMS-YOLO attained a mAP@0.5 of 92.4% and a precision of 93.2%, marking a 2.0% and 2.1% enhancement over YOLOv8n, respectively. Despite the increased computational burden, GMS-YOLO maintains an average detection time of 10 ms per image, meeting practical detection needs.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"9 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Fast rough mode decision algorithm and hardware architecture design for AV1 encoder AV1 编码器的快速粗略模式决策算法和硬件架构设计

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Real-Time Image Processing

Pub Date : 2024-09-12 DOI: 10.1007/s11554-024-01552-3

Heng Chen, Xiaofeng Huang, Zehao Tao, Qinghua Sheng, Yan Cui, Yang Zhou, Haibing Yin

To enhance compression efficiency, the AV1 video coding standard has introduced several new intra-prediction modes, such as smooth and finer directional prediction modes. However, this addition increases computational complexity and hinders parallelized hardware implementation. In this paper, a hardware-friendly rough mode decision (RMD) algorithm and its fully pipelined hardware architecture design are proposed to address these challenges. For algorithm optimization, firstly, a novel directional mode pruning algorithm is proposed. Then, the sum of absolute transform differences (SATD) cost accumulated approximation method is adopted during the tree search. Finally, in the reconstruction stage, a reconstruction approximation model based on the DC transform is proposed to solve the low-parallelism problem. For hardware architecture design, the proposed fully pipelined hardware architecture is implemented with 28 pipeline stages. This design can process multiple prediction modes in parallel. Experimental results show that the proposed fast algorithm achieves 46.8% time savings by 1.96% Bjøntegaard delta rate (BD-Rate) increase on average under all-intra (AI) configuration. When synthesized under the 28nm UMC technology, the proposed hardware can operate at a frequency of 316.2 MHz with 1113.14 K gate count.

为了提高压缩效率，AV1 视频编码标准引入了几种新的内部预测模式，如平滑和更精细的定向预测模式。然而，这种新增模式增加了计算复杂度，阻碍了硬件的并行化实施。本文提出了一种硬件友好型粗略模式决策（RMD）算法及其全流水线硬件架构设计，以应对这些挑战。在算法优化方面，首先提出了一种新颖的定向模式剪枝算法。然后，在树搜索过程中采用了绝对变换差值之和（SATD）成本累积近似法。最后，在重构阶段，提出了一种基于直流变换的重构近似模型，以解决低并行性问题。在硬件架构设计方面，提出的全流水线硬件架构有 28 个流水线级。这种设计可以并行处理多种预测模式。实验结果表明，在全内核（AI）配置下，所提出的快速算法平均提高了 1.96% 的比恩特加德三角率（BD-Rate），节省了 46.8% 的时间。在 28nm UMC 技术下合成时，所提出的硬件可以在 316.2 MHz 频率下运行，门数为 1113.14 K。

{"title":"Fast rough mode decision algorithm and hardware architecture design for AV1 encoder","authors":"Heng Chen, Xiaofeng Huang, Zehao Tao, Qinghua Sheng, Yan Cui, Yang Zhou, Haibing Yin","doi":"10.1007/s11554-024-01552-3","DOIUrl":"https://doi.org/10.1007/s11554-024-01552-3","url":null,"abstract":"To enhance compression efficiency, the AV1 video coding standard has introduced several new intra-prediction modes, such as smooth and finer directional prediction modes. However, this addition increases computational complexity and hinders parallelized hardware implementation. In this paper, a hardware-friendly rough mode decision (RMD) algorithm and its fully pipelined hardware architecture design are proposed to address these challenges. For algorithm optimization, firstly, a novel directional mode pruning algorithm is proposed. Then, the sum of absolute transform differences (SATD) cost accumulated approximation method is adopted during the tree search. Finally, in the reconstruction stage, a reconstruction approximation model based on the DC transform is proposed to solve the low-parallelism problem. For hardware architecture design, the proposed fully pipelined hardware architecture is implemented with 28 pipeline stages. This design can process multiple prediction modes in parallel. Experimental results show that the proposed fast algorithm achieves 46.8% time savings by 1.96% Bjøntegaard delta rate (BD-Rate) increase on average under all-intra (AI) configuration. When synthesized under the 28nm UMC technology, the proposed hardware can operate at a frequency of 316.2 MHz with 1113.14 K gate count.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"62 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

AdaptoMixNet: detection of foreign objects on power transmission lines under severe weather conditions AdaptoMixNet：在恶劣天气条件下检测输电线路上的异物

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Real-Time Image Processing

Pub Date : 2024-09-12 DOI: 10.1007/s11554-024-01546-1

Xinghai Jia, Chao Ji, Fan Zhang, Junpeng Liu, Mingjiang Gao, Xinbo Huang

With the expansion of power transmission line scale, the surrounding environment is complex and susceptible to foreign objects, severely threatening its safe operation. The current algorithm lacks stability and real-time performance in small target detection and severe weather conditions. Therefore, this paper proposes a method for detecting foreign objects on power transmission lines under severe weather conditions based on AdaptoMixNet. First, an Adaptive Fusion Module (AFM) is introduced, which improves the model's accuracy and adaptability through multi-scale feature extraction, fine-grained information preservation, and enhancing context information. Second, an Adaptive Feature Pyramid Module (AEFPM) is proposed, which enhances the focus on local details while preserving global information, improving the stability and robustness of feature representation. Finally, the Neuron Expansion Recursion Adaptive Filter (CARAFE) is designed, which enhances feature extraction, adaptive filtering, and recursive mechanisms, improving detection accuracy, robustness, and computational efficiency. Experimental results show that the method of this paper exhibits excellent performance in the detection of foreign objects on power transmission lines under complex backgrounds and harsh weather conditions.

随着输电线路规模的扩大，周边环境复杂且易受异物影响，严重威胁着输电线路的安全运行。目前的算法在小目标检测和恶劣天气条件下缺乏稳定性和实时性。因此，本文提出了一种基于 AdaptoMixNet 的恶劣天气条件下输电线路异物检测方法。首先，引入了自适应融合模块（AFM），通过多尺度特征提取、细粒度信息保存和增强上下文信息来提高模型的准确性和适应性。其次，提出了自适应特征金字塔模块（AEFPM），在保留全局信息的同时加强了对局部细节的关注，提高了特征表示的稳定性和鲁棒性。最后，设计了神经元扩展递归自适应滤波器（CARAFE），增强了特征提取、自适应滤波和递归机制，提高了检测精度、鲁棒性和计算效率。实验结果表明，本文的方法在复杂背景和恶劣天气条件下的输电线路异物检测中表现出优异的性能。

{"title":"AdaptoMixNet: detection of foreign objects on power transmission lines under severe weather conditions","authors":"Xinghai Jia, Chao Ji, Fan Zhang, Junpeng Liu, Mingjiang Gao, Xinbo Huang","doi":"10.1007/s11554-024-01546-1","DOIUrl":"https://doi.org/10.1007/s11554-024-01546-1","url":null,"abstract":"With the expansion of power transmission line scale, the surrounding environment is complex and susceptible to foreign objects, severely threatening its safe operation. The current algorithm lacks stability and real-time performance in small target detection and severe weather conditions. Therefore, this paper proposes a method for detecting foreign objects on power transmission lines under severe weather conditions based on AdaptoMixNet. First, an Adaptive Fusion Module (AFM) is introduced, which improves the model's accuracy and adaptability through multi-scale feature extraction, fine-grained information preservation, and enhancing context information. Second, an Adaptive Feature Pyramid Module (AEFPM) is proposed, which enhances the focus on local details while preserving global information, improving the stability and robustness of feature representation. Finally, the Neuron Expansion Recursion Adaptive Filter (CARAFE) is designed, which enhances feature extraction, adaptive filtering, and recursive mechanisms, improving detection accuracy, robustness, and computational efficiency. Experimental results show that the method of this paper exhibits excellent performance in the detection of foreign objects on power transmission lines under complex backgrounds and harsh weather conditions.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"19 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Mfdd: Multi-scale attention fatigue and distracted driving detector based on facial features Mfdd：基于面部特征的多尺度注意力疲劳和分心驾驶检测器

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Real-Time Image Processing

Pub Date : 2024-09-11 DOI: 10.1007/s11554-024-01549-y

Yulin Shi, Jintao Cheng, Xingming Chen, Jiehao Luo, Xiaoyu Tang

With the rapid expansion of the automotive industry and the continuous growth of vehicle fleets, traffic safety has become a critical global social issue. Developing detection and alert systems for fatigue and distracted driving is essential for enhancing traffic safety. Factors, such as variations in the driver’s facial details, lighting conditions, and camera pixel quality, significantly affect the accuracy of fatigue and distracted driving detection, often resulting in the low effectiveness of existing methods. This study introduces a new network designed to detect fatigue and distracted driving amidst the complex backgrounds typical within vehicles. To extract driver and facial information as well as gradient details more efficiently, we introduce the Multihead Difference Kernel Convolution Module (MDKC) and Multiscale Large Convolutional Fusion Module (MLCF) in baseline. This incorporates a blend of Multihead Mixed Convolution and Large and Small Convolutional Kernels to amplify the spatial intricacies of the backbone. To extract gradient details from different illumination and noise feature maps, we enhance the network’s neck by introducing the Adaptive Convolutional Attention Module (ACAM) in NECK, optimizing feature retention. Extensive comparative experiments validate the efficacy of our network, showcasing superior performance not only on the Fatigue and Distracted Driving Dataset but also competitive results on the public COCO dataset. Source code is available at https://github.com/SCNU-RISLAB/MFDD.

随着汽车工业的迅速发展和汽车保有量的持续增长，交通安全已成为一个至关重要的全球性社会问题。开发疲劳驾驶和分心驾驶的检测和警报系统对于加强交通安全至关重要。驾驶员面部细节变化、照明条件和摄像头像素质量等因素严重影响疲劳驾驶和分心驾驶检测的准确性，往往导致现有方法的低效。本研究引入了一种新网络，旨在检测车辆内典型的复杂背景下的疲劳驾驶和分心驾驶。为了更有效地提取驾驶员和面部信息以及梯度细节，我们在基线中引入了多头差分核卷积模块（MDKC）和多尺度大卷积融合模块（MLCF）。它融合了多头混合卷积和大小卷积核，以放大骨干网的空间复杂性。为了从不同的光照和噪声特征图中提取梯度细节，我们在 NECK 中引入了自适应卷积注意模块（ACAM），优化了特征保留，从而增强了网络的颈部功能。广泛的对比实验验证了我们网络的功效，不仅在疲劳驾驶和分心驾驶数据集上展示了卓越的性能，而且在公共 COCO 数据集上也取得了具有竞争力的结果。源代码见 https://github.com/SCNU-RISLAB/MFDD。

{"title":"Mfdd: Multi-scale attention fatigue and distracted driving detector based on facial features","authors":"Yulin Shi, Jintao Cheng, Xingming Chen, Jiehao Luo, Xiaoyu Tang","doi":"10.1007/s11554-024-01549-y","DOIUrl":"https://doi.org/10.1007/s11554-024-01549-y","url":null,"abstract":"With the rapid expansion of the automotive industry and the continuous growth of vehicle fleets, traffic safety has become a critical global social issue. Developing detection and alert systems for fatigue and distracted driving is essential for enhancing traffic safety. Factors, such as variations in the driver’s facial details, lighting conditions, and camera pixel quality, significantly affect the accuracy of fatigue and distracted driving detection, often resulting in the low effectiveness of existing methods. This study introduces a new network designed to detect fatigue and distracted driving amidst the complex backgrounds typical within vehicles. To extract driver and facial information as well as gradient details more efficiently, we introduce the Multihead Difference Kernel Convolution Module (MDKC) and Multiscale Large Convolutional Fusion Module (MLCF) in baseline. This incorporates a blend of Multihead Mixed Convolution and Large and Small Convolutional Kernels to amplify the spatial intricacies of the backbone. To extract gradient details from different illumination and noise feature maps, we enhance the network’s neck by introducing the Adaptive Convolutional Attention Module (ACAM) in NECK, optimizing feature retention. Extensive comparative experiments validate the efficacy of our network, showcasing superior performance not only on the Fatigue and Distracted Driving Dataset but also competitive results on the public COCO dataset. Source code is available at https://github.com/SCNU-RISLAB/MFDD.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"59 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Integrating YOLOv8 and CSPBottleneck based CNN for enhanced license plate character recognition 整合 YOLOv8 和基于 CSPBottleneck 的 CNN，增强车牌字符识别能力

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Real-Time Image Processing

Pub Date : 2024-09-10 DOI: 10.1007/s11554-024-01537-2

Sahil Khokhar, Deepak Kedia

The paper introduces an integrated methodology for license plate character recognition, combining YOLOv8 for segmentation and a CSPBottleneck-based CNN classifier for character recognition. The proposed approach incorporates pre-processing techniques to enhance the recognition of partial plates and augmentation methods to address challenges arising from colour diversity. Performance analysis demonstrates YOLOv8’s high segmentation accuracy and fast processing time, complemented by precise character recognition and efficient processing by the CNN classifier. The integrated system achieves an overall accuracy of 99.02% with a total processing time of 9.9 ms, offering a robust solution for automated license plate recognition (ALPR) systems. The integrated approach presented in the paper holds promise for the practical implementation of ALPR technology and further development in the field of license plate recognition systems.

本文介绍了一种车牌字符识别综合方法，将用于分割的 YOLOv8 与用于字符识别的基于 CSPBottleneck 的 CNN 分类器相结合。所提出的方法结合了预处理技术来提高部分车牌的识别率，并结合了增强方法来应对颜色多样性带来的挑战。性能分析表明，YOLOv8 的分割准确率高，处理时间短，同时还具有精确的字符识别能力和 CNN 分类器的高效处理能力。集成系统的总体准确率达到 99.02%，总处理时间仅为 9.9 毫秒，为车牌自动识别 (ALPR) 系统提供了强大的解决方案。本文介绍的集成方法为 ALPR 技术的实际应用和车牌识别系统领域的进一步发展带来了希望。

引用次数: 0

A real-time visual SLAM based on semantic information and geometric information in dynamic environment 基于动态环境中语义信息和几何信息的实时视觉 SLAM

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Real-Time Image Processing

Pub Date : 2024-09-10 DOI: 10.1007/s11554-024-01527-4

Hongli Sun, Qingwu Fan, Huiqing Zhang, Jiajing Liu

Simultaneous Localization and Mapping (SLAM) is the core technology enabling mobile robots to autonomously explore and perceive the environment. However, dynamic objects in the scene significantly impact the accuracy and robustness of visual SLAM systems, limiting its applicability in real-world scenarios. Hence, we propose a real-time RGB-D visual SLAM algorithm designed for indoor dynamic scenes. Our approach includes a parallel lightweight object detection thread, which leverages the YOLOv7-tiny network to detect potential moving objects and generate 2D semantic information. Subsequently, a novel dynamic feature removal strategy is introduced in the tracking thread. This strategy integrates semantic information, geometric constraints, and feature point depth-based RANSAC to effectively mitigate the influence of dynamic features. To evaluate the effectiveness of the proposed algorithms, we conducted comparative experiments using other state-of-the-art algorithms on the TUM RGB-D dataset and Bonn RGB-D dataset, as well as in real-world dynamic scenes. The results demonstrate that the algorithm maintains excellent accuracy and robustness in dynamic environments, while also exhibiting impressive real-time performance.

同步定位与绘图（SLAM）是移动机器人自主探索和感知环境的核心技术。然而，场景中的动态物体会严重影响视觉 SLAM 系统的准确性和鲁棒性，从而限制了其在现实世界中的应用。因此，我们提出了一种专为室内动态场景设计的实时 RGB-D 视觉 SLAM 算法。我们的方法包括一个并行的轻量级物体检测线程，它利用 YOLOv7-tiny 网络来检测潜在的移动物体并生成二维语义信息。随后，在跟踪线程中引入了一种新颖的动态特征去除策略。该策略整合了语义信息、几何约束和基于特征点深度的 RANSAC，可有效减轻动态特征的影响。为了评估所提出算法的有效性，我们在 TUM RGB-D 数据集和波恩 RGB-D 数据集以及真实世界的动态场景中使用其他最先进的算法进行了对比实验。结果表明，该算法在动态环境中保持了出色的准确性和鲁棒性，同时还表现出令人印象深刻的实时性能。

{"title":"A real-time visual SLAM based on semantic information and geometric information in dynamic environment","authors":"Hongli Sun, Qingwu Fan, Huiqing Zhang, Jiajing Liu","doi":"10.1007/s11554-024-01527-4","DOIUrl":"https://doi.org/10.1007/s11554-024-01527-4","url":null,"abstract":"Simultaneous Localization and Mapping (SLAM) is the core technology enabling mobile robots to autonomously explore and perceive the environment. However, dynamic objects in the scene significantly impact the accuracy and robustness of visual SLAM systems, limiting its applicability in real-world scenarios. Hence, we propose a real-time RGB-D visual SLAM algorithm designed for indoor dynamic scenes. Our approach includes a parallel lightweight object detection thread, which leverages the YOLOv7-tiny network to detect potential moving objects and generate 2D semantic information. Subsequently, a novel dynamic feature removal strategy is introduced in the tracking thread. This strategy integrates semantic information, geometric constraints, and feature point depth-based RANSAC to effectively mitigate the influence of dynamic features. To evaluate the effectiveness of the proposed algorithms, we conducted comparative experiments using other state-of-the-art algorithms on the TUM RGB-D dataset and Bonn RGB-D dataset, as well as in real-world dynamic scenes. The results demonstrate that the algorithm maintains excellent accuracy and robustness in dynamic environments, while also exhibiting impressive real-time performance.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"46 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142227884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

LGFF-YOLO: small object detection method of UAV images based on efficient local–global feature fusion LGFF-YOLO：基于高效局部-全局特征融合的无人机图像小目标检测方法

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Real-Time Image Processing

Pub Date : 2024-09-06 DOI: 10.1007/s11554-024-01550-5

Hongxing Peng, Haopei Xie, Huanai Liu, Xianlu Guan

Images captured by Unmanned Aerial Vehicles (UAVs) play a significant role in many fields. However, with the development of UAV technology, challenges such as detecting small and dense objects against complex backgrounds have emerged. In this paper, we propose LGFF-YOLO, a detection model that integrates a novel local–global feature fusion method with the YOLOv8 baseline, specifically designed for small object detection in UAV imagery. Our innovative approach employs the Global Information Fusion Module (GIFM) and the Four-Leaf Clover Fusion Module (FLCM) to enhance the fusion of multi-scale features, improving detection accuracy without increasing model complexity. Next, we proposed the RFA-Block and LDyHead to control the total number of model parameters and improve the representation capability for small object detection. Experimental results on the VisDrone2019 dataset demonstrate a 38.3% mAP with only 4.15M parameters, a 4. 5% increase over baseline YOLOv8, while achieving 79.1 FPS for real-time detection. These advancements enhance the model’s generalization capability, balancing accuracy and speed, and significantly extend its applicability for detecting small objects in UAV images.

无人飞行器（UAV）拍摄的图像在许多领域都发挥着重要作用。然而，随着无人飞行器技术的发展，在复杂背景下检测小而密集的物体等难题也随之出现。本文提出的 LGFF-YOLO 是一种检测模型，它将一种新颖的局部-全局特征融合方法与 YOLOv8 基线相结合，专门用于无人机图像中的小物体检测。我们的创新方法采用了全局信息融合模块（GIFM）和四叶草融合模块（FLCM）来增强多尺度特征的融合，从而在不增加模型复杂度的情况下提高了检测精度。接下来，我们提出了 RFA-Block 和 LDyHead 来控制模型参数的总数，提高小物体检测的表示能力。在 VisDrone2019 数据集上的实验结果表明，只需 4.15M 个参数就能实现 38.3% 的 mAP，比基线 YOLOv8 提高了 4.5%，同时实现了 79.1 FPS 的实时检测。这些进步增强了模型的泛化能力，平衡了准确性和速度，大大扩展了其在无人机图像中检测小型物体的适用性。

{"title":"LGFF-YOLO: small object detection method of UAV images based on efficient local–global feature fusion","authors":"Hongxing Peng, Haopei Xie, Huanai Liu, Xianlu Guan","doi":"10.1007/s11554-024-01550-5","DOIUrl":"https://doi.org/10.1007/s11554-024-01550-5","url":null,"abstract":"Images captured by Unmanned Aerial Vehicles (UAVs) play a significant role in many fields. However, with the development of UAV technology, challenges such as detecting small and dense objects against complex backgrounds have emerged. In this paper, we propose LGFF-YOLO, a detection model that integrates a novel local–global feature fusion method with the YOLOv8 baseline, specifically designed for small object detection in UAV imagery. Our innovative approach employs the Global Information Fusion Module (GIFM) and the Four-Leaf Clover Fusion Module (FLCM) to enhance the fusion of multi-scale features, improving detection accuracy without increasing model complexity. Next, we proposed the RFA-Block and LDyHead to control the total number of model parameters and improve the representation capability for small object detection. Experimental results on the VisDrone2019 dataset demonstrate a 38.3% mAP with only 4.15M parameters, a 4. 5% increase over baseline YOLOv8, while achieving 79.1 FPS for real-time detection. These advancements enhance the model’s generalization capability, balancing accuracy and speed, and significantly extend its applicability for detecting small objects in UAV images.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"29 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A real-time foreign object detection method based on deep learning in complex open railway environments 复杂开放铁路环境中基于深度学习的异物实时检测方法

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Real-Time Image Processing

Pub Date : 2024-09-06 DOI: 10.1007/s11554-024-01548-z

Binlin Zhang, Qing Yang, Fengkui Chen, Dexin Gao

In response to the current challenges of numerous background influencing factors and low detection accuracy in the open railway foreign object detection, a real-time foreign object detection method based on deep learning for open railways in complex environments is proposed. Firstly, the images of foreign objects invading the clearance collected by locomotives during long-term operation are used to create a railway foreign object dataset that fits the current situation. Then, to improve the performance of the target detection algorithm, certain improvements are made to the YOLOv7-tiny network structure. The improved algorithm enhances feature extraction capability and strengthens detection performance. By introducing a Simple, parameter-free Attention Module for convolutional neural network (SimAM) attention mechanism, the representation ability of ConvNets is improved without adding extra parameters. Additionally, drawing on the network structure of the weighted Bi-directional Feature Pyramid Network (BiFPN), the backbone network achieves cross-level feature fusion by adding edges and neck fusion. Subsequently, the feature fusion layer is improved by introducing the GhostNetV2 module, which enhances the fusion capability of different scale features and greatly reduces computational load. Furthermore, the original loss function is replaced with the Normalized Wasserstein Distance (NWD) loss function to enhance the recognition capability of small distant targets. Finally, the proposed algorithm is trained and validated, and compared with other mainstream detection algorithms based on the established railway foreign object dataset. Experimental results show that the proposed algorithm achieves applicability and real-time performance on embedded devices, with high accuracy, improved model performance, and provides precise data support for railway safety assurance.

针对当前开放式铁路异物检测中存在的背景影响因素多、检测精度低等难题，提出了一种基于深度学习的复杂环境下开放式铁路异物实时检测方法。首先，利用机车在长期运行过程中采集到的异物侵入净空的图像，建立符合现状的铁路异物数据集。然后，为了提高目标检测算法的性能，对 YOLOv7-tiny 网络结构进行了一定的改进。改进后的算法增强了特征提取能力，提高了检测性能。通过为卷积神经网络（SimAM）注意力机制引入简单、无参数的注意力模块，在不增加额外参数的情况下提高了卷积神经网络的表示能力。此外，借鉴加权双向特征金字塔网络（BiFPN）的网络结构，骨干网络通过添加边缘和颈部融合实现跨层特征融合。随后，通过引入 GhostNetV2 模块改进了特征融合层，增强了不同尺度特征的融合能力，并大大降低了计算负荷。此外，用归一化瓦瑟斯坦距离（NWD）损失函数替换了原来的损失函数，以增强对远距离小目标的识别能力。最后，基于已建立的铁路异物数据集，对提出的算法进行了训练和验证，并与其他主流检测算法进行了比较。实验结果表明，所提出的算法在嵌入式设备上实现了适用性和实时性，具有较高的准确性，改善了模型性能，为铁路安全保障提供了精确的数据支持。

{"title":"A real-time foreign object detection method based on deep learning in complex open railway environments","authors":"Binlin Zhang, Qing Yang, Fengkui Chen, Dexin Gao","doi":"10.1007/s11554-024-01548-z","DOIUrl":"https://doi.org/10.1007/s11554-024-01548-z","url":null,"abstract":"In response to the current challenges of numerous background influencing factors and low detection accuracy in the open railway foreign object detection, a real-time foreign object detection method based on deep learning for open railways in complex environments is proposed. Firstly, the images of foreign objects invading the clearance collected by locomotives during long-term operation are used to create a railway foreign object dataset that fits the current situation. Then, to improve the performance of the target detection algorithm, certain improvements are made to the YOLOv7-tiny network structure. The improved algorithm enhances feature extraction capability and strengthens detection performance. By introducing a Simple, parameter-free Attention Module for convolutional neural network (SimAM) attention mechanism, the representation ability of ConvNets is improved without adding extra parameters. Additionally, drawing on the network structure of the weighted Bi-directional Feature Pyramid Network (BiFPN), the backbone network achieves cross-level feature fusion by adding edges and neck fusion. Subsequently, the feature fusion layer is improved by introducing the GhostNetV2 module, which enhances the fusion capability of different scale features and greatly reduces computational load. Furthermore, the original loss function is replaced with the Normalized Wasserstein Distance (NWD) loss function to enhance the recognition capability of small distant targets. Finally, the proposed algorithm is trained and validated, and compared with other mainstream detection algorithms based on the established railway foreign object dataset. Experimental results show that the proposed algorithm achieves applicability and real-time performance on embedded devices, with high accuracy, improved model performance, and provides precise data support for railway safety assurance.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"7 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Efficient and real-time skin lesion image segmentation using spatial-frequency information and channel convolutional networks 利用空间频率信息和通道卷积网络高效、实时地分割皮肤病变图像

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Real-Time Image Processing

Pub Date : 2024-09-03 DOI: 10.1007/s11554-024-01542-5

Shangwang Liu, Bingyan Zhou, Yinghai Lin, Peixia Wang

Accurate segmentation of skin lesions is essential for physicians to screen in dermoscopy images. However, they commonly face three main limitations: difficulty in accurately processing targets with coarse edges; frequent challenges in recovering detailed feature data; and a lack of adequate capability for the effective amalgamation of multi-scale features. To overcome these problems, we propose a skin lesion segmentation network (SFCC Net) that combines an attention mechanism and a redundancy reduction strategy. The initial step involved the design of a downsampling encoder and an encoder composed of Receptive Field (REFC) Blocks, aimed at supplementing lost details and extracting latent features. Subsequently, the Spatial-Frequency-Channel (SF) Block was employed to minimize feature redundancy and restore fine-grained information. To fully leverage previously learned features, an Up-sampling Convolution (UpC) Block was designed for information integration. The network’s performance was compared with state-of-the-art models on four public datasets. Experimental results demonstrate significant improvements in the network’s performance. On the ISIC datasets, the proposed network outperformed D-LKA Net by 4.19%, 0.19%, and 7.75% in F1, and by 2.14%, 0.51%, and 12.20% in IoU. The frame rate (FPS) of the proposed network when processing skin lesion images underscores its suitability for real-time image analysis. Additionally, the network’s generalization capability was validated on a lung dataset.

准确分割皮肤病变对于医生在皮肤镜图像中进行筛查至关重要。然而，它们通常面临三大限制：难以准确处理边缘粗糙的目标；在恢复详细特征数据时经常遇到挑战；缺乏有效合并多尺度特征的能力。为了克服这些问题，我们提出了一种皮肤病变分割网络（SFCC Net），它结合了注意力机制和减少冗余策略。第一步是设计一个降采样编码器和一个由感受野（REFC）块组成的编码器，目的是补充丢失的细节并提取潜在特征。随后，空间-频率-通道（SF）区块被用于最大限度地减少特征冗余和恢复细粒度信息。为了充分利用之前学习到的特征，设计了一个上采样卷积（UpC）区块进行信息整合。该网络的性能在四个公共数据集上与最先进的模型进行了比较。实验结果表明，该网络的性能有了显著提高。在 ISIC 数据集上，拟议网络的 F1 性能分别比 D-LKA Net 高出 4.19%、0.19% 和 7.75%，IoU 性能分别比 D-LKA Net 高出 2.14%、0.51% 和 12.20%。该网络在处理皮肤病变图像时的帧速率（FPS）突出表明了其适用于实时图像分析的能力。此外，该网络的泛化能力也在肺部数据集上得到了验证。

{"title":"Efficient and real-time skin lesion image segmentation using spatial-frequency information and channel convolutional networks","authors":"Shangwang Liu, Bingyan Zhou, Yinghai Lin, Peixia Wang","doi":"10.1007/s11554-024-01542-5","DOIUrl":"https://doi.org/10.1007/s11554-024-01542-5","url":null,"abstract":"Accurate segmentation of skin lesions is essential for physicians to screen in dermoscopy images. However, they commonly face three main limitations: difficulty in accurately processing targets with coarse edges; frequent challenges in recovering detailed feature data; and a lack of adequate capability for the effective amalgamation of multi-scale features. To overcome these problems, we propose a skin lesion segmentation network (SFCC Net) that combines an attention mechanism and a redundancy reduction strategy. The initial step involved the design of a downsampling encoder and an encoder composed of Receptive Field (REFC) Blocks, aimed at supplementing lost details and extracting latent features. Subsequently, the Spatial-Frequency-Channel (SF) Block was employed to minimize feature redundancy and restore fine-grained information. To fully leverage previously learned features, an Up-sampling Convolution (UpC) Block was designed for information integration. The network’s performance was compared with state-of-the-art models on four public datasets. Experimental results demonstrate significant improvements in the network’s performance. On the ISIC datasets, the proposed network outperformed D-LKA Net by 4.19%, 0.19%, and 7.75% in F1, and by 2.14%, 0.51%, and 12.20% in IoU. The frame rate (FPS) of the proposed network when processing skin lesion images underscores its suitability for real-time image analysis. Additionally, the network’s generalization capability was validated on a lung dataset.\u0000","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"60 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0