Journal of Real-Time Image Processing最新文献_第5页

Real-time medical lesion screening: accurate and rapid detectors 实时医疗病灶筛查：准确快速的检测器

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Real-Time Image Processing

Pub Date : 2024-07-17 DOI: 10.1007/s11554-024-01512-x

Dangguo Shao, Jie Jiang, Lei Ma, Hua Lai, Sanli Yi

Brain tumors are highly lethal, representing 85–90% of all primary central nervous system (CNS) tumors. Magnetic resonance imaging (MRI) images are employed to identify and assess brain tumors. However, this process has historically relied heavily on the expertise of medical professionals and necessitated the involvement of a substantial number of personnel. To optimize the allocation of medical resources and improve diagnostic efficiency, this work proposes a DETR-based RPC–DETR model that utilizes the Transformer. We conducted comparative experiments using RPC–DETR with other traditional detectors on the same equipment to test the performance exhibited by the model with equal computational resources. Tested on the Br35H brain tumor dataset with 701 MRI images (500 training sets and 201 test sets), RPC–DETR surpasses the YOLO models in accuracy while utilizing fewer parameters. This advancement ensures more reliable and faster diagnosis. RPC–DETR achieves a 96% mAP with just 14M parameters, offering high accuracy in brain tumor detection with a lighter model, making it easier to implement in various medical settings.

脑肿瘤致死率很高，占所有原发性中枢神经系统（CNS）肿瘤的 85-90%。磁共振成像（MRI）图像可用于识别和评估脑肿瘤。然而，这一过程历来严重依赖医疗专业人员的专业知识，需要大量人员的参与。为了优化医疗资源配置，提高诊断效率，本研究提出了一种基于 DETR 的 RPC-DETR 模型，该模型利用了 Transformer。我们使用 RPC-DETR 与相同设备上的其他传统检测器进行了对比实验，以测试该模型在同等计算资源下的性能表现。RPC-DETR 在 Br35H 脑肿瘤数据集（包含 701 幅核磁共振图像（500 个训练集和 201 个测试集））上进行了测试，其准确性超过了 YOLO 模型，同时使用的参数更少。这一进步确保了更可靠、更快速的诊断。RPC-DETR 仅用 1400 万个参数就达到了 96% 的 mAP，以更轻的模型提供了脑肿瘤检测的高准确性，使其更易于在各种医疗环境中实施。

{"title":"Real-time medical lesion screening: accurate and rapid detectors","authors":"Dangguo Shao, Jie Jiang, Lei Ma, Hua Lai, Sanli Yi","doi":"10.1007/s11554-024-01512-x","DOIUrl":"https://doi.org/10.1007/s11554-024-01512-x","url":null,"abstract":"Brain tumors are highly lethal, representing 85–90% of all primary central nervous system (CNS) tumors. Magnetic resonance imaging (MRI) images are employed to identify and assess brain tumors. However, this process has historically relied heavily on the expertise of medical professionals and necessitated the involvement of a substantial number of personnel. To optimize the allocation of medical resources and improve diagnostic efficiency, this work proposes a DETR-based RPC–DETR model that utilizes the Transformer. We conducted comparative experiments using RPC–DETR with other traditional detectors on the same equipment to test the performance exhibited by the model with equal computational resources. Tested on the Br35H brain tumor dataset with 701 MRI images (500 training sets and 201 test sets), RPC–DETR surpasses the YOLO models in accuracy while utilizing fewer parameters. This advancement ensures more reliable and faster diagnosis. RPC–DETR achieves a 96% mAP with just 14M parameters, offering high accuracy in brain tumor detection with a lighter model, making it easier to implement in various medical settings.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"30 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141722197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A full-detection association tracker with confidence optimization for real-time multi-object tracking 具有置信度优化功能的全检测关联跟踪器，用于实时多目标跟踪

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Real-Time Image Processing

Pub Date : 2024-07-11 DOI: 10.1007/s11554-024-01513-w

Youyu Liu, Xiangxiang Zhou, Zhendong Zhang, Yi Li, Wanbao Tao

Multi-object tracking (MOT) aims to obtain trajectories with unique identifiers for multiple objects in a video stream. In current approaches, confidence thresholds were frequently used to perform multi-stage data association. However, these thresholds could introduce instability into the algorithm when confronted with diverse scenarios. This article proposed confidence-optimization tracker (COTracker), a full-detection association tracker based on confidence optimization. COTracker incorporated detection confidence and matching cost as covariates and modeled tracklet confidence using exponential moving average (EMA). It introduced confidence cues in data association by generating a weighting matrix containing detection and tracklet confidence. Experimental results showed that COTracker achieved 63.0 HOTA and 77.1 IDF1 on MOT17 test set. On the more crowded MOT20, it achieves 62.4 HOTA and 76.1 IDF1. Compared with threshold-based methods, COTracker showcased the ability to handle various complex scenarios without adjusting the confidence threshold. Furthermore, its outstanding tracking speed, meeting the requirements of real-time tracking, positions it with potential value in applications such as unmanned driving and drone tracking. The source codes are available at https://github.com/LiYi199983/CWTracker.

多目标跟踪（MOT）旨在为视频流中的多个目标获取具有唯一标识符的轨迹。在目前的方法中，经常使用置信度阈值来执行多阶段数据关联。然而，在面对不同场景时，这些阈值可能会给算法带来不稳定性。本文提出了基于置信度优化的全检测关联跟踪器--置信度优化跟踪器（COTracker）。COTracker 将检测置信度和匹配成本作为协变量，并使用指数移动平均法（EMA）对小轨迹置信度进行建模。它通过生成包含检测和小轨迹信度的加权矩阵，在数据关联中引入了信度线索。实验结果表明，COTracker 在 MOT17 测试集上取得了 63.0 的 HOTA 和 77.1 的 IDF1。在更为拥挤的 MOT20 测试集中，COTracker 实现了 62.4 HOTA 和 76.1 IDF1。与基于阈值的方法相比，COTracker 展示了在不调整置信度阈值的情况下处理各种复杂场景的能力。此外，其出色的跟踪速度满足了实时跟踪的要求，在无人驾驶和无人机跟踪等应用中具有潜在价值。源代码可从 https://github.com/LiYi199983/CWTracker 获取。

{"title":"A full-detection association tracker with confidence optimization for real-time multi-object tracking","authors":"Youyu Liu, Xiangxiang Zhou, Zhendong Zhang, Yi Li, Wanbao Tao","doi":"10.1007/s11554-024-01513-w","DOIUrl":"https://doi.org/10.1007/s11554-024-01513-w","url":null,"abstract":"Multi-object tracking (MOT) aims to obtain trajectories with unique identifiers for multiple objects in a video stream. In current approaches, confidence thresholds were frequently used to perform multi-stage data association. However, these thresholds could introduce instability into the algorithm when confronted with diverse scenarios. This article proposed confidence-optimization tracker (COTracker), a full-detection association tracker based on confidence optimization. COTracker incorporated detection confidence and matching cost as covariates and modeled tracklet confidence using exponential moving average (EMA). It introduced confidence cues in data association by generating a weighting matrix containing detection and tracklet confidence. Experimental results showed that COTracker achieved 63.0 HOTA and 77.1 IDF1 on MOT17 test set. On the more crowded MOT20, it achieves 62.4 HOTA and 76.1 IDF1. Compared with threshold-based methods, COTracker showcased the ability to handle various complex scenarios without adjusting the confidence threshold. Furthermore, its outstanding tracking speed, meeting the requirements of real-time tracking, positions it with potential value in applications such as unmanned driving and drone tracking. The source codes are available at https://github.com/LiYi199983/CWTracker.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"32 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141610334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Real-time and secure identity authentication transmission mechanism for artificial intelligence generated image content 人工智能生成图像内容的实时安全身份验证传输机制

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Real-Time Image Processing

Pub Date : 2024-07-10 DOI: 10.1007/s11554-024-01508-7

Xiao Feng, Zheng Yuan

The rapid development of generative artificial intelligence technology and large-scale pre-training models has led to the emergence of artificial intelligence generated image content (AIGIC) as an important application of natural language processing models. This has resulted in a significant shift and advancement in the way image content is created. As AIGIC requires the acquisition of substantial image datasets from user devices for training purposes, the data transmission link is highly complex, and the datasets are susceptible to illegal attacks from multiple parties during transmission, which has a detrimental impact on the integrity and real-time nature of the training data and affects the accuracy of the training results of the AIGIC model. Consequently, this paper proposed a real-time authentication mechanism to guarantee the secure transmission of AIGIC image datasets. The mechanism achieves anonymous identity protection for the user device providing the image dataset by introducing a certificate-less encryption system. In turn, an aggregated signature scheme with key negotiation algorithm is introduced to authenticate the user devices of legitimate image datasets. A performance analysis indicates that the mechanism proposed in this paper outperforms other related methods in terms of security and accuracy of AIGIC image model training results, while guaranteeing real-time transmission of AIGIC image datasets, at the same time, the time complexity is also lower, which can effectively ensure the timeliness of the algorithm.

随着生成式人工智能技术和大规模预训练模型的快速发展，人工智能生成的图像内容（AIGIC）已成为自然语言处理模型的一项重要应用。这使得创建图像内容的方式发生了重大转变和进步。由于人工智能生成图像内容（AIGIC）需要从用户设备中获取大量图像数据集进行训练，数据传输环节非常复杂，数据集在传输过程中容易受到来自多方的非法攻击，这对训练数据的完整性和实时性造成了不利影响，也影响了人工智能生成图像内容（AIGIC）模型训练结果的准确性。因此，本文提出了一种实时身份验证机制，以保证 AIGIC 图像数据集的安全传输。该机制通过引入无证书加密系统，实现了对提供图像数据集的用户设备的匿名身份保护。此外，还引入了一种带有密钥协商算法的聚合签名方案，以验证合法图像数据集用户设备的身份。性能分析表明，本文提出的机制在安全性和 AIGIC 图像模型训练结果的准确性方面优于其他相关方法，在保证 AIGIC 图像数据集实时传输的同时，时间复杂度也较低，能有效保证算法的时效性。

{"title":"Real-time and secure identity authentication transmission mechanism for artificial intelligence generated image content","authors":"Xiao Feng, Zheng Yuan","doi":"10.1007/s11554-024-01508-7","DOIUrl":"https://doi.org/10.1007/s11554-024-01508-7","url":null,"abstract":"The rapid development of generative artificial intelligence technology and large-scale pre-training models has led to the emergence of artificial intelligence generated image content (AIGIC) as an important application of natural language processing models. This has resulted in a significant shift and advancement in the way image content is created. As AIGIC requires the acquisition of substantial image datasets from user devices for training purposes, the data transmission link is highly complex, and the datasets are susceptible to illegal attacks from multiple parties during transmission, which has a detrimental impact on the integrity and real-time nature of the training data and affects the accuracy of the training results of the AIGIC model. Consequently, this paper proposed a real-time authentication mechanism to guarantee the secure transmission of AIGIC image datasets. The mechanism achieves anonymous identity protection for the user device providing the image dataset by introducing a certificate-less encryption system. In turn, an aggregated signature scheme with key negotiation algorithm is introduced to authenticate the user devices of legitimate image datasets. A performance analysis indicates that the mechanism proposed in this paper outperforms other related methods in terms of security and accuracy of AIGIC image model training results, while guaranteeing real-time transmission of AIGIC image datasets, at the same time, the time complexity is also lower, which can effectively ensure the timeliness of the algorithm.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"181 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141567461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A novel single kernel parallel image encryption scheme based on a chaotic map 基于混沌图的新型单核并行图像加密方案

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Real-Time Image Processing

Pub Date : 2024-07-09 DOI: 10.1007/s11554-024-01506-9

Joao Inacio Moreira Bezerra, Alexandre Molter, Gustavo Machado, Rafael Iankowski Soares, Vinícius Valduga de Almeida Camargo

The development of communication technologies has increased concerns about data security, increasing the prominence of cryptography. Images are one of the most widely shared data, and chaotic ciphers arouse significant interest from researchers, as traditional ciphers are not optimized for image encryption. Chaotic encryption schemes perform well for low-quality images but should be faster for real-time encryption of Full Ultra HD images. In this context, a novel parallel image cipher scheme is proposed to execute in GPU architectures, where the encryption procedure consists of a single kernel, making it different from previous chaotic ciphers and the AES. This new contribution enables our work to achieve a throughput of 130.8 GB/s on a GeForce RTX3070 and 251.6 GB/s on a Tesla V100 GPU, an increase of 37% compared to the AES and 43 times higher than the previously high for chaotic ciphers. The cipher’s security is also verified regarding multiple forms of attacks.

通信技术的发展加剧了人们对数据安全的担忧，使密码学的地位更加突出。图像是最广泛共享的数据之一，混沌密码引起了研究人员的极大兴趣，因为传统密码没有针对图像加密进行优化。混沌加密方案在低质量图像方面表现良好，但在全超高清图像的实时加密方面应该更快。在此背景下，我们提出了一种新型并行图像加密方案，可在 GPU 架构中执行，加密过程由单个内核组成，使其有别于之前的混沌密码和 AES。这一新贡献使我们的工作在 GeForce RTX3070 上实现了 130.8 GB/s 的吞吐量，在 Tesla V100 GPU 上实现了 251.6 GB/s 的吞吐量，与 AES 相比提高了 37%，比以前的混沌密码高出 43 倍。该密码的安全性还通过了多种攻击形式的验证。

引用次数: 0

DTS: dynamic training slimming with feature sparsity for efficient convolutional neural network DTS：利用特征稀疏性进行动态训练瘦身，实现高效卷积神经网络

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Real-Time Image Processing

Pub Date : 2024-07-08 DOI: 10.1007/s11554-024-01511-y

Jia Yin, Wei Wang, Zhonghua Guo, Yangchun Ji

Deep convolutional neural networks have achieved remarkable progress on computer vision tasks over last years. In this paper, we proposed a dynamic training slimming with feature sparsity based on structured pruning, named DTS, for efficient and automatic channel pruning. Unlike other existing pruning methods, which require manual intervention for pruning settings for each layer, DTS can design suitable architecture width for target datasets and deployment resources by automated pruning. The proposed method can be deployed to modern CNNs and the experimental results on CIFAR, ImageNet and PASCAL VOC benchmark datasets demonstrate the effectiveness of the proposed method, which significantly exceeds the other schemes.

近年来，深度卷积神经网络在计算机视觉任务中取得了显著进展。在本文中，我们提出了一种基于结构化剪枝的特征稀疏性动态训练瘦身方法，命名为 DTS，用于高效、自动地进行通道剪枝。与其他需要人工干预各层剪枝设置的现有剪枝方法不同，DTS 可以通过自动剪枝为目标数据集和部署资源设计合适的架构宽度。在 CIFAR、ImageNet 和 PASCAL VOC 基准数据集上的实验结果证明了所提方法的有效性，其效果明显优于其他方案。

引用次数: 0

Real-time water surface target detection based on improved YOLOv7 for Chengdu Sand River 基于改进型 YOLOv7 的成都沙河水面目标实时探测技术

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Real-Time Image Processing

Pub Date : 2024-07-08 DOI: 10.1007/s11554-024-01510-z

Mei Yang, Huajun Wang

It has been a challenge to obtain accurate detection results in a timely manner when faced with complex and changing surface target detection. Detecting targets on water surfaces in real-time can be challenging due to their rapid movement, small size, and fragmented appearance. In addition, traditional detection methods are often labor-intensive and time-consuming, especially when dealing with large water bodies such as rivers and lakes. This paper presents an improved water surface target detection algorithm that is based on the YOLOv7 (you only look once) model to enhance the performance of water surface target detection. We have enhanced the accuracy and speed of detecting surface targets by making improvements to three key structures: the network aggregation structure, the pyramid pooling structure, and the down-sampling structure. Furthermore, we implemented the model on mobile devices and designed a detection software. The software enables real-time detection through images and videos. The experimental results demonstrate that the improved model outperforms the original YOLOv7 model. It exhibits a 6.4% boost in accuracy, a 4.2% improvement in recall, a 4.1% increase in mAP, a 14.3% reduction in parameter counts, and archives the FPS of 87. The software has the ability to accurately recognize 11 typical targets on the water surface and demonstrates excellent water surface target detection capability.

面对复杂多变的水面目标探测，如何及时获得准确的探测结果一直是个难题。由于水面目标移动速度快、体积小、外观零散，因此实时检测水面目标具有很大的挑战性。此外，传统的检测方法往往耗费大量人力和时间，尤其是在处理河流和湖泊等大型水体时。本文提出了一种基于 YOLOv7（只看一次）模型的改进型水面目标检测算法，以提高水面目标检测的性能。我们通过改进三个关键结构：网络聚合结构、金字塔汇集结构和向下采样结构，提高了水面目标检测的精度和速度。此外，我们还在移动设备上实现了该模型，并设计了一款检测软件。该软件可通过图像和视频进行实时检测。实验结果表明，改进后的模型优于原始的 YOLOv7 模型。它的准确率提高了 6.4%，召回率提高了 4.2%，mAP 提高了 4.1%，参数数减少了 14.3%，归档 FPS 为 87。该软件能够准确识别水面上的 11 个典型目标，展示了出色的水面目标检测能力。

{"title":"Real-time water surface target detection based on improved YOLOv7 for Chengdu Sand River","authors":"Mei Yang, Huajun Wang","doi":"10.1007/s11554-024-01510-z","DOIUrl":"https://doi.org/10.1007/s11554-024-01510-z","url":null,"abstract":"It has been a challenge to obtain accurate detection results in a timely manner when faced with complex and changing surface target detection. Detecting targets on water surfaces in real-time can be challenging due to their rapid movement, small size, and fragmented appearance. In addition, traditional detection methods are often labor-intensive and time-consuming, especially when dealing with large water bodies such as rivers and lakes. This paper presents an improved water surface target detection algorithm that is based on the YOLOv7 (you only look once) model to enhance the performance of water surface target detection. We have enhanced the accuracy and speed of detecting surface targets by making improvements to three key structures: the network aggregation structure, the pyramid pooling structure, and the down-sampling structure. Furthermore, we implemented the model on mobile devices and designed a detection software. The software enables real-time detection through images and videos. The experimental results demonstrate that the improved model outperforms the original YOLOv7 model. It exhibits a 6.4% boost in accuracy, a 4.2% improvement in recall, a 4.1% increase in mAP, a 14.3% reduction in parameter counts, and archives the FPS of 87. The software has the ability to accurately recognize 11 typical targets on the water surface and demonstrates excellent water surface target detection capability.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"78 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141567463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An improved multi-scale and knowledge distillation method for efficient pedestrian detection in dense scenes 一种改进的多尺度和知识提炼方法，用于在密集场景中高效检测行人

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Real-Time Image Processing

Pub Date : 2024-07-06 DOI: 10.1007/s11554-024-01507-8

Yanxiang Xu, Mi Wen, Wei He, Hongwei Wang, Yunsheng Xue

Pedestrian detection in densely populated scenes, particularly in the presence of occlusions, remains a challenging issue in computer vision. Existing approaches often address detection leakage by enhancing model architectures or incorporating attention mechanisms; However, small-scale pedestrians have fewer features and are easily overfitted to the dataset and these approaches still face challenges in accurately detecting pedestrians with small target sizes. To tackle this issue, this research rethinks the occlusion region through small-scale pedestrian detection and proposes the You Only Look Once model for efficient pedestrian detection(YOLO-EPD). Firstly, we find that Standard Convolution and Dilated Convolution do not fit well with pedestrian targets with different scales due to a single receptive field, and we propose the Selective Content Aware Downsampling (SCAD) module, which is integrated into the backbone to attain enhanced feature extraction. In addition, to address the issue of missed detections resulting from insufficient feature extraction for small-scale pedestrian detection, we propose the Crowded Multi-Head Attention (CMHA) module, which makes full use of multi-layer information. Finally, for the challenge of optimizing the performance and effectiveness of small-object detection, we design Unified Channel-Task Distillation (UCTD) with channel attention and a Lightweight head (Lhead) using parameter sharing to keep it lightweight. Experimental results validate the superiority of YOLO-EPD, achieving a remarkable 91.1% Average Precision (AP) on the Widerperson dataset, while concurrently reducing parameters and computational overhead by 40%. The experimental findings demonstrate that YOLO-EPD greatly accelerates the convergence of model training and achieves better real-time performance in real-world dense scenarios.

在人口稠密的场景中进行行人检测，尤其是在有遮挡物的情况下，仍然是计算机视觉领域的一个挑战性问题。然而，小尺度行人的特征较少，很容易与数据集过度拟合，这些方法在准确检测目标尺寸较小的行人方面仍面临挑战。针对这一问题，本研究通过小尺度行人检测重新思考了遮挡区域，并提出了高效行人检测模型（YOLO-EPD）。首先，我们发现标准卷积和稀释卷积由于感受野单一，不能很好地适应不同尺度的行人目标，因此我们提出了选择性内容感知下采样（SCAD）模块，并将其集成到主干网中，以实现增强的特征提取。此外，针对小尺度行人检测中特征提取不足导致的漏检问题，我们提出了拥挤多头关注（CMHA）模块，充分利用多层信息。最后，针对优化小目标检测性能和效果的挑战，我们设计了统一通道-任务蒸馏（UCTD），该模块具有通道注意力和轻量级头部（Lhead），使用参数共享来保持其轻量级。实验结果验证了 YOLO-EPD 的优越性，在 Widerperson 数据集上取得了 91.1% 的显著平均精度 (AP)，同时减少了 40% 的参数和计算开销。实验结果表明，YOLO-EPD 极大地加快了模型训练的收敛速度，并在真实世界的密集场景中实现了更好的实时性能。

{"title":"An improved multi-scale and knowledge distillation method for efficient pedestrian detection in dense scenes","authors":"Yanxiang Xu, Mi Wen, Wei He, Hongwei Wang, Yunsheng Xue","doi":"10.1007/s11554-024-01507-8","DOIUrl":"https://doi.org/10.1007/s11554-024-01507-8","url":null,"abstract":"Pedestrian detection in densely populated scenes, particularly in the presence of occlusions, remains a challenging issue in computer vision. Existing approaches often address detection leakage by enhancing model architectures or incorporating attention mechanisms; However, small-scale pedestrians have fewer features and are easily overfitted to the dataset and these approaches still face challenges in accurately detecting pedestrians with small target sizes. To tackle this issue, this research rethinks the occlusion region through small-scale pedestrian detection and proposes the You Only Look Once model for efficient pedestrian detection(YOLO-EPD). Firstly, we find that Standard Convolution and Dilated Convolution do not fit well with pedestrian targets with different scales due to a single receptive field, and we propose the Selective Content Aware Downsampling (SCAD) module, which is integrated into the backbone to attain enhanced feature extraction. In addition, to address the issue of missed detections resulting from insufficient feature extraction for small-scale pedestrian detection, we propose the Crowded Multi-Head Attention (CMHA) module, which makes full use of multi-layer information. Finally, for the challenge of optimizing the performance and effectiveness of small-object detection, we design Unified Channel-Task Distillation (UCTD) with channel attention and a Lightweight head (Lhead) using parameter sharing to keep it lightweight. Experimental results validate the superiority of YOLO-EPD, achieving a remarkable 91.1% Average Precision (AP) on the Widerperson dataset, while concurrently reducing parameters and computational overhead by 40%. The experimental findings demonstrate that YOLO-EPD greatly accelerates the convergence of model training and achieves better real-time performance in real-world dense scenarios.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"54 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141567465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Lightweight safety helmet detection algorithm using improved YOLOv5 使用改进型 YOLOv5 的轻型安全头盔检测算法

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Real-Time Image Processing

Pub Date : 2024-07-05 DOI: 10.1007/s11554-024-01499-5

Hongge Ren, Anni Fan, Jian Zhao, Hairui Song, Xiuman Liang

In response to the challenges faced by existing safety helmet detection algorithms when applied to complex construction site scenarios, such as poor accuracy, large number of parameters, large amount of computation and large model size, this paper proposes a lightweight safety helmet detection algorithm based on YOLOv5, which achieves a balance between lightweight and accuracy. First, the algorithm integrates the Distribution Shifting Convolution (DSConv) layer and the Squeeze-and-Excitation (SE) attention mechanism, effectively replacing the original partial convolution and C3 modules, this integration significantly enhances the capabilities of feature extraction and representation learning. Second, multi-scale feature fusion is performed on the Ghost module using skip connections, replacing certain C3 module, to achieve lightweight and maintain accuracy. Finally, adjustments have been made to the Bottleneck Attention Mechanism (BAM) to suppress irrelevant information and enhance the extraction of features in rich regions. The experimental results show that improved model improves the mean average precision (mAP) by 1.0% compared to the original algorithm, reduces the number of parameters by 22.2%, decreases the computation by 20.9%, and the model size is reduced by 20.1%, which realizes the lightweight of the detection algorithm.

针对现有安全帽检测算法在应用于复杂施工现场场景时面临的精度差、参数多、计算量大、模型体积大等难题，本文提出了一种基于 YOLOv5 的轻量级安全帽检测算法，实现了轻量级与精度之间的平衡。首先，该算法集成了分布移动卷积（DSConv）层和挤压激励（SE）注意机制，有效替代了原有的部分卷积和C3模块，这种集成显著增强了特征提取和表征学习的能力。其次，在 Ghost 模块上使用跳转连接进行多尺度特征融合，取代了某些 C3 模块，实现了轻量化并保持了准确性。最后，对瓶颈注意机制（Bottleneck Attention Mechanism，BAM）进行了调整，以抑制无关信息，增强对丰富区域特征的提取。实验结果表明，改进后的模型与原始算法相比，平均精度（mAP）提高了 1.0%，参数数量减少了 22.2%，计算量减少了 20.9%，模型大小减少了 20.1%，实现了检测算法的轻量化。

{"title":"Lightweight safety helmet detection algorithm using improved YOLOv5","authors":"Hongge Ren, Anni Fan, Jian Zhao, Hairui Song, Xiuman Liang","doi":"10.1007/s11554-024-01499-5","DOIUrl":"https://doi.org/10.1007/s11554-024-01499-5","url":null,"abstract":"In response to the challenges faced by existing safety helmet detection algorithms when applied to complex construction site scenarios, such as poor accuracy, large number of parameters, large amount of computation and large model size, this paper proposes a lightweight safety helmet detection algorithm based on YOLOv5, which achieves a balance between lightweight and accuracy. First, the algorithm integrates the Distribution Shifting Convolution (DSConv) layer and the Squeeze-and-Excitation (SE) attention mechanism, effectively replacing the original partial convolution and C3 modules, this integration significantly enhances the capabilities of feature extraction and representation learning. Second, multi-scale feature fusion is performed on the Ghost module using skip connections, replacing certain C3 module, to achieve lightweight and maintain accuracy. Finally, adjustments have been made to the Bottleneck Attention Mechanism (BAM) to suppress irrelevant information and enhance the extraction of features in rich regions. The experimental results show that improved model improves the mean average precision (mAP) by 1.0% compared to the original algorithm, reduces the number of parameters by 22.2%, decreases the computation by 20.9%, and the model size is reduced by 20.1%, which realizes the lightweight of the detection algorithm.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"37 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141549047","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Selfredepth 自我深度

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Real-Time Image Processing

Pub Date : 2024-07-04 DOI: 10.1007/s11554-024-01491-z

Alexandre Duarte, Francisco Fernandes, João M. Pereira, Catarina Moreira, Jacinto C. Nascimento, Joaquim Jorge

Depth maps produced by consumer-grade sensors suffer from inaccurate measurements and missing data from either system or scene-specific sources. Data-driven denoising algorithms can mitigate such problems; however, they require vast amounts of ground truth depth data. Recent research has tackled this limitation using self-supervised learning techniques, but it requires multiple RGB-D sensors. Moreover, most existing approaches focus on denoising single isolated depth maps or specific subjects of interest highlighting a need for methods that can effectively denoise depth maps in real-time dynamic environments. This paper extends state-of-the-art approaches for depth-denoising commodity depth devices, proposing SelfReDepth, a self-supervised deep learning technique for depth restoration, via denoising and hole-filling by inpainting of full-depth maps captured with RGB-D sensors. The algorithm targets depth data in video streams, utilizing multiple sequential depth frames coupled with color data to achieve high-quality depth videos with temporal coherence. Finally, SelfReDepth is designed to be compatible with various RGB-D sensors and usable in real-time scenarios as a pre-processing step before applying other depth-dependent algorithms. Our results demonstrate our approach’s real-time performance on real-world datasets shows that it outperforms state-of-the-art methods in denoising and restoration performance at over 30 fps on Commercial Depth Cameras, with potential benefits for augmented and mixed-reality applications.

消费级传感器生成的深度图存在测量不准确以及系统或场景特定来源数据缺失的问题。数据驱动的去噪算法可以缓解这些问题，但需要大量的地面真实深度数据。最近的研究利用自监督学习技术解决了这一限制，但它需要多个 RGB-D 传感器。此外，现有的大多数方法都侧重于对单个孤立的深度图或特定的感兴趣对象进行去噪，这就凸显了对能在实时动态环境中有效去噪深度图的方法的需求。本文扩展了最先进的商品深度设备深度去噪方法，提出了一种用于深度还原的自监督深度学习技术--SelfReDepth，该技术通过对 RGB-D 传感器捕获的全深度图进行去噪和内绘填洞来实现深度还原。该算法以视频流中的深度数据为目标，利用多个连续的深度帧和颜色数据，实现具有时间一致性的高质量深度视频。最后，SelfReDepth 的设计与各种 RGB-D 传感器兼容，可在实时场景中作为应用其他深度相关算法前的预处理步骤。我们的研究结果证明了我们的方法在真实世界数据集上的实时性能，表明它在商用深度摄像头上以超过 30 fps 的速度进行去噪和还原时，其性能优于最先进的方法，这为增强现实和混合现实应用带来了潜在的好处。

{"title":"Selfredepth","authors":"Alexandre Duarte, Francisco Fernandes, João M. Pereira, Catarina Moreira, Jacinto C. Nascimento, Joaquim Jorge","doi":"10.1007/s11554-024-01491-z","DOIUrl":"https://doi.org/10.1007/s11554-024-01491-z","url":null,"abstract":"Depth maps produced by consumer-grade sensors suffer from inaccurate measurements and missing data from either system or scene-specific sources. Data-driven denoising algorithms can mitigate such problems; however, they require vast amounts of ground truth depth data. Recent research has tackled this limitation using self-supervised learning techniques, but it requires multiple RGB-D sensors. Moreover, most existing approaches focus on denoising single isolated depth maps or specific subjects of interest highlighting a need for methods that can effectively denoise depth maps in real-time dynamic environments. This paper extends state-of-the-art approaches for depth-denoising commodity depth devices, proposing SelfReDepth, a self-supervised deep learning technique for depth restoration, via denoising and hole-filling by inpainting of full-depth maps captured with RGB-D sensors. The algorithm targets depth data in video streams, utilizing multiple sequential depth frames coupled with color data to achieve high-quality depth videos with temporal coherence. Finally, SelfReDepth is designed to be compatible with various RGB-D sensors and usable in real-time scenarios as a pre-processing step before applying other depth-dependent algorithms. Our results demonstrate our approach’s real-time performance on real-world datasets shows that it outperforms state-of-the-art methods in denoising and restoration performance at over 30 fps on Commercial Depth Cameras, with potential benefits for augmented and mixed-reality applications.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"62 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141549049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

FastBeltNet: a dual-branch light-weight network for real-time conveyor belt edge detection FastBeltNet：用于实时传送带边缘检测的双分支轻量级网络

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Real-Time Image Processing

Pub Date : 2024-07-03 DOI: 10.1007/s11554-024-01502-z

Xing Zhao, Minhao Zeng, Yanglin Dong, Gang Rao, Xianshan Huang, Xutao Mo

Belt conveyors are widely used in multiple industries, including coal, steel, port, power, metallurgy, and chemical, etc. One major challenge faced by these industries is belt deviation, which can negatively impact production efficiency and safety. Despite previous research on improving belt edge detection accuracy, there is still a need to prioritize system efficiency and light-weight models for practical industrial applications. To meet this need, a new semantic segmentation network called FastBeltNet has been developed specifically for real-time and highly accurate conveyor belt edge line segmentation while maintaining a light-weight design. This network uses a dual-branch structure that combines a shallow spatial branch for extracting high-resolution spatial information with a context branch for deep contextual semantic information. It also incorporates the Ghost blocks, Downsample blocks, and Input Injection blocks to reduce computational load, increase processing frame rate, and enhance feature representation. Experimental results have shown that FastBeltNet has performed comparatively better than some existing methods in different real-world production settings, achieving promising performance metrics. Specifically, FastBeltNet achieves 80.49% mIoU accuracy, 99.89 FPS processing speed, 895 k parameters, 8.23 GFLOPs, and 430.95 MB peak CUDA memory use, effectively balancing accuracy and speed for industrial production.

带式输送机广泛应用于煤炭、钢铁、港口、电力、冶金和化工等多个行业。这些行业面临的一个主要挑战是皮带偏离，这会对生产效率和安全造成负面影响。尽管以前曾对提高皮带边缘检测精度进行过研究，但在实际工业应用中，仍然需要优先考虑系统效率和轻量级模型。为了满足这一需求，我们专门开发了一种名为 FastBeltNet 的新型语义分割网络，用于实时、高精度地分割传送带边缘线，同时保持轻量级设计。该网络采用双分支结构，将用于提取高分辨率空间信息的浅层空间分支与用于提取深层上下文语义信息的上下文分支相结合。它还结合了幽灵区块、下采样区块和输入注入区块，以减少计算负荷、提高处理帧频并增强特征表示。实验结果表明，在不同的实际生产环境中，FastBeltNet 的表现优于一些现有方法，取得了可喜的性能指标。具体来说，FastBeltNet 实现了 80.49% 的 mIoU 精确度、99.89 FPS 的处理速度、895 k 个参数、8.23 GFLOPs 和 430.95 MB 的峰值 CUDA 内存使用量，有效地平衡了工业生产中的精确度和速度。

{"title":"FastBeltNet: a dual-branch light-weight network for real-time conveyor belt edge detection","authors":"Xing Zhao, Minhao Zeng, Yanglin Dong, Gang Rao, Xianshan Huang, Xutao Mo","doi":"10.1007/s11554-024-01502-z","DOIUrl":"https://doi.org/10.1007/s11554-024-01502-z","url":null,"abstract":"Belt conveyors are widely used in multiple industries, including coal, steel, port, power, metallurgy, and chemical, etc. One major challenge faced by these industries is belt deviation, which can negatively impact production efficiency and safety. Despite previous research on improving belt edge detection accuracy, there is still a need to prioritize system efficiency and light-weight models for practical industrial applications. To meet this need, a new semantic segmentation network called FastBeltNet has been developed specifically for real-time and highly accurate conveyor belt edge line segmentation while maintaining a light-weight design. This network uses a dual-branch structure that combines a shallow spatial branch for extracting high-resolution spatial information with a context branch for deep contextual semantic information. It also incorporates the Ghost blocks, Downsample blocks, and Input Injection blocks to reduce computational load, increase processing frame rate, and enhance feature representation. Experimental results have shown that FastBeltNet has performed comparatively better than some existing methods in different real-world production settings, achieving promising performance metrics. Specifically, FastBeltNet achieves 80.49% mIoU accuracy, 99.89 FPS processing speed, 895 k parameters, 8.23 GFLOPs, and 430.95 MB peak CUDA memory use, effectively balancing accuracy and speed for industrial production.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"29 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141549048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0