Journal of Real-Time Image Processing最新文献_第6页

YOLO-FGD: a fast lightweight PCB defect method based on FasterNet and the Gather-and-Distribute mechanism YOLO-FGD：基于 FasterNet 和聚散机制的快速轻量级 PCB 缺陷处理方法

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Real-Time Image Processing

Pub Date : 2024-07-03 DOI: 10.1007/s11554-024-01504-x

Changxin Qin, Zhongyu Zhou

With the rapid expansion of the electronics industry, the demand for high-quality printed circuit boards has surged. However, existing PCB defect detection methods suffer from various limitations, such as slow speeds, low accuracy, and restricted detection scope, often leading to false positives and negatives. To overcome these challenges, this paper presents YOLO-FGD, a novel detection model. YOLO-FGD replaces YOLOv5’s backbone network with FasterNet, significantly accelerating feature extraction. The Neck section adopts the Gather-and-Distribute mechanism, which enhances multiscale feature fusion for small targets through convolution and self-attention mechanisms. Integration of the C3_Faster feature extraction module effectively reduces the number of parameters and the number of FLOPs, accelerating the computations. Experiments on the PCB-DATASETS dataset show promising results: the mean average precision50 reaches 98.8%, the mean average precision50–95 reaches 57.2%, the computational load is reduced to 11.5 GFLOPs, and the model size is only 12.6 MB, meeting lightweight standards. These findings underscore the effectiveness of YOLO-FGD in efficiently detecting PCB defects, providing robust support for electronic manufacturing quality control.

随着电子工业的迅速发展，对高质量印刷电路板的需求激增。然而，现有的印刷电路板缺陷检测方法存在各种局限性，如速度慢、精度低、检测范围受限等，往往会导致误报和漏报。为了克服这些挑战，本文提出了一种新型检测模型 YOLO-FGD。YOLO-FGD 用 FasterNet 代替了 YOLOv5 的主干网络，大大加快了特征提取的速度。Neck 部分采用了 Gather-and-Distribute 机制，通过卷积和自注意机制增强了小型目标的多尺度特征融合。C3_Faster 特征提取模块的集成有效减少了参数数量和 FLOPs 数量，从而加快了计算速度。在 PCB-DATASETS 数据集上的实验显示了良好的结果：平均精度50 达到 98.8%，平均精度50-95 达到 57.2%，计算负荷降低到 11.5 GFLOPs，模型大小仅为 12.6 MB，达到了轻量级标准。这些发现证明了 YOLO-FGD 在高效检测 PCB 缺陷方面的有效性，为电子制造质量控制提供了强有力的支持。

{"title":"YOLO-FGD: a fast lightweight PCB defect method based on FasterNet and the Gather-and-Distribute mechanism","authors":"Changxin Qin, Zhongyu Zhou","doi":"10.1007/s11554-024-01504-x","DOIUrl":"https://doi.org/10.1007/s11554-024-01504-x","url":null,"abstract":"With the rapid expansion of the electronics industry, the demand for high-quality printed circuit boards has surged. However, existing PCB defect detection methods suffer from various limitations, such as slow speeds, low accuracy, and restricted detection scope, often leading to false positives and negatives. To overcome these challenges, this paper presents YOLO-FGD, a novel detection model. YOLO-FGD replaces YOLOv5’s backbone network with FasterNet, significantly accelerating feature extraction. The Neck section adopts the Gather-and-Distribute mechanism, which enhances multiscale feature fusion for small targets through convolution and self-attention mechanisms. Integration of the C3_Faster feature extraction module effectively reduces the number of parameters and the number of FLOPs, accelerating the computations. Experiments on the PCB-DATASETS dataset show promising results: the mean average precision50 reaches 98.8%, the mean average precision50–95 reaches 57.2%, the computational load is reduced to 11.5 GFLOPs, and the model size is only 12.6 MB, meeting lightweight standards. These findings underscore the effectiveness of YOLO-FGD in efficiently detecting PCB defects, providing robust support for electronic manufacturing quality control.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"31 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141516484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

TeleStroke: real-time stroke detection with federated learning and YOLOv8 on edge devices TeleStroke：利用联合学习和 YOLOv8 在边缘设备上进行实时中风检测

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Real-Time Image Processing

Pub Date : 2024-06-26 DOI: 10.1007/s11554-024-01500-1

Abdussalam Elhanashi, Pierpaolo Dini, Sergio Saponara, Qinghe Zheng

Stroke, a life-threatening medical condition, necessitates immediate intervention for optimal outcomes. Timely diagnosis and treatment play a crucial role in reducing mortality and minimizing long-term disabilities associated with strokes. This study presents a novel approach to meet these critical needs by proposing a real-time stroke detection system based on deep learning (DL) with utilization of federated learning (FL) to enhance accuracy and privacy preservation. The primary objective of this research is to develop an efficient and accurate model capable of discerning between stroke and non-stroke cases in real-time, facilitating healthcare professionals in making well-informed decisions. Traditional stroke detection methods relying on manual interpretation of medical images are time-consuming and prone to human error. DL techniques have shown promise in automating this process, yet challenges persist due to the need for extensive and diverse datasets and privacy concerns. To address these challenges, our methodology involves utilization and assessing YOLOv8 models on comprehensive datasets comprising both stroke and non-stroke based on the facial paralysis of the individuals from the images. This training process empowers the model to grasp intricate patterns and features associated with strokes, thereby enhancing its diagnostic accuracy. In addition, federated learning, a decentralized training approach, is employed to bolster privacy while preserving model performance. This approach enables the model to learn from data distributed across various clients without compromising sensitive patient information. The proposed methodology has been implemented on NVIDIA platforms, utilizing their advanced GPU capabilities to enable real-time processing and analysis. This optimized model has the potential to revolutionize stroke diagnosis and patient care, promising to save lives and elevate the quality of healthcare services in the neurology field.

脑卒中是一种危及生命的疾病，必须立即进行干预才能取得最佳疗效。及时诊断和治疗在降低死亡率和减少与中风相关的长期残疾方面发挥着至关重要的作用。本研究提出了一种满足这些关键需求的新方法，即基于深度学习（DL）的实时中风检测系统，并利用联合学习（FL）来提高准确性和保护隐私。这项研究的主要目的是开发一种高效、准确的模型，能够实时分辨中风和非中风病例，帮助医疗保健专业人员做出明智的决策。传统的中风检测方法依赖人工解读医学图像，既费时又容易出现人为错误。DL 技术在实现这一过程的自动化方面已初见成效，但由于需要广泛多样的数据集和隐私问题，挑战依然存在。为了应对这些挑战，我们的方法包括在综合数据集上使用 YOLOv8 模型并对其进行评估，这些数据集包括中风和非中风，基于图像中个人的面部瘫痪情况。这一训练过程使模型能够掌握与中风相关的复杂模式和特征，从而提高其诊断准确性。此外，联合学习是一种分散式训练方法，可在保护模型性能的同时保护隐私。这种方法使模型能够从分布在不同客户端的数据中学习，而不会泄露敏感的患者信息。所提出的方法已在英伟达™（NVIDIA®）平台上实施，利用其先进的 GPU 功能实现了实时处理和分析。这一优化模型有望彻底改变中风诊断和患者护理，挽救生命并提高神经病学领域的医疗服务质量。

{"title":"TeleStroke: real-time stroke detection with federated learning and YOLOv8 on edge devices","authors":"Abdussalam Elhanashi, Pierpaolo Dini, Sergio Saponara, Qinghe Zheng","doi":"10.1007/s11554-024-01500-1","DOIUrl":"https://doi.org/10.1007/s11554-024-01500-1","url":null,"abstract":"Stroke, a life-threatening medical condition, necessitates immediate intervention for optimal outcomes. Timely diagnosis and treatment play a crucial role in reducing mortality and minimizing long-term disabilities associated with strokes. This study presents a novel approach to meet these critical needs by proposing a real-time stroke detection system based on deep learning (DL) with utilization of federated learning (FL) to enhance accuracy and privacy preservation. The primary objective of this research is to develop an efficient and accurate model capable of discerning between stroke and non-stroke cases in real-time, facilitating healthcare professionals in making well-informed decisions. Traditional stroke detection methods relying on manual interpretation of medical images are time-consuming and prone to human error. DL techniques have shown promise in automating this process, yet challenges persist due to the need for extensive and diverse datasets and privacy concerns. To address these challenges, our methodology involves utilization and assessing YOLOv8 models on comprehensive datasets comprising both stroke and non-stroke based on the facial paralysis of the individuals from the images. This training process empowers the model to grasp intricate patterns and features associated with strokes, thereby enhancing its diagnostic accuracy. In addition, federated learning, a decentralized training approach, is employed to bolster privacy while preserving model performance. This approach enables the model to learn from data distributed across various clients without compromising sensitive patient information. The proposed methodology has been implemented on NVIDIA platforms, utilizing their advanced GPU capabilities to enable real-time processing and analysis. This optimized model has the potential to revolutionize stroke diagnosis and patient care, promising to save lives and elevate the quality of healthcare services in the neurology field.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"61 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141529223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Towards real-time video analysis of flooded areas: redundancy-based accelerator for object detection models 实现水灾地区的实时视频分析：基于冗余的物体检测模型加速器

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Real-Time Image Processing

Pub Date : 2024-06-25 DOI: 10.1007/s11554-024-01490-0

Shubhasree AV, Praveen Sankaran, Raghu C.V

The state of Kerala in India has seen multiple instances of intense cyclones in recent years, resulting in heavy flooding. One of the biggest challenges faced by rescuers is the accessibility to flooded areas and buildings during rescue operations. In such scenarios, unmanned aerial vehicles (UAVs) can deliver reliable aerial visual data to aid planning and operations during rescue. Object detectors based on deep learning methods provide an effective solution to automate the process of detecting relevant information from image/video data. These models are complex and resource-hungry, leading to severe speed constraints during field operations. The pixel displacement algorithm (PDA), a portable and effective technique, is developed in this work to speed up object detection models on devices with limited resources, such as edge devices. This method can be integrated with all object detection models to speed up the inference time. The proposed method is combined with multiple object detection models in this work to show its effectiveness. The YOLOv4 model combined with the proposed method outperformed the AP50 performance of the YOLOv4-tiny model by 6(%) while maintaining the same processing time. This approach gave almost 10(times ) speed improvement to Jetson Nano at an accuracy cost of (3%) when compared to YOLOv4. Further, a model to predict maximum pixel shift with respect to frame skip is proposed using parameters such as the altitude and velocity of the UAV and the tilt of the camera. Accurate prediction of pixel shift leads to a reduced search area, leading to reduced inference time. The effectiveness of the proposed model was tested against annotated locations, and it was found that the method was able to predict the search area for each test video segment with a high degree of accuracy.

近年来，印度喀拉拉邦多次遭受强烈气旋袭击，导致洪水泛滥。救援人员面临的最大挑战之一是在救援行动中如何进入洪水淹没的地区和建筑物。在这种情况下，无人飞行器（UAV）可以提供可靠的空中视觉数据，帮助救援期间的规划和行动。基于深度学习方法的物体检测器为从图像/视频数据中自动检测相关信息的过程提供了有效的解决方案。这些模型既复杂又耗费资源，导致在现场作业时速度受到严重限制。像素位移算法（PDA）是一种便携而有效的技术，在本作品中被开发出来，以加快资源有限的设备（如边缘设备）上的物体检测模型。该方法可与所有物体检测模型集成，以加快推理时间。本作品将所提出的方法与多种物体检测模型相结合，以显示其有效性。在保持相同处理时间的情况下，YOLOv4 模型与所提方法相结合的 AP50 性能比 YOLOv4-tiny 模型高出 6（%）。与 YOLOv4 相比，这种方法为 Jetson Nano 带来了近 10 倍的速度提升，但准确率却降低了 3%。此外，还提出了一个模型，利用无人机的高度和速度以及相机的倾斜度等参数，预测相对于帧跳的最大像素偏移。准确预测像素偏移可缩小搜索范围，从而缩短推理时间。根据注释位置测试了所提模型的有效性，结果发现该方法能够高度准确地预测每个测试视频片段的搜索区域。

{"title":"Towards real-time video analysis of flooded areas: redundancy-based accelerator for object detection models","authors":"Shubhasree AV, Praveen Sankaran, Raghu C.V","doi":"10.1007/s11554-024-01490-0","DOIUrl":"https://doi.org/10.1007/s11554-024-01490-0","url":null,"abstract":"The state of Kerala in India has seen multiple instances of intense cyclones in recent years, resulting in heavy flooding. One of the biggest challenges faced by rescuers is the accessibility to flooded areas and buildings during rescue operations. In such scenarios, unmanned aerial vehicles (UAVs) can deliver reliable aerial visual data to aid planning and operations during rescue. Object detectors based on deep learning methods provide an effective solution to automate the process of detecting relevant information from image/video data. These models are complex and resource-hungry, leading to severe speed constraints during field operations. The pixel displacement algorithm (PDA), a portable and effective technique, is developed in this work to speed up object detection models on devices with limited resources, such as edge devices. This method can be integrated with all object detection models to speed up the inference time. The proposed method is combined with multiple object detection models in this work to show its effectiveness. The YOLOv4 model combined with the proposed method outperformed the AP50 performance of the YOLOv4-tiny model by 6(%) while maintaining the same processing time. This approach gave almost 10(times ) speed improvement to Jetson Nano at an accuracy cost of (3%) when compared to YOLOv4. Further, a model to predict maximum pixel shift with respect to frame skip is proposed using parameters such as the altitude and velocity of the UAV and the tilt of the camera. Accurate prediction of pixel shift leads to a reduced search area, leading to reduced inference time. The effectiveness of the proposed model was tested against annotated locations, and it was found that the method was able to predict the search area for each test video segment with a high degree of accuracy.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"38 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141532365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Safety helmet detection based on improved YOLOv7-tiny with multiple feature enhancement 基于改进型 YOLOv7-tiny 和多重特征增强的安全头盔检测技术

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Real-Time Image Processing

Pub Date : 2024-06-25 DOI: 10.1007/s11554-024-01501-0

Shuqiang Wang, Peiyang Wu, Qingqing Wu

Safety helmets are vital protective gear for construction workers, effectively reducing head injuries and safeguarding lives. By identification of safety helmet usage, workers’ unsafe behaviors can be detected and corrected in a timely manner, reducing the possibility of accidents. Target detection methods based on computer vision can achieve fast and accurate detection regarding the wearing habits of safety helmets of workers. In this study, we propose a real-time construction-site helmet detection algorithm that improves YOLOv7-tiny to address the problems associated with automatically identifying construction-site helmets. First, the Efficient Multi-scale Attention (EMA) module is introduced at the trunk to capture the detailed information; here, the model is more focused on training to recognize the helmet-related target features. Second, the detection head is replaced with a self-attentive Dynamic Head (DyHead) for stronger feature representation. Finally, Wise-IoU (WIoU) with a dynamic nonmonotonic focusing mechanism is used as a loss function to improve the model’s ability to manage the situation of mutual occlusion between workers and enhance the detection performance. The experimental results show that the improved YOLOv7-tiny algorithm model yields 3.3, 1.5, and 5.6% improvements in the evaluation of indices of mAP@0.5, precision, and recall, respectively, while maintaining its lightweight features; this enables more accurate detection with a suitable detection speed and is more in conjunction with the needs of on-site-automated detection.

安全帽是建筑工人的重要防护装备，能有效减少头部伤害，保障生命安全。通过识别安全帽的使用情况，可以及时发现和纠正工人的不安全行为，降低事故发生的可能性。基于计算机视觉的目标检测方法可以快速、准确地检测工人的安全帽佩戴习惯。在本研究中，我们提出了一种建筑工地安全帽实时检测算法，该算法改进了 YOLOv7-tiny，以解决自动识别建筑工地安全帽的相关问题。首先，在躯干处引入了高效多尺度注意力（EMA）模块，以捕捉详细信息；在此，模型更专注于训练识别与安全帽相关的目标特征。其次，检测头被自注意力动态头（DyHead）取代，以获得更强的特征表示。最后，使用具有动态非单调聚焦机制的 Wise-IoU (WIoU) 作为损失函数，以提高模型管理工人之间相互遮挡情况的能力，并提高检测性能。实验结果表明，改进后的 YOLOv7-tiny 算法模型在保持轻量级特征的前提下，在 mAP@0.5、精确度和召回率等指标上分别提高了 3.3%、1.5% 和 5.6%，从而以合适的检测速度实现了更精确的检测，更符合现场自动检测的需求。

{"title":"Safety helmet detection based on improved YOLOv7-tiny with multiple feature enhancement","authors":"Shuqiang Wang, Peiyang Wu, Qingqing Wu","doi":"10.1007/s11554-024-01501-0","DOIUrl":"https://doi.org/10.1007/s11554-024-01501-0","url":null,"abstract":"Safety helmets are vital protective gear for construction workers, effectively reducing head injuries and safeguarding lives. By identification of safety helmet usage, workers’ unsafe behaviors can be detected and corrected in a timely manner, reducing the possibility of accidents. Target detection methods based on computer vision can achieve fast and accurate detection regarding the wearing habits of safety helmets of workers. In this study, we propose a real-time construction-site helmet detection algorithm that improves YOLOv7-tiny to address the problems associated with automatically identifying construction-site helmets. First, the Efficient Multi-scale Attention (EMA) module is introduced at the trunk to capture the detailed information; here, the model is more focused on training to recognize the helmet-related target features. Second, the detection head is replaced with a self-attentive Dynamic Head (DyHead) for stronger feature representation. Finally, Wise-IoU (WIoU) with a dynamic nonmonotonic focusing mechanism is used as a loss function to improve the model’s ability to manage the situation of mutual occlusion between workers and enhance the detection performance. The experimental results show that the improved YOLOv7-tiny algorithm model yields 3.3, 1.5, and 5.6% improvements in the evaluation of indices of mAP@0.5, precision, and recall, respectively, while maintaining its lightweight features; this enables more accurate detection with a suitable detection speed and is more in conjunction with the needs of on-site-automated detection.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"30 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141529224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A novel pipelined architecture of entropy filter 熵滤波器的新型流水线结构

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Real-Time Image Processing

Pub Date : 2024-06-23 DOI: 10.1007/s11554-024-01498-6

Dat Ngo, Bongsoon Kang

In computer vision, entropy is a measure adopted to characterize the texture information of a grayscale image, and an entropy filter is a fundamental operation used to calculate local entropy. However, this filter is computationally intensive and demands an efficient means of implementation. Additionally, with the foreseeable end of Moore’s law, there is a growing trend towards hardware offloading to increase computing power. In line with this trend, we propose a novel method for the calculation of local entropy and introduce a corresponding pipelined architecture. Under the proposed method, a sliding window of pixels undergoes three steps: sorting, adjacent difference calculation, and pipelined entropy calculation. Compared with a conventional design, implementation results on a Zynq UltraScale+ XCZU7EV-2FFVC1156 MPSoC device demonstrate that our pipelined architecture can reach a maximum throughput of handling 764.526 megapixels per second while achieving (2.4times) and (2.9times) reductions in resource utilization and (1.1times) reduction in power consumption.

在计算机视觉领域，熵是用来描述灰度图像纹理信息的一种度量，而熵滤波器是用来计算局部熵的基本操作。然而，这种滤波器的计算量很大，需要高效的实现方法。此外，随着摩尔定律的终结，越来越多的人倾向于通过硬件卸载来提高计算能力。顺应这一趋势，我们提出了一种计算局部熵的新方法，并引入了相应的流水线架构。根据提出的方法，像素滑动窗口需要经过三个步骤：排序、相邻差计算和流水线熵计算。与传统设计相比，在Zynq UltraScale+ XCZU7EV-2FFVC1156 MPSoC器件上的实现结果表明，我们的流水线架构可以达到每秒处理764.526百万像素的最大吞吐量，同时实现了（2.4次）和（2.9次）资源利用率的降低和（1.1次）功耗的降低。

引用次数: 0

Rtsds:a real-time and efficient method for detecting surface defects in strip steel Rtsds：检测带钢表面缺陷的实时高效方法

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Real-Time Image Processing

Pub Date : 2024-06-19 DOI: 10.1007/s11554-024-01497-7

Qingtian Zeng, Daibai Wei, Minghao Zou

To address the issues of varying defect sizes, inconsistent data quality, and real-time detection challenges in steel defect detection, we propose a real-time efficient steel defect detection network (RTSD). This model employs a multi-scale feature extraction module (MSC3) and a mid-sized object detector (MidObj) to comprehensively capture texture features of defects across different scales. We incorporate a coordinate attention module (CA) and replace the spatial pyramid pooling structure (SPPF) to enhance defect localization capabilities. Additionally, we introduce the Wise-IoU (WIoU) loss function to balance attention to various quality defects. To address the real-time detection issue, we use Taylor channel pruning to reduce model complexity and employ channel-wise knowledge distillation instead of fine-tuning to mitigate the negative impacts of pruning. Experimental results show that on the NEU-DET data set, the average precision of RTSD reaches 83.5%. The model parameters, calculation amount, and size are 5.9M, 7.9 GFLOPs, and 11.9M, respectively, with an inference speed of up to 247.6 FPS. This demonstrates that our method can enhance performance while significantly reducing model complexity and computational overhead, offering a highly practical solution for industrial applications.

针对钢材缺陷检测中存在的缺陷大小不一、数据质量不稳定以及实时检测困难等问题，我们提出了一种实时高效的钢材缺陷检测网络（RTSD）。该模型采用多尺度特征提取模块（MSC3）和中型物体检测器（MidObj）来全面捕捉不同尺度的缺陷纹理特征。我们加入了坐标注意模块 (CA)，并替换了空间金字塔池结构 (SPPF)，以增强缺陷定位能力。此外，我们还引入了 Wise-IoU (WIoU) 损失函数，以平衡对各种质量缺陷的关注。为了解决实时检测问题，我们使用泰勒信道剪枝来降低模型复杂度，并采用信道知识提炼而不是微调来减轻剪枝的负面影响。实验结果表明，在 NEU-DET 数据集上，RTSD 的平均精度达到 83.5%。模型参数、计算量和大小分别为 5.9M、7.9 GFLOPs 和 11.9M，推理速度高达 247.6 FPS。这表明，我们的方法可以在提高性能的同时，显著降低模型复杂度和计算开销，为工业应用提供了一个非常实用的解决方案。

{"title":"Rtsds:a real-time and efficient method for detecting surface defects in strip steel","authors":"Qingtian Zeng, Daibai Wei, Minghao Zou","doi":"10.1007/s11554-024-01497-7","DOIUrl":"https://doi.org/10.1007/s11554-024-01497-7","url":null,"abstract":"To address the issues of varying defect sizes, inconsistent data quality, and real-time detection challenges in steel defect detection, we propose a real-time efficient steel defect detection network (RTSD). This model employs a multi-scale feature extraction module (MSC3) and a mid-sized object detector (MidObj) to comprehensively capture texture features of defects across different scales. We incorporate a coordinate attention module (CA) and replace the spatial pyramid pooling structure (SPPF) to enhance defect localization capabilities. Additionally, we introduce the Wise-IoU (WIoU) loss function to balance attention to various quality defects. To address the real-time detection issue, we use Taylor channel pruning to reduce model complexity and employ channel-wise knowledge distillation instead of fine-tuning to mitigate the negative impacts of pruning. Experimental results show that on the NEU-DET data set, the average precision of RTSD reaches 83.5%. The model parameters, calculation amount, and size are 5.9M, 7.9 GFLOPs, and 11.9M, respectively, with an inference speed of up to 247.6 FPS. This demonstrates that our method can enhance performance while significantly reducing model complexity and computational overhead, offering a highly practical solution for industrial applications.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"9 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141530841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A generic deep learning architecture optimization method for edge device based on start-up latency reduction 基于降低启动延迟的边缘设备通用深度学习架构优化方法

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Real-Time Image Processing

Pub Date : 2024-06-18 DOI: 10.1007/s11554-024-01496-8

Qi Li, Hengyi Li, Lin Meng

In the promising Artificial Intelligence of Things technology, deep learning algorithms are implemented on edge devices to process data locally. However, high-performance deep learning algorithms are accompanied by increased computation and parameter storage costs, leading to difficulties in implementing huge deep learning algorithms on memory and power constrained edge devices, such as smartphones and drones. Thus various compression methods are proposed, such as channel pruning. According to the analysis of low-level operations on edge devices, existing channel pruning methods have limited effect on latency optimization. Due to data processing operations, the pruned residual blocks still result in significant latency, which hinders real-time processing of CNNs on edge devices. Hence, we propose a generic deep learning architecture optimization method to achieve further acceleration on edge devices. The network is optimized in two stages, Global Constraint and Start-up Latency Reduction, and pruning of both channels and residual blocks is achieved. Optimized networks are evaluated on desktop CPU, FPGA, ARM CPU, and PULP platforms. The experimental results show that the latency is reduced by up to 70.40%, which is 13.63% higher than only applying channel pruning and achieving real-time processing in the edge device.

在前景广阔的人工智能物联网技术中，深度学习算法是在边缘设备上实现本地数据处理的。然而，高性能的深度学习算法伴随着计算和参数存储成本的增加，导致在智能手机和无人机等内存和功耗受限的边缘设备上实施庞大的深度学习算法存在困难。因此，人们提出了各种压缩方法，如通道剪枝。根据对边缘设备底层操作的分析，现有的通道剪枝方法对延迟优化的效果有限。由于数据处理操作的原因，剪枝后的残余块仍然会导致明显的延迟，这阻碍了边缘设备上 CNN 的实时处理。因此，我们提出了一种通用的深度学习架构优化方法，以实现在边缘设备上的进一步加速。网络优化分为两个阶段：全局约束和启动延迟降低，并实现了通道和残余块的剪枝。在桌面 CPU、FPGA、ARM CPU 和 PULP 平台上对优化后的网络进行了评估。实验结果表明，延迟降低了 70.40%，比仅应用通道剪枝和在边缘设备中实现实时处理高出 13.63%。

{"title":"A generic deep learning architecture optimization method for edge device based on start-up latency reduction","authors":"Qi Li, Hengyi Li, Lin Meng","doi":"10.1007/s11554-024-01496-8","DOIUrl":"https://doi.org/10.1007/s11554-024-01496-8","url":null,"abstract":"In the promising Artificial Intelligence of Things technology, deep learning algorithms are implemented on edge devices to process data locally. However, high-performance deep learning algorithms are accompanied by increased computation and parameter storage costs, leading to difficulties in implementing huge deep learning algorithms on memory and power constrained edge devices, such as smartphones and drones. Thus various compression methods are proposed, such as channel pruning. According to the analysis of low-level operations on edge devices, existing channel pruning methods have limited effect on latency optimization. Due to data processing operations, the pruned residual blocks still result in significant latency, which hinders real-time processing of CNNs on edge devices. Hence, we propose a generic deep learning architecture optimization method to achieve further acceleration on edge devices. The network is optimized in two stages, Global Constraint and Start-up Latency Reduction, and pruning of both channels and residual blocks is achieved. Optimized networks are evaluated on desktop CPU, FPGA, ARM CPU, and PULP platforms. The experimental results show that the latency is reduced by up to 70.40%, which is 13.63% higher than only applying channel pruning and achieving real-time processing in the edge device.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"215 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141529227","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Deep learning based insulator fault detection algorithm for power transmission lines 基于深度学习的输电线路绝缘体故障检测算法

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Real-Time Image Processing

Pub Date : 2024-06-18 DOI: 10.1007/s11554-024-01495-9

Han Wang, Qing Yang, Binlin Zhang, Dexin Gao

Aiming at the complex background of transmission lines at the present stage, which leads to the problem of low accuracy of insulator fault detection for small targets, a deep learning-based insulator fault detection algorithm for transmission lines is proposed. First, aerial images of insulators are collected using UAVs in different scenarios to establish insulator fault datasets. After that, in order to improve the detection efficiency of the target detection algorithm, certain improvements are made on the basis of the YOLOV9 algorithm. The improved algorithm enhances the feature extraction capability of the algorithm for insulator faults at a smaller computational cost by adding the GAM attention mechanism; at the same time, in order to realize the detection efficiency of small targets for insulator faults, the generalized efficient layer aggregation network (GELAN) module is improved and a new SC-GELAN module is proposed; the original loss function is replaced by the effective intersection-over-union (EIOU) loss function to minimize the difference between the aspect ratio of the predicted frame and the real frame, thereby accelerating the convergence speed of the model. Finally, the proposed algorithm is trained and tested with other target detection algorithms on the established insulator fault dataset. The experimental results and analysis show that the algorithm in this paper ensures a certain detection speed, while the algorithmic model has a higher detection accuracy, which is more suitable for UAV fault detection of insulators on transmission lines.

针对现阶段输电线路背景复杂，导致小目标绝缘子故障检测精度低的问题，提出了一种基于深度学习的输电线路绝缘子故障检测算法。首先，利用无人机采集不同场景下的绝缘子航拍图像，建立绝缘子故障数据集。之后，为了提高目标检测算法的检测效率，在 YOLOV9 算法的基础上进行了一定的改进。改进后的算法通过增加 GAM 注意机制，以较小的计算成本提高了算法对绝缘体故障的特征提取能力；同时，为了实现对绝缘体故障小目标的检测效率，改进了广义高效层聚合网络（GELAN）模块，并提出了新的 SC-GELAN 模块；用有效交集-过联合（EIOU）损失函数代替原来的损失函数，使预测帧与真实帧的长宽比之差最小，从而加快了模型的收敛速度。最后，在已建立的绝缘子故障数据集上对所提出的算法进行了训练，并与其他目标检测算法进行了测试。实验结果和分析表明，本文算法保证了一定的检测速度，同时算法模型具有较高的检测精度，更适用于输电线路绝缘子的无人机故障检测。

{"title":"Deep learning based insulator fault detection algorithm for power transmission lines","authors":"Han Wang, Qing Yang, Binlin Zhang, Dexin Gao","doi":"10.1007/s11554-024-01495-9","DOIUrl":"https://doi.org/10.1007/s11554-024-01495-9","url":null,"abstract":"Aiming at the complex background of transmission lines at the present stage, which leads to the problem of low accuracy of insulator fault detection for small targets, a deep learning-based insulator fault detection algorithm for transmission lines is proposed. First, aerial images of insulators are collected using UAVs in different scenarios to establish insulator fault datasets. After that, in order to improve the detection efficiency of the target detection algorithm, certain improvements are made on the basis of the YOLOV9 algorithm. The improved algorithm enhances the feature extraction capability of the algorithm for insulator faults at a smaller computational cost by adding the GAM attention mechanism; at the same time, in order to realize the detection efficiency of small targets for insulator faults, the generalized efficient layer aggregation network (GELAN) module is improved and a new SC-GELAN module is proposed; the original loss function is replaced by the effective intersection-over-union (EIOU) loss function to minimize the difference between the aspect ratio of the predicted frame and the real frame, thereby accelerating the convergence speed of the model. Finally, the proposed algorithm is trained and tested with other target detection algorithms on the established insulator fault dataset. The experimental results and analysis show that the algorithm in this paper ensures a certain detection speed, while the algorithmic model has a higher detection accuracy, which is more suitable for UAV fault detection of insulators on transmission lines.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"70 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141529229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ARF-YOLOv8: a novel real-time object detection model for UAV-captured images detection ARF-YOLOv8：用于无人机捕获图像检测的新型实时物体检测模型

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Real-Time Image Processing

Pub Date : 2024-06-04 DOI: 10.1007/s11554-024-01483-z

YaLin Zeng, DongJin Guo, WeiKai He, Tian Zhang, ZhongTao Liu

There are several difficulties in the task of object detection for Unmanned Aerial Vehicle (UAV) photography images, including the small size of objects, densely distributed objects, and diverse perspectives from which the objects are captured. To tackle these challenges, we proposed a real-time algorithm named adjusting overall receptive field enhancement YOLOv8 (ARF-YOLOv8) for object detection in UAV-captured images. Our approach begins with a comprehensive restructuring of the YOLOv8 network architecture. The primary objectives are to mitigate the loss of shallow-level information and establish an optimal model receptive field. Subsequently, we designed a bibranch fusion attention module based on Coordinate Attention which is seamlessly integrated into the detection network. This module combines features processed by Coordinate Attention module with shallow-level features, facilitating the extraction of multi-level feature information. Furthermore, recognizing the influence of target size on boundary box loss, we refine the boundary box loss function CIoU Loss employed in YOLOv8. Extensive experimentation conducted on the visdrone2019 dataset provides empirical evidence supporting the superior performance of ARF-YOLOv8. In comparison to YOLOv8, our method demonstrates a noteworthy 6.86% increase in mAP (0.5:0.95) while maintaining similar detection speeds. The code is available at https://github.com/sbzeng/ARF-YOLOv8-for-uav/tree/main.

无人飞行器（UAV）摄影图像的物体检测任务有几个难点，包括物体尺寸小、物体分布密集以及拍摄物体的视角不同。为了应对这些挑战，我们提出了一种名为调整整体感受野增强 YOLOv8（ARF-YOLOv8）的实时算法，用于无人机拍摄图像中的物体检测。我们的方法首先是全面重组 YOLOv8 网络架构。其主要目标是减少浅层信息的损失，并建立最佳模型感受野。随后，我们设计了一个基于坐标注意力的双支融合注意力模块，并将其无缝集成到检测网络中。该模块将坐标注意模块处理过的特征与浅层特征相结合，便于提取多层次特征信息。此外，考虑到目标大小对边界盒损失的影响，我们改进了 YOLOv8 中使用的边界盒损失函数 CIoU Loss。在 visdrone2019 数据集上进行的广泛实验为 ARF-YOLOv8 的卓越性能提供了实证支持。与 YOLOv8 相比，我们的方法在保持类似检测速度的同时，将 mAP（0.5:0.95）显著提高了 6.86%。代码见 https://github.com/sbzeng/ARF-YOLOv8-for-uav/tree/main。

{"title":"ARF-YOLOv8: a novel real-time object detection model for UAV-captured images detection","authors":"YaLin Zeng, DongJin Guo, WeiKai He, Tian Zhang, ZhongTao Liu","doi":"10.1007/s11554-024-01483-z","DOIUrl":"https://doi.org/10.1007/s11554-024-01483-z","url":null,"abstract":"There are several difficulties in the task of object detection for Unmanned Aerial Vehicle (UAV) photography images, including the small size of objects, densely distributed objects, and diverse perspectives from which the objects are captured. To tackle these challenges, we proposed a real-time algorithm named adjusting overall receptive field enhancement YOLOv8 (ARF-YOLOv8) for object detection in UAV-captured images. Our approach begins with a comprehensive restructuring of the YOLOv8 network architecture. The primary objectives are to mitigate the loss of shallow-level information and establish an optimal model receptive field. Subsequently, we designed a bibranch fusion attention module based on Coordinate Attention which is seamlessly integrated into the detection network. This module combines features processed by Coordinate Attention module with shallow-level features, facilitating the extraction of multi-level feature information. Furthermore, recognizing the influence of target size on boundary box loss, we refine the boundary box loss function CIoU Loss employed in YOLOv8. Extensive experimentation conducted on the visdrone2019 dataset provides empirical evidence supporting the superior performance of ARF-YOLOv8. In comparison to YOLOv8, our method demonstrates a noteworthy 6.86% increase in mAP (0.5:0.95) while maintaining similar detection speeds. The code is available at https://github.com/sbzeng/ARF-YOLOv8-for-uav/tree/main.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"66 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141252696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Fcd-cnn: FPGA-based CU depth decision for HEVC intra encoder using CNN Fcd-cnn：使用 CNN 为 HEVC 内编码器做出基于 FPGA 的 CU 深度决策

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Real-Time Image Processing

Pub Date : 2024-06-02 DOI: 10.1007/s11554-024-01487-9

Hossein Dehnavi, Mohammad Dehnavi, Sajad Haghzad Klidbary

Video compression for storage and transmission has always been a focal point for researchers in the field of image processing. Their efforts aim to reduce the data volume required for video representation while maintaining its quality. HEVC is one of the efficient standards for video compression, receiving special attention due to the increasing demand for high-resolution videos. The main step in video compression involves dividing the coding unit (CU) blocks into smaller blocks that have a uniform texture. In traditional methods, The Discrete Cosine Transform (DCT) is applied, followed by the use of RDO for decision-making on partitioning. This paper presents a novel convolutional neural network (CNN) and its hardware implementation as an alternative to DCT, aimed at speeding up partitioning and reducing the hardware resources required. The proposed hardware utilizes an efficient and lightweight CNN to partition CUs with low hardware resources in real-time applications. This CNN is trained for different Quantization Parameters (QPs) and block sizes to prevent overfitting. Furthermore, the system’s input size is fixed at (16times 16), and other input sizes are scaled to this dimension. Loop unrolling, data reuse, and resource sharing are applied in hardware implementation to save resources. The hardware architecture is fixed for all block sizes and QPs, and only the coefficients of the CNN are changed. In terms of compression quality, the proposed hardware achieves a (4.42%) BD-BR and (-,0.19) BD-PSNR compared to HM16.5. The proposed system can process (64times 64) CU at 150 MHz and in 4914 clock cycles. The hardware resources utilized by the proposed system include 13,141 LUTs, 15,885 Flip-flops, 51 BRAMs, and 74 DSPs.

用于存储和传输的视频压缩一直是图像处理领域研究人员关注的焦点。他们的努力旨在减少视频表示所需的数据量，同时保持视频质量。HEVC 是高效的视频压缩标准之一，由于对高分辨率视频的需求日益增长，它受到了特别关注。视频压缩的主要步骤是将编码单元（CU）块划分为具有统一纹理的较小块。在传统方法中，首先应用离散余弦变换（DCT），然后使用 RDO 对分割进行决策。本文介绍了一种新型卷积神经网络（CNN）及其硬件实现，作为 DCT 的替代方案，旨在加快分割速度并减少所需的硬件资源。拟议的硬件利用高效、轻量级的 CNN，在实时应用中以较低的硬件资源对 CU 进行分区。该 CNN 针对不同的量化参数（QPs）和块大小进行训练，以防止过度拟合。此外，系统的输入大小固定为 (16times 16) ，其他输入大小也按此维度缩放。为了节省资源，硬件实现中采用了循环解卷、数据重用和资源共享等方法。对于所有的块大小和 QPs，硬件架构都是固定的，只改变 CNN 的系数。在压缩质量方面，与HM16.5相比，所提出的硬件实现了BD-BR和BD-PSNR。提议的系统可以在150 MHz和4914个时钟周期内处理64次CU。拟议系统使用的硬件资源包括 13,141 个 LUT、15,885 个触发器、51 个 BRAM 和 74 个 DSP。

{"title":"Fcd-cnn: FPGA-based CU depth decision for HEVC intra encoder using CNN","authors":"Hossein Dehnavi, Mohammad Dehnavi, Sajad Haghzad Klidbary","doi":"10.1007/s11554-024-01487-9","DOIUrl":"https://doi.org/10.1007/s11554-024-01487-9","url":null,"abstract":"Video compression for storage and transmission has always been a focal point for researchers in the field of image processing. Their efforts aim to reduce the data volume required for video representation while maintaining its quality. HEVC is one of the efficient standards for video compression, receiving special attention due to the increasing demand for high-resolution videos. The main step in video compression involves dividing the coding unit (CU) blocks into smaller blocks that have a uniform texture. In traditional methods, The Discrete Cosine Transform (DCT) is applied, followed by the use of RDO for decision-making on partitioning. This paper presents a novel convolutional neural network (CNN) and its hardware implementation as an alternative to DCT, aimed at speeding up partitioning and reducing the hardware resources required. The proposed hardware utilizes an efficient and lightweight CNN to partition CUs with low hardware resources in real-time applications. This CNN is trained for different Quantization Parameters (QPs) and block sizes to prevent overfitting. Furthermore, the system’s input size is fixed at (16times 16), and other input sizes are scaled to this dimension. Loop unrolling, data reuse, and resource sharing are applied in hardware implementation to save resources. The hardware architecture is fixed for all block sizes and QPs, and only the coefficients of the CNN are changed. In terms of compression quality, the proposed hardware achieves a (4.42%) BD-BR and (-,0.19) BD-PSNR compared to HM16.5. The proposed system can process (64times 64) CU at 150 MHz and in 4914 clock cycles. The hardware resources utilized by the proposed system include 13,141 LUTs, 15,885 Flip-flops, 51 BRAMs, and 74 DSPs.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"70 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141252051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0