Journal of Real-Time Image Processing最新文献_第4页

Advanced diagnosis of common rice leaf diseases using KERTL-BME ensemble approach 利用 KERTL-BME 组合方法对常见水稻叶病进行高级诊断

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Real-Time Image Processing

Pub Date : 2024-08-01 DOI: 10.1007/s11554-024-01522-9

Chinna Gopi Simhadri, Hari Kishan Kondaveeti

The influence of rice leaf diseases has resulted in an annual decrease in rice mass production. This occurs mainly due to the need for more understanding in perceiving and managing rice leaf diseases. However, there has not yet been any appropriate application designed to accurately detect rice leaf diseases. This paper, we proposed a novel method called Kushner Elman Recurrent Transfer Learning-based Boyer Moore Ensemble (KERTL-BME) to detect rice leaf diseases and differentiate between healthy and diseased images. Using the KERTL-BME method, the four most common rice leaf diseases, namely Bacterial leaf blight, Brown spot, Leaf blast, and Leaf scald, are detected. First, the Kushner non-linear filter is applied to the sample images to remove noise and differentiate between measurements and expected values by pixels in the neighborhood according to time instances. This significantly improves the peak signal-to-noise ratio while preserving the edges. The transfer learning in our work uses DenseNet169 pre-trained models to extract relevant features via the Elman Recurrent Network, which improves accuracy for the rice leaf 5 disease dataset. Additionally, the ensemble of transfer learning helps to minimize generalization errors, making the proposed method more robust. Finally, Boyer–Moore majority voting is applied to minimize generalization significantly, thereby improving overall prediction accuracy and reducing prediction error promptly. The rice leaf 5 disease dataset is used for training and testing the method. Performance measures such as prediction accuracy, prediction time, prediction error, and peak signal-to-noise ratio were calculated and monitored. The designed method predicts disease-affected rice leaves with greater accuracy.

受水稻叶部病害的影响，水稻产量逐年下降。出现这种情况的主要原因是需要更多地了解水稻叶部病害的感知和管理。然而，目前还没有设计出任何适当的应用来准确检测水稻叶部病害。本文提出了一种名为 "基于 Kushner Elman 循环迁移学习的 Boyer Moore 集合（KERTL-BME）"的新方法，用于检测水稻叶片病害并区分健康和病害图像。利用 KERTL-BME 方法，可以检测出四种最常见的水稻叶片病害，即细菌性叶枯病、褐斑病、叶瘟和叶烫病。首先，对样本图像应用库什纳非线性滤波器去除噪声，并根据时间实例区分邻域像素的测量值和预期值。这大大提高了峰值信噪比，同时保留了边缘。我们工作中的迁移学习使用 DenseNet169 预先训练的模型，通过 Elman 循环网络提取相关特征，从而提高了水稻五叶病数据集的准确性。此外，迁移学习的集合有助于最大限度地减少泛化误差，从而使所提出的方法更加稳健。最后，Boyer-Moore 多票制的应用大大减少了泛化误差，从而提高了整体预测准确率，并及时减少了预测误差。水稻五叶病数据集用于训练和测试该方法。计算并监测了预测精度、预测时间、预测误差和峰值信噪比等性能指标。所设计的方法能更准确地预测受病害影响的水稻叶片。

{"title":"Advanced diagnosis of common rice leaf diseases using KERTL-BME ensemble approach","authors":"Chinna Gopi Simhadri, Hari Kishan Kondaveeti","doi":"10.1007/s11554-024-01522-9","DOIUrl":"https://doi.org/10.1007/s11554-024-01522-9","url":null,"abstract":"The influence of rice leaf diseases has resulted in an annual decrease in rice mass production. This occurs mainly due to the need for more understanding in perceiving and managing rice leaf diseases. However, there has not yet been any appropriate application designed to accurately detect rice leaf diseases. This paper, we proposed a novel method called Kushner Elman Recurrent Transfer Learning-based Boyer Moore Ensemble (KERTL-BME) to detect rice leaf diseases and differentiate between healthy and diseased images. Using the KERTL-BME method, the four most common rice leaf diseases, namely Bacterial leaf blight, Brown spot, Leaf blast, and Leaf scald, are detected. First, the Kushner non-linear filter is applied to the sample images to remove noise and differentiate between measurements and expected values by pixels in the neighborhood according to time instances. This significantly improves the peak signal-to-noise ratio while preserving the edges. The transfer learning in our work uses DenseNet169 pre-trained models to extract relevant features via the Elman Recurrent Network, which improves accuracy for the rice leaf 5 disease dataset. Additionally, the ensemble of transfer learning helps to minimize generalization errors, making the proposed method more robust. Finally, Boyer–Moore majority voting is applied to minimize generalization significantly, thereby improving overall prediction accuracy and reducing prediction error promptly. The rice leaf 5 disease dataset is used for training and testing the method. Performance measures such as prediction accuracy, prediction time, prediction error, and peak signal-to-noise ratio were calculated and monitored. The designed method predicts disease-affected rice leaves with greater accuracy.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"76 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141864932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A real-time and energy-efficient SRAM with mixed-signal in-memory computing near CMOS sensors 在 CMOS 传感器附近采用混合信号内存计算的实时高能效 SRAM

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Real-Time Image Processing

Pub Date : 2024-07-31 DOI: 10.1007/s11554-024-01520-x

Jose-Angel Diaz-Madrid, Gines Domenech-Asensi, Ramon Ruiz-Merino, Juan-Francisco Zapata-Perez

In-memory computing (IMC) represents a promising approach to reducing latency and enhancing the energy efficiency of operations required for calculating convolution products of images. This study proposes a fully differential current-mode architecture for computing image convolutions across all four quadrants, intended for deep learning applications within CMOS imagers utilizing IMC near the CMOS sensor. This architecture processes analog signals provided by a CMOS sensor without the need for analog-to-digital conversion. Furthermore, it eliminates the necessity for data transfer between memory and analog operators as convolutions are computed within modified SRAM memory. The paper suggests modifying the structure of a CMOS SRAM cell by incorporating transistors capable of performing multiplications between binary (−1 or +1) weights and analog signals. Modified SRAM cells can be interconnected to sum the multiplication results obtained from individual cells. This approach facilitates connecting current inputs to different SRAM cells, offering highly scalable and parallelized calculations. For this study, a configurable module comprising nine modified SRAM cells with peripheral circuitry has been designed to calculate the convolution product on each pixel of an image using a (3 times 3) mask with binary values (−1 or 1). Subsequently, an IMC module has been designed to perform 16 convolution operations in parallel, with input currents shared among the 16 modules. This configuration enables the computation of 16 convolutions simultaneously, processing a column per cycle. A digital control circuit manages both the readout or memorization of digital weights, as well as the multiply and add operations in real-time. The architecture underwent testing by performing convolutions between binary masks of 3 × 3 values and images of 32 × 32 pixels to assess accuracy and scalability when two IMC modules are vertically integrated. Convolution weights are stored locally as 1-bit digital values. The circuit was synthesized in 180 nm CMOS technology, and simulation results indicate its capability to perform a complete convolution in 3.2 ms, achieving an efficiency of 11,522 1-b TOPS/W (1-b tera-operations per second per watt) with a similarity to ideal processing of 96%.

内存计算（IMC）是减少延迟和提高计算图像卷积所需的操作能效的一种有前途的方法。本研究提出了一种全差分电流模式架构，用于计算所有四个象限的图像卷积，旨在利用 CMOS 传感器附近的 IMC，在 CMOS 成像仪内实现深度学习应用。该架构处理 CMOS 传感器提供的模拟信号，无需进行模数转换。此外，由于卷积是在修改后的 SRAM 存储器中计算的，因此无需在存储器和模拟运算器之间进行数据传输。论文建议修改 CMOS SRAM 单元的结构，加入能够执行二进制（-1 或 +1）权重与模拟信号之间乘法运算的晶体管。修改后的 SRAM 单元可以相互连接，将单个单元的乘法结果相加。这种方法便于将电流输入连接到不同的 SRAM 单元，从而提供高度可扩展的并行计算。在这项研究中，设计了一个由九个带外围电路的改良 SRAM 单元组成的可配置模块，利用二进制值（-1 或 1）的 3 次掩码计算图像每个像素的卷积。随后，设计了一个 IMC 模块来并行执行 16 个卷积运算，16 个模块共享输入电流。这种配置可同时计算 16 个卷积，每个周期处理一列。数字控制电路可实时管理数字权重的读出或记忆，以及乘法和加法运算。该架构通过在 3 × 3 值的二进制掩码和 32 × 32 像素的图像之间进行卷积来进行测试，以评估两个 IMC 模块垂直集成时的精度和可扩展性。卷积权重在本地存储为 1 位数字值。电路采用 180 纳米 CMOS 技术合成，仿真结果表明它能在 3.2 毫秒内完成一次完整的卷积，效率达到 11,522 1-b TOPS/W（每秒每瓦 1-b 太字节运算），与理想处理的相似度为 96%。

{"title":"A real-time and energy-efficient SRAM with mixed-signal in-memory computing near CMOS sensors","authors":"Jose-Angel Diaz-Madrid, Gines Domenech-Asensi, Ramon Ruiz-Merino, Juan-Francisco Zapata-Perez","doi":"10.1007/s11554-024-01520-x","DOIUrl":"https://doi.org/10.1007/s11554-024-01520-x","url":null,"abstract":"In-memory computing (IMC) represents a promising approach to reducing latency and enhancing the energy efficiency of operations required for calculating convolution products of images. This study proposes a fully differential current-mode architecture for computing image convolutions across all four quadrants, intended for deep learning applications within CMOS imagers utilizing IMC near the CMOS sensor. This architecture processes analog signals provided by a CMOS sensor without the need for analog-to-digital conversion. Furthermore, it eliminates the necessity for data transfer between memory and analog operators as convolutions are computed within modified SRAM memory. The paper suggests modifying the structure of a CMOS SRAM cell by incorporating transistors capable of performing multiplications between binary (−1 or +1) weights and analog signals. Modified SRAM cells can be interconnected to sum the multiplication results obtained from individual cells. This approach facilitates connecting current inputs to different SRAM cells, offering highly scalable and parallelized calculations. For this study, a configurable module comprising nine modified SRAM cells with peripheral circuitry has been designed to calculate the convolution product on each pixel of an image using a (3 times 3) mask with binary values (−1 or 1). Subsequently, an IMC module has been designed to perform 16 convolution operations in parallel, with input currents shared among the 16 modules. This configuration enables the computation of 16 convolutions simultaneously, processing a column per cycle. A digital control circuit manages both the readout or memorization of digital weights, as well as the multiply and add operations in real-time. The architecture underwent testing by performing convolutions between binary masks of 3 × 3 values and images of 32 × 32 pixels to assess accuracy and scalability when two IMC modules are vertically integrated. Convolution weights are stored locally as 1-bit digital values. The circuit was synthesized in 180 nm CMOS technology, and simulation results indicate its capability to perform a complete convolution in 3.2 ms, achieving an efficiency of 11,522 1-b TOPS/W (1-b tera-operations per second per watt) with a similarity to ideal processing of 96%.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"12 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141864805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Yolo-tla: An Efficient and Lightweight Small Object Detection Model based on YOLOv5 Yolo-tla：基于 YOLOv5 的高效轻量级小目标检测模型

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Real-Time Image Processing

Pub Date : 2024-07-29 DOI: 10.1007/s11554-024-01519-4

Chun-Lin Ji, Tao Yu, Peng Gao, Fei Wang, Ru-Yue Yuan

Object detection, a crucial aspect of computer vision, has seen significant advancements in accuracy and robustness. Despite these advancements, practical applications still face notable challenges, primarily the inaccurate detection or missed detection of small objects. Moreover, the extensive parameter count and computational demands of the detection models impede their deployment on equipment with limited resources. In this paper, we propose YOLO-TLA, an advanced object detection model building on YOLOv5. We first introduce an additional detection layer for small objects in the neck network pyramid architecture, thereby producing a feature map of a larger scale to discern finer features of small objects. Further, we integrate the C3CrossCovn module into the backbone network. This module uses sliding window feature extraction, which effectively minimizes both computational demand and the number of parameters, rendering the model more compact. Additionally, we have incorporated a global attention mechanism into the backbone network. This mechanism combines the channel information with global information to create a weighted feature map. This feature map is tailored to highlight the attributes of the object of interest, while effectively ignoring irrelevant details. In comparison to the baseline YOLOv5s model, our newly developed YOLO-TLA model has shown considerable improvements on the MS COCO validation dataset, with increases of 4.6% in mAP@0.5 and 4% in mAP@0.5:0.95, all while keeping the model size compact at 9.49M parameters. Further extending these improvements to the YOLOv5m model, the enhanced version exhibited a 1.7% and 1.9% increase in mAP@0.5 and mAP@0.5:0.95, respectively, with a total of 27.53M parameters. These results validate the YOLO-TLA model’s efficient and effective performance in small object detection, achieving high accuracy with fewer parameters and computational demands.

物体检测是计算机视觉的一个重要方面，在准确性和鲁棒性方面取得了显著进步。尽管取得了这些进步，但实际应用仍然面临着显著的挑战，主要是对小物体的检测不准确或漏检。此外，检测模型的大量参数和计算需求也阻碍了它们在资源有限的设备上的部署。在本文中，我们提出了基于 YOLOv5 的高级物体检测模型 YOLO-TLA。我们首先在颈部网络金字塔结构中为小物体引入了一个额外的检测层，从而生成一个更大尺度的特征图，以辨别小物体更精细的特征。此外，我们还将 C3CrossCovn 模块集成到骨干网络中。该模块采用滑动窗口特征提取，有效地减少了计算需求和参数数量，使模型更加紧凑。此外，我们还在骨干网络中加入了全局关注机制。该机制将信道信息与全局信息相结合，创建加权特征图。该特征图是为突出感兴趣对象的属性而量身定制的，同时有效地忽略了无关细节。与基线 YOLOv5s 模型相比，我们新开发的 YOLO-TLA 模型在 MS COCO 验证数据集上显示出相当大的改进，在 mAP@0.5 和 mAP@0.5:0.95 上分别提高了 4.6% 和 4%，同时模型大小保持在 949 万个参数。将这些改进进一步扩展到 YOLOv5m 模型，增强版在 mAP@0.5 和 mAP@0.5:0.95 中分别提高了 1.7% 和 1.9%，参数总数达到 2753 万。这些结果验证了 YOLO-TLA 模型在小物体检测方面的高效性能，它以更少的参数和计算需求实现了高精度。

{"title":"Yolo-tla: An Efficient and Lightweight Small Object Detection Model based on YOLOv5","authors":"Chun-Lin Ji, Tao Yu, Peng Gao, Fei Wang, Ru-Yue Yuan","doi":"10.1007/s11554-024-01519-4","DOIUrl":"https://doi.org/10.1007/s11554-024-01519-4","url":null,"abstract":"Object detection, a crucial aspect of computer vision, has seen significant advancements in accuracy and robustness. Despite these advancements, practical applications still face notable challenges, primarily the inaccurate detection or missed detection of small objects. Moreover, the extensive parameter count and computational demands of the detection models impede their deployment on equipment with limited resources. In this paper, we propose YOLO-TLA, an advanced object detection model building on YOLOv5. We first introduce an additional detection layer for small objects in the neck network pyramid architecture, thereby producing a feature map of a larger scale to discern finer features of small objects. Further, we integrate the C3CrossCovn module into the backbone network. This module uses sliding window feature extraction, which effectively minimizes both computational demand and the number of parameters, rendering the model more compact. Additionally, we have incorporated a global attention mechanism into the backbone network. This mechanism combines the channel information with global information to create a weighted feature map. This feature map is tailored to highlight the attributes of the object of interest, while effectively ignoring irrelevant details. In comparison to the baseline YOLOv5s model, our newly developed YOLO-TLA model has shown considerable improvements on the MS COCO validation dataset, with increases of 4.6% in mAP@0.5 and 4% in mAP@0.5:0.95, all while keeping the model size compact at 9.49M parameters. Further extending these improvements to the YOLOv5m model, the enhanced version exhibited a 1.7% and 1.9% increase in mAP@0.5 and mAP@0.5:0.95, respectively, with a total of 27.53M parameters. These results validate the YOLO-TLA model’s efficient and effective performance in small object detection, achieving high accuracy with fewer parameters and computational demands.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"7 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141864809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

FedsNet: the real-time network for pedestrian detection based on RT-DETR FedsNet：基于 RT-DETR 的行人实时检测网络

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Real-Time Image Processing

Pub Date : 2024-07-29 DOI: 10.1007/s11554-024-01523-8

Hao Peng, Shiqiang Chen

In response to the problems of complex model networks, low detection accuracy, and the detection of small targets prone to false detections and omissions in pedestrian detection, this paper proposes FedsNet, a pedestrian detection network based on RT-DETR. By constructing a new lightweight backbone network, ResFastNet, the number of parameters and computation of the model are reduced to accelerate the detection speed of pedestrian detection. Integrating the Efficient Multi-scale Attention(EMA) mechanism with the backbone network creates a new ResBlock module for improved detection of small targets. The more effective DySample has been adopted as the upsampling operator to improve the accuracy and robustness of pedestrian detection. SIoU is used as the loss function to improve the accuracy of pedestrian recognition and speed up model convergence. Experimental evaluations conducted on a self-built pedestrian detection dataset demonstrate that the average accuracy value of the FedsNet model is 91(%), which is a 1.7(%) improvement over the RT-DETR model. The parameters and model volume are reduced by 15.1(%) and 14.5(%), respectively. When tested on the public dataset WiderPerson, FedsNet achieved the average accuracy value of 71.3(%), an improvement of 1.1(%) over the original model. In addition, the detection speed of the FedsNet network reaches 109.5 FPS and 100.3 FPS, respectively, meeting the real-time requirements of pedestrian detection.

针对行人检测中存在的模型网络复杂、检测精度低、检测小目标易出现误检和漏检等问题，本文提出了基于 RT-DETR 的行人检测网络 FedsNet。通过构建新的轻量级骨干网络 ResFastNet，减少了模型的参数数量和计算量，从而加快了行人检测的速度。将高效多尺度注意力（EMA）机制与主干网络相结合，创建了一个新的 ResBlock 模块，以改进对小型目标的检测。采用更有效的 DySample 作为上采样算子，以提高行人检测的准确性和鲁棒性。SIoU 被用作损失函数，以提高行人识别的准确性并加速模型收敛。在自建的行人检测数据集上进行的实验评估表明，FedsNet模型的平均准确率值为91（%），比RT-DETR模型提高了1.7（%）。参数和模型体积分别减少了15.1和14.5。在公共数据集 WiderPerson 上进行测试时，FedsNet 的平均准确率达到了 71.3，比原始模型提高了 1.1。此外，FedsNet 网络的检测速度分别达到了 109.5 FPS 和 100.3 FPS，满足了行人检测的实时性要求。

{"title":"FedsNet: the real-time network for pedestrian detection based on RT-DETR","authors":"Hao Peng, Shiqiang Chen","doi":"10.1007/s11554-024-01523-8","DOIUrl":"https://doi.org/10.1007/s11554-024-01523-8","url":null,"abstract":"In response to the problems of complex model networks, low detection accuracy, and the detection of small targets prone to false detections and omissions in pedestrian detection, this paper proposes FedsNet, a pedestrian detection network based on RT-DETR. By constructing a new lightweight backbone network, ResFastNet, the number of parameters and computation of the model are reduced to accelerate the detection speed of pedestrian detection. Integrating the Efficient Multi-scale Attention(EMA) mechanism with the backbone network creates a new ResBlock module for improved detection of small targets. The more effective DySample has been adopted as the upsampling operator to improve the accuracy and robustness of pedestrian detection. SIoU is used as the loss function to improve the accuracy of pedestrian recognition and speed up model convergence. Experimental evaluations conducted on a self-built pedestrian detection dataset demonstrate that the average accuracy value of the FedsNet model is 91(%), which is a 1.7(%) improvement over the RT-DETR model. The parameters and model volume are reduced by 15.1(%) and 14.5(%), respectively. When tested on the public dataset WiderPerson, FedsNet achieved the average accuracy value of 71.3(%), an improvement of 1.1(%) over the original model. In addition, the detection speed of the FedsNet network reaches 109.5 FPS and 100.3 FPS, respectively, meeting the real-time requirements of pedestrian detection.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"198 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141864810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Csb-yolo: a rapid and efficient real-time algorithm for classroom student behavior detection Csb-yolo：快速高效的课堂学生行为实时检测算法

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Real-Time Image Processing

Pub Date : 2024-07-27 DOI: 10.1007/s11554-024-01515-8

Wenqi Zhu, Zhijun Yang

In recent years, the integration of artificial intelligence in education has become key to enhancing the quality of teaching. This study addresses the real-time detection of student behavior in classroom environments by proposing the Classroom Student Behavior YOLO (CSB-YOLO) model. We enhance the model’s multi-scale feature fusion capability using the Bidirectional Feature Pyramid Network (BiFPN). Additionally, we have designed a novel Efficient Re-parameterized Detection Head (ERD Head) to accelerate the model’s inference speed and introduced Self-Calibrated Convolutions (SCConv) to compensate for any potential accuracy loss resulting from lightweight design. To further optimize performance, model pruning and knowledge distillation are utilized to reduce the model size and computational demands while maintaining accuracy. This makes CSB-YOLO suitable for deployment on low-performance classroom devices while maintaining robust detection capabilities. Tested on the classroom student behavior dataset SCB-DATASET3, the distilled and pruned CSB-YOLO, with only 0.72M parameters and 4.3 Giga Floating-point Operations Per Second (GFLOPs), maintains high accuracy and exhibits excellent real-time performance, making it particularly suitable for educational environments.

近年来，人工智能与教育的融合已成为提高教学质量的关键。本研究通过提出课堂学生行为 YOLO（CSB-YOLO）模型来解决课堂环境中学生行为的实时检测问题。我们利用双向特征金字塔网络（BiFPN）增强了模型的多尺度特征融合能力。此外，我们还设计了新颖的高效再参数化检测头（ERD Head），以加快模型的推理速度，并引入了自校准卷积（SCConv），以弥补轻量级设计可能造成的精度损失。为了进一步优化性能，还利用了模型剪枝和知识提炼技术，在保持准确性的同时缩小模型规模，降低计算需求。这使得 CSB-YOLO 适合部署在低性能的教室设备上，同时保持强大的检测能力。在课堂学生行为数据集 SCB-DATASET3 上进行的测试表明，经过提炼和剪枝的 CSB-YOLO 仅需 0.72M 个参数和 4.3 Giga Floating-point Operations Per Second (GFLOPs)，就能保持较高的准确性，并表现出卓越的实时性能，因此特别适用于教育环境。

{"title":"Csb-yolo: a rapid and efficient real-time algorithm for classroom student behavior detection","authors":"Wenqi Zhu, Zhijun Yang","doi":"10.1007/s11554-024-01515-8","DOIUrl":"https://doi.org/10.1007/s11554-024-01515-8","url":null,"abstract":"In recent years, the integration of artificial intelligence in education has become key to enhancing the quality of teaching. This study addresses the real-time detection of student behavior in classroom environments by proposing the Classroom Student Behavior YOLO (CSB-YOLO) model. We enhance the model’s multi-scale feature fusion capability using the Bidirectional Feature Pyramid Network (BiFPN). Additionally, we have designed a novel Efficient Re-parameterized Detection Head (ERD Head) to accelerate the model’s inference speed and introduced Self-Calibrated Convolutions (SCConv) to compensate for any potential accuracy loss resulting from lightweight design. To further optimize performance, model pruning and knowledge distillation are utilized to reduce the model size and computational demands while maintaining accuracy. This makes CSB-YOLO suitable for deployment on low-performance classroom devices while maintaining robust detection capabilities. Tested on the classroom student behavior dataset SCB-DATASET3, the distilled and pruned CSB-YOLO, with only 0.72M parameters and 4.3 Giga Floating-point Operations Per Second (GFLOPs), maintains high accuracy and exhibits excellent real-time performance, making it particularly suitable for educational environments.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"1 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141786065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Real-time detection and geometric analysis algorithm for concrete cracks based on the improved U-net model 基于改进型 U 网模型的混凝土裂缝实时检测和几何分析算法

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Real-Time Image Processing

Pub Date : 2024-07-26 DOI: 10.1007/s11554-024-01503-y

Qian Zhang, Fan Zhang, Hongbo Liu, Longxuan Wang, Zhihua Chen, Liulu Guo

Aiming at complex operation problems, low precision and poor robustness of traditional concrete crack detection methods, a real-time concrete crack detection and geometric analysis algorithm based on the improved U-net model is proposed. First, the efficient channel attention (ECA) module is embedded in the U-net model to reduce the loss of target information. The DenseNet network is used instead of the VGG16 network in the U-net basic model architecture, making transmitting features and gradients more effective. Then, based on the improved U-net model, the concrete crack detection experiment is performed. The experimental results indicate that the improved U-net model has 91.56% pixel accuracy (PA), 80.12% mean intersection over union (mIoU), 84.89% recall and 88.10% F1_score. The mIoU, PA, recall and F1_score of the improved U-net model increased by 17.39%, 7.82%, 2.62% and 5.10%, respectively, compared with the original model. Next, the real-time detection experiment of concrete cracks is performed based on the improved U-net model. The FPS of the improved model is the same as that of the original model and reaches 42. Finally, the geometric analysis of concrete cracks is performed based on the detection results of the improved U-net model. The area, density, length and average width information of concrete cracks are effectively extracted. The research results indicate that the detection effect of this study’s model on concrete cracks is considerably improved and that the model has good robustness. The model proposed in this study can achieve intelligent real-time and accurate identification of concrete cracks, which has broad application prospects.

针对传统混凝土裂缝检测方法操作复杂、精度低、鲁棒性差等问题，提出了一种基于改进 U 网模型的混凝土裂缝实时检测与几何分析算法。首先，在 U-net 模型中嵌入了高效信道关注（ECA）模块，以减少目标信息的丢失。在 U-net 基本模型结构中使用 DenseNet 网络代替 VGG16 网络，使特征和梯度的传输更加有效。然后，基于改进的 U-net 模型，进行了混凝土裂缝检测实验。实验结果表明，改进后的 U-net 模型具有 91.56% 的像素准确率（PA）、80.12% 的平均交集大于联合率（mIoU）、84.89% 的召回率和 88.10% 的 F1_score。与原始模型相比，改进后的 U-net 模型的 mIoU、PA、召回率和 F1_score 分别提高了 17.39%、7.82%、2.62% 和 5.10%。接下来，基于改进的 U-net 模型进行了混凝土裂缝的实时检测实验。改进模型的 FPS 与原始模型相同，达到了 42。最后，根据改进的 U 型网模型的检测结果对混凝土裂缝进行几何分析。有效提取了混凝土裂缝的面积、密度、长度和平均宽度信息。研究结果表明，本研究的模型对混凝土裂缝的检测效果有了显著提高，模型具有良好的鲁棒性。本研究提出的模型可实现对混凝土裂缝的智能化实时准确识别，具有广阔的应用前景。

{"title":"Real-time detection and geometric analysis algorithm for concrete cracks based on the improved U-net model","authors":"Qian Zhang, Fan Zhang, Hongbo Liu, Longxuan Wang, Zhihua Chen, Liulu Guo","doi":"10.1007/s11554-024-01503-y","DOIUrl":"https://doi.org/10.1007/s11554-024-01503-y","url":null,"abstract":"Aiming at complex operation problems, low precision and poor robustness of traditional concrete crack detection methods, a real-time concrete crack detection and geometric analysis algorithm based on the improved U-net model is proposed. First, the efficient channel attention (ECA) module is embedded in the U-net model to reduce the loss of target information. The DenseNet network is used instead of the VGG16 network in the U-net basic model architecture, making transmitting features and gradients more effective. Then, based on the improved U-net model, the concrete crack detection experiment is performed. The experimental results indicate that the improved U-net model has 91.56% pixel accuracy (PA), 80.12% mean intersection over union (mIoU), 84.89% recall and 88.10% F1_score. The mIoU, PA, recall and F1_score of the improved U-net model increased by 17.39%, 7.82%, 2.62% and 5.10%, respectively, compared with the original model. Next, the real-time detection experiment of concrete cracks is performed based on the improved U-net model. The FPS of the improved model is the same as that of the original model and reaches 42. Finally, the geometric analysis of concrete cracks is performed based on the detection results of the improved U-net model. The area, density, length and average width information of concrete cracks are effectively extracted. The research results indicate that the detection effect of this study’s model on concrete cracks is considerably improved and that the model has good robustness. The model proposed in this study can achieve intelligent real-time and accurate identification of concrete cracks, which has broad application prospects.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"1199 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141786064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An improved YOLOv8 algorithm for small object detection in autonomous driving 用于自动驾驶中小物体检测的改进型 YOLOv8 算法

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Real-Time Image Processing

Pub Date : 2024-07-25 DOI: 10.1007/s11554-024-01517-6

Jie Cao, Tong Zhang, Liang Hou, Ning Nan

In the task of visual object detection for autonomous driving, several challenges arise, such as detecting densely clustered targets, dealing with significant occlusion, and identifying small-sized targets. To address these challenges, an improved YOLOv8 algorithm for small object detection in autonomous driving (MSD-YOLO) is proposed. This algorithm incorporates several enhancements to improve the performance of detecting small and densely occluded targets. Firstly, the downsampling module is replaced with SPD-CBS (Space-to-Depth) to maintain the integrity of channel feature information. Subsequently, a multi-scale small object detection structure is designed to increase sensitivity for recognizing densely packed small objects. Additionally, DyHead (Dynamic Head) is introduced, equipped with simultaneous scale, spatial, and channel attention to ensure comprehensive perception of feature map information. In the post-processing stage, Soft-NMS (non-maximum suppression) is employed to effectively suppress redundant candidate boxes and reduce the missed detection rate of densely occluded targets. The effectiveness of these enhancements has been verified through various experiments conducted on the BDD100K autonomous driving public dataset. Experimental results indicate a significant improvement in the performance of the enhanced network. Compared to the YOLOv8n baseline model, MSD-YOLO shows a 13.7% increase in mAP₅₀ and a 12.1% increase in mAP_50:₉₅, with only a slight increase in the number of parameters. Furthermore, the detection speed can reach 67.6 FPS, achieving a better balance between accuracy and speed.

在自动驾驶的视觉目标检测任务中，会出现一些挑战，例如检测密集的目标群、处理明显的遮挡以及识别小尺寸目标。为了应对这些挑战，我们提出了一种用于自动驾驶小目标检测的改进型 YOLOv8 算法（MSD-YOLO）。该算法采用了多项改进措施，以提高检测小型和密集遮挡目标的性能。首先，用 SPD-CBS（空间-深度）取代了下采样模块，以保持信道特征信息的完整性。随后，设计了一种多尺度小目标检测结构，以提高识别密集小目标的灵敏度。此外，还引入了 DyHead（动态头），配备了同时关注尺度、空间和通道的功能，以确保全面感知特征图信息。在后处理阶段，采用了 Soft-NMS（非最大抑制）技术，以有效抑制冗余候选框，降低密集遮挡目标的漏检率。在 BDD100K 自动驾驶公共数据集上进行的各种实验验证了这些改进措施的有效性。实验结果表明，增强型网络的性能有了显著提高。与 YOLOv8n 基准模型相比，MSD-YOLO 的 mAP50 提高了 13.7%，mAP50:95 提高了 12.1%，而参数数量仅略有增加。此外，检测速度可达 67.6 FPS，在准确性和速度之间实现了更好的平衡。

{"title":"An improved YOLOv8 algorithm for small object detection in autonomous driving","authors":"Jie Cao, Tong Zhang, Liang Hou, Ning Nan","doi":"10.1007/s11554-024-01517-6","DOIUrl":"https://doi.org/10.1007/s11554-024-01517-6","url":null,"abstract":"In the task of visual object detection for autonomous driving, several challenges arise, such as detecting densely clustered targets, dealing with significant occlusion, and identifying small-sized targets. To address these challenges, an improved YOLOv8 algorithm for small object detection in autonomous driving (MSD-YOLO) is proposed. This algorithm incorporates several enhancements to improve the performance of detecting small and densely occluded targets. Firstly, the downsampling module is replaced with SPD-CBS (Space-to-Depth) to maintain the integrity of channel feature information. Subsequently, a multi-scale small object detection structure is designed to increase sensitivity for recognizing densely packed small objects. Additionally, DyHead (Dynamic Head) is introduced, equipped with simultaneous scale, spatial, and channel attention to ensure comprehensive perception of feature map information. In the post-processing stage, Soft-NMS (non-maximum suppression) is employed to effectively suppress redundant candidate boxes and reduce the missed detection rate of densely occluded targets. The effectiveness of these enhancements has been verified through various experiments conducted on the BDD100K autonomous driving public dataset. Experimental results indicate a significant improvement in the performance of the enhanced network. Compared to the YOLOv8n baseline model, MSD-YOLO shows a 13.7% increase in mAP50 and a 12.1% increase in mAP50:95, with only a slight increase in the number of parameters. Furthermore, the detection speed can reach 67.6 FPS, achieving a better balance between accuracy and speed.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"187 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141786066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A lightweight YOLOv8 based on attention mechanism for mango pest and disease detection 基于注意力机制的轻量级 YOLOv8，用于芒果病虫害检测

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Real-Time Image Processing

Pub Date : 2024-07-23 DOI: 10.1007/s11554-024-01505-w

Jiao Wang, Junping Wang

Because the growth of mangoes is often affected by pests and diseases, the application of object detection technology can effectively solve this problem. However, deploying object detection models on mobile devices is challenging due to resource constraints and high-efficiency requirements. To address this issue, we reduced the parameters in the target detection model, facilitating its deployment on mobile devices to detect mango pests and diseases. This study introduced the improved lightweight target detection model GAS-YOLOv8. The model’s performance was improved through the following three modifications. First, the model backbone was replaced with GhostHGNetv2, significantly reducing the model parameters. Second, the lightweight detection head AsDDet was adopted to further decrease the parameters. Finally, to increase the detection accuracy of the lightweight model without significantly increasing parameters, the C2f module was replaced with the C2f-SE module. Validation with a publicly available dataset of mango pests and diseases showed that the accuracy for insect pests increased from 97.1 to 98.6%, the accuracy for diseases increased from 91.4 to 91.7%, and the model parameters decreased by 33%. This demonstrates that the GAS-YOLOv8 model effectively addresses the issues of large computational volume and challenging deployment for the detection of mango pests and diseases.

由于芒果的生长经常受到病虫害的影响，应用物体检测技术可以有效解决这一问题。然而，由于资源限制和高效率要求，在移动设备上部署目标检测模型具有挑战性。针对这一问题，我们减少了目标检测模型的参数，使其更易于在移动设备上部署，以检测芒果病虫害。本研究引入了改进的轻量级目标检测模型 GAS-YOLOv8。该模型的性能通过以下三个方面的修改得到了提高。首先，用 GhostHGNetv2 代替了模型主干，大大减少了模型参数。其次，采用了轻量级检测头 AsDDet，进一步降低了参数。最后，为了在不大幅增加参数的情况下提高轻量级模型的检测精度，将 C2f 模块替换为 C2f-SE 模块。利用公开的芒果病虫害数据集进行的验证表明，虫害的准确率从 97.1% 提高到 98.6%，病害的准确率从 91.4% 提高到 91.7%，而模型参数则减少了 33%。这表明，GAS-YOLOv8 模型有效地解决了芒果病虫害检测计算量大、部署难度高的问题。

{"title":"A lightweight YOLOv8 based on attention mechanism for mango pest and disease detection","authors":"Jiao Wang, Junping Wang","doi":"10.1007/s11554-024-01505-w","DOIUrl":"https://doi.org/10.1007/s11554-024-01505-w","url":null,"abstract":"Because the growth of mangoes is often affected by pests and diseases, the application of object detection technology can effectively solve this problem. However, deploying object detection models on mobile devices is challenging due to resource constraints and high-efficiency requirements. To address this issue, we reduced the parameters in the target detection model, facilitating its deployment on mobile devices to detect mango pests and diseases. This study introduced the improved lightweight target detection model GAS-YOLOv8. The model’s performance was improved through the following three modifications. First, the model backbone was replaced with GhostHGNetv2, significantly reducing the model parameters. Second, the lightweight detection head AsDDet was adopted to further decrease the parameters. Finally, to increase the detection accuracy of the lightweight model without significantly increasing parameters, the C2f module was replaced with the C2f-SE module. Validation with a publicly available dataset of mango pests and diseases showed that the accuracy for insect pests increased from 97.1 to 98.6%, the accuracy for diseases increased from 91.4 to 91.7%, and the model parameters decreased by 33%. This demonstrates that the GAS-YOLOv8 model effectively addresses the issues of large computational volume and challenging deployment for the detection of mango pests and diseases.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"171 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141784419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Lightweight detection model for coal gangue identification based on improved YOLOv5s 基于改进型 YOLOv5s 的煤矸石识别轻量级检测模型

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Real-Time Image Processing

Pub Date : 2024-07-23 DOI: 10.1007/s11554-024-01518-5

Deyong Shang, Zhibin Lv, Zehua Gao, Yuntao Li

Focusing on the issues of complex models, high computational cost, and low identification speed of existing coal gangue image identification object detection algorithms, an optimized YOLOv5s lightweight detection model for coal gangue is proposed. Using ShuffleNetV2 as the backbone network, a convolution pooling module is used at the input end instead of the original convolution module. Combining the re-parameterization idea of RepVGG and introducing depthwise separable convolution, a neck feature fusion network is constructed. And using the WIoU function as the loss function. The experimental findings indicate that the improved model maintains the same accuracy, the number of parameters is only 5.1% of the original, the computational effort is reduced to 6.3 % of the original, and the identification speed is improved by 30.9% on GPU and 4 times on CPU. This method significantly reduces model complexity and improves detection speed while maintaining detection accuracy.

针对现有煤矸石图像识别对象检测算法模型复杂、计算成本高、识别速度低等问题，提出了一种优化的 YOLOv5s 煤矸石轻量级检测模型。以 ShuffleNetV2 为骨干网络，在输入端使用卷积池模块代替原有的卷积模块。结合 RepVGG 的重参数化思想，引入深度可分离卷积，构建了颈部特征融合网络。并使用 WIoU 函数作为损失函数。实验结果表明，改进后的模型精度保持不变，参数数量仅为原来的 5.1%，计算量减少到原来的 6.3%，识别速度在 GPU 上提高了 30.9%，在 CPU 上提高了 4 倍。该方法在保持检测精度的同时，大大降低了模型复杂度，提高了检测速度。

引用次数: 0

Journal of real-time image processing: fourth issue of volume 21 实时图像处理杂志：第 21 卷第 4 期

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Real-Time Image Processing

Pub Date : 2024-07-18 DOI: 10.1007/s11554-024-01516-7

Nasser Kehtarnavaz

引用次数: 0