Pub Date : 2024-08-01DOI: 10.1007/s11554-024-01522-9
Chinna Gopi Simhadri, Hari Kishan Kondaveeti
The influence of rice leaf diseases has resulted in an annual decrease in rice mass production. This occurs mainly due to the need for more understanding in perceiving and managing rice leaf diseases. However, there has not yet been any appropriate application designed to accurately detect rice leaf diseases. This paper, we proposed a novel method called Kushner Elman Recurrent Transfer Learning-based Boyer Moore Ensemble (KERTL-BME) to detect rice leaf diseases and differentiate between healthy and diseased images. Using the KERTL-BME method, the four most common rice leaf diseases, namely Bacterial leaf blight, Brown spot, Leaf blast, and Leaf scald, are detected. First, the Kushner non-linear filter is applied to the sample images to remove noise and differentiate between measurements and expected values by pixels in the neighborhood according to time instances. This significantly improves the peak signal-to-noise ratio while preserving the edges. The transfer learning in our work uses DenseNet169 pre-trained models to extract relevant features via the Elman Recurrent Network, which improves accuracy for the rice leaf 5 disease dataset. Additionally, the ensemble of transfer learning helps to minimize generalization errors, making the proposed method more robust. Finally, Boyer–Moore majority voting is applied to minimize generalization significantly, thereby improving overall prediction accuracy and reducing prediction error promptly. The rice leaf 5 disease dataset is used for training and testing the method. Performance measures such as prediction accuracy, prediction time, prediction error, and peak signal-to-noise ratio were calculated and monitored. The designed method predicts disease-affected rice leaves with greater accuracy.
{"title":"Advanced diagnosis of common rice leaf diseases using KERTL-BME ensemble approach","authors":"Chinna Gopi Simhadri, Hari Kishan Kondaveeti","doi":"10.1007/s11554-024-01522-9","DOIUrl":"https://doi.org/10.1007/s11554-024-01522-9","url":null,"abstract":"<p>The influence of rice leaf diseases has resulted in an annual decrease in rice mass production. This occurs mainly due to the need for more understanding in perceiving and managing rice leaf diseases. However, there has not yet been any appropriate application designed to accurately detect rice leaf diseases. This paper, we proposed a novel method called Kushner Elman Recurrent Transfer Learning-based Boyer Moore Ensemble (KERTL-BME) to detect rice leaf diseases and differentiate between healthy and diseased images. Using the KERTL-BME method, the four most common rice leaf diseases, namely Bacterial leaf blight, Brown spot, Leaf blast, and Leaf scald, are detected. First, the Kushner non-linear filter is applied to the sample images to remove noise and differentiate between measurements and expected values by pixels in the neighborhood according to time instances. This significantly improves the peak signal-to-noise ratio while preserving the edges. The transfer learning in our work uses DenseNet169 pre-trained models to extract relevant features via the Elman Recurrent Network, which improves accuracy for the rice leaf 5 disease dataset. Additionally, the ensemble of transfer learning helps to minimize generalization errors, making the proposed method more robust. Finally, Boyer–Moore majority voting is applied to minimize generalization significantly, thereby improving overall prediction accuracy and reducing prediction error promptly. The rice leaf 5 disease dataset is used for training and testing the method. Performance measures such as prediction accuracy, prediction time, prediction error, and peak signal-to-noise ratio were calculated and monitored. The designed method predicts disease-affected rice leaves with greater accuracy.</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"76 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141864932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-31DOI: 10.1007/s11554-024-01520-x
Jose-Angel Diaz-Madrid, Gines Domenech-Asensi, Ramon Ruiz-Merino, Juan-Francisco Zapata-Perez
In-memory computing (IMC) represents a promising approach to reducing latency and enhancing the energy efficiency of operations required for calculating convolution products of images. This study proposes a fully differential current-mode architecture for computing image convolutions across all four quadrants, intended for deep learning applications within CMOS imagers utilizing IMC near the CMOS sensor. This architecture processes analog signals provided by a CMOS sensor without the need for analog-to-digital conversion. Furthermore, it eliminates the necessity for data transfer between memory and analog operators as convolutions are computed within modified SRAM memory. The paper suggests modifying the structure of a CMOS SRAM cell by incorporating transistors capable of performing multiplications between binary (−1 or +1) weights and analog signals. Modified SRAM cells can be interconnected to sum the multiplication results obtained from individual cells. This approach facilitates connecting current inputs to different SRAM cells, offering highly scalable and parallelized calculations. For this study, a configurable module comprising nine modified SRAM cells with peripheral circuitry has been designed to calculate the convolution product on each pixel of an image using a (3 times 3) mask with binary values (−1 or 1). Subsequently, an IMC module has been designed to perform 16 convolution operations in parallel, with input currents shared among the 16 modules. This configuration enables the computation of 16 convolutions simultaneously, processing a column per cycle. A digital control circuit manages both the readout or memorization of digital weights, as well as the multiply and add operations in real-time. The architecture underwent testing by performing convolutions between binary masks of 3 × 3 values and images of 32 × 32 pixels to assess accuracy and scalability when two IMC modules are vertically integrated. Convolution weights are stored locally as 1-bit digital values. The circuit was synthesized in 180 nm CMOS technology, and simulation results indicate its capability to perform a complete convolution in 3.2 ms, achieving an efficiency of 11,522 1-b TOPS/W (1-b tera-operations per second per watt) with a similarity to ideal processing of 96%.
{"title":"A real-time and energy-efficient SRAM with mixed-signal in-memory computing near CMOS sensors","authors":"Jose-Angel Diaz-Madrid, Gines Domenech-Asensi, Ramon Ruiz-Merino, Juan-Francisco Zapata-Perez","doi":"10.1007/s11554-024-01520-x","DOIUrl":"https://doi.org/10.1007/s11554-024-01520-x","url":null,"abstract":"<p>In-memory computing (IMC) represents a promising approach to reducing latency and enhancing the energy efficiency of operations required for calculating convolution products of images. This study proposes a fully differential current-mode architecture for computing image convolutions across all four quadrants, intended for deep learning applications within CMOS imagers utilizing IMC near the CMOS sensor. This architecture processes analog signals provided by a CMOS sensor without the need for analog-to-digital conversion. Furthermore, it eliminates the necessity for data transfer between memory and analog operators as convolutions are computed within modified SRAM memory. The paper suggests modifying the structure of a CMOS SRAM cell by incorporating transistors capable of performing multiplications between binary (−1 or +1) weights and analog signals. Modified SRAM cells can be interconnected to sum the multiplication results obtained from individual cells. This approach facilitates connecting current inputs to different SRAM cells, offering highly scalable and parallelized calculations. For this study, a configurable module comprising nine modified SRAM cells with peripheral circuitry has been designed to calculate the convolution product on each pixel of an image using a <span>(3 times 3)</span> mask with binary values (−1 or 1). Subsequently, an IMC module has been designed to perform 16 convolution operations in parallel, with input currents shared among the 16 modules. This configuration enables the computation of 16 convolutions simultaneously, processing a column per cycle. A digital control circuit manages both the readout or memorization of digital weights, as well as the multiply and add operations in real-time. The architecture underwent testing by performing convolutions between binary masks of 3 × 3 values and images of 32 × 32 pixels to assess accuracy and scalability when two IMC modules are vertically integrated. Convolution weights are stored locally as 1-bit digital values. The circuit was synthesized in 180 nm CMOS technology, and simulation results indicate its capability to perform a complete convolution in 3.2 ms, achieving an efficiency of 11,522 1-b TOPS/W (1-b tera-operations per second per watt) with a similarity to ideal processing of 96%.</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"12 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141864805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-29DOI: 10.1007/s11554-024-01519-4
Chun-Lin Ji, Tao Yu, Peng Gao, Fei Wang, Ru-Yue Yuan
Object detection, a crucial aspect of computer vision, has seen significant advancements in accuracy and robustness. Despite these advancements, practical applications still face notable challenges, primarily the inaccurate detection or missed detection of small objects. Moreover, the extensive parameter count and computational demands of the detection models impede their deployment on equipment with limited resources. In this paper, we propose YOLO-TLA, an advanced object detection model building on YOLOv5. We first introduce an additional detection layer for small objects in the neck network pyramid architecture, thereby producing a feature map of a larger scale to discern finer features of small objects. Further, we integrate the C3CrossCovn module into the backbone network. This module uses sliding window feature extraction, which effectively minimizes both computational demand and the number of parameters, rendering the model more compact. Additionally, we have incorporated a global attention mechanism into the backbone network. This mechanism combines the channel information with global information to create a weighted feature map. This feature map is tailored to highlight the attributes of the object of interest, while effectively ignoring irrelevant details. In comparison to the baseline YOLOv5s model, our newly developed YOLO-TLA model has shown considerable improvements on the MS COCO validation dataset, with increases of 4.6% in mAP@0.5 and 4% in mAP@0.5:0.95, all while keeping the model size compact at 9.49M parameters. Further extending these improvements to the YOLOv5m model, the enhanced version exhibited a 1.7% and 1.9% increase in mAP@0.5 and mAP@0.5:0.95, respectively, with a total of 27.53M parameters. These results validate the YOLO-TLA model’s efficient and effective performance in small object detection, achieving high accuracy with fewer parameters and computational demands.
{"title":"Yolo-tla: An Efficient and Lightweight Small Object Detection Model based on YOLOv5","authors":"Chun-Lin Ji, Tao Yu, Peng Gao, Fei Wang, Ru-Yue Yuan","doi":"10.1007/s11554-024-01519-4","DOIUrl":"https://doi.org/10.1007/s11554-024-01519-4","url":null,"abstract":"<p>Object detection, a crucial aspect of computer vision, has seen significant advancements in accuracy and robustness. Despite these advancements, practical applications still face notable challenges, primarily the inaccurate detection or missed detection of small objects. Moreover, the extensive parameter count and computational demands of the detection models impede their deployment on equipment with limited resources. In this paper, we propose YOLO-TLA, an advanced object detection model building on YOLOv5. We first introduce an additional detection layer for small objects in the neck network pyramid architecture, thereby producing a feature map of a larger scale to discern finer features of small objects. Further, we integrate the C3CrossCovn module into the backbone network. This module uses sliding window feature extraction, which effectively minimizes both computational demand and the number of parameters, rendering the model more compact. Additionally, we have incorporated a global attention mechanism into the backbone network. This mechanism combines the channel information with global information to create a weighted feature map. This feature map is tailored to highlight the attributes of the object of interest, while effectively ignoring irrelevant details. In comparison to the baseline YOLOv5s model, our newly developed YOLO-TLA model has shown considerable improvements on the MS COCO validation dataset, with increases of 4.6% in mAP@0.5 and 4% in mAP@0.5:0.95, all while keeping the model size compact at 9.49M parameters. Further extending these improvements to the YOLOv5m model, the enhanced version exhibited a 1.7% and 1.9% increase in mAP@0.5 and mAP@0.5:0.95, respectively, with a total of 27.53M parameters. These results validate the YOLO-TLA model’s efficient and effective performance in small object detection, achieving high accuracy with fewer parameters and computational demands.</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"7 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141864809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-29DOI: 10.1007/s11554-024-01523-8
Hao Peng, Shiqiang Chen
In response to the problems of complex model networks, low detection accuracy, and the detection of small targets prone to false detections and omissions in pedestrian detection, this paper proposes FedsNet, a pedestrian detection network based on RT-DETR. By constructing a new lightweight backbone network, ResFastNet, the number of parameters and computation of the model are reduced to accelerate the detection speed of pedestrian detection. Integrating the Efficient Multi-scale Attention(EMA) mechanism with the backbone network creates a new ResBlock module for improved detection of small targets. The more effective DySample has been adopted as the upsampling operator to improve the accuracy and robustness of pedestrian detection. SIoU is used as the loss function to improve the accuracy of pedestrian recognition and speed up model convergence. Experimental evaluations conducted on a self-built pedestrian detection dataset demonstrate that the average accuracy value of the FedsNet model is 91(%), which is a 1.7(%) improvement over the RT-DETR model. The parameters and model volume are reduced by 15.1(%) and 14.5(%), respectively. When tested on the public dataset WiderPerson, FedsNet achieved the average accuracy value of 71.3(%), an improvement of 1.1(%) over the original model. In addition, the detection speed of the FedsNet network reaches 109.5 FPS and 100.3 FPS, respectively, meeting the real-time requirements of pedestrian detection.
{"title":"FedsNet: the real-time network for pedestrian detection based on RT-DETR","authors":"Hao Peng, Shiqiang Chen","doi":"10.1007/s11554-024-01523-8","DOIUrl":"https://doi.org/10.1007/s11554-024-01523-8","url":null,"abstract":"<p>In response to the problems of complex model networks, low detection accuracy, and the detection of small targets prone to false detections and omissions in pedestrian detection, this paper proposes FedsNet, a pedestrian detection network based on RT-DETR. By constructing a new lightweight backbone network, ResFastNet, the number of parameters and computation of the model are reduced to accelerate the detection speed of pedestrian detection. Integrating the Efficient Multi-scale Attention(EMA) mechanism with the backbone network creates a new ResBlock module for improved detection of small targets. The more effective DySample has been adopted as the upsampling operator to improve the accuracy and robustness of pedestrian detection. SIoU is used as the loss function to improve the accuracy of pedestrian recognition and speed up model convergence. Experimental evaluations conducted on a self-built pedestrian detection dataset demonstrate that the average accuracy value of the FedsNet model is 91<span>(%)</span>, which is a 1.7<span>(%)</span> improvement over the RT-DETR model. The parameters and model volume are reduced by 15.1<span>(%)</span> and 14.5<span>(%)</span>, respectively. When tested on the public dataset WiderPerson, FedsNet achieved the average accuracy value of 71.3<span>(%)</span>, an improvement of 1.1<span>(%)</span> over the original model. In addition, the detection speed of the FedsNet network reaches 109.5 FPS and 100.3 FPS, respectively, meeting the real-time requirements of pedestrian detection.</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"198 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141864810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-27DOI: 10.1007/s11554-024-01515-8
Wenqi Zhu, Zhijun Yang
In recent years, the integration of artificial intelligence in education has become key to enhancing the quality of teaching. This study addresses the real-time detection of student behavior in classroom environments by proposing the Classroom Student Behavior YOLO (CSB-YOLO) model. We enhance the model’s multi-scale feature fusion capability using the Bidirectional Feature Pyramid Network (BiFPN). Additionally, we have designed a novel Efficient Re-parameterized Detection Head (ERD Head) to accelerate the model’s inference speed and introduced Self-Calibrated Convolutions (SCConv) to compensate for any potential accuracy loss resulting from lightweight design. To further optimize performance, model pruning and knowledge distillation are utilized to reduce the model size and computational demands while maintaining accuracy. This makes CSB-YOLO suitable for deployment on low-performance classroom devices while maintaining robust detection capabilities. Tested on the classroom student behavior dataset SCB-DATASET3, the distilled and pruned CSB-YOLO, with only 0.72M parameters and 4.3 Giga Floating-point Operations Per Second (GFLOPs), maintains high accuracy and exhibits excellent real-time performance, making it particularly suitable for educational environments.
近年来,人工智能与教育的融合已成为提高教学质量的关键。本研究通过提出课堂学生行为 YOLO(CSB-YOLO)模型来解决课堂环境中学生行为的实时检测问题。我们利用双向特征金字塔网络(BiFPN)增强了模型的多尺度特征融合能力。此外,我们还设计了新颖的高效再参数化检测头(ERD Head),以加快模型的推理速度,并引入了自校准卷积(SCConv),以弥补轻量级设计可能造成的精度损失。为了进一步优化性能,还利用了模型剪枝和知识提炼技术,在保持准确性的同时缩小模型规模,降低计算需求。这使得 CSB-YOLO 适合部署在低性能的教室设备上,同时保持强大的检测能力。在课堂学生行为数据集 SCB-DATASET3 上进行的测试表明,经过提炼和剪枝的 CSB-YOLO 仅需 0.72M 个参数和 4.3 Giga Floating-point Operations Per Second (GFLOPs),就能保持较高的准确性,并表现出卓越的实时性能,因此特别适用于教育环境。
{"title":"Csb-yolo: a rapid and efficient real-time algorithm for classroom student behavior detection","authors":"Wenqi Zhu, Zhijun Yang","doi":"10.1007/s11554-024-01515-8","DOIUrl":"https://doi.org/10.1007/s11554-024-01515-8","url":null,"abstract":"<p>In recent years, the integration of artificial intelligence in education has become key to enhancing the quality of teaching. This study addresses the real-time detection of student behavior in classroom environments by proposing the Classroom Student Behavior YOLO (CSB-YOLO) model. We enhance the model’s multi-scale feature fusion capability using the Bidirectional Feature Pyramid Network (BiFPN). Additionally, we have designed a novel Efficient Re-parameterized Detection Head (ERD Head) to accelerate the model’s inference speed and introduced Self-Calibrated Convolutions (SCConv) to compensate for any potential accuracy loss resulting from lightweight design. To further optimize performance, model pruning and knowledge distillation are utilized to reduce the model size and computational demands while maintaining accuracy. This makes CSB-YOLO suitable for deployment on low-performance classroom devices while maintaining robust detection capabilities. Tested on the classroom student behavior dataset SCB-DATASET3, the distilled and pruned CSB-YOLO, with only 0.72M parameters and 4.3 Giga Floating-point Operations Per Second (GFLOPs), maintains high accuracy and exhibits excellent real-time performance, making it particularly suitable for educational environments.</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"1 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141786065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Aiming at complex operation problems, low precision and poor robustness of traditional concrete crack detection methods, a real-time concrete crack detection and geometric analysis algorithm based on the improved U-net model is proposed. First, the efficient channel attention (ECA) module is embedded in the U-net model to reduce the loss of target information. The DenseNet network is used instead of the VGG16 network in the U-net basic model architecture, making transmitting features and gradients more effective. Then, based on the improved U-net model, the concrete crack detection experiment is performed. The experimental results indicate that the improved U-net model has 91.56% pixel accuracy (PA), 80.12% mean intersection over union (mIoU), 84.89% recall and 88.10% F1_score. The mIoU, PA, recall and F1_score of the improved U-net model increased by 17.39%, 7.82%, 2.62% and 5.10%, respectively, compared with the original model. Next, the real-time detection experiment of concrete cracks is performed based on the improved U-net model. The FPS of the improved model is the same as that of the original model and reaches 42. Finally, the geometric analysis of concrete cracks is performed based on the detection results of the improved U-net model. The area, density, length and average width information of concrete cracks are effectively extracted. The research results indicate that the detection effect of this study’s model on concrete cracks is considerably improved and that the model has good robustness. The model proposed in this study can achieve intelligent real-time and accurate identification of concrete cracks, which has broad application prospects.
{"title":"Real-time detection and geometric analysis algorithm for concrete cracks based on the improved U-net model","authors":"Qian Zhang, Fan Zhang, Hongbo Liu, Longxuan Wang, Zhihua Chen, Liulu Guo","doi":"10.1007/s11554-024-01503-y","DOIUrl":"https://doi.org/10.1007/s11554-024-01503-y","url":null,"abstract":"<p>Aiming at complex operation problems, low precision and poor robustness of traditional concrete crack detection methods, a real-time concrete crack detection and geometric analysis algorithm based on the improved U-net model is proposed. First, the efficient channel attention (ECA) module is embedded in the U-net model to reduce the loss of target information. The DenseNet network is used instead of the VGG16 network in the U-net basic model architecture, making transmitting features and gradients more effective. Then, based on the improved U-net model, the concrete crack detection experiment is performed. The experimental results indicate that the improved U-net model has 91.56% pixel accuracy (PA), 80.12% mean intersection over union (mIoU), 84.89% recall and 88.10% F1_score. The mIoU, PA, recall and F1_score of the improved U-net model increased by 17.39%, 7.82%, 2.62% and 5.10%, respectively, compared with the original model. Next, the real-time detection experiment of concrete cracks is performed based on the improved U-net model. The FPS of the improved model is the same as that of the original model and reaches 42. Finally, the geometric analysis of concrete cracks is performed based on the detection results of the improved U-net model. The area, density, length and average width information of concrete cracks are effectively extracted. The research results indicate that the detection effect of this study’s model on concrete cracks is considerably improved and that the model has good robustness. The model proposed in this study can achieve intelligent real-time and accurate identification of concrete cracks, which has broad application prospects.</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"1199 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141786064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-25DOI: 10.1007/s11554-024-01517-6
Jie Cao, Tong Zhang, Liang Hou, Ning Nan
In the task of visual object detection for autonomous driving, several challenges arise, such as detecting densely clustered targets, dealing with significant occlusion, and identifying small-sized targets. To address these challenges, an improved YOLOv8 algorithm for small object detection in autonomous driving (MSD-YOLO) is proposed. This algorithm incorporates several enhancements to improve the performance of detecting small and densely occluded targets. Firstly, the downsampling module is replaced with SPD-CBS (Space-to-Depth) to maintain the integrity of channel feature information. Subsequently, a multi-scale small object detection structure is designed to increase sensitivity for recognizing densely packed small objects. Additionally, DyHead (Dynamic Head) is introduced, equipped with simultaneous scale, spatial, and channel attention to ensure comprehensive perception of feature map information. In the post-processing stage, Soft-NMS (non-maximum suppression) is employed to effectively suppress redundant candidate boxes and reduce the missed detection rate of densely occluded targets. The effectiveness of these enhancements has been verified through various experiments conducted on the BDD100K autonomous driving public dataset. Experimental results indicate a significant improvement in the performance of the enhanced network. Compared to the YOLOv8n baseline model, MSD-YOLO shows a 13.7% increase in mAP50 and a 12.1% increase in mAP50:95, with only a slight increase in the number of parameters. Furthermore, the detection speed can reach 67.6 FPS, achieving a better balance between accuracy and speed.
{"title":"An improved YOLOv8 algorithm for small object detection in autonomous driving","authors":"Jie Cao, Tong Zhang, Liang Hou, Ning Nan","doi":"10.1007/s11554-024-01517-6","DOIUrl":"https://doi.org/10.1007/s11554-024-01517-6","url":null,"abstract":"<p>In the task of visual object detection for autonomous driving, several challenges arise, such as detecting densely clustered targets, dealing with significant occlusion, and identifying small-sized targets. To address these challenges, an improved YOLOv8 algorithm for small object detection in autonomous driving (MSD-YOLO) is proposed. This algorithm incorporates several enhancements to improve the performance of detecting small and densely occluded targets. Firstly, the downsampling module is replaced with SPD-CBS (Space-to-Depth) to maintain the integrity of channel feature information. Subsequently, a multi-scale small object detection structure is designed to increase sensitivity for recognizing densely packed small objects. Additionally, DyHead (Dynamic Head) is introduced, equipped with simultaneous scale, spatial, and channel attention to ensure comprehensive perception of feature map information. In the post-processing stage, Soft-NMS (non-maximum suppression) is employed to effectively suppress redundant candidate boxes and reduce the missed detection rate of densely occluded targets. The effectiveness of these enhancements has been verified through various experiments conducted on the BDD100K autonomous driving public dataset. Experimental results indicate a significant improvement in the performance of the enhanced network. Compared to the YOLOv8n baseline model, MSD-YOLO shows a 13.7% increase in mAP<sub>50</sub> and a 12.1% increase in mAP<sub>50:</sub><sub>95</sub>, with only a slight increase in the number of parameters. Furthermore, the detection speed can reach 67.6 FPS, achieving a better balance between accuracy and speed.</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"187 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141786066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-23DOI: 10.1007/s11554-024-01505-w
Jiao Wang, Junping Wang
Because the growth of mangoes is often affected by pests and diseases, the application of object detection technology can effectively solve this problem. However, deploying object detection models on mobile devices is challenging due to resource constraints and high-efficiency requirements. To address this issue, we reduced the parameters in the target detection model, facilitating its deployment on mobile devices to detect mango pests and diseases. This study introduced the improved lightweight target detection model GAS-YOLOv8. The model’s performance was improved through the following three modifications. First, the model backbone was replaced with GhostHGNetv2, significantly reducing the model parameters. Second, the lightweight detection head AsDDet was adopted to further decrease the parameters. Finally, to increase the detection accuracy of the lightweight model without significantly increasing parameters, the C2f module was replaced with the C2f-SE module. Validation with a publicly available dataset of mango pests and diseases showed that the accuracy for insect pests increased from 97.1 to 98.6%, the accuracy for diseases increased from 91.4 to 91.7%, and the model parameters decreased by 33%. This demonstrates that the GAS-YOLOv8 model effectively addresses the issues of large computational volume and challenging deployment for the detection of mango pests and diseases.
{"title":"A lightweight YOLOv8 based on attention mechanism for mango pest and disease detection","authors":"Jiao Wang, Junping Wang","doi":"10.1007/s11554-024-01505-w","DOIUrl":"https://doi.org/10.1007/s11554-024-01505-w","url":null,"abstract":"<p>Because the growth of mangoes is often affected by pests and diseases, the application of object detection technology can effectively solve this problem. However, deploying object detection models on mobile devices is challenging due to resource constraints and high-efficiency requirements. To address this issue, we reduced the parameters in the target detection model, facilitating its deployment on mobile devices to detect mango pests and diseases. This study introduced the improved lightweight target detection model GAS-YOLOv8. The model’s performance was improved through the following three modifications. First, the model backbone was replaced with GhostHGNetv2, significantly reducing the model parameters. Second, the lightweight detection head AsDDet was adopted to further decrease the parameters. Finally, to increase the detection accuracy of the lightweight model without significantly increasing parameters, the C2f module was replaced with the C2f-SE module. Validation with a publicly available dataset of mango pests and diseases showed that the accuracy for insect pests increased from 97.1 to 98.6%, the accuracy for diseases increased from 91.4 to 91.7%, and the model parameters decreased by 33%. This demonstrates that the GAS-YOLOv8 model effectively addresses the issues of large computational volume and challenging deployment for the detection of mango pests and diseases.</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"171 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141784419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-23DOI: 10.1007/s11554-024-01518-5
Deyong Shang, Zhibin Lv, Zehua Gao, Yuntao Li
Focusing on the issues of complex models, high computational cost, and low identification speed of existing coal gangue image identification object detection algorithms, an optimized YOLOv5s lightweight detection model for coal gangue is proposed. Using ShuffleNetV2 as the backbone network, a convolution pooling module is used at the input end instead of the original convolution module. Combining the re-parameterization idea of RepVGG and introducing depthwise separable convolution, a neck feature fusion network is constructed. And using the WIoU function as the loss function. The experimental findings indicate that the improved model maintains the same accuracy, the number of parameters is only 5.1% of the original, the computational effort is reduced to 6.3 % of the original, and the identification speed is improved by 30.9% on GPU and 4 times on CPU. This method significantly reduces model complexity and improves detection speed while maintaining detection accuracy.
{"title":"Lightweight detection model for coal gangue identification based on improved YOLOv5s","authors":"Deyong Shang, Zhibin Lv, Zehua Gao, Yuntao Li","doi":"10.1007/s11554-024-01518-5","DOIUrl":"https://doi.org/10.1007/s11554-024-01518-5","url":null,"abstract":"<p>Focusing on the issues of complex models, high computational cost, and low identification speed of existing coal gangue image identification object detection algorithms, an optimized YOLOv5s lightweight detection model for coal gangue is proposed. Using ShuffleNetV2 as the backbone network, a convolution pooling module is used at the input end instead of the original convolution module. Combining the re-parameterization idea of RepVGG and introducing depthwise separable convolution, a neck feature fusion network is constructed. And using the WIoU function as the loss function. The experimental findings indicate that the improved model maintains the same accuracy, the number of parameters is only 5.1% of the original, the computational effort is reduced to 6.3 % of the original, and the identification speed is improved by 30.9% on GPU and 4 times on CPU. This method significantly reduces model complexity and improves detection speed while maintaining detection accuracy.</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"9 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141784421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}