Pub Date : 2024-07-04DOI: 10.1007/s11554-024-01491-z
Alexandre Duarte, Francisco Fernandes, João M. Pereira, Catarina Moreira, Jacinto C. Nascimento, Joaquim Jorge
Depth maps produced by consumer-grade sensors suffer from inaccurate measurements and missing data from either system or scene-specific sources. Data-driven denoising algorithms can mitigate such problems; however, they require vast amounts of ground truth depth data. Recent research has tackled this limitation using self-supervised learning techniques, but it requires multiple RGB-D sensors. Moreover, most existing approaches focus on denoising single isolated depth maps or specific subjects of interest highlighting a need for methods that can effectively denoise depth maps in real-time dynamic environments. This paper extends state-of-the-art approaches for depth-denoising commodity depth devices, proposing SelfReDepth, a self-supervised deep learning technique for depth restoration, via denoising and hole-filling by inpainting of full-depth maps captured with RGB-D sensors. The algorithm targets depth data in video streams, utilizing multiple sequential depth frames coupled with color data to achieve high-quality depth videos with temporal coherence. Finally, SelfReDepth is designed to be compatible with various RGB-D sensors and usable in real-time scenarios as a pre-processing step before applying other depth-dependent algorithms. Our results demonstrate our approach’s real-time performance on real-world datasets shows that it outperforms state-of-the-art methods in denoising and restoration performance at over 30 fps on Commercial Depth Cameras, with potential benefits for augmented and mixed-reality applications.
{"title":"Selfredepth","authors":"Alexandre Duarte, Francisco Fernandes, João M. Pereira, Catarina Moreira, Jacinto C. Nascimento, Joaquim Jorge","doi":"10.1007/s11554-024-01491-z","DOIUrl":"https://doi.org/10.1007/s11554-024-01491-z","url":null,"abstract":"<p>Depth maps produced by consumer-grade sensors suffer from inaccurate measurements and missing data from either system or scene-specific sources. Data-driven denoising algorithms can mitigate such problems; however, they require vast amounts of ground truth depth data. Recent research has tackled this limitation using self-supervised learning techniques, but it requires multiple RGB-D sensors. Moreover, most existing approaches focus on denoising single isolated depth maps or specific subjects of interest highlighting a need for methods that can effectively denoise depth maps in real-time dynamic environments. This paper extends state-of-the-art approaches for depth-denoising commodity depth devices, proposing SelfReDepth, a self-supervised deep learning technique for depth restoration, via denoising and hole-filling by inpainting of full-depth maps captured with RGB-D sensors. The algorithm targets depth data in video streams, utilizing multiple sequential depth frames coupled with color data to achieve high-quality depth videos with temporal coherence. Finally, SelfReDepth is designed to be compatible with various RGB-D sensors and usable in real-time scenarios as a pre-processing step before applying other depth-dependent algorithms. Our results demonstrate our approach’s real-time performance on real-world datasets shows that it outperforms state-of-the-art methods in denoising and restoration performance at over 30 fps on Commercial Depth Cameras, with potential benefits for augmented and mixed-reality applications.</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2024-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141549049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-03DOI: 10.1007/s11554-024-01502-z
Xing Zhao, Minhao Zeng, Yanglin Dong, Gang Rao, Xianshan Huang, Xutao Mo
Belt conveyors are widely used in multiple industries, including coal, steel, port, power, metallurgy, and chemical, etc. One major challenge faced by these industries is belt deviation, which can negatively impact production efficiency and safety. Despite previous research on improving belt edge detection accuracy, there is still a need to prioritize system efficiency and light-weight models for practical industrial applications. To meet this need, a new semantic segmentation network called FastBeltNet has been developed specifically for real-time and highly accurate conveyor belt edge line segmentation while maintaining a light-weight design. This network uses a dual-branch structure that combines a shallow spatial branch for extracting high-resolution spatial information with a context branch for deep contextual semantic information. It also incorporates the Ghost blocks, Downsample blocks, and Input Injection blocks to reduce computational load, increase processing frame rate, and enhance feature representation. Experimental results have shown that FastBeltNet has performed comparatively better than some existing methods in different real-world production settings, achieving promising performance metrics. Specifically, FastBeltNet achieves 80.49% mIoU accuracy, 99.89 FPS processing speed, 895 k parameters, 8.23 GFLOPs, and 430.95 MB peak CUDA memory use, effectively balancing accuracy and speed for industrial production.
带式输送机广泛应用于煤炭、钢铁、港口、电力、冶金和化工等多个行业。这些行业面临的一个主要挑战是皮带偏离,这会对生产效率和安全造成负面影响。尽管以前曾对提高皮带边缘检测精度进行过研究,但在实际工业应用中,仍然需要优先考虑系统效率和轻量级模型。为了满足这一需求,我们专门开发了一种名为 FastBeltNet 的新型语义分割网络,用于实时、高精度地分割传送带边缘线,同时保持轻量级设计。该网络采用双分支结构,将用于提取高分辨率空间信息的浅层空间分支与用于提取深层上下文语义信息的上下文分支相结合。它还结合了幽灵区块、下采样区块和输入注入区块,以减少计算负荷、提高处理帧频并增强特征表示。实验结果表明,在不同的实际生产环境中,FastBeltNet 的表现优于一些现有方法,取得了可喜的性能指标。具体来说,FastBeltNet 实现了 80.49% 的 mIoU 精确度、99.89 FPS 的处理速度、895 k 个参数、8.23 GFLOPs 和 430.95 MB 的峰值 CUDA 内存使用量,有效地平衡了工业生产中的精确度和速度。
{"title":"FastBeltNet: a dual-branch light-weight network for real-time conveyor belt edge detection","authors":"Xing Zhao, Minhao Zeng, Yanglin Dong, Gang Rao, Xianshan Huang, Xutao Mo","doi":"10.1007/s11554-024-01502-z","DOIUrl":"https://doi.org/10.1007/s11554-024-01502-z","url":null,"abstract":"<p>Belt conveyors are widely used in multiple industries, including coal, steel, port, power, metallurgy, and chemical, etc. One major challenge faced by these industries is belt deviation, which can negatively impact production efficiency and safety. Despite previous research on improving belt edge detection accuracy, there is still a need to prioritize system efficiency and light-weight models for practical industrial applications. To meet this need, a new semantic segmentation network called FastBeltNet has been developed specifically for real-time and highly accurate conveyor belt edge line segmentation while maintaining a light-weight design. This network uses a dual-branch structure that combines a shallow spatial branch for extracting high-resolution spatial information with a context branch for deep contextual semantic information. It also incorporates the Ghost blocks, Downsample blocks, and Input Injection blocks to reduce computational load, increase processing frame rate, and enhance feature representation. Experimental results have shown that FastBeltNet has performed comparatively better than some existing methods in different real-world production settings, achieving promising performance metrics. Specifically, FastBeltNet achieves 80.49% mIoU accuracy, 99.89 FPS processing speed, 895 k parameters, 8.23 GFLOPs, and 430.95 MB peak CUDA memory use, effectively balancing accuracy and speed for industrial production.</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2024-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141549048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-03DOI: 10.1007/s11554-024-01504-x
Changxin Qin, Zhongyu Zhou
With the rapid expansion of the electronics industry, the demand for high-quality printed circuit boards has surged. However, existing PCB defect detection methods suffer from various limitations, such as slow speeds, low accuracy, and restricted detection scope, often leading to false positives and negatives. To overcome these challenges, this paper presents YOLO-FGD, a novel detection model. YOLO-FGD replaces YOLOv5’s backbone network with FasterNet, significantly accelerating feature extraction. The Neck section adopts the Gather-and-Distribute mechanism, which enhances multiscale feature fusion for small targets through convolution and self-attention mechanisms. Integration of the C3_Faster feature extraction module effectively reduces the number of parameters and the number of FLOPs, accelerating the computations. Experiments on the PCB-DATASETS dataset show promising results: the mean average precision50 reaches 98.8%, the mean average precision50–95 reaches 57.2%, the computational load is reduced to 11.5 GFLOPs, and the model size is only 12.6 MB, meeting lightweight standards. These findings underscore the effectiveness of YOLO-FGD in efficiently detecting PCB defects, providing robust support for electronic manufacturing quality control.
{"title":"YOLO-FGD: a fast lightweight PCB defect method based on FasterNet and the Gather-and-Distribute mechanism","authors":"Changxin Qin, Zhongyu Zhou","doi":"10.1007/s11554-024-01504-x","DOIUrl":"https://doi.org/10.1007/s11554-024-01504-x","url":null,"abstract":"<p>With the rapid expansion of the electronics industry, the demand for high-quality printed circuit boards has surged. However, existing PCB defect detection methods suffer from various limitations, such as slow speeds, low accuracy, and restricted detection scope, often leading to false positives and negatives. To overcome these challenges, this paper presents YOLO-FGD, a novel detection model. YOLO-FGD replaces YOLOv5’s backbone network with FasterNet, significantly accelerating feature extraction. The Neck section adopts the Gather-and-Distribute mechanism, which enhances multiscale feature fusion for small targets through convolution and self-attention mechanisms. Integration of the C3_Faster feature extraction module effectively reduces the number of parameters and the number of FLOPs, accelerating the computations. Experiments on the PCB-DATASETS dataset show promising results: the mean average precision50 reaches 98.8%, the mean average precision50–95 reaches 57.2%, the computational load is reduced to 11.5 GFLOPs, and the model size is only 12.6 MB, meeting lightweight standards. These findings underscore the effectiveness of YOLO-FGD in efficiently detecting PCB defects, providing robust support for electronic manufacturing quality control.</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2024-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141516484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Stroke, a life-threatening medical condition, necessitates immediate intervention for optimal outcomes. Timely diagnosis and treatment play a crucial role in reducing mortality and minimizing long-term disabilities associated with strokes. This study presents a novel approach to meet these critical needs by proposing a real-time stroke detection system based on deep learning (DL) with utilization of federated learning (FL) to enhance accuracy and privacy preservation. The primary objective of this research is to develop an efficient and accurate model capable of discerning between stroke and non-stroke cases in real-time, facilitating healthcare professionals in making well-informed decisions. Traditional stroke detection methods relying on manual interpretation of medical images are time-consuming and prone to human error. DL techniques have shown promise in automating this process, yet challenges persist due to the need for extensive and diverse datasets and privacy concerns. To address these challenges, our methodology involves utilization and assessing YOLOv8 models on comprehensive datasets comprising both stroke and non-stroke based on the facial paralysis of the individuals from the images. This training process empowers the model to grasp intricate patterns and features associated with strokes, thereby enhancing its diagnostic accuracy. In addition, federated learning, a decentralized training approach, is employed to bolster privacy while preserving model performance. This approach enables the model to learn from data distributed across various clients without compromising sensitive patient information. The proposed methodology has been implemented on NVIDIA platforms, utilizing their advanced GPU capabilities to enable real-time processing and analysis. This optimized model has the potential to revolutionize stroke diagnosis and patient care, promising to save lives and elevate the quality of healthcare services in the neurology field.
{"title":"TeleStroke: real-time stroke detection with federated learning and YOLOv8 on edge devices","authors":"Abdussalam Elhanashi, Pierpaolo Dini, Sergio Saponara, Qinghe Zheng","doi":"10.1007/s11554-024-01500-1","DOIUrl":"https://doi.org/10.1007/s11554-024-01500-1","url":null,"abstract":"<p>Stroke, a life-threatening medical condition, necessitates immediate intervention for optimal outcomes. Timely diagnosis and treatment play a crucial role in reducing mortality and minimizing long-term disabilities associated with strokes. This study presents a novel approach to meet these critical needs by proposing a real-time stroke detection system based on deep learning (DL) with utilization of federated learning (FL) to enhance accuracy and privacy preservation. The primary objective of this research is to develop an efficient and accurate model capable of discerning between stroke and non-stroke cases in real-time, facilitating healthcare professionals in making well-informed decisions. Traditional stroke detection methods relying on manual interpretation of medical images are time-consuming and prone to human error. DL techniques have shown promise in automating this process, yet challenges persist due to the need for extensive and diverse datasets and privacy concerns. To address these challenges, our methodology involves utilization and assessing YOLOv8 models on comprehensive datasets comprising both stroke and non-stroke based on the facial paralysis of the individuals from the images. This training process empowers the model to grasp intricate patterns and features associated with strokes, thereby enhancing its diagnostic accuracy. In addition, federated learning, a decentralized training approach, is employed to bolster privacy while preserving model performance. This approach enables the model to learn from data distributed across various clients without compromising sensitive patient information. The proposed methodology has been implemented on NVIDIA platforms, utilizing their advanced GPU capabilities to enable real-time processing and analysis. This optimized model has the potential to revolutionize stroke diagnosis and patient care, promising to save lives and elevate the quality of healthcare services in the neurology field.</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2024-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141529223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-25DOI: 10.1007/s11554-024-01490-0
Shubhasree AV, Praveen Sankaran, Raghu C.V
The state of Kerala in India has seen multiple instances of intense cyclones in recent years, resulting in heavy flooding. One of the biggest challenges faced by rescuers is the accessibility to flooded areas and buildings during rescue operations. In such scenarios, unmanned aerial vehicles (UAVs) can deliver reliable aerial visual data to aid planning and operations during rescue. Object detectors based on deep learning methods provide an effective solution to automate the process of detecting relevant information from image/video data. These models are complex and resource-hungry, leading to severe speed constraints during field operations. The pixel displacement algorithm (PDA), a portable and effective technique, is developed in this work to speed up object detection models on devices with limited resources, such as edge devices. This method can be integrated with all object detection models to speed up the inference time. The proposed method is combined with multiple object detection models in this work to show its effectiveness. The YOLOv4 model combined with the proposed method outperformed the AP50 performance of the YOLOv4-tiny model by 6(%) while maintaining the same processing time. This approach gave almost 10(times ) speed improvement to Jetson Nano at an accuracy cost of (3%) when compared to YOLOv4. Further, a model to predict maximum pixel shift with respect to frame skip is proposed using parameters such as the altitude and velocity of the UAV and the tilt of the camera. Accurate prediction of pixel shift leads to a reduced search area, leading to reduced inference time. The effectiveness of the proposed model was tested against annotated locations, and it was found that the method was able to predict the search area for each test video segment with a high degree of accuracy.
{"title":"Towards real-time video analysis of flooded areas: redundancy-based accelerator for object detection models","authors":"Shubhasree AV, Praveen Sankaran, Raghu C.V","doi":"10.1007/s11554-024-01490-0","DOIUrl":"https://doi.org/10.1007/s11554-024-01490-0","url":null,"abstract":"<p>The state of Kerala in India has seen multiple instances of intense cyclones in recent years, resulting in heavy flooding. One of the biggest challenges faced by rescuers is the accessibility to flooded areas and buildings during rescue operations. In such scenarios, unmanned aerial vehicles (UAVs) can deliver reliable aerial visual data to aid planning and operations during rescue. Object detectors based on deep learning methods provide an effective solution to automate the process of detecting relevant information from image/video data. These models are complex and resource-hungry, leading to severe speed constraints during field operations. The pixel displacement algorithm (PDA), a portable and effective technique, is developed in this work to speed up object detection models on devices with limited resources, such as edge devices. This method can be integrated with all object detection models to speed up the inference time. The proposed method is combined with multiple object detection models in this work to show its effectiveness. The YOLOv4 model combined with the proposed method outperformed the AP50 performance of the YOLOv4-tiny model by 6<span>(%)</span> while maintaining the same processing time. This approach gave almost 10<span>(times )</span> speed improvement to Jetson Nano at an accuracy cost of <span>(3%)</span> when compared to YOLOv4. Further, a model to predict maximum pixel shift with respect to frame skip is proposed using parameters such as the altitude and velocity of the UAV and the tilt of the camera. Accurate prediction of pixel shift leads to a reduced search area, leading to reduced inference time. The effectiveness of the proposed model was tested against annotated locations, and it was found that the method was able to predict the search area for each test video segment with a high degree of accuracy.</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2024-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141532365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-25DOI: 10.1007/s11554-024-01501-0
Shuqiang Wang, Peiyang Wu, Qingqing Wu
Safety helmets are vital protective gear for construction workers, effectively reducing head injuries and safeguarding lives. By identification of safety helmet usage, workers’ unsafe behaviors can be detected and corrected in a timely manner, reducing the possibility of accidents. Target detection methods based on computer vision can achieve fast and accurate detection regarding the wearing habits of safety helmets of workers. In this study, we propose a real-time construction-site helmet detection algorithm that improves YOLOv7-tiny to address the problems associated with automatically identifying construction-site helmets. First, the Efficient Multi-scale Attention (EMA) module is introduced at the trunk to capture the detailed information; here, the model is more focused on training to recognize the helmet-related target features. Second, the detection head is replaced with a self-attentive Dynamic Head (DyHead) for stronger feature representation. Finally, Wise-IoU (WIoU) with a dynamic nonmonotonic focusing mechanism is used as a loss function to improve the model’s ability to manage the situation of mutual occlusion between workers and enhance the detection performance. The experimental results show that the improved YOLOv7-tiny algorithm model yields 3.3, 1.5, and 5.6% improvements in the evaluation of indices of mAP@0.5, precision, and recall, respectively, while maintaining its lightweight features; this enables more accurate detection with a suitable detection speed and is more in conjunction with the needs of on-site-automated detection.
{"title":"Safety helmet detection based on improved YOLOv7-tiny with multiple feature enhancement","authors":"Shuqiang Wang, Peiyang Wu, Qingqing Wu","doi":"10.1007/s11554-024-01501-0","DOIUrl":"https://doi.org/10.1007/s11554-024-01501-0","url":null,"abstract":"<p>Safety helmets are vital protective gear for construction workers, effectively reducing head injuries and safeguarding lives. By identification of safety helmet usage, workers’ unsafe behaviors can be detected and corrected in a timely manner, reducing the possibility of accidents. Target detection methods based on computer vision can achieve fast and accurate detection regarding the wearing habits of safety helmets of workers. In this study, we propose a real-time construction-site helmet detection algorithm that improves YOLOv7-tiny to address the problems associated with automatically identifying construction-site helmets. First, the Efficient Multi-scale Attention (EMA) module is introduced at the trunk to capture the detailed information; here, the model is more focused on training to recognize the helmet-related target features. Second, the detection head is replaced with a self-attentive Dynamic Head (DyHead) for stronger feature representation. Finally, Wise-IoU (WIoU) with a dynamic nonmonotonic focusing mechanism is used as a loss function to improve the model’s ability to manage the situation of mutual occlusion between workers and enhance the detection performance. The experimental results show that the improved YOLOv7-tiny algorithm model yields 3.3, 1.5, and 5.6% improvements in the evaluation of indices of mAP@0.5, precision, and recall, respectively, while maintaining its lightweight features; this enables more accurate detection with a suitable detection speed and is more in conjunction with the needs of on-site-automated detection.</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2024-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141529224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-23DOI: 10.1007/s11554-024-01498-6
Dat Ngo, Bongsoon Kang
In computer vision, entropy is a measure adopted to characterize the texture information of a grayscale image, and an entropy filter is a fundamental operation used to calculate local entropy. However, this filter is computationally intensive and demands an efficient means of implementation. Additionally, with the foreseeable end of Moore’s law, there is a growing trend towards hardware offloading to increase computing power. In line with this trend, we propose a novel method for the calculation of local entropy and introduce a corresponding pipelined architecture. Under the proposed method, a sliding window of pixels undergoes three steps: sorting, adjacent difference calculation, and pipelined entropy calculation. Compared with a conventional design, implementation results on a Zynq UltraScale+ XCZU7EV-2FFVC1156 MPSoC device demonstrate that our pipelined architecture can reach a maximum throughput of handling 764.526 megapixels per second while achieving (2.4times) and (2.9times) reductions in resource utilization and (1.1times) reduction in power consumption.
{"title":"A novel pipelined architecture of entropy filter","authors":"Dat Ngo, Bongsoon Kang","doi":"10.1007/s11554-024-01498-6","DOIUrl":"https://doi.org/10.1007/s11554-024-01498-6","url":null,"abstract":"<p>In computer vision, entropy is a measure adopted to characterize the texture information of a grayscale image, and an entropy filter is a fundamental operation used to calculate local entropy. However, this filter is computationally intensive and demands an efficient means of implementation. Additionally, with the foreseeable end of Moore’s law, there is a growing trend towards hardware offloading to increase computing power. In line with this trend, we propose a novel method for the calculation of local entropy and introduce a corresponding pipelined architecture. Under the proposed method, a sliding window of pixels undergoes three steps: sorting, adjacent difference calculation, and pipelined entropy calculation. Compared with a conventional design, implementation results on a Zynq UltraScale+ XCZU7EV-2FFVC1156 MPSoC device demonstrate that our pipelined architecture can reach a maximum throughput of handling 764.526 megapixels per second while achieving <span>(2.4times)</span> and <span>(2.9times)</span> reductions in resource utilization and <span>(1.1times)</span> reduction in power consumption.</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2024-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141529225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-19DOI: 10.1007/s11554-024-01497-7
Qingtian Zeng, Daibai Wei, Minghao Zou
To address the issues of varying defect sizes, inconsistent data quality, and real-time detection challenges in steel defect detection, we propose a real-time efficient steel defect detection network (RTSD). This model employs a multi-scale feature extraction module (MSC3) and a mid-sized object detector (MidObj) to comprehensively capture texture features of defects across different scales. We incorporate a coordinate attention module (CA) and replace the spatial pyramid pooling structure (SPPF) to enhance defect localization capabilities. Additionally, we introduce the Wise-IoU (WIoU) loss function to balance attention to various quality defects. To address the real-time detection issue, we use Taylor channel pruning to reduce model complexity and employ channel-wise knowledge distillation instead of fine-tuning to mitigate the negative impacts of pruning. Experimental results show that on the NEU-DET data set, the average precision of RTSD reaches 83.5%. The model parameters, calculation amount, and size are 5.9M, 7.9 GFLOPs, and 11.9M, respectively, with an inference speed of up to 247.6 FPS. This demonstrates that our method can enhance performance while significantly reducing model complexity and computational overhead, offering a highly practical solution for industrial applications.
{"title":"Rtsds:a real-time and efficient method for detecting surface defects in strip steel","authors":"Qingtian Zeng, Daibai Wei, Minghao Zou","doi":"10.1007/s11554-024-01497-7","DOIUrl":"https://doi.org/10.1007/s11554-024-01497-7","url":null,"abstract":"<p>To address the issues of varying defect sizes, inconsistent data quality, and real-time detection challenges in steel defect detection, we propose a real-time efficient steel defect detection network (RTSD). This model employs a multi-scale feature extraction module (MSC3) and a mid-sized object detector (MidObj) to comprehensively capture texture features of defects across different scales. We incorporate a coordinate attention module (CA) and replace the spatial pyramid pooling structure (SPPF) to enhance defect localization capabilities. Additionally, we introduce the Wise-IoU (WIoU) loss function to balance attention to various quality defects. To address the real-time detection issue, we use Taylor channel pruning to reduce model complexity and employ channel-wise knowledge distillation instead of fine-tuning to mitigate the negative impacts of pruning. Experimental results show that on the NEU-DET data set, the average precision of RTSD reaches 83.5%. The model parameters, calculation amount, and size are 5.9M, 7.9 GFLOPs, and 11.9M, respectively, with an inference speed of up to 247.6 FPS. This demonstrates that our method can enhance performance while significantly reducing model complexity and computational overhead, offering a highly practical solution for industrial applications.</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2024-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141530841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-18DOI: 10.1007/s11554-024-01496-8
Qi Li, Hengyi Li, Lin Meng
In the promising Artificial Intelligence of Things technology, deep learning algorithms are implemented on edge devices to process data locally. However, high-performance deep learning algorithms are accompanied by increased computation and parameter storage costs, leading to difficulties in implementing huge deep learning algorithms on memory and power constrained edge devices, such as smartphones and drones. Thus various compression methods are proposed, such as channel pruning. According to the analysis of low-level operations on edge devices, existing channel pruning methods have limited effect on latency optimization. Due to data processing operations, the pruned residual blocks still result in significant latency, which hinders real-time processing of CNNs on edge devices. Hence, we propose a generic deep learning architecture optimization method to achieve further acceleration on edge devices. The network is optimized in two stages, Global Constraint and Start-up Latency Reduction, and pruning of both channels and residual blocks is achieved. Optimized networks are evaluated on desktop CPU, FPGA, ARM CPU, and PULP platforms. The experimental results show that the latency is reduced by up to 70.40%, which is 13.63% higher than only applying channel pruning and achieving real-time processing in the edge device.
在前景广阔的人工智能物联网技术中,深度学习算法是在边缘设备上实现本地数据处理的。然而,高性能的深度学习算法伴随着计算和参数存储成本的增加,导致在智能手机和无人机等内存和功耗受限的边缘设备上实施庞大的深度学习算法存在困难。因此,人们提出了各种压缩方法,如通道剪枝。根据对边缘设备底层操作的分析,现有的通道剪枝方法对延迟优化的效果有限。由于数据处理操作的原因,剪枝后的残余块仍然会导致明显的延迟,这阻碍了边缘设备上 CNN 的实时处理。因此,我们提出了一种通用的深度学习架构优化方法,以实现在边缘设备上的进一步加速。网络优化分为两个阶段:全局约束和启动延迟降低,并实现了通道和残余块的剪枝。在桌面 CPU、FPGA、ARM CPU 和 PULP 平台上对优化后的网络进行了评估。实验结果表明,延迟降低了 70.40%,比仅应用通道剪枝和在边缘设备中实现实时处理高出 13.63%。
{"title":"A generic deep learning architecture optimization method for edge device based on start-up latency reduction","authors":"Qi Li, Hengyi Li, Lin Meng","doi":"10.1007/s11554-024-01496-8","DOIUrl":"https://doi.org/10.1007/s11554-024-01496-8","url":null,"abstract":"<p>In the promising Artificial Intelligence of Things technology, deep learning algorithms are implemented on edge devices to process data locally. However, high-performance deep learning algorithms are accompanied by increased computation and parameter storage costs, leading to difficulties in implementing huge deep learning algorithms on memory and power constrained edge devices, such as smartphones and drones. Thus various compression methods are proposed, such as channel pruning. According to the analysis of low-level operations on edge devices, existing channel pruning methods have limited effect on latency optimization. Due to data processing operations, the pruned residual blocks still result in significant latency, which hinders real-time processing of CNNs on edge devices. Hence, we propose a generic deep learning architecture optimization method to achieve further acceleration on edge devices. The network is optimized in two stages, Global Constraint and Start-up Latency Reduction, and pruning of both channels and residual blocks is achieved. Optimized networks are evaluated on desktop CPU, FPGA, ARM CPU, and PULP platforms. The experimental results show that the latency is reduced by up to 70.40%, which is 13.63% higher than only applying channel pruning and achieving real-time processing in the edge device.</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2024-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141529227","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-18DOI: 10.1007/s11554-024-01495-9
Han Wang, Qing Yang, Binlin Zhang, Dexin Gao
Aiming at the complex background of transmission lines at the present stage, which leads to the problem of low accuracy of insulator fault detection for small targets, a deep learning-based insulator fault detection algorithm for transmission lines is proposed. First, aerial images of insulators are collected using UAVs in different scenarios to establish insulator fault datasets. After that, in order to improve the detection efficiency of the target detection algorithm, certain improvements are made on the basis of the YOLOV9 algorithm. The improved algorithm enhances the feature extraction capability of the algorithm for insulator faults at a smaller computational cost by adding the GAM attention mechanism; at the same time, in order to realize the detection efficiency of small targets for insulator faults, the generalized efficient layer aggregation network (GELAN) module is improved and a new SC-GELAN module is proposed; the original loss function is replaced by the effective intersection-over-union (EIOU) loss function to minimize the difference between the aspect ratio of the predicted frame and the real frame, thereby accelerating the convergence speed of the model. Finally, the proposed algorithm is trained and tested with other target detection algorithms on the established insulator fault dataset. The experimental results and analysis show that the algorithm in this paper ensures a certain detection speed, while the algorithmic model has a higher detection accuracy, which is more suitable for UAV fault detection of insulators on transmission lines.
{"title":"Deep learning based insulator fault detection algorithm for power transmission lines","authors":"Han Wang, Qing Yang, Binlin Zhang, Dexin Gao","doi":"10.1007/s11554-024-01495-9","DOIUrl":"https://doi.org/10.1007/s11554-024-01495-9","url":null,"abstract":"<p>Aiming at the complex background of transmission lines at the present stage, which leads to the problem of low accuracy of insulator fault detection for small targets, a deep learning-based insulator fault detection algorithm for transmission lines is proposed. First, aerial images of insulators are collected using UAVs in different scenarios to establish insulator fault datasets. After that, in order to improve the detection efficiency of the target detection algorithm, certain improvements are made on the basis of the YOLOV9 algorithm. The improved algorithm enhances the feature extraction capability of the algorithm for insulator faults at a smaller computational cost by adding the GAM attention mechanism; at the same time, in order to realize the detection efficiency of small targets for insulator faults, the generalized efficient layer aggregation network (GELAN) module is improved and a new SC-GELAN module is proposed; the original loss function is replaced by the effective intersection-over-union (EIOU) loss function to minimize the difference between the aspect ratio of the predicted frame and the real frame, thereby accelerating the convergence speed of the model. Finally, the proposed algorithm is trained and tested with other target detection algorithms on the established insulator fault dataset. The experimental results and analysis show that the algorithm in this paper ensures a certain detection speed, while the algorithmic model has a higher detection accuracy, which is more suitable for UAV fault detection of insulators on transmission lines.</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2024-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141529229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}