Pub Date : 2024-05-12DOI: 10.1007/s11554-024-01472-2
Chunhui Bai, Lilian Zhang, Lutao Gao, Lin Peng, Peishan Li, Linnan Yang
In response to the fuzzy and complex boundaries of unstructured road scenes, as well as the high difficulty of segmentation, this paper uses BiSeNet as the benchmark model to improve the above situation and proposes a real-time segmentation model based on partial convolution. Using FasterNet based on partial convolution as the backbone network and improving it, adopting higher floating-point operations per second operators to improve the inference speed of the model; optimizing the model structure, removing inefficient spatial paths, and using shallow features of context paths to replace their roles, reducing model complexity; the Residual Atrous Spatial Pyramid Pooling Module is proposed to replace a single context embedding module in the original model, allowing better extraction of multi-scale context information and improving the accuracy of model segmentation; the feature fusion module is upgraded, the proposed Dual Attention Features Fusion Module is more helpful for the model to better understand image context through cross-level feature fusion. This paper proposes a model with a inference speed of 78.81 f/s, which meets the real-time requirements of unstructured road scene segmentation. Regarding accuracy metrics, the model in this paper excels with Mean Intersection over Union and Macro F1 at 72.63% and 83.20%, respectively, showing significant advantages over other advanced real-time segmentation models. Therefore, the real-time segmentation model based on partial convolution in this paper well meets the accuracy and speed required for segmentation tasks in complex and variable unstructured road scenes, and has reference value for the development of autonomous driving technology in unstructured road scenes. Code is available at https://github.com/BaiChunhui2001/Real-time-segmentation.
{"title":"Real-time segmentation algorithm of unstructured road scenes based on improved BiSeNet","authors":"Chunhui Bai, Lilian Zhang, Lutao Gao, Lin Peng, Peishan Li, Linnan Yang","doi":"10.1007/s11554-024-01472-2","DOIUrl":"https://doi.org/10.1007/s11554-024-01472-2","url":null,"abstract":"<p>In response to the fuzzy and complex boundaries of unstructured road scenes, as well as the high difficulty of segmentation, this paper uses BiSeNet as the benchmark model to improve the above situation and proposes a real-time segmentation model based on partial convolution. Using FasterNet based on partial convolution as the backbone network and improving it, adopting higher floating-point operations per second operators to improve the inference speed of the model; optimizing the model structure, removing inefficient spatial paths, and using shallow features of context paths to replace their roles, reducing model complexity; the Residual Atrous Spatial Pyramid Pooling Module is proposed to replace a single context embedding module in the original model, allowing better extraction of multi-scale context information and improving the accuracy of model segmentation; the feature fusion module is upgraded, the proposed Dual Attention Features Fusion Module is more helpful for the model to better understand image context through cross-level feature fusion. This paper proposes a model with a inference speed of 78.81 f/s, which meets the real-time requirements of unstructured road scene segmentation. Regarding accuracy metrics, the model in this paper excels with Mean Intersection over Union and Macro F1 at 72.63% and 83.20%, respectively, showing significant advantages over other advanced real-time segmentation models. Therefore, the real-time segmentation model based on partial convolution in this paper well meets the accuracy and speed required for segmentation tasks in complex and variable unstructured road scenes, and has reference value for the development of autonomous driving technology in unstructured road scenes. Code is available at https://github.com/BaiChunhui2001/Real-time-segmentation.</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"30 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140941977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
To reduce the hardware implementation resource consumption of the two-dimensional transform component in H.266 VVC, a unified hardware structure is proposed that supports full-size Discrete Cosine Transform (DCT), Discrete Sine Transform (DST), and full-size Low-Frequency Non-Separable Transform (LFNST). This paper presents an area-efficient hardware architecture for two-dimensional transforms based on a general Regular Multiplier (RM) and a high-throughput hardware design for LFNST in the context of H.266/VVC. The first approach utilizes the high-frequency zeroing characteristics of VVC and the symmetric properties of the DCT-II matrix, allowing the RM-based architecture to use only 256 general multipliers in a fully pipelined structure with a parallelism of 16. The second approach optimizes the transpose operation of the input matrix for LFNST in a parallelism of 16 architecture, aiming to save storage and logic resources.
{"title":"Hardware architecture optimization for high-frequency zeroing and LFNST in H.266/VVC based on FPGA","authors":"Junxiang Zhang, Qinghua Sheng, Rui Pan, Jiawei Wang, Kuan Qin, Xiaofang Huang, Xiaoyan Niu","doi":"10.1007/s11554-024-01470-4","DOIUrl":"https://doi.org/10.1007/s11554-024-01470-4","url":null,"abstract":"<p>To reduce the hardware implementation resource consumption of the two-dimensional transform component in H.266 VVC, a unified hardware structure is proposed that supports full-size Discrete Cosine Transform (DCT), Discrete Sine Transform (DST), and full-size Low-Frequency Non-Separable Transform (LFNST). This paper presents an area-efficient hardware architecture for two-dimensional transforms based on a general Regular Multiplier (RM) and a high-throughput hardware design for LFNST in the context of H.266/VVC. The first approach utilizes the high-frequency zeroing characteristics of VVC and the symmetric properties of the DCT-II matrix, allowing the RM-based architecture to use only 256 general multipliers in a fully pipelined structure with a parallelism of 16. The second approach optimizes the transpose operation of the input matrix for LFNST in a parallelism of 16 architecture, aiming to save storage and logic resources.</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"156 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140930174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A novel real-time infrared pedestrian detection algorithm is introduced in this study. The proposed approach leverages re-parameterized convolution and channel-spatial location fusion attention to tackle the difficulties presented by low-resolution, partial occlusion, and environmental interference in infrared pedestrian images. These factors have historically hindered the accurate detection of pedestrians using traditional algorithms. First, to tackle the problem of weak feature representation of infrared pedestrian targets caused by low resolution and partial occlusion, a new attention module that integrates channel and spatial is devised and introduced to CSPDarkNet53 to design a new backbone CSLF-DarkNet53. The designed attention model can enhance the feature expression ability of pedestrian targets and make pedestrian targets more prominent in complex backgrounds. Second, to enhance the efficiency of detection and accelerate convergence, a multi-branch decoupled detector head is designed to operate the classification and location of infrared pedestrians separately. Finally, to improve poor real-time without losing precision, we introduce the re-parameterized convolution (Repconv) using parameter identity transformation to decouple the training process and detection process. During the training procedure, to enhance the fitting ability of small convolution kernels, a multi-branch structure with convolution kernels of different scales is designed. Compared with the nice classical detection algorithms, the results of the experiment show that the proposed RCSLFNet not only detects partial occlusion infrared pedestrians in complex environments accurately but also has better real-time performance on the KAIST dataset. The mAP@0.5 reaches 86% and the detection time is 0.0081 s, 2.9% higher than the baseline.
{"title":"RCSLFNet: a novel real-time pedestrian detection network based on re-parameterized convolution and channel-spatial location fusion attention for low-resolution infrared image","authors":"Shuai Hao, Zhengqi Liu, Xu Ma, Yingqi Wu, Tian He, Jiahao Li","doi":"10.1007/s11554-024-01469-x","DOIUrl":"https://doi.org/10.1007/s11554-024-01469-x","url":null,"abstract":"<p>A novel real-time infrared pedestrian detection algorithm is introduced in this study. The proposed approach leverages re-parameterized convolution and channel-spatial location fusion attention to tackle the difficulties presented by low-resolution, partial occlusion, and environmental interference in infrared pedestrian images. These factors have historically hindered the accurate detection of pedestrians using traditional algorithms. First, to tackle the problem of weak feature representation of infrared pedestrian targets caused by low resolution and partial occlusion, a new attention module that integrates channel and spatial is devised and introduced to CSPDarkNet53 to design a new backbone CSLF-DarkNet53. The designed attention model can enhance the feature expression ability of pedestrian targets and make pedestrian targets more prominent in complex backgrounds. Second, to enhance the efficiency of detection and accelerate convergence, a multi-branch decoupled detector head is designed to operate the classification and location of infrared pedestrians separately. Finally, to improve poor real-time without losing precision, we introduce the re-parameterized convolution (Repconv) using parameter identity transformation to decouple the training process and detection process. During the training procedure, to enhance the fitting ability of small convolution kernels, a multi-branch structure with convolution kernels of different scales is designed. Compared with the nice classical detection algorithms, the results of the experiment show that the proposed RCSLFNet not only detects partial occlusion infrared pedestrians in complex environments accurately but also has better real-time performance on the KAIST dataset. The mAP@0.5 reaches 86% and the detection time is 0.0081 s, 2.9% higher than the baseline.</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"43 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140930056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-10DOI: 10.1007/s11554-024-01473-1
Jingfan Liu, Zhaobing Liu
The current apple detection algorithms fail to accurately differentiate obscured apples from pickable ones, thus leading to low accuracy in apple harvesting and a high rate of instances where apples are either mispicked or missed altogether. To address the issues associated with the existing algorithms, this study proposes an improved YOLOv5s-based method, named YOLOv5s-BC, for real-time apple detection, in which a series of modifications have been introduced. First, a coordinate attention block has been incorporated into the backbone module to construct a new backbone network. Second, the original concatenation operation has been replaced with a bi-directional feature pyramid network in the neck network. Finally, a new detection head has been added to the head module, enabling the detection of smaller and more distant targets within the field of view of the robot. The proposed YOLOv5s-BC model was compared to several target detection algorithms, including YOLOv5s, YOLOv4, YOLOv3, SSD, Faster R-CNN (ResNet50), and Faster R-CNN (VGG), with significant improvements of 4.6%, 3.6%, 20.48%, 23.22%, 15.27%, and 15.59% in mAP, respectively. The detection accuracy of the proposed model is also greatly enhanced over the original YOLOv5s model. The model boasts an average detection speed of 0.018 s per image, and the weight size is only 16.7 Mb with 4.7 Mb smaller than that of YOLOv8s, meeting the real-time requirements for the picking robot. Furthermore, according to the heat map, our proposed model can focus more on and learn the high-level features of the target apples, and recognize the smaller target apples better than the original YOLOv5s model. Then, in other apple orchard tests, the model can detect the pickable apples in real time and correctly, illustrating a decent generalization ability. It is noted that our model can provide technical support for the apple harvesting robot in terms of real-time target detection and harvesting sequence planning.
{"title":"YOLOv5s-BC: an improved YOLOv5s-based method for real-time apple detection","authors":"Jingfan Liu, Zhaobing Liu","doi":"10.1007/s11554-024-01473-1","DOIUrl":"https://doi.org/10.1007/s11554-024-01473-1","url":null,"abstract":"<p>The current apple detection algorithms fail to accurately differentiate obscured apples from pickable ones, thus leading to low accuracy in apple harvesting and a high rate of instances where apples are either mispicked or missed altogether. To address the issues associated with the existing algorithms, this study proposes an improved YOLOv5s-based method, named YOLOv5s-BC, for real-time apple detection, in which a series of modifications have been introduced. First, a coordinate attention block has been incorporated into the backbone module to construct a new backbone network. Second, the original concatenation operation has been replaced with a bi-directional feature pyramid network in the neck network. Finally, a new detection head has been added to the head module, enabling the detection of smaller and more distant targets within the field of view of the robot. The proposed YOLOv5s-BC model was compared to several target detection algorithms, including YOLOv5s, YOLOv4, YOLOv3, SSD, Faster R-CNN (ResNet50), and Faster R-CNN (VGG), with significant improvements of 4.6%, 3.6%, 20.48%, 23.22%, 15.27%, and 15.59% in mAP, respectively. The detection accuracy of the proposed model is also greatly enhanced over the original YOLOv5s model. The model boasts an average detection speed of 0.018 s per image, and the weight size is only 16.7 Mb with 4.7 Mb smaller than that of YOLOv8s, meeting the real-time requirements for the picking robot. Furthermore, according to the heat map, our proposed model can focus more on and learn the high-level features of the target apples, and recognize the smaller target apples better than the original YOLOv5s model. Then, in other apple orchard tests, the model can detect the pickable apples in real time and correctly, illustrating a decent generalization ability. It is noted that our model can provide technical support for the apple harvesting robot in terms of real-time target detection and harvesting sequence planning.</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"128 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140929976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-09DOI: 10.1007/s11554-024-01464-2
Peiyi Teng, Gaoming Du, Zhenmin Li, Xiaolei Wang, Yongsheng Yin
Due to the increasing demand for artificial intelligence technology in today’s society, the entire industrial production system is undergoing a transformative process related to automation, reliability, and robustness, seeking higher productivity and product competitiveness. Additionally, many hardware platforms are unable to deploy complex algorithms due to limited resources. To address these challenges, this paper proposes a computationally efficient lightweight convolutional neural network called Brightness Improved by Light-DehazeNet, which removes the impact of fog and haze to reconstruct clear images. Additionally, we introduce an efficient hardware accelerator architecture based on this network for deployment on low-resource platforms. Furthermore, we present a brightness visibility restoration method to prevent brightness loss in dehazed images. To evaluate the performance of our method, extensive experiments were conducted, comparing it with various traditional and deep learning-based methods, including images with artificial synthesis and natural blur. The experimental results demonstrate that our proposed method excels in dehazing ability, outperforming other methods in comprehensive comparisons. Moreover, it achieves rapid processing speeds, with a maximum frame rate of 105 frames per second, meeting the requirements of real-time processing.
由于当今社会对人工智能技术的需求日益增长,整个工业生产系统正在经历一个与自动化、可靠性和稳健性有关的转型过程,以寻求更高的生产率和产品竞争力。此外,由于资源有限,许多硬件平台无法部署复杂的算法。为了应对这些挑战,本文提出了一种计算效率高的轻量级卷积神经网络--Brightness Improved by Light-DehazeNet,它可以消除雾和霾的影响,重建清晰的图像。此外,我们还介绍了基于该网络的高效硬件加速器架构,以便在低资源平台上部署。此外,我们还提出了一种亮度可见性恢复方法,以防止去雾图像中的亮度损失。为了评估我们方法的性能,我们进行了大量实验,将其与各种传统方法和基于深度学习的方法进行了比较,包括人工合成和自然模糊的图像。实验结果表明,我们提出的方法具有出色的去毛刺能力,在综合比较中优于其他方法。此外,它的处理速度也很快,最高帧率可达每秒 105 帧,满足了实时处理的要求。
{"title":"High-speed hardware accelerator based on brightness improved by Light-DehazeNet","authors":"Peiyi Teng, Gaoming Du, Zhenmin Li, Xiaolei Wang, Yongsheng Yin","doi":"10.1007/s11554-024-01464-2","DOIUrl":"https://doi.org/10.1007/s11554-024-01464-2","url":null,"abstract":"<p>Due to the increasing demand for artificial intelligence technology in today’s society, the entire industrial production system is undergoing a transformative process related to automation, reliability, and robustness, seeking higher productivity and product competitiveness. Additionally, many hardware platforms are unable to deploy complex algorithms due to limited resources. To address these challenges, this paper proposes a computationally efficient lightweight convolutional neural network called Brightness Improved by Light-DehazeNet, which removes the impact of fog and haze to reconstruct clear images. Additionally, we introduce an efficient hardware accelerator architecture based on this network for deployment on low-resource platforms. Furthermore, we present a brightness visibility restoration method to prevent brightness loss in dehazed images. To evaluate the performance of our method, extensive experiments were conducted, comparing it with various traditional and deep learning-based methods, including images with artificial synthesis and natural blur. The experimental results demonstrate that our proposed method excels in dehazing ability, outperforming other methods in comprehensive comparisons. Moreover, it achieves rapid processing speeds, with a maximum frame rate of 105 frames per second, meeting the requirements of real-time processing.</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"43 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140930072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Transmission line insulators often operate in challenging weather conditions, particularly on rainy days. Continuous exposure to humidity and rain accelerates the aging process of insulators, leading to a decline in insulating material performance, the occurrence of cracks, and deformation. This situation poses a significant risk to the operation of the power system. Scene images collected on rainy days are frequently obstructed by rain lines, resulting in blurred backgrounds that significantly impact the performance of detection models. To improve the accuracy of insulator defect detection in rainy day environments, this paper proposes the DRI-Net (Derain-Insulator-net) detection model. Firstly, a dataset of insulator defects in rainy weather environments is constructed. Second, designing the de-raining model DRGAN and integrating it as an end-to-end DRGAN de-raining structural layer into the input end of the DRI-Net detection model, we significantly enhance the clarity and quality of images affected by rain, thereby reducing adverse effects such as image blurring and occlusion caused by rainwater. Finally, to enhance the lightweight performance of the model, partial convolution (PConv) and the lightweight upsampling operator CARAFE are utilized in the detection network to reduce the computational complexity of the model. The Wise-IoU bounding box regression loss function is applied to achieve faster convergence and improved detector accuracy. Experimental results demonstrate the effectiveness of the DRI-Net model in the task of rainy-day insulator defect detection, achieving an average precision MAP value of 82.65% in the established dataset. Additionally, an online detection system for rainy day insulator defects is designed in conjunction with the detection model, demonstrating practical engineering applications value.
{"title":"DRI-Net: a model for insulator defect detection on transmission lines in rainy backgrounds","authors":"Chao Ji, Mingjiang Gao, Siyuan Zhou, Junpeng Liu, Yongcan Zhu, Xinbo Huang","doi":"10.1007/s11554-024-01461-5","DOIUrl":"https://doi.org/10.1007/s11554-024-01461-5","url":null,"abstract":"<p>Transmission line insulators often operate in challenging weather conditions, particularly on rainy days. Continuous exposure to humidity and rain accelerates the aging process of insulators, leading to a decline in insulating material performance, the occurrence of cracks, and deformation. This situation poses a significant risk to the operation of the power system. Scene images collected on rainy days are frequently obstructed by rain lines, resulting in blurred backgrounds that significantly impact the performance of detection models. To improve the accuracy of insulator defect detection in rainy day environments, this paper proposes the DRI-Net (Derain-Insulator-net) detection model. Firstly, a dataset of insulator defects in rainy weather environments is constructed. Second, designing the de-raining model DRGAN and integrating it as an end-to-end DRGAN de-raining structural layer into the input end of the DRI-Net detection model, we significantly enhance the clarity and quality of images affected by rain, thereby reducing adverse effects such as image blurring and occlusion caused by rainwater. Finally, to enhance the lightweight performance of the model, partial convolution (PConv) and the lightweight upsampling operator CARAFE are utilized in the detection network to reduce the computational complexity of the model. The Wise-IoU bounding box regression loss function is applied to achieve faster convergence and improved detector accuracy. Experimental results demonstrate the effectiveness of the DRI-Net model in the task of rainy-day insulator defect detection, achieving an average precision MAP value of 82.65% in the established dataset. Additionally, an online detection system for rainy day insulator defects is designed in conjunction with the detection model, demonstrating practical engineering applications value.</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"14 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140929979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-08DOI: 10.1007/s11554-024-01468-y
Zihao Wang, Dong Zhou, Chengjun Guo, Ruihao Zhou
Recently, deep learning methodologies have achieved significant advancements in mineral automatic sorting and anomaly detection. However, the limited features of minerals transported in the form of small particles pose significant challenges to accurate detection. To address this challenge, we propose a enhanced mineral particle detection algorithm based on the YOLOv8s model. Initially, a C2f-SRU block is introduced to enable the feature extraction network to more effectively process spatial redundant information. Additionally, we designed the GFF module with the aim of enhancing information propagation between non-adjacent scale features, thereby enabling deep networks to more fully leverage spatial positional information from shallower networks. Finally, we adopted the Wise-IoU loss function to optimize the detection performance of the model. We also re-designed the position of the prediction heads to achieve precise detection of small-scale targets. The experimental results substantiate the effectiveness of the algorithm, with YOLO-Global achieving a mAP@.5 of 95.8%. In comparison to the original YOLOv8s, the improved model exhibits a 2.5% increase in mAP, achieving a model inference speed of 81 fps, meeting the requirements for real-time processing and accuracy.
{"title":"Yolo-global: a real-time target detector for mineral particles","authors":"Zihao Wang, Dong Zhou, Chengjun Guo, Ruihao Zhou","doi":"10.1007/s11554-024-01468-y","DOIUrl":"https://doi.org/10.1007/s11554-024-01468-y","url":null,"abstract":"<p>Recently, deep learning methodologies have achieved significant advancements in mineral automatic sorting and anomaly detection. However, the limited features of minerals transported in the form of small particles pose significant challenges to accurate detection. To address this challenge, we propose a enhanced mineral particle detection algorithm based on the YOLOv8s model. Initially, a C2f-SRU block is introduced to enable the feature extraction network to more effectively process spatial redundant information. Additionally, we designed the GFF module with the aim of enhancing information propagation between non-adjacent scale features, thereby enabling deep networks to more fully leverage spatial positional information from shallower networks. Finally, we adopted the Wise-IoU loss function to optimize the detection performance of the model. We also re-designed the position of the prediction heads to achieve precise detection of small-scale targets. The experimental results substantiate the effectiveness of the algorithm, with YOLO-Global achieving a mAP@.5 of 95.8%. In comparison to the original YOLOv8s, the improved model exhibits a 2.5% increase in mAP, achieving a model inference speed of 81 fps, meeting the requirements for real-time processing and accuracy.</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"59 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140930022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-07DOI: 10.1007/s11554-024-01467-z
Duc Khai Lam
Over the decades, implementing information technology (IT) has become increasingly common, equating to an increasing amount of data that needs to be stored, creating a massive challenge in data storage. Using a large storage capacity can solve the problem of the file size. However, this method is costly in terms of both capacity and bandwidth. One possible method is data compression, which significantly reduces the file size. With the development of IT and increasing computing capacity, data compression is becoming more and more widespread in many fields, such as broadcast television, aircraft, computer transmission, and medical imaging. In this work, we introduce an image compression algorithm based on the Huffman coding algorithm and use linear techniques to increase image compression efficiency. Besides, we replace 8-bit pixel-by-pixel compression by dividing one pixel into two 4-bit halves to save hardware capacity (because only 4-bit for each input) and optimize run time (because the number of different inputs is less). The goal is to reduce the image’s complexity, increase the data’s repetition rate, reduce the compression time, and increase the image compression efficiency. A hardware accelerator is designed and implemented on the Virtex-7 VC707 FPGA to make it work in real-time. The achieved average compression ratio is 3,467. Hardware design achieves a maximum frequency of 125 MHz.
{"title":"Real-time lossless image compression by dynamic Huffman coding hardware implementation","authors":"Duc Khai Lam","doi":"10.1007/s11554-024-01467-z","DOIUrl":"https://doi.org/10.1007/s11554-024-01467-z","url":null,"abstract":"<p>Over the decades, implementing information technology (IT) has become increasingly common, equating to an increasing amount of data that needs to be stored, creating a massive challenge in data storage. Using a large storage capacity can solve the problem of the file size. However, this method is costly in terms of both capacity and bandwidth. One possible method is data compression, which significantly reduces the file size. With the development of IT and increasing computing capacity, data compression is becoming more and more widespread in many fields, such as broadcast television, aircraft, computer transmission, and medical imaging. In this work, we introduce an image compression algorithm based on the Huffman coding algorithm and use linear techniques to increase image compression efficiency. Besides, we replace 8-bit pixel-by-pixel compression by dividing one pixel into two 4-bit halves to save hardware capacity (because only 4-bit for each input) and optimize run time (because the number of different inputs is less). The goal is to reduce the image’s complexity, increase the data’s repetition rate, reduce the compression time, and increase the image compression efficiency. A hardware accelerator is designed and implemented on the Virtex-7 VC707 FPGA to make it work in real-time. The achieved average compression ratio is 3,467. Hardware design achieves a maximum frequency of 125 MHz.</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"20 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140882428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-06DOI: 10.1007/s11554-024-01450-8
Bishoy K. Sharobim, Muhammad Hosam, Salwa K. Abd-El-Hafiz, Wafaa S. Sayed, Lobna A. Said, Ahmed G. Radwan
Secret image sharing (SIS) conveys a secret image to mutually suspicious receivers by sending meaningless shares to the participants, and all shares must be present to recover the secret. This paper proposes and compares three systems for secret sharing, where a visual cryptography system is designed with a fast recovery scheme as the backbone for all systems. Then, an SIS system is introduced for sharing any type of image, where it improves security using the Lorenz chaotic system as the source of randomness and the generalized Arnold transform as a permutation module. The second SIS system further enhances security and robustness by utilizing SHA-256 and RSA cryptosystem. The presented architectures are implemented on a field programmable gate array (FPGA) to enhance computational efficiency and facilitate real-time processing. Detailed experimental results and comparisons between the software and hardware realizations are presented. Security analysis and comparisons with related literature are also introduced with good results, including statistical tests, differential attack measures, robustness tests against noise and crop attacks, key sensitivity tests, and performance analysis.
{"title":"Software and hardware realizations for different designs of chaos-based secret image sharing systems","authors":"Bishoy K. Sharobim, Muhammad Hosam, Salwa K. Abd-El-Hafiz, Wafaa S. Sayed, Lobna A. Said, Ahmed G. Radwan","doi":"10.1007/s11554-024-01450-8","DOIUrl":"https://doi.org/10.1007/s11554-024-01450-8","url":null,"abstract":"<p>Secret image sharing (SIS) conveys a secret image to mutually suspicious receivers by sending meaningless shares to the participants, and all shares must be present to recover the secret. This paper proposes and compares three systems for secret sharing, where a visual cryptography system is designed with a fast recovery scheme as the backbone for all systems. Then, an SIS system is introduced for sharing any type of image, where it improves security using the Lorenz chaotic system as the source of randomness and the generalized Arnold transform as a permutation module. The second SIS system further enhances security and robustness by utilizing SHA-256 and RSA cryptosystem. The presented architectures are implemented on a field programmable gate array (FPGA) to enhance computational efficiency and facilitate real-time processing. Detailed experimental results and comparisons between the software and hardware realizations are presented. Security analysis and comparisons with related literature are also introduced with good results, including statistical tests, differential attack measures, robustness tests against noise and crop attacks, key sensitivity tests, and performance analysis.</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"2012 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140882445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-04DOI: 10.1007/s11554-024-01460-6
Ruitao Zheng, Haifei Zhu, Xinghua Wu, Wei Meng
This paper deals with a challenging autonomous parking problem in which the parking slots are with various different angles. We transform the problem of parking slot detection into center keypoint detection, representing the parking slot as a T-shape to make it robust and simple. For diverse types of parking slots, we propose a T-shape parking slot detection method, called T-PSD, to extract the T-shape center information based on a self-calibrated convolution network (SCCN). This method can concurrently obtain the entrance center confidence, the relative offsets of the paired junctions, the direction of the middle line, the occupancy and the inferred type in the parking slots. Final detection results are produced by utilizing Half-Heatmap, MultiBins and Midline-Grid to more accurately extract the center keypoint, direction and occupancy, respectively. To verify the performance of our method, we conduct experiments on the public PS2.0 dataset. The results have shown that our method outperforms state-of-the-art competitors by showing recall rate of 99.86% and precision rate of 99.82%. It is capable of achieving 65 frames per second (FPS) and satisfying a real-time detection performance. In contrast to the simultaneous detection of global and local information, our SCCN detector exclusively concentrates on the T-shape center information, which achieves comparable performance and significantly accelerates the inference time without non-maximum suppression (NMS).
本文讨论的是一个具有挑战性的自主停车问题,其中停车位的角度各不相同。我们将停车位检测问题转化为中心关键点检测问题,将停车位表示为一个 T 形,使其更加稳健和简单。针对不同类型的停车位,我们提出了一种名为 T-PSD 的 T 形停车位检测方法,基于自校准卷积网络(SCCN)提取 T 形中心信息。该方法可同时获得入口中心置信度、成对路口的相对偏移、中线方向、占用率以及推断停车槽的类型。通过利用半热图、多宾和中线网格,分别更准确地提取中心关键点、方向和占用率,从而得出最终的检测结果。为了验证我们方法的性能,我们在公共 PS2.0 数据集上进行了实验。结果表明,我们的方法优于最先进的竞争对手,召回率达到 99.86%,精确率达到 99.82%。它能够达到每秒 65 帧(FPS),满足实时检测性能要求。与同时检测全局和局部信息的方法相比,我们的 SCCN 检测器只集中检测 T 形中心信息,不仅性能相当,而且在没有非最大抑制(NMS)的情况下大大加快了推理时间。
{"title":"T-psd: T-shape parking slot detection with self-calibrated convolution network","authors":"Ruitao Zheng, Haifei Zhu, Xinghua Wu, Wei Meng","doi":"10.1007/s11554-024-01460-6","DOIUrl":"https://doi.org/10.1007/s11554-024-01460-6","url":null,"abstract":"<p>This paper deals with a challenging autonomous parking problem in which the parking slots are with various different angles. We transform the problem of parking slot detection into center keypoint detection, representing the parking slot as a T-shape to make it robust and simple. For diverse types of parking slots, we propose a T-shape parking slot detection method, called T-PSD, to extract the T-shape center information based on a self-calibrated convolution network (SCCN). This method can concurrently obtain the entrance center confidence, the relative offsets of the paired junctions, the direction of the middle line, the occupancy and the inferred type in the parking slots. Final detection results are produced by utilizing Half-Heatmap, MultiBins and Midline-Grid to more accurately extract the center keypoint, direction and occupancy, respectively. To verify the performance of our method, we conduct experiments on the public PS2.0 dataset. The results have shown that our method outperforms state-of-the-art competitors by showing recall rate of 99.86% and precision rate of 99.82%. It is capable of achieving 65 frames per second (FPS) and satisfying a real-time detection performance. In contrast to the simultaneous detection of global and local information, our SCCN detector exclusively concentrates on the T-shape center information, which achieves comparable performance and significantly accelerates the inference time without non-maximum suppression (NMS).</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"12 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140882380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}