Pub Date : 2024-07-17DOI: 10.1007/s11554-024-01512-x
Dangguo Shao, Jie Jiang, Lei Ma, Hua Lai, Sanli Yi
Brain tumors are highly lethal, representing 85–90% of all primary central nervous system (CNS) tumors. Magnetic resonance imaging (MRI) images are employed to identify and assess brain tumors. However, this process has historically relied heavily on the expertise of medical professionals and necessitated the involvement of a substantial number of personnel. To optimize the allocation of medical resources and improve diagnostic efficiency, this work proposes a DETR-based RPC–DETR model that utilizes the Transformer. We conducted comparative experiments using RPC–DETR with other traditional detectors on the same equipment to test the performance exhibited by the model with equal computational resources. Tested on the Br35H brain tumor dataset with 701 MRI images (500 training sets and 201 test sets), RPC–DETR surpasses the YOLO models in accuracy while utilizing fewer parameters. This advancement ensures more reliable and faster diagnosis. RPC–DETR achieves a 96% mAP with just 14M parameters, offering high accuracy in brain tumor detection with a lighter model, making it easier to implement in various medical settings.
{"title":"Real-time medical lesion screening: accurate and rapid detectors","authors":"Dangguo Shao, Jie Jiang, Lei Ma, Hua Lai, Sanli Yi","doi":"10.1007/s11554-024-01512-x","DOIUrl":"https://doi.org/10.1007/s11554-024-01512-x","url":null,"abstract":"<p>Brain tumors are highly lethal, representing 85–90% of all primary central nervous system (CNS) tumors. Magnetic resonance imaging (MRI) images are employed to identify and assess brain tumors. However, this process has historically relied heavily on the expertise of medical professionals and necessitated the involvement of a substantial number of personnel. To optimize the allocation of medical resources and improve diagnostic efficiency, this work proposes a DETR-based RPC–DETR model that utilizes the Transformer. We conducted comparative experiments using RPC–DETR with other traditional detectors on the same equipment to test the performance exhibited by the model with equal computational resources. Tested on the Br35H brain tumor dataset with 701 MRI images (500 training sets and 201 test sets), RPC–DETR surpasses the YOLO models in accuracy while utilizing fewer parameters. This advancement ensures more reliable and faster diagnosis. RPC–DETR achieves a 96% mAP with just 14M parameters, offering high accuracy in brain tumor detection with a lighter model, making it easier to implement in various medical settings.</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"30 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141722197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-11DOI: 10.1007/s11554-024-01513-w
Youyu Liu, Xiangxiang Zhou, Zhendong Zhang, Yi Li, Wanbao Tao
Multi-object tracking (MOT) aims to obtain trajectories with unique identifiers for multiple objects in a video stream. In current approaches, confidence thresholds were frequently used to perform multi-stage data association. However, these thresholds could introduce instability into the algorithm when confronted with diverse scenarios. This article proposed confidence-optimization tracker (COTracker), a full-detection association tracker based on confidence optimization. COTracker incorporated detection confidence and matching cost as covariates and modeled tracklet confidence using exponential moving average (EMA). It introduced confidence cues in data association by generating a weighting matrix containing detection and tracklet confidence. Experimental results showed that COTracker achieved 63.0 HOTA and 77.1 IDF1 on MOT17 test set. On the more crowded MOT20, it achieves 62.4 HOTA and 76.1 IDF1. Compared with threshold-based methods, COTracker showcased the ability to handle various complex scenarios without adjusting the confidence threshold. Furthermore, its outstanding tracking speed, meeting the requirements of real-time tracking, positions it with potential value in applications such as unmanned driving and drone tracking. The source codes are available at https://github.com/LiYi199983/CWTracker.
{"title":"A full-detection association tracker with confidence optimization for real-time multi-object tracking","authors":"Youyu Liu, Xiangxiang Zhou, Zhendong Zhang, Yi Li, Wanbao Tao","doi":"10.1007/s11554-024-01513-w","DOIUrl":"https://doi.org/10.1007/s11554-024-01513-w","url":null,"abstract":"<p>Multi-object tracking (MOT) aims to obtain trajectories with unique identifiers for multiple objects in a video stream. In current approaches, confidence thresholds were frequently used to perform multi-stage data association. However, these thresholds could introduce instability into the algorithm when confronted with diverse scenarios. This article proposed confidence-optimization tracker (COTracker), a full-detection association tracker based on confidence optimization. COTracker incorporated detection confidence and matching cost as covariates and modeled tracklet confidence using exponential moving average (EMA). It introduced confidence cues in data association by generating a weighting matrix containing detection and tracklet confidence. Experimental results showed that COTracker achieved 63.0 HOTA and 77.1 IDF1 on MOT17 test set. On the more crowded MOT20, it achieves 62.4 HOTA and 76.1 IDF1. Compared with threshold-based methods, COTracker showcased the ability to handle various complex scenarios without adjusting the confidence threshold. Furthermore, its outstanding tracking speed, meeting the requirements of real-time tracking, positions it with potential value in applications such as unmanned driving and drone tracking. The source codes are available at https://github.com/LiYi199983/CWTracker.</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"32 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141610334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-10DOI: 10.1007/s11554-024-01508-7
Xiao Feng, Zheng Yuan
The rapid development of generative artificial intelligence technology and large-scale pre-training models has led to the emergence of artificial intelligence generated image content (AIGIC) as an important application of natural language processing models. This has resulted in a significant shift and advancement in the way image content is created. As AIGIC requires the acquisition of substantial image datasets from user devices for training purposes, the data transmission link is highly complex, and the datasets are susceptible to illegal attacks from multiple parties during transmission, which has a detrimental impact on the integrity and real-time nature of the training data and affects the accuracy of the training results of the AIGIC model. Consequently, this paper proposed a real-time authentication mechanism to guarantee the secure transmission of AIGIC image datasets. The mechanism achieves anonymous identity protection for the user device providing the image dataset by introducing a certificate-less encryption system. In turn, an aggregated signature scheme with key negotiation algorithm is introduced to authenticate the user devices of legitimate image datasets. A performance analysis indicates that the mechanism proposed in this paper outperforms other related methods in terms of security and accuracy of AIGIC image model training results, while guaranteeing real-time transmission of AIGIC image datasets, at the same time, the time complexity is also lower, which can effectively ensure the timeliness of the algorithm.
{"title":"Real-time and secure identity authentication transmission mechanism for artificial intelligence generated image content","authors":"Xiao Feng, Zheng Yuan","doi":"10.1007/s11554-024-01508-7","DOIUrl":"https://doi.org/10.1007/s11554-024-01508-7","url":null,"abstract":"<p>The rapid development of generative artificial intelligence technology and large-scale pre-training models has led to the emergence of artificial intelligence generated image content (AIGIC) as an important application of natural language processing models. This has resulted in a significant shift and advancement in the way image content is created. As AIGIC requires the acquisition of substantial image datasets from user devices for training purposes, the data transmission link is highly complex, and the datasets are susceptible to illegal attacks from multiple parties during transmission, which has a detrimental impact on the integrity and real-time nature of the training data and affects the accuracy of the training results of the AIGIC model. Consequently, this paper proposed a real-time authentication mechanism to guarantee the secure transmission of AIGIC image datasets. The mechanism achieves anonymous identity protection for the user device providing the image dataset by introducing a certificate-less encryption system. In turn, an aggregated signature scheme with key negotiation algorithm is introduced to authenticate the user devices of legitimate image datasets. A performance analysis indicates that the mechanism proposed in this paper outperforms other related methods in terms of security and accuracy of AIGIC image model training results, while guaranteeing real-time transmission of AIGIC image datasets, at the same time, the time complexity is also lower, which can effectively ensure the timeliness of the algorithm.</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"181 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141567461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-09DOI: 10.1007/s11554-024-01506-9
Joao Inacio Moreira Bezerra, Alexandre Molter, Gustavo Machado, Rafael Iankowski Soares, Vinícius Valduga de Almeida Camargo
The development of communication technologies has increased concerns about data security, increasing the prominence of cryptography. Images are one of the most widely shared data, and chaotic ciphers arouse significant interest from researchers, as traditional ciphers are not optimized for image encryption. Chaotic encryption schemes perform well for low-quality images but should be faster for real-time encryption of Full Ultra HD images. In this context, a novel parallel image cipher scheme is proposed to execute in GPU architectures, where the encryption procedure consists of a single kernel, making it different from previous chaotic ciphers and the AES. This new contribution enables our work to achieve a throughput of 130.8 GB/s on a GeForce RTX3070 and 251.6 GB/s on a Tesla V100 GPU, an increase of 37% compared to the AES and 43 times higher than the previously high for chaotic ciphers. The cipher’s security is also verified regarding multiple forms of attacks.
{"title":"A novel single kernel parallel image encryption scheme based on a chaotic map","authors":"Joao Inacio Moreira Bezerra, Alexandre Molter, Gustavo Machado, Rafael Iankowski Soares, Vinícius Valduga de Almeida Camargo","doi":"10.1007/s11554-024-01506-9","DOIUrl":"https://doi.org/10.1007/s11554-024-01506-9","url":null,"abstract":"<p>The development of communication technologies has increased concerns about data security, increasing the prominence of cryptography. Images are one of the most widely shared data, and chaotic ciphers arouse significant interest from researchers, as traditional ciphers are not optimized for image encryption. Chaotic encryption schemes perform well for low-quality images but should be faster for real-time encryption of Full Ultra HD images. In this context, a novel parallel image cipher scheme is proposed to execute in GPU architectures, where the encryption procedure consists of a single kernel, making it different from previous chaotic ciphers and the AES. This new contribution enables our work to achieve a throughput of 130.8 GB/s on a GeForce RTX3070 and 251.6 GB/s on a Tesla V100 GPU, an increase of 37% compared to the AES and 43 times higher than the previously high for chaotic ciphers. The cipher’s security is also verified regarding multiple forms of attacks.</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"86 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141567462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-08DOI: 10.1007/s11554-024-01511-y
Jia Yin, Wei Wang, Zhonghua Guo, Yangchun Ji
Deep convolutional neural networks have achieved remarkable progress on computer vision tasks over last years. In this paper, we proposed a dynamic training slimming with feature sparsity based on structured pruning, named DTS, for efficient and automatic channel pruning. Unlike other existing pruning methods, which require manual intervention for pruning settings for each layer, DTS can design suitable architecture width for target datasets and deployment resources by automated pruning. The proposed method can be deployed to modern CNNs and the experimental results on CIFAR, ImageNet and PASCAL VOC benchmark datasets demonstrate the effectiveness of the proposed method, which significantly exceeds the other schemes.
{"title":"DTS: dynamic training slimming with feature sparsity for efficient convolutional neural network","authors":"Jia Yin, Wei Wang, Zhonghua Guo, Yangchun Ji","doi":"10.1007/s11554-024-01511-y","DOIUrl":"https://doi.org/10.1007/s11554-024-01511-y","url":null,"abstract":"<p>Deep convolutional neural networks have achieved remarkable progress on computer vision tasks over last years. In this paper, we proposed a dynamic training slimming with feature sparsity based on structured pruning, named DTS, for efficient and automatic channel pruning. Unlike other existing pruning methods, which require manual intervention for pruning settings for each layer, DTS can design suitable architecture width for target datasets and deployment resources by automated pruning. The proposed method can be deployed to modern CNNs and the experimental results on CIFAR, ImageNet and PASCAL VOC benchmark datasets demonstrate the effectiveness of the proposed method, which significantly exceeds the other schemes.</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"20 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141567464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-08DOI: 10.1007/s11554-024-01510-z
Mei Yang, Huajun Wang
It has been a challenge to obtain accurate detection results in a timely manner when faced with complex and changing surface target detection. Detecting targets on water surfaces in real-time can be challenging due to their rapid movement, small size, and fragmented appearance. In addition, traditional detection methods are often labor-intensive and time-consuming, especially when dealing with large water bodies such as rivers and lakes. This paper presents an improved water surface target detection algorithm that is based on the YOLOv7 (you only look once) model to enhance the performance of water surface target detection. We have enhanced the accuracy and speed of detecting surface targets by making improvements to three key structures: the network aggregation structure, the pyramid pooling structure, and the down-sampling structure. Furthermore, we implemented the model on mobile devices and designed a detection software. The software enables real-time detection through images and videos. The experimental results demonstrate that the improved model outperforms the original YOLOv7 model. It exhibits a 6.4% boost in accuracy, a 4.2% improvement in recall, a 4.1% increase in mAP, a 14.3% reduction in parameter counts, and archives the FPS of 87. The software has the ability to accurately recognize 11 typical targets on the water surface and demonstrates excellent water surface target detection capability.
{"title":"Real-time water surface target detection based on improved YOLOv7 for Chengdu Sand River","authors":"Mei Yang, Huajun Wang","doi":"10.1007/s11554-024-01510-z","DOIUrl":"https://doi.org/10.1007/s11554-024-01510-z","url":null,"abstract":"<p>It has been a challenge to obtain accurate detection results in a timely manner when faced with complex and changing surface target detection. Detecting targets on water surfaces in real-time can be challenging due to their rapid movement, small size, and fragmented appearance. In addition, traditional detection methods are often labor-intensive and time-consuming, especially when dealing with large water bodies such as rivers and lakes. This paper presents an improved water surface target detection algorithm that is based on the YOLOv7 (you only look once) model to enhance the performance of water surface target detection. We have enhanced the accuracy and speed of detecting surface targets by making improvements to three key structures: the network aggregation structure, the pyramid pooling structure, and the down-sampling structure. Furthermore, we implemented the model on mobile devices and designed a detection software. The software enables real-time detection through images and videos. The experimental results demonstrate that the improved model outperforms the original YOLOv7 model. It exhibits a 6.4% boost in accuracy, a 4.2% improvement in recall, a 4.1% increase in mAP, a 14.3% reduction in parameter counts, and archives the FPS of 87. The software has the ability to accurately recognize 11 typical targets on the water surface and demonstrates excellent water surface target detection capability.</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"78 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141567463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-06DOI: 10.1007/s11554-024-01507-8
Yanxiang Xu, Mi Wen, Wei He, Hongwei Wang, Yunsheng Xue
Pedestrian detection in densely populated scenes, particularly in the presence of occlusions, remains a challenging issue in computer vision. Existing approaches often address detection leakage by enhancing model architectures or incorporating attention mechanisms; However, small-scale pedestrians have fewer features and are easily overfitted to the dataset and these approaches still face challenges in accurately detecting pedestrians with small target sizes. To tackle this issue, this research rethinks the occlusion region through small-scale pedestrian detection and proposes the You Only Look Once model for efficient pedestrian detection(YOLO-EPD). Firstly, we find that Standard Convolution and Dilated Convolution do not fit well with pedestrian targets with different scales due to a single receptive field, and we propose the Selective Content Aware Downsampling (SCAD) module, which is integrated into the backbone to attain enhanced feature extraction. In addition, to address the issue of missed detections resulting from insufficient feature extraction for small-scale pedestrian detection, we propose the Crowded Multi-Head Attention (CMHA) module, which makes full use of multi-layer information. Finally, for the challenge of optimizing the performance and effectiveness of small-object detection, we design Unified Channel-Task Distillation (UCTD) with channel attention and a Lightweight head (Lhead) using parameter sharing to keep it lightweight. Experimental results validate the superiority of YOLO-EPD, achieving a remarkable 91.1% Average Precision (AP) on the Widerperson dataset, while concurrently reducing parameters and computational overhead by 40%. The experimental findings demonstrate that YOLO-EPD greatly accelerates the convergence of model training and achieves better real-time performance in real-world dense scenarios.
{"title":"An improved multi-scale and knowledge distillation method for efficient pedestrian detection in dense scenes","authors":"Yanxiang Xu, Mi Wen, Wei He, Hongwei Wang, Yunsheng Xue","doi":"10.1007/s11554-024-01507-8","DOIUrl":"https://doi.org/10.1007/s11554-024-01507-8","url":null,"abstract":"<p>Pedestrian detection in densely populated scenes, particularly in the presence of occlusions, remains a challenging issue in computer vision. Existing approaches often address detection leakage by enhancing model architectures or incorporating attention mechanisms; However, small-scale pedestrians have fewer features and are easily overfitted to the dataset and these approaches still face challenges in accurately detecting pedestrians with small target sizes. To tackle this issue, this research rethinks the occlusion region through small-scale pedestrian detection and proposes the You Only Look Once model for efficient pedestrian detection(YOLO-EPD). Firstly, we find that Standard Convolution and Dilated Convolution do not fit well with pedestrian targets with different scales due to a single receptive field, and we propose the Selective Content Aware Downsampling (SCAD) module, which is integrated into the backbone to attain enhanced feature extraction. In addition, to address the issue of missed detections resulting from insufficient feature extraction for small-scale pedestrian detection, we propose the Crowded Multi-Head Attention (CMHA) module, which makes full use of multi-layer information. Finally, for the challenge of optimizing the performance and effectiveness of small-object detection, we design Unified Channel-Task Distillation (UCTD) with channel attention and a Lightweight head (Lhead) using parameter sharing to keep it lightweight. Experimental results validate the superiority of YOLO-EPD, achieving a remarkable 91.1% Average Precision (AP) on the Widerperson dataset, while concurrently reducing parameters and computational overhead by 40%. The experimental findings demonstrate that YOLO-EPD greatly accelerates the convergence of model training and achieves better real-time performance in real-world dense scenarios.</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"54 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141567465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-05DOI: 10.1007/s11554-024-01499-5
Hongge Ren, Anni Fan, Jian Zhao, Hairui Song, Xiuman Liang
In response to the challenges faced by existing safety helmet detection algorithms when applied to complex construction site scenarios, such as poor accuracy, large number of parameters, large amount of computation and large model size, this paper proposes a lightweight safety helmet detection algorithm based on YOLOv5, which achieves a balance between lightweight and accuracy. First, the algorithm integrates the Distribution Shifting Convolution (DSConv) layer and the Squeeze-and-Excitation (SE) attention mechanism, effectively replacing the original partial convolution and C3 modules, this integration significantly enhances the capabilities of feature extraction and representation learning. Second, multi-scale feature fusion is performed on the Ghost module using skip connections, replacing certain C3 module, to achieve lightweight and maintain accuracy. Finally, adjustments have been made to the Bottleneck Attention Mechanism (BAM) to suppress irrelevant information and enhance the extraction of features in rich regions. The experimental results show that improved model improves the mean average precision (mAP) by 1.0% compared to the original algorithm, reduces the number of parameters by 22.2%, decreases the computation by 20.9%, and the model size is reduced by 20.1%, which realizes the lightweight of the detection algorithm.
{"title":"Lightweight safety helmet detection algorithm using improved YOLOv5","authors":"Hongge Ren, Anni Fan, Jian Zhao, Hairui Song, Xiuman Liang","doi":"10.1007/s11554-024-01499-5","DOIUrl":"https://doi.org/10.1007/s11554-024-01499-5","url":null,"abstract":"<p>In response to the challenges faced by existing safety helmet detection algorithms when applied to complex construction site scenarios, such as poor accuracy, large number of parameters, large amount of computation and large model size, this paper proposes a lightweight safety helmet detection algorithm based on YOLOv5, which achieves a balance between lightweight and accuracy. First, the algorithm integrates the Distribution Shifting Convolution (DSConv) layer and the Squeeze-and-Excitation (SE) attention mechanism, effectively replacing the original partial convolution and C3 modules, this integration significantly enhances the capabilities of feature extraction and representation learning. Second, multi-scale feature fusion is performed on the Ghost module using skip connections, replacing certain C3 module, to achieve lightweight and maintain accuracy. Finally, adjustments have been made to the Bottleneck Attention Mechanism (BAM) to suppress irrelevant information and enhance the extraction of features in rich regions. The experimental results show that improved model improves the mean average precision (mAP) by 1.0% compared to the original algorithm, reduces the number of parameters by 22.2%, decreases the computation by 20.9%, and the model size is reduced by 20.1%, which realizes the lightweight of the detection algorithm.</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"37 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141549047","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-04DOI: 10.1007/s11554-024-01491-z
Alexandre Duarte, Francisco Fernandes, João M. Pereira, Catarina Moreira, Jacinto C. Nascimento, Joaquim Jorge
Depth maps produced by consumer-grade sensors suffer from inaccurate measurements and missing data from either system or scene-specific sources. Data-driven denoising algorithms can mitigate such problems; however, they require vast amounts of ground truth depth data. Recent research has tackled this limitation using self-supervised learning techniques, but it requires multiple RGB-D sensors. Moreover, most existing approaches focus on denoising single isolated depth maps or specific subjects of interest highlighting a need for methods that can effectively denoise depth maps in real-time dynamic environments. This paper extends state-of-the-art approaches for depth-denoising commodity depth devices, proposing SelfReDepth, a self-supervised deep learning technique for depth restoration, via denoising and hole-filling by inpainting of full-depth maps captured with RGB-D sensors. The algorithm targets depth data in video streams, utilizing multiple sequential depth frames coupled with color data to achieve high-quality depth videos with temporal coherence. Finally, SelfReDepth is designed to be compatible with various RGB-D sensors and usable in real-time scenarios as a pre-processing step before applying other depth-dependent algorithms. Our results demonstrate our approach’s real-time performance on real-world datasets shows that it outperforms state-of-the-art methods in denoising and restoration performance at over 30 fps on Commercial Depth Cameras, with potential benefits for augmented and mixed-reality applications.
{"title":"Selfredepth","authors":"Alexandre Duarte, Francisco Fernandes, João M. Pereira, Catarina Moreira, Jacinto C. Nascimento, Joaquim Jorge","doi":"10.1007/s11554-024-01491-z","DOIUrl":"https://doi.org/10.1007/s11554-024-01491-z","url":null,"abstract":"<p>Depth maps produced by consumer-grade sensors suffer from inaccurate measurements and missing data from either system or scene-specific sources. Data-driven denoising algorithms can mitigate such problems; however, they require vast amounts of ground truth depth data. Recent research has tackled this limitation using self-supervised learning techniques, but it requires multiple RGB-D sensors. Moreover, most existing approaches focus on denoising single isolated depth maps or specific subjects of interest highlighting a need for methods that can effectively denoise depth maps in real-time dynamic environments. This paper extends state-of-the-art approaches for depth-denoising commodity depth devices, proposing SelfReDepth, a self-supervised deep learning technique for depth restoration, via denoising and hole-filling by inpainting of full-depth maps captured with RGB-D sensors. The algorithm targets depth data in video streams, utilizing multiple sequential depth frames coupled with color data to achieve high-quality depth videos with temporal coherence. Finally, SelfReDepth is designed to be compatible with various RGB-D sensors and usable in real-time scenarios as a pre-processing step before applying other depth-dependent algorithms. Our results demonstrate our approach’s real-time performance on real-world datasets shows that it outperforms state-of-the-art methods in denoising and restoration performance at over 30 fps on Commercial Depth Cameras, with potential benefits for augmented and mixed-reality applications.</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"62 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141549049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-03DOI: 10.1007/s11554-024-01502-z
Xing Zhao, Minhao Zeng, Yanglin Dong, Gang Rao, Xianshan Huang, Xutao Mo
Belt conveyors are widely used in multiple industries, including coal, steel, port, power, metallurgy, and chemical, etc. One major challenge faced by these industries is belt deviation, which can negatively impact production efficiency and safety. Despite previous research on improving belt edge detection accuracy, there is still a need to prioritize system efficiency and light-weight models for practical industrial applications. To meet this need, a new semantic segmentation network called FastBeltNet has been developed specifically for real-time and highly accurate conveyor belt edge line segmentation while maintaining a light-weight design. This network uses a dual-branch structure that combines a shallow spatial branch for extracting high-resolution spatial information with a context branch for deep contextual semantic information. It also incorporates the Ghost blocks, Downsample blocks, and Input Injection blocks to reduce computational load, increase processing frame rate, and enhance feature representation. Experimental results have shown that FastBeltNet has performed comparatively better than some existing methods in different real-world production settings, achieving promising performance metrics. Specifically, FastBeltNet achieves 80.49% mIoU accuracy, 99.89 FPS processing speed, 895 k parameters, 8.23 GFLOPs, and 430.95 MB peak CUDA memory use, effectively balancing accuracy and speed for industrial production.
带式输送机广泛应用于煤炭、钢铁、港口、电力、冶金和化工等多个行业。这些行业面临的一个主要挑战是皮带偏离,这会对生产效率和安全造成负面影响。尽管以前曾对提高皮带边缘检测精度进行过研究,但在实际工业应用中,仍然需要优先考虑系统效率和轻量级模型。为了满足这一需求,我们专门开发了一种名为 FastBeltNet 的新型语义分割网络,用于实时、高精度地分割传送带边缘线,同时保持轻量级设计。该网络采用双分支结构,将用于提取高分辨率空间信息的浅层空间分支与用于提取深层上下文语义信息的上下文分支相结合。它还结合了幽灵区块、下采样区块和输入注入区块,以减少计算负荷、提高处理帧频并增强特征表示。实验结果表明,在不同的实际生产环境中,FastBeltNet 的表现优于一些现有方法,取得了可喜的性能指标。具体来说,FastBeltNet 实现了 80.49% 的 mIoU 精确度、99.89 FPS 的处理速度、895 k 个参数、8.23 GFLOPs 和 430.95 MB 的峰值 CUDA 内存使用量,有效地平衡了工业生产中的精确度和速度。
{"title":"FastBeltNet: a dual-branch light-weight network for real-time conveyor belt edge detection","authors":"Xing Zhao, Minhao Zeng, Yanglin Dong, Gang Rao, Xianshan Huang, Xutao Mo","doi":"10.1007/s11554-024-01502-z","DOIUrl":"https://doi.org/10.1007/s11554-024-01502-z","url":null,"abstract":"<p>Belt conveyors are widely used in multiple industries, including coal, steel, port, power, metallurgy, and chemical, etc. One major challenge faced by these industries is belt deviation, which can negatively impact production efficiency and safety. Despite previous research on improving belt edge detection accuracy, there is still a need to prioritize system efficiency and light-weight models for practical industrial applications. To meet this need, a new semantic segmentation network called FastBeltNet has been developed specifically for real-time and highly accurate conveyor belt edge line segmentation while maintaining a light-weight design. This network uses a dual-branch structure that combines a shallow spatial branch for extracting high-resolution spatial information with a context branch for deep contextual semantic information. It also incorporates the Ghost blocks, Downsample blocks, and Input Injection blocks to reduce computational load, increase processing frame rate, and enhance feature representation. Experimental results have shown that FastBeltNet has performed comparatively better than some existing methods in different real-world production settings, achieving promising performance metrics. Specifically, FastBeltNet achieves 80.49% mIoU accuracy, 99.89 FPS processing speed, 895 k parameters, 8.23 GFLOPs, and 430.95 MB peak CUDA memory use, effectively balancing accuracy and speed for industrial production.</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"29 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141549048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}