Pub Date : 2024-06-02DOI: 10.1007/s11554-024-01488-8
Priya Singh, Rajalakshmi Krishnamurthi
In farming, clashes between humans and animals create significant challenges, risking crop yields, human well-being, and resource depletion. Farmers use traditional methods like electric fences to protect their fields but these can harm essential animals that maintain a balanced ecosystem. To address these fundamental challenges, our research presents a fresh solution harnessing the power of the Internet of Things (IoT) and deep learning. In this paper, we developed a monitoring system that takes advantage of ESP32-CAM and Raspberry Pi in collaboration with optimised YOLOv8 model. Our objective is to detect and classify objects such as animals or humans that roam around the field, providing real-time notification to the farmers by incorporating firebase cloud messaging (FCM). Initially, we have employed ultrasonic sensors that will detect any intruder movement, triggering the camera to capture an image. Further, the captured image is transmitted to a server equipped with an object detection model. Afterwards, the processed image is forwarded to FCM, responsible for managing the image and sending notifications to the farmer through an Android application. Our optimised YOLOv8 model attains an exceptional precision of 97%, recall of 96%, and accuracy of 96%. Once we achieved this optimal outcome, we integrated the model with our IoT infrastructure. This study emphasizes the effectiveness of low-power IoT devices, LoRa devices, and object detection techniques in delivering strong security solutions to the agriculture industry. These technologies hold the potential to significantly decrease crop damage while enhancing safety within the agricultural field and contribute towards wildlife conservation.
{"title":"IoT-based real-time object detection system for crop protection and agriculture field security","authors":"Priya Singh, Rajalakshmi Krishnamurthi","doi":"10.1007/s11554-024-01488-8","DOIUrl":"https://doi.org/10.1007/s11554-024-01488-8","url":null,"abstract":"<p>In farming, clashes between humans and animals create significant challenges, risking crop yields, human well-being, and resource depletion. Farmers use traditional methods like electric fences to protect their fields but these can harm essential animals that maintain a balanced ecosystem. To address these fundamental challenges, our research presents a fresh solution harnessing the power of the Internet of Things (IoT) and deep learning. In this paper, we developed a monitoring system that takes advantage of ESP32-CAM and Raspberry Pi in collaboration with optimised YOLOv8 model. Our objective is to detect and classify objects such as animals or humans that roam around the field, providing real-time notification to the farmers by incorporating firebase cloud messaging (FCM). Initially, we have employed ultrasonic sensors that will detect any intruder movement, triggering the camera to capture an image. Further, the captured image is transmitted to a server equipped with an object detection model. Afterwards, the processed image is forwarded to FCM, responsible for managing the image and sending notifications to the farmer through an Android application. Our optimised YOLOv8 model attains an exceptional precision of 97%, recall of 96%, and accuracy of 96%. Once we achieved this optimal outcome, we integrated the model with our IoT infrastructure. This study emphasizes the effectiveness of low-power IoT devices, LoRa devices, and object detection techniques in delivering strong security solutions to the agriculture industry. These technologies hold the potential to significantly decrease crop damage while enhancing safety within the agricultural field and contribute towards wildlife conservation.</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"102 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141252232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-30DOI: 10.1007/s11554-024-01482-0
Chiara Contoli, Lorenzo Calisti, Giacomo Di Fabrizio, Nicholas Kania, Alessandro Bogliolo, Emanuele Lattanzi
Graphics processing units and tensor processing units coupled with tiny machine learning models deployed on edge devices are revolutionizing computer vision and real-time tracking systems. However, edge devices pose tight resource and power constraints. This paper proposes a real-time vision-based virtual sensors paradigm to provide power-aware multi-object tracking at the edge while preserving tracking accuracy and enhancing privacy. We thoroughly describe our proposed system architecture, focusing on the Dynamic Inference Power Manager (DIPM). Our proposed DIPM is based on an adaptive frame rate to provide energy savings. We implement and deploy the virtual sensor and the DIPM on the NVIDIA Jetson Nano edge platform to prove the effectiveness and efficiency of the proposed solution. The results of extensive experiments demonstrate that the proposed virtual sensor can achieve a reduction in energy consumption of about 36% in videos with relatively low dynamicity and about 21% in more dynamic video content while simultaneously maintaining tracking accuracy within a range of less than 1.2%.
{"title":"A power-aware vision-based virtual sensor for real-time edge computing","authors":"Chiara Contoli, Lorenzo Calisti, Giacomo Di Fabrizio, Nicholas Kania, Alessandro Bogliolo, Emanuele Lattanzi","doi":"10.1007/s11554-024-01482-0","DOIUrl":"https://doi.org/10.1007/s11554-024-01482-0","url":null,"abstract":"<p>Graphics processing units and tensor processing units coupled with tiny machine learning models deployed on edge devices are revolutionizing computer vision and real-time tracking systems. However, edge devices pose tight resource and power constraints. This paper proposes a real-time vision-based virtual sensors paradigm to provide power-aware multi-object tracking at the edge while preserving tracking accuracy and enhancing privacy. We thoroughly describe our proposed system architecture, focusing on the Dynamic Inference Power Manager (DIPM). Our proposed DIPM is based on an adaptive frame rate to provide energy savings. We implement and deploy the virtual sensor and the DIPM on the NVIDIA Jetson Nano edge platform to prove the effectiveness and efficiency of the proposed solution. The results of extensive experiments demonstrate that the proposed virtual sensor can achieve a reduction in energy consumption of about 36% in videos with relatively low dynamicity and about 21% in more dynamic video content while simultaneously maintaining tracking accuracy within a range of less than 1.2%.</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"13 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141188555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-29DOI: 10.1007/s11554-024-01480-2
Shuai Hao, Wei Li, Xu Ma, Zhuo Tian
To address the problems of low precision and poor anti-noise performance of the standard route selection method for the small current grounding faults, a fault line selection approach based on YOLOv5 network that integrates attention modules and lightweight models is proposed. First, grounding system fault’s zero sequence current is utilized as the basis for fault discrimination. A wavelet transform is employed to translate the zero sequence current to a two-dimensional time–frequency map to create a dataset. However, due to the impact of the lack of training sets on the accuracy of line selection, we constructed a simulation model for small current grounding faults based on actual faults. By modifying the fault location, fault angle, and grounding resistance, we generated a simulation dataset to expand the training set. Second, to reduce the impact of noise on fault features during line selection, the SE channel attention model is used to fuse it into the backbone of the YOLOv5 detection network, significantly improving the network's accuracy in detecting fault areas. Finally, to achieve high line selection accuracy and good real-time performance in the detection network, the lightweight network model ShuffleNetV2 is introduced into the constructed network. ShuffleNetV2 reduces the number of network model parameters through its deep separable convolution, improving the real-time performance of line selection. The proposed algorithm in this study was compared with four other algorithms to verify its advantages. The experimental results reveal that the proposed method reached a line selection accuracy of 93.6% under the condition of a small amount of real data samples, while maintaining a line selection accuracy of over 90% in the presence of noise. When the image resolution is 640 × 640, its detection speed is 122fps, indicating good real-time performance.
{"title":"SSE-YOLOv5: a real-time fault line selection method based on lightweight modules and attention models","authors":"Shuai Hao, Wei Li, Xu Ma, Zhuo Tian","doi":"10.1007/s11554-024-01480-2","DOIUrl":"https://doi.org/10.1007/s11554-024-01480-2","url":null,"abstract":"<p>To address the problems of low precision and poor anti-noise performance of the standard route selection method for the small current grounding faults, a fault line selection approach based on YOLOv5 network that integrates attention modules and lightweight models is proposed. First, grounding system fault’s zero sequence current is utilized as the basis for fault discrimination. A wavelet transform is employed to translate the zero sequence current to a two-dimensional time–frequency map to create a dataset. However, due to the impact of the lack of training sets on the accuracy of line selection, we constructed a simulation model for small current grounding faults based on actual faults. By modifying the fault location, fault angle, and grounding resistance, we generated a simulation dataset to expand the training set. Second, to reduce the impact of noise on fault features during line selection, the SE channel attention model is used to fuse it into the backbone of the YOLOv5 detection network, significantly improving the network's accuracy in detecting fault areas. Finally, to achieve high line selection accuracy and good real-time performance in the detection network, the lightweight network model ShuffleNetV2 is introduced into the constructed network. ShuffleNetV2 reduces the number of network model parameters through its deep separable convolution, improving the real-time performance of line selection. The proposed algorithm in this study was compared with four other algorithms to verify its advantages. The experimental results reveal that the proposed method reached a line selection accuracy of 93.6% under the condition of a small amount of real data samples, while maintaining a line selection accuracy of over 90% in the presence of noise. When the image resolution is 640 × 640, its detection speed is 122fps, indicating good real-time performance.</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"24 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141188561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-28DOI: 10.1007/s11554-024-01479-9
Ming Yuan, Hao Meng, Junbao Wu
Instance segmentation has seen widespread development and significant progress across various fields. However, ship instance segmentation in marine environments faces challenges, including complex sea surface backgrounds, indistinct target features, and large-scale variations, making it incapable of achieving the desirable results. To overcome these challenges, this paper presents an adaptive multi-scale YOLO (AM YOLO) algorithm to improve instance segmentation performance for multi-scale ship targets in marine environments. Initially, the algorithm proposes a multi-grained adaptive feature enhancement module (MAEM) that utilizes grouped weighting and multiple adaptive mechanisms to enhance the extraction of details and improve the accuracy of multi-scale and global information. Subsequently, this study proposes a refine bidirectional feature pyramid network (RBiFPN) structure, which employs a cross-channel attention adaptive mechanism to integrate feature information and contextual details across different scales fully. Experiments on the challenging MS COCO dataset, COCO-boat dataset, and OVSD dataset show that compared to the baseline YOLOv5s, the AM YOLO model increases instance segmentation precision by 4.0%, 1.4%, and 2.3%, respectively. This improvement enhances the model’s generalization capabilities and achieves an optimal balance between accuracy and speed while maintaining real-time performance, thus broadening the model’s applicability in dynamic marine environments
{"title":"AM YOLO: adaptive multi-scale YOLO for ship instance segmentation","authors":"Ming Yuan, Hao Meng, Junbao Wu","doi":"10.1007/s11554-024-01479-9","DOIUrl":"https://doi.org/10.1007/s11554-024-01479-9","url":null,"abstract":"<p>Instance segmentation has seen widespread development and significant progress across various fields. However, ship instance segmentation in marine environments faces challenges, including complex sea surface backgrounds, indistinct target features, and large-scale variations, making it incapable of achieving the desirable results. To overcome these challenges, this paper presents an adaptive multi-scale YOLO (AM YOLO) algorithm to improve instance segmentation performance for multi-scale ship targets in marine environments. Initially, the algorithm proposes a multi-grained adaptive feature enhancement module (MAEM) that utilizes grouped weighting and multiple adaptive mechanisms to enhance the extraction of details and improve the accuracy of multi-scale and global information. Subsequently, this study proposes a refine bidirectional feature pyramid network (RBiFPN) structure, which employs a cross-channel attention adaptive mechanism to integrate feature information and contextual details across different scales fully. Experiments on the challenging MS COCO dataset, COCO-boat dataset, and OVSD dataset show that compared to the baseline YOLOv5s, the AM YOLO model increases instance segmentation precision by 4.0%, 1.4%, and 2.3%, respectively. This improvement enhances the model’s generalization capabilities and achieves an optimal balance between accuracy and speed while maintaining real-time performance, thus broadening the model’s applicability in dynamic marine environments</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"48 3 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141169923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-28DOI: 10.1007/s11554-024-01474-0
Irem Ulku
Lightweight multiscale-feature-fusion network (LMFFNet), a proficient real-time CNN architecture, adeptly achieves a balance between inference time and accuracy. Capturing the intricate details of precision agriculture target objects in remote sensing images requires deep SEM-B blocks in the LMFFNet model design. However, employing numerous SEM-B units leads to instability during backward gradient flow. This work proposes the novel residual-LMFFNet (ResLMFFNet) model for ensuring smooth gradient flow within SEM-B blocks. By incorporating residual connections, ResLMFFNet achieves improved accuracy without affecting the inference speed and the number of trainable parameters. The results of the experiments demonstrate that this architecture has achieved superior performance compared to other real-time architectures across diverse precision agriculture applications involving UAV and satellite images. Compared to LMFFNet, the ResLMFFNet architecture enhances the Jaccard Index values by 2.1% for tree detection, 1.4% for crop detection, and 11.2% for wheat-yellow rust detection. Achieving these remarkable accuracy levels involves maintaining almost identical inference time and computational complexity as the LMFFNet model. The source code is available on GitHub: https://github.com/iremulku/Semantic-Segmentation-in-Precision-Agriculture.
{"title":"ResLMFFNet: a real-time semantic segmentation network for precision agriculture","authors":"Irem Ulku","doi":"10.1007/s11554-024-01474-0","DOIUrl":"https://doi.org/10.1007/s11554-024-01474-0","url":null,"abstract":"<p>Lightweight multiscale-feature-fusion network (LMFFNet), a proficient real-time CNN architecture, adeptly achieves a balance between inference time and accuracy. Capturing the intricate details of precision agriculture target objects in remote sensing images requires deep SEM-B blocks in the LMFFNet model design. However, employing numerous SEM-B units leads to instability during backward gradient flow. This work proposes the novel residual-LMFFNet (ResLMFFNet) model for ensuring smooth gradient flow within SEM-B blocks. By incorporating residual connections, ResLMFFNet achieves improved accuracy without affecting the inference speed and the number of trainable parameters. The results of the experiments demonstrate that this architecture has achieved superior performance compared to other real-time architectures across diverse precision agriculture applications involving UAV and satellite images. Compared to LMFFNet, the ResLMFFNet architecture enhances the Jaccard Index values by 2.1% for tree detection, 1.4% for crop detection, and 11.2% for wheat-yellow rust detection. Achieving these remarkable accuracy levels involves maintaining almost identical inference time and computational complexity as the LMFFNet model. The source code is available on GitHub: https://github.com/iremulku/Semantic-Segmentation-in-Precision-Agriculture.</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"63 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141169755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-25DOI: 10.1007/s11554-024-01475-z
Midde Venkata Siva, E. P. Jayakumar
Noise is an unwanted element that has a negative impact on digital image quality. Salt-and-pepper noise is a type of noise that can appear at any point during the acquisition or transmission of images. It is essential to utilize proper restoration procedures to lessen the noise. This paper proposes a hardware-efficient VLSI architecture for the feedback decision-based trimmed mean filter that eliminates high-density salt-and-pepper noise in the images. The noisy pixels are identified and corrected by considering the neighbouring pixels in a 3 (times) 3 window corresponding to this noisy centre pixel. Either the mean of the horizontal and vertical noisy pixels or the mean of noise-free pixels in the window is computed. This mean value is fed back and the noisy centre pixel is updated immediately, such that this updated pixel value is used henceforth for correcting the remaining corrupted pixels. It is observed that this procedure helps in removing the noisy pixels effectively even if the noise density is high. Additionally, the designed VLSI architecture is efficient, since the algorithm does not require a sorting process and the computing resources required are less when compared to other state-of-the-art algorithms.
{"title":"HSMF: hardware-efficient single-stage feedback mean filter for high-density salt-and-pepper noise removal","authors":"Midde Venkata Siva, E. P. Jayakumar","doi":"10.1007/s11554-024-01475-z","DOIUrl":"https://doi.org/10.1007/s11554-024-01475-z","url":null,"abstract":"<p>Noise is an unwanted element that has a negative impact on digital image quality. Salt-and-pepper noise is a type of noise that can appear at any point during the acquisition or transmission of images. It is essential to utilize proper restoration procedures to lessen the noise. This paper proposes a hardware-efficient VLSI architecture for the feedback decision-based trimmed mean filter that eliminates high-density salt-and-pepper noise in the images. The noisy pixels are identified and corrected by considering the neighbouring pixels in a 3 <span>(times)</span> 3 window corresponding to this noisy centre pixel. Either the mean of the horizontal and vertical noisy pixels or the mean of noise-free pixels in the window is computed. This mean value is fed back and the noisy centre pixel is updated immediately, such that this updated pixel value is used henceforth for correcting the remaining corrupted pixels. It is observed that this procedure helps in removing the noisy pixels effectively even if the noise density is high. Additionally, the designed VLSI architecture is efficient, since the algorithm does not require a sorting process and the computing resources required are less when compared to other state-of-the-art algorithms.</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"14 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141152421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-19DOI: 10.1007/s11554-024-01463-3
Isis Bender, Gustavo Rehbein, Guilherme Correa, Luciano Agostini, Marcelo Porto
Digital videos are widely used on various platforms, including smartphones and other battery-powered mobile devices, which can suffer from energy consumption and performance constraints. Video encoders are responsible for compressing video data, enabling the use of this type of media by reducing the data rate while maintaining image quality. To promote the use of digital videos, the continuous improvement of digital video encoding standards is crucial. In this context, the Alliance for Open Media (AOM) developed the AV1 (AOMedia Video 1) format. However, the advanced tools and enhancements provided by AV1 come with a high computational cost. To address this issue, this paper presents the learning-based AV1 complexity controller (LACCO). The goal of LACCO is to dynamically optimize the encoding time of the AV1 encoder for HD 1080 and UHD 4K resolution videos. The controller achieves this goal by predicting the encoding time of future frames and classifying input videos according to their characteristics through the use of trained machine learning models. LACCO was integrated into the reference software of the AV1 encoder and its encoding time reduction ranges from 10 to 70%, with average error results ranging from 0.11 to 1.88 percentage points for HD 1080 resolution and from 0.14 to 3.33 percentage points for UHD 4K resolution.
{"title":"Adaptive complexity control for AV1 video encoder using machine learning","authors":"Isis Bender, Gustavo Rehbein, Guilherme Correa, Luciano Agostini, Marcelo Porto","doi":"10.1007/s11554-024-01463-3","DOIUrl":"https://doi.org/10.1007/s11554-024-01463-3","url":null,"abstract":"<p>Digital videos are widely used on various platforms, including smartphones and other battery-powered mobile devices, which can suffer from energy consumption and performance constraints. Video encoders are responsible for compressing video data, enabling the use of this type of media by reducing the data rate while maintaining image quality. To promote the use of digital videos, the continuous improvement of digital video encoding standards is crucial. In this context, the Alliance for Open Media (AOM) developed the AV1 (AOMedia Video 1) format. However, the advanced tools and enhancements provided by AV1 come with a high computational cost. To address this issue, this paper presents the learning-based AV1 complexity controller (LACCO). The goal of LACCO is to dynamically optimize the encoding time of the AV1 encoder for HD 1080 and UHD 4K resolution videos. The controller achieves this goal by predicting the encoding time of future frames and classifying input videos according to their characteristics through the use of trained machine learning models. LACCO was integrated into the reference software of the AV1 encoder and its encoding time reduction ranges from 10 to 70%, with average error results ranging from 0.11 to 1.88 percentage points for HD 1080 resolution and from 0.14 to 3.33 percentage points for UHD 4K resolution.</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"55 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141062474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-19DOI: 10.1007/s11554-024-01476-y
Chuying Guan, Jiaxuan Jiang, Zhong Wang
The COVID-19 pandemic has resulted in substantial global losses. In the post-epidemic era, public health needs still advocate the correct use of medical masks in confined spaces such as hospitals and indoors. This can effectively block the spread of infectious diseases through droplets, protect personal and public health, and improve the environmental sustainability and social resilience of cities. Therefore, detecting the correct wearing of masks is crucial. This study proposes an innovative three-class mask detection model based on the QARepVGG-YOLOv7 algorithm. The model replaces the convolution module in the backbone network with the QARepVGG module and uses the quantitative friendly structure and re-parameterization characteristics of the QARepVGG module to achieve high-precision and high-efficiency target detection. To validate the effectiveness of our proposed method, we created a mask dataset of 5095 pictures, including three categories: correct use of masks, incorrect use of masks, and individuals who do not wear masks. We also employed data augmentation techniques to further balance the dataset categories. We tested YOLOv5s, YOLOv6, YOLOv7, and YOLOv8s models on self-made datasets. The results show that the QARepVGG-YOLOv7 model has the best accuracy compared with the most advanced YOLO model. Our model achieves a significantly improved mAP value of 0.946 and a faster fps of 263.2, which is 90.8 fps higher than the YOLOv7 model and a 0.5% increase in map value over the YOLOv7 model. It is a high-precision and high-efficiency mask detection model.
{"title":"Fast detection of face masks in public places using QARepVGG-YOLOv7","authors":"Chuying Guan, Jiaxuan Jiang, Zhong Wang","doi":"10.1007/s11554-024-01476-y","DOIUrl":"https://doi.org/10.1007/s11554-024-01476-y","url":null,"abstract":"<p>The COVID-19 pandemic has resulted in substantial global losses. In the post-epidemic era, public health needs still advocate the correct use of medical masks in confined spaces such as hospitals and indoors. This can effectively block the spread of infectious diseases through droplets, protect personal and public health, and improve the environmental sustainability and social resilience of cities. Therefore, detecting the correct wearing of masks is crucial. This study proposes an innovative three-class mask detection model based on the QARepVGG-YOLOv7 algorithm. The model replaces the convolution module in the backbone network with the QARepVGG module and uses the quantitative friendly structure and re-parameterization characteristics of the QARepVGG module to achieve high-precision and high-efficiency target detection. To validate the effectiveness of our proposed method, we created a mask dataset of 5095 pictures, including three categories: correct use of masks, incorrect use of masks, and individuals who do not wear masks. We also employed data augmentation techniques to further balance the dataset categories. We tested YOLOv5s, YOLOv6, YOLOv7, and YOLOv8s models on self-made datasets. The results show that the QARepVGG-YOLOv7 model has the best accuracy compared with the most advanced YOLO model. Our model achieves a significantly improved mAP value of 0.946 and a faster fps of 263.2, which is 90.8 fps higher than the YOLOv7 model and a 0.5% increase in map value over the YOLOv7 model. It is a high-precision and high-efficiency mask detection model.</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"55 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141062466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-18DOI: 10.1007/s11554-024-01471-3
Fatma Belghith, Sonda Ben Jdidia, Bouthaina Abdallah, Nouri Masmoudi
The Versatile Video Coding (VVC) standard, released in July 2020, brings better coding performance than the High-Efficiency Video Coding (HEVC) thanks to the introduction of new coding tools. The transform module in the VVC standard incorporates the Multiple Transform Selection (MTS) concept, which relies on separable Discrete Cosine Transform (DCT)/Discrete Sine Transform (DST) kernels, and the recently introduced Low-Frequency Non-Separable Transform (LFNST). This latter serves as a secondary transform process, enhancing coding efficiency by further decorrelating residual samples. However, it introduces heightened computational complexity and substantial resource allocation demands, potentially complicating its hardware implementation. This paper introduces an effective and cost-efficient hardware architecture for LFNST. The proposed design employs additions and bit-shifting operations preserving hardware logic usage. The synthesis results for an Arria 10 10AX115N1F45E1SG FPGA device demonstrate that the logic cost is only of 26% of the available hardware resources. Additionally, the proposed design is working at 204 MHz and can process Ultra High Definition (UHD) 4K videos at up to 60 frames per second (fps).
{"title":"FPGA-based implementation of the VVC low-frequency non-separable transform","authors":"Fatma Belghith, Sonda Ben Jdidia, Bouthaina Abdallah, Nouri Masmoudi","doi":"10.1007/s11554-024-01471-3","DOIUrl":"https://doi.org/10.1007/s11554-024-01471-3","url":null,"abstract":"<p>The Versatile Video Coding (VVC) standard, released in July 2020, brings better coding performance than the High-Efficiency Video Coding (HEVC) thanks to the introduction of new coding tools. The transform module in the VVC standard incorporates the Multiple Transform Selection (MTS) concept, which relies on separable Discrete Cosine Transform (DCT)/Discrete Sine Transform (DST) kernels, and the recently introduced Low-Frequency Non-Separable Transform (LFNST). This latter serves as a secondary transform process, enhancing coding efficiency by further decorrelating residual samples. However, it introduces heightened computational complexity and substantial resource allocation demands, potentially complicating its hardware implementation. This paper introduces an effective and cost-efficient hardware architecture for LFNST. The proposed design employs additions and bit-shifting operations preserving hardware logic usage. The synthesis results for an Arria 10 10AX115N1F45E1SG FPGA device demonstrate that the logic cost is only of 26% of the available hardware resources. Additionally, the proposed design is working at 204 MHz and can process Ultra High Definition (UHD) 4K videos at up to 60 frames per second (fps).</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"12 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141062532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
To address the issues of low real-time performance and poor algorithm accuracy in detecting miner behavior underground, we propose a high-precision real-time detection method named DSY-YOLOv8n based on the characteristics of human body behavior. This method integrates DSConv into the backbone network to enhance multi-scale feature extraction. Additionally, SCConv-C2f replaces C2f modules, reducing redundant calculations and improving model training speed. The optimization strategy of the loss function is employed, and MPDIoU is used to improve the model’s accuracy and speed. The experimental results show: (1) With almost no increase in parameters and calculation amount, the mAP50 of the DSY-YOLOv8n model is 97.4%, which is a 3.2% great improvement over the YOLOv8n model. (2) Compared to Faster-R-CNN, YOLOv5s, and YOLOv7, DYS-YOLOv8n has improved the average accuracy to varying degrees while significantly increasing the detection speed. (3) DYS-YOLOv8n meets the real-time requirements for behavioral detection in mines with a detection speed of 243FPS. In summary, the DYS-YOLOv8n offers a real-time, efficient, and lightweight method for detecting miner behavior in mines, which has high practical value.
{"title":"A real-time detection for miner behavior via DYS-YOLOv8n model","authors":"Fangfang Xin, Xinyu He, Chaoxiu Yao, Shan Li, Biao Ma, Hongguang Pan","doi":"10.1007/s11554-024-01466-0","DOIUrl":"https://doi.org/10.1007/s11554-024-01466-0","url":null,"abstract":"<p>To address the issues of low real-time performance and poor algorithm accuracy in detecting miner behavior underground, we propose a high-precision real-time detection method named DSY-YOLOv8n based on the characteristics of human body behavior. This method integrates DSConv into the backbone network to enhance multi-scale feature extraction. Additionally, SCConv-C2f replaces C2f modules, reducing redundant calculations and improving model training speed. The optimization strategy of the loss function is employed, and MPDIoU is used to improve the model’s accuracy and speed. The experimental results show: (1) With almost no increase in parameters and calculation amount, the mAP50 of the DSY-YOLOv8n model is 97.4%, which is a 3.2% great improvement over the YOLOv8n model. (2) Compared to Faster-R-CNN, YOLOv5s, and YOLOv7, DYS-YOLOv8n has improved the average accuracy to varying degrees while significantly increasing the detection speed. (3) DYS-YOLOv8n meets the real-time requirements for behavioral detection in mines with a detection speed of 243FPS. In summary, the DYS-YOLOv8n offers a real-time, efficient, and lightweight method for detecting miner behavior in mines, which has high practical value.</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"11 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140930172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}