Pub Date : 2024-05-30DOI: 10.1007/s11554-024-01482-0
Chiara Contoli, Lorenzo Calisti, Giacomo Di Fabrizio, Nicholas Kania, Alessandro Bogliolo, Emanuele Lattanzi
Graphics processing units and tensor processing units coupled with tiny machine learning models deployed on edge devices are revolutionizing computer vision and real-time tracking systems. However, edge devices pose tight resource and power constraints. This paper proposes a real-time vision-based virtual sensors paradigm to provide power-aware multi-object tracking at the edge while preserving tracking accuracy and enhancing privacy. We thoroughly describe our proposed system architecture, focusing on the Dynamic Inference Power Manager (DIPM). Our proposed DIPM is based on an adaptive frame rate to provide energy savings. We implement and deploy the virtual sensor and the DIPM on the NVIDIA Jetson Nano edge platform to prove the effectiveness and efficiency of the proposed solution. The results of extensive experiments demonstrate that the proposed virtual sensor can achieve a reduction in energy consumption of about 36% in videos with relatively low dynamicity and about 21% in more dynamic video content while simultaneously maintaining tracking accuracy within a range of less than 1.2%.
{"title":"A power-aware vision-based virtual sensor for real-time edge computing","authors":"Chiara Contoli, Lorenzo Calisti, Giacomo Di Fabrizio, Nicholas Kania, Alessandro Bogliolo, Emanuele Lattanzi","doi":"10.1007/s11554-024-01482-0","DOIUrl":"https://doi.org/10.1007/s11554-024-01482-0","url":null,"abstract":"<p>Graphics processing units and tensor processing units coupled with tiny machine learning models deployed on edge devices are revolutionizing computer vision and real-time tracking systems. However, edge devices pose tight resource and power constraints. This paper proposes a real-time vision-based virtual sensors paradigm to provide power-aware multi-object tracking at the edge while preserving tracking accuracy and enhancing privacy. We thoroughly describe our proposed system architecture, focusing on the Dynamic Inference Power Manager (DIPM). Our proposed DIPM is based on an adaptive frame rate to provide energy savings. We implement and deploy the virtual sensor and the DIPM on the NVIDIA Jetson Nano edge platform to prove the effectiveness and efficiency of the proposed solution. The results of extensive experiments demonstrate that the proposed virtual sensor can achieve a reduction in energy consumption of about 36% in videos with relatively low dynamicity and about 21% in more dynamic video content while simultaneously maintaining tracking accuracy within a range of less than 1.2%.</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2024-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141188555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-29DOI: 10.1007/s11554-024-01480-2
Shuai Hao, Wei Li, Xu Ma, Zhuo Tian
To address the problems of low precision and poor anti-noise performance of the standard route selection method for the small current grounding faults, a fault line selection approach based on YOLOv5 network that integrates attention modules and lightweight models is proposed. First, grounding system fault’s zero sequence current is utilized as the basis for fault discrimination. A wavelet transform is employed to translate the zero sequence current to a two-dimensional time–frequency map to create a dataset. However, due to the impact of the lack of training sets on the accuracy of line selection, we constructed a simulation model for small current grounding faults based on actual faults. By modifying the fault location, fault angle, and grounding resistance, we generated a simulation dataset to expand the training set. Second, to reduce the impact of noise on fault features during line selection, the SE channel attention model is used to fuse it into the backbone of the YOLOv5 detection network, significantly improving the network's accuracy in detecting fault areas. Finally, to achieve high line selection accuracy and good real-time performance in the detection network, the lightweight network model ShuffleNetV2 is introduced into the constructed network. ShuffleNetV2 reduces the number of network model parameters through its deep separable convolution, improving the real-time performance of line selection. The proposed algorithm in this study was compared with four other algorithms to verify its advantages. The experimental results reveal that the proposed method reached a line selection accuracy of 93.6% under the condition of a small amount of real data samples, while maintaining a line selection accuracy of over 90% in the presence of noise. When the image resolution is 640 × 640, its detection speed is 122fps, indicating good real-time performance.
{"title":"SSE-YOLOv5: a real-time fault line selection method based on lightweight modules and attention models","authors":"Shuai Hao, Wei Li, Xu Ma, Zhuo Tian","doi":"10.1007/s11554-024-01480-2","DOIUrl":"https://doi.org/10.1007/s11554-024-01480-2","url":null,"abstract":"<p>To address the problems of low precision and poor anti-noise performance of the standard route selection method for the small current grounding faults, a fault line selection approach based on YOLOv5 network that integrates attention modules and lightweight models is proposed. First, grounding system fault’s zero sequence current is utilized as the basis for fault discrimination. A wavelet transform is employed to translate the zero sequence current to a two-dimensional time–frequency map to create a dataset. However, due to the impact of the lack of training sets on the accuracy of line selection, we constructed a simulation model for small current grounding faults based on actual faults. By modifying the fault location, fault angle, and grounding resistance, we generated a simulation dataset to expand the training set. Second, to reduce the impact of noise on fault features during line selection, the SE channel attention model is used to fuse it into the backbone of the YOLOv5 detection network, significantly improving the network's accuracy in detecting fault areas. Finally, to achieve high line selection accuracy and good real-time performance in the detection network, the lightweight network model ShuffleNetV2 is introduced into the constructed network. ShuffleNetV2 reduces the number of network model parameters through its deep separable convolution, improving the real-time performance of line selection. The proposed algorithm in this study was compared with four other algorithms to verify its advantages. The experimental results reveal that the proposed method reached a line selection accuracy of 93.6% under the condition of a small amount of real data samples, while maintaining a line selection accuracy of over 90% in the presence of noise. When the image resolution is 640 × 640, its detection speed is 122fps, indicating good real-time performance.</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2024-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141188561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-28DOI: 10.1007/s11554-024-01479-9
Ming Yuan, Hao Meng, Junbao Wu
Instance segmentation has seen widespread development and significant progress across various fields. However, ship instance segmentation in marine environments faces challenges, including complex sea surface backgrounds, indistinct target features, and large-scale variations, making it incapable of achieving the desirable results. To overcome these challenges, this paper presents an adaptive multi-scale YOLO (AM YOLO) algorithm to improve instance segmentation performance for multi-scale ship targets in marine environments. Initially, the algorithm proposes a multi-grained adaptive feature enhancement module (MAEM) that utilizes grouped weighting and multiple adaptive mechanisms to enhance the extraction of details and improve the accuracy of multi-scale and global information. Subsequently, this study proposes a refine bidirectional feature pyramid network (RBiFPN) structure, which employs a cross-channel attention adaptive mechanism to integrate feature information and contextual details across different scales fully. Experiments on the challenging MS COCO dataset, COCO-boat dataset, and OVSD dataset show that compared to the baseline YOLOv5s, the AM YOLO model increases instance segmentation precision by 4.0%, 1.4%, and 2.3%, respectively. This improvement enhances the model’s generalization capabilities and achieves an optimal balance between accuracy and speed while maintaining real-time performance, thus broadening the model’s applicability in dynamic marine environments
{"title":"AM YOLO: adaptive multi-scale YOLO for ship instance segmentation","authors":"Ming Yuan, Hao Meng, Junbao Wu","doi":"10.1007/s11554-024-01479-9","DOIUrl":"https://doi.org/10.1007/s11554-024-01479-9","url":null,"abstract":"<p>Instance segmentation has seen widespread development and significant progress across various fields. However, ship instance segmentation in marine environments faces challenges, including complex sea surface backgrounds, indistinct target features, and large-scale variations, making it incapable of achieving the desirable results. To overcome these challenges, this paper presents an adaptive multi-scale YOLO (AM YOLO) algorithm to improve instance segmentation performance for multi-scale ship targets in marine environments. Initially, the algorithm proposes a multi-grained adaptive feature enhancement module (MAEM) that utilizes grouped weighting and multiple adaptive mechanisms to enhance the extraction of details and improve the accuracy of multi-scale and global information. Subsequently, this study proposes a refine bidirectional feature pyramid network (RBiFPN) structure, which employs a cross-channel attention adaptive mechanism to integrate feature information and contextual details across different scales fully. Experiments on the challenging MS COCO dataset, COCO-boat dataset, and OVSD dataset show that compared to the baseline YOLOv5s, the AM YOLO model increases instance segmentation precision by 4.0%, 1.4%, and 2.3%, respectively. This improvement enhances the model’s generalization capabilities and achieves an optimal balance between accuracy and speed while maintaining real-time performance, thus broadening the model’s applicability in dynamic marine environments</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2024-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141169923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-28DOI: 10.1007/s11554-024-01474-0
Irem Ulku
Lightweight multiscale-feature-fusion network (LMFFNet), a proficient real-time CNN architecture, adeptly achieves a balance between inference time and accuracy. Capturing the intricate details of precision agriculture target objects in remote sensing images requires deep SEM-B blocks in the LMFFNet model design. However, employing numerous SEM-B units leads to instability during backward gradient flow. This work proposes the novel residual-LMFFNet (ResLMFFNet) model for ensuring smooth gradient flow within SEM-B blocks. By incorporating residual connections, ResLMFFNet achieves improved accuracy without affecting the inference speed and the number of trainable parameters. The results of the experiments demonstrate that this architecture has achieved superior performance compared to other real-time architectures across diverse precision agriculture applications involving UAV and satellite images. Compared to LMFFNet, the ResLMFFNet architecture enhances the Jaccard Index values by 2.1% for tree detection, 1.4% for crop detection, and 11.2% for wheat-yellow rust detection. Achieving these remarkable accuracy levels involves maintaining almost identical inference time and computational complexity as the LMFFNet model. The source code is available on GitHub: https://github.com/iremulku/Semantic-Segmentation-in-Precision-Agriculture.
{"title":"ResLMFFNet: a real-time semantic segmentation network for precision agriculture","authors":"Irem Ulku","doi":"10.1007/s11554-024-01474-0","DOIUrl":"https://doi.org/10.1007/s11554-024-01474-0","url":null,"abstract":"<p>Lightweight multiscale-feature-fusion network (LMFFNet), a proficient real-time CNN architecture, adeptly achieves a balance between inference time and accuracy. Capturing the intricate details of precision agriculture target objects in remote sensing images requires deep SEM-B blocks in the LMFFNet model design. However, employing numerous SEM-B units leads to instability during backward gradient flow. This work proposes the novel residual-LMFFNet (ResLMFFNet) model for ensuring smooth gradient flow within SEM-B blocks. By incorporating residual connections, ResLMFFNet achieves improved accuracy without affecting the inference speed and the number of trainable parameters. The results of the experiments demonstrate that this architecture has achieved superior performance compared to other real-time architectures across diverse precision agriculture applications involving UAV and satellite images. Compared to LMFFNet, the ResLMFFNet architecture enhances the Jaccard Index values by 2.1% for tree detection, 1.4% for crop detection, and 11.2% for wheat-yellow rust detection. Achieving these remarkable accuracy levels involves maintaining almost identical inference time and computational complexity as the LMFFNet model. The source code is available on GitHub: https://github.com/iremulku/Semantic-Segmentation-in-Precision-Agriculture.</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2024-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141169755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-25DOI: 10.1007/s11554-024-01475-z
Midde Venkata Siva, E. P. Jayakumar
Noise is an unwanted element that has a negative impact on digital image quality. Salt-and-pepper noise is a type of noise that can appear at any point during the acquisition or transmission of images. It is essential to utilize proper restoration procedures to lessen the noise. This paper proposes a hardware-efficient VLSI architecture for the feedback decision-based trimmed mean filter that eliminates high-density salt-and-pepper noise in the images. The noisy pixels are identified and corrected by considering the neighbouring pixels in a 3 (times) 3 window corresponding to this noisy centre pixel. Either the mean of the horizontal and vertical noisy pixels or the mean of noise-free pixels in the window is computed. This mean value is fed back and the noisy centre pixel is updated immediately, such that this updated pixel value is used henceforth for correcting the remaining corrupted pixels. It is observed that this procedure helps in removing the noisy pixels effectively even if the noise density is high. Additionally, the designed VLSI architecture is efficient, since the algorithm does not require a sorting process and the computing resources required are less when compared to other state-of-the-art algorithms.
{"title":"HSMF: hardware-efficient single-stage feedback mean filter for high-density salt-and-pepper noise removal","authors":"Midde Venkata Siva, E. P. Jayakumar","doi":"10.1007/s11554-024-01475-z","DOIUrl":"https://doi.org/10.1007/s11554-024-01475-z","url":null,"abstract":"<p>Noise is an unwanted element that has a negative impact on digital image quality. Salt-and-pepper noise is a type of noise that can appear at any point during the acquisition or transmission of images. It is essential to utilize proper restoration procedures to lessen the noise. This paper proposes a hardware-efficient VLSI architecture for the feedback decision-based trimmed mean filter that eliminates high-density salt-and-pepper noise in the images. The noisy pixels are identified and corrected by considering the neighbouring pixels in a 3 <span>(times)</span> 3 window corresponding to this noisy centre pixel. Either the mean of the horizontal and vertical noisy pixels or the mean of noise-free pixels in the window is computed. This mean value is fed back and the noisy centre pixel is updated immediately, such that this updated pixel value is used henceforth for correcting the remaining corrupted pixels. It is observed that this procedure helps in removing the noisy pixels effectively even if the noise density is high. Additionally, the designed VLSI architecture is efficient, since the algorithm does not require a sorting process and the computing resources required are less when compared to other state-of-the-art algorithms.</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2024-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141152421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-19DOI: 10.1007/s11554-024-01463-3
Isis Bender, Gustavo Rehbein, Guilherme Correa, Luciano Agostini, Marcelo Porto
Digital videos are widely used on various platforms, including smartphones and other battery-powered mobile devices, which can suffer from energy consumption and performance constraints. Video encoders are responsible for compressing video data, enabling the use of this type of media by reducing the data rate while maintaining image quality. To promote the use of digital videos, the continuous improvement of digital video encoding standards is crucial. In this context, the Alliance for Open Media (AOM) developed the AV1 (AOMedia Video 1) format. However, the advanced tools and enhancements provided by AV1 come with a high computational cost. To address this issue, this paper presents the learning-based AV1 complexity controller (LACCO). The goal of LACCO is to dynamically optimize the encoding time of the AV1 encoder for HD 1080 and UHD 4K resolution videos. The controller achieves this goal by predicting the encoding time of future frames and classifying input videos according to their characteristics through the use of trained machine learning models. LACCO was integrated into the reference software of the AV1 encoder and its encoding time reduction ranges from 10 to 70%, with average error results ranging from 0.11 to 1.88 percentage points for HD 1080 resolution and from 0.14 to 3.33 percentage points for UHD 4K resolution.
{"title":"Adaptive complexity control for AV1 video encoder using machine learning","authors":"Isis Bender, Gustavo Rehbein, Guilherme Correa, Luciano Agostini, Marcelo Porto","doi":"10.1007/s11554-024-01463-3","DOIUrl":"https://doi.org/10.1007/s11554-024-01463-3","url":null,"abstract":"<p>Digital videos are widely used on various platforms, including smartphones and other battery-powered mobile devices, which can suffer from energy consumption and performance constraints. Video encoders are responsible for compressing video data, enabling the use of this type of media by reducing the data rate while maintaining image quality. To promote the use of digital videos, the continuous improvement of digital video encoding standards is crucial. In this context, the Alliance for Open Media (AOM) developed the AV1 (AOMedia Video 1) format. However, the advanced tools and enhancements provided by AV1 come with a high computational cost. To address this issue, this paper presents the learning-based AV1 complexity controller (LACCO). The goal of LACCO is to dynamically optimize the encoding time of the AV1 encoder for HD 1080 and UHD 4K resolution videos. The controller achieves this goal by predicting the encoding time of future frames and classifying input videos according to their characteristics through the use of trained machine learning models. LACCO was integrated into the reference software of the AV1 encoder and its encoding time reduction ranges from 10 to 70%, with average error results ranging from 0.11 to 1.88 percentage points for HD 1080 resolution and from 0.14 to 3.33 percentage points for UHD 4K resolution.</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2024-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141062474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-19DOI: 10.1007/s11554-024-01476-y
Chuying Guan, Jiaxuan Jiang, Zhong Wang
The COVID-19 pandemic has resulted in substantial global losses. In the post-epidemic era, public health needs still advocate the correct use of medical masks in confined spaces such as hospitals and indoors. This can effectively block the spread of infectious diseases through droplets, protect personal and public health, and improve the environmental sustainability and social resilience of cities. Therefore, detecting the correct wearing of masks is crucial. This study proposes an innovative three-class mask detection model based on the QARepVGG-YOLOv7 algorithm. The model replaces the convolution module in the backbone network with the QARepVGG module and uses the quantitative friendly structure and re-parameterization characteristics of the QARepVGG module to achieve high-precision and high-efficiency target detection. To validate the effectiveness of our proposed method, we created a mask dataset of 5095 pictures, including three categories: correct use of masks, incorrect use of masks, and individuals who do not wear masks. We also employed data augmentation techniques to further balance the dataset categories. We tested YOLOv5s, YOLOv6, YOLOv7, and YOLOv8s models on self-made datasets. The results show that the QARepVGG-YOLOv7 model has the best accuracy compared with the most advanced YOLO model. Our model achieves a significantly improved mAP value of 0.946 and a faster fps of 263.2, which is 90.8 fps higher than the YOLOv7 model and a 0.5% increase in map value over the YOLOv7 model. It is a high-precision and high-efficiency mask detection model.
{"title":"Fast detection of face masks in public places using QARepVGG-YOLOv7","authors":"Chuying Guan, Jiaxuan Jiang, Zhong Wang","doi":"10.1007/s11554-024-01476-y","DOIUrl":"https://doi.org/10.1007/s11554-024-01476-y","url":null,"abstract":"<p>The COVID-19 pandemic has resulted in substantial global losses. In the post-epidemic era, public health needs still advocate the correct use of medical masks in confined spaces such as hospitals and indoors. This can effectively block the spread of infectious diseases through droplets, protect personal and public health, and improve the environmental sustainability and social resilience of cities. Therefore, detecting the correct wearing of masks is crucial. This study proposes an innovative three-class mask detection model based on the QARepVGG-YOLOv7 algorithm. The model replaces the convolution module in the backbone network with the QARepVGG module and uses the quantitative friendly structure and re-parameterization characteristics of the QARepVGG module to achieve high-precision and high-efficiency target detection. To validate the effectiveness of our proposed method, we created a mask dataset of 5095 pictures, including three categories: correct use of masks, incorrect use of masks, and individuals who do not wear masks. We also employed data augmentation techniques to further balance the dataset categories. We tested YOLOv5s, YOLOv6, YOLOv7, and YOLOv8s models on self-made datasets. The results show that the QARepVGG-YOLOv7 model has the best accuracy compared with the most advanced YOLO model. Our model achieves a significantly improved mAP value of 0.946 and a faster fps of 263.2, which is 90.8 fps higher than the YOLOv7 model and a 0.5% increase in map value over the YOLOv7 model. It is a high-precision and high-efficiency mask detection model.</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2024-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141062466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-18DOI: 10.1007/s11554-024-01471-3
Fatma Belghith, Sonda Ben Jdidia, Bouthaina Abdallah, Nouri Masmoudi
The Versatile Video Coding (VVC) standard, released in July 2020, brings better coding performance than the High-Efficiency Video Coding (HEVC) thanks to the introduction of new coding tools. The transform module in the VVC standard incorporates the Multiple Transform Selection (MTS) concept, which relies on separable Discrete Cosine Transform (DCT)/Discrete Sine Transform (DST) kernels, and the recently introduced Low-Frequency Non-Separable Transform (LFNST). This latter serves as a secondary transform process, enhancing coding efficiency by further decorrelating residual samples. However, it introduces heightened computational complexity and substantial resource allocation demands, potentially complicating its hardware implementation. This paper introduces an effective and cost-efficient hardware architecture for LFNST. The proposed design employs additions and bit-shifting operations preserving hardware logic usage. The synthesis results for an Arria 10 10AX115N1F45E1SG FPGA device demonstrate that the logic cost is only of 26% of the available hardware resources. Additionally, the proposed design is working at 204 MHz and can process Ultra High Definition (UHD) 4K videos at up to 60 frames per second (fps).
{"title":"FPGA-based implementation of the VVC low-frequency non-separable transform","authors":"Fatma Belghith, Sonda Ben Jdidia, Bouthaina Abdallah, Nouri Masmoudi","doi":"10.1007/s11554-024-01471-3","DOIUrl":"https://doi.org/10.1007/s11554-024-01471-3","url":null,"abstract":"<p>The Versatile Video Coding (VVC) standard, released in July 2020, brings better coding performance than the High-Efficiency Video Coding (HEVC) thanks to the introduction of new coding tools. The transform module in the VVC standard incorporates the Multiple Transform Selection (MTS) concept, which relies on separable Discrete Cosine Transform (DCT)/Discrete Sine Transform (DST) kernels, and the recently introduced Low-Frequency Non-Separable Transform (LFNST). This latter serves as a secondary transform process, enhancing coding efficiency by further decorrelating residual samples. However, it introduces heightened computational complexity and substantial resource allocation demands, potentially complicating its hardware implementation. This paper introduces an effective and cost-efficient hardware architecture for LFNST. The proposed design employs additions and bit-shifting operations preserving hardware logic usage. The synthesis results for an Arria 10 10AX115N1F45E1SG FPGA device demonstrate that the logic cost is only of 26% of the available hardware resources. Additionally, the proposed design is working at 204 MHz and can process Ultra High Definition (UHD) 4K videos at up to 60 frames per second (fps).</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2024-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141062532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
To address the issues of low real-time performance and poor algorithm accuracy in detecting miner behavior underground, we propose a high-precision real-time detection method named DSY-YOLOv8n based on the characteristics of human body behavior. This method integrates DSConv into the backbone network to enhance multi-scale feature extraction. Additionally, SCConv-C2f replaces C2f modules, reducing redundant calculations and improving model training speed. The optimization strategy of the loss function is employed, and MPDIoU is used to improve the model’s accuracy and speed. The experimental results show: (1) With almost no increase in parameters and calculation amount, the mAP50 of the DSY-YOLOv8n model is 97.4%, which is a 3.2% great improvement over the YOLOv8n model. (2) Compared to Faster-R-CNN, YOLOv5s, and YOLOv7, DYS-YOLOv8n has improved the average accuracy to varying degrees while significantly increasing the detection speed. (3) DYS-YOLOv8n meets the real-time requirements for behavioral detection in mines with a detection speed of 243FPS. In summary, the DYS-YOLOv8n offers a real-time, efficient, and lightweight method for detecting miner behavior in mines, which has high practical value.
{"title":"A real-time detection for miner behavior via DYS-YOLOv8n model","authors":"Fangfang Xin, Xinyu He, Chaoxiu Yao, Shan Li, Biao Ma, Hongguang Pan","doi":"10.1007/s11554-024-01466-0","DOIUrl":"https://doi.org/10.1007/s11554-024-01466-0","url":null,"abstract":"<p>To address the issues of low real-time performance and poor algorithm accuracy in detecting miner behavior underground, we propose a high-precision real-time detection method named DSY-YOLOv8n based on the characteristics of human body behavior. This method integrates DSConv into the backbone network to enhance multi-scale feature extraction. Additionally, SCConv-C2f replaces C2f modules, reducing redundant calculations and improving model training speed. The optimization strategy of the loss function is employed, and MPDIoU is used to improve the model’s accuracy and speed. The experimental results show: (1) With almost no increase in parameters and calculation amount, the mAP50 of the DSY-YOLOv8n model is 97.4%, which is a 3.2% great improvement over the YOLOv8n model. (2) Compared to Faster-R-CNN, YOLOv5s, and YOLOv7, DYS-YOLOv8n has improved the average accuracy to varying degrees while significantly increasing the detection speed. (3) DYS-YOLOv8n meets the real-time requirements for behavioral detection in mines with a detection speed of 243FPS. In summary, the DYS-YOLOv8n offers a real-time, efficient, and lightweight method for detecting miner behavior in mines, which has high practical value.</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2024-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140930172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-12DOI: 10.1007/s11554-024-01472-2
Chunhui Bai, Lilian Zhang, Lutao Gao, Lin Peng, Peishan Li, Linnan Yang
In response to the fuzzy and complex boundaries of unstructured road scenes, as well as the high difficulty of segmentation, this paper uses BiSeNet as the benchmark model to improve the above situation and proposes a real-time segmentation model based on partial convolution. Using FasterNet based on partial convolution as the backbone network and improving it, adopting higher floating-point operations per second operators to improve the inference speed of the model; optimizing the model structure, removing inefficient spatial paths, and using shallow features of context paths to replace their roles, reducing model complexity; the Residual Atrous Spatial Pyramid Pooling Module is proposed to replace a single context embedding module in the original model, allowing better extraction of multi-scale context information and improving the accuracy of model segmentation; the feature fusion module is upgraded, the proposed Dual Attention Features Fusion Module is more helpful for the model to better understand image context through cross-level feature fusion. This paper proposes a model with a inference speed of 78.81 f/s, which meets the real-time requirements of unstructured road scene segmentation. Regarding accuracy metrics, the model in this paper excels with Mean Intersection over Union and Macro F1 at 72.63% and 83.20%, respectively, showing significant advantages over other advanced real-time segmentation models. Therefore, the real-time segmentation model based on partial convolution in this paper well meets the accuracy and speed required for segmentation tasks in complex and variable unstructured road scenes, and has reference value for the development of autonomous driving technology in unstructured road scenes. Code is available at https://github.com/BaiChunhui2001/Real-time-segmentation.
{"title":"Real-time segmentation algorithm of unstructured road scenes based on improved BiSeNet","authors":"Chunhui Bai, Lilian Zhang, Lutao Gao, Lin Peng, Peishan Li, Linnan Yang","doi":"10.1007/s11554-024-01472-2","DOIUrl":"https://doi.org/10.1007/s11554-024-01472-2","url":null,"abstract":"<p>In response to the fuzzy and complex boundaries of unstructured road scenes, as well as the high difficulty of segmentation, this paper uses BiSeNet as the benchmark model to improve the above situation and proposes a real-time segmentation model based on partial convolution. Using FasterNet based on partial convolution as the backbone network and improving it, adopting higher floating-point operations per second operators to improve the inference speed of the model; optimizing the model structure, removing inefficient spatial paths, and using shallow features of context paths to replace their roles, reducing model complexity; the Residual Atrous Spatial Pyramid Pooling Module is proposed to replace a single context embedding module in the original model, allowing better extraction of multi-scale context information and improving the accuracy of model segmentation; the feature fusion module is upgraded, the proposed Dual Attention Features Fusion Module is more helpful for the model to better understand image context through cross-level feature fusion. This paper proposes a model with a inference speed of 78.81 f/s, which meets the real-time requirements of unstructured road scene segmentation. Regarding accuracy metrics, the model in this paper excels with Mean Intersection over Union and Macro F1 at 72.63% and 83.20%, respectively, showing significant advantages over other advanced real-time segmentation models. Therefore, the real-time segmentation model based on partial convolution in this paper well meets the accuracy and speed required for segmentation tasks in complex and variable unstructured road scenes, and has reference value for the development of autonomous driving technology in unstructured road scenes. Code is available at https://github.com/BaiChunhui2001/Real-time-segmentation.</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2024-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140941977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}