Journal of Real-Time Image Processing最新文献_第7页

IoT-based real-time object detection system for crop protection and agriculture field security 基于物联网的作物保护和农田安全实时物体检测系统

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Real-Time Image Processing

Pub Date : 2024-06-02 DOI: 10.1007/s11554-024-01488-8

Priya Singh, Rajalakshmi Krishnamurthi

In farming, clashes between humans and animals create significant challenges, risking crop yields, human well-being, and resource depletion. Farmers use traditional methods like electric fences to protect their fields but these can harm essential animals that maintain a balanced ecosystem. To address these fundamental challenges, our research presents a fresh solution harnessing the power of the Internet of Things (IoT) and deep learning. In this paper, we developed a monitoring system that takes advantage of ESP32-CAM and Raspberry Pi in collaboration with optimised YOLOv8 model. Our objective is to detect and classify objects such as animals or humans that roam around the field, providing real-time notification to the farmers by incorporating firebase cloud messaging (FCM). Initially, we have employed ultrasonic sensors that will detect any intruder movement, triggering the camera to capture an image. Further, the captured image is transmitted to a server equipped with an object detection model. Afterwards, the processed image is forwarded to FCM, responsible for managing the image and sending notifications to the farmer through an Android application. Our optimised YOLOv8 model attains an exceptional precision of 97%, recall of 96%, and accuracy of 96%. Once we achieved this optimal outcome, we integrated the model with our IoT infrastructure. This study emphasizes the effectiveness of low-power IoT devices, LoRa devices, and object detection techniques in delivering strong security solutions to the agriculture industry. These technologies hold the potential to significantly decrease crop damage while enhancing safety within the agricultural field and contribute towards wildlife conservation.

在农业生产中，人与动物之间的冲突带来了巨大挑战，危及作物产量、人类福祉和资源枯竭。农民们使用电网等传统方法来保护他们的田地，但这些方法可能会伤害维持生态系统平衡的重要动物。为了应对这些基本挑战，我们的研究提出了一种全新的解决方案，利用物联网（IoT）和深度学习的力量。在本文中，我们利用 ESP32-CAM 和树莓派（Raspberry Pi）与优化的 YOLOv8 模型合作开发了一个监控系统。我们的目标是检测和分类在田间游荡的动物或人类等物体，并通过火基云消息（FCM）向农民提供实时通知。最初，我们采用超声波传感器来检测任何入侵者的移动，并触发摄像头捕捉图像。然后，捕捉到的图像会被传输到装有物体检测模型的服务器上。之后，经过处理的图像被传送到 FCM，由其负责管理图像，并通过安卓应用程序向农民发送通知。我们优化后的 YOLOv8 模型精确度高达 97%，召回率高达 96%，准确率高达 96%。在取得最佳结果后，我们将该模型与我们的物联网基础设施进行了整合。这项研究强调了低功耗物联网设备、LoRa 设备和物体检测技术在为农业行业提供强大的安全解决方案方面的有效性。这些技术有可能在提高农田安全的同时，大幅减少农作物损失，并为保护野生动物做出贡献。

{"title":"IoT-based real-time object detection system for crop protection and agriculture field security","authors":"Priya Singh, Rajalakshmi Krishnamurthi","doi":"10.1007/s11554-024-01488-8","DOIUrl":"https://doi.org/10.1007/s11554-024-01488-8","url":null,"abstract":"In farming, clashes between humans and animals create significant challenges, risking crop yields, human well-being, and resource depletion. Farmers use traditional methods like electric fences to protect their fields but these can harm essential animals that maintain a balanced ecosystem. To address these fundamental challenges, our research presents a fresh solution harnessing the power of the Internet of Things (IoT) and deep learning. In this paper, we developed a monitoring system that takes advantage of ESP32-CAM and Raspberry Pi in collaboration with optimised YOLOv8 model. Our objective is to detect and classify objects such as animals or humans that roam around the field, providing real-time notification to the farmers by incorporating firebase cloud messaging (FCM). Initially, we have employed ultrasonic sensors that will detect any intruder movement, triggering the camera to capture an image. Further, the captured image is transmitted to a server equipped with an object detection model. Afterwards, the processed image is forwarded to FCM, responsible for managing the image and sending notifications to the farmer through an Android application. Our optimised YOLOv8 model attains an exceptional precision of 97%, recall of 96%, and accuracy of 96%. Once we achieved this optimal outcome, we integrated the model with our IoT infrastructure. This study emphasizes the effectiveness of low-power IoT devices, LoRa devices, and object detection techniques in delivering strong security solutions to the agriculture industry. These technologies hold the potential to significantly decrease crop damage while enhancing safety within the agricultural field and contribute towards wildlife conservation.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"102 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141252232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A power-aware vision-based virtual sensor for real-time edge computing 用于实时边缘计算的基于视觉的功率感知虚拟传感器

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Real-Time Image Processing

Pub Date : 2024-05-30 DOI: 10.1007/s11554-024-01482-0

Chiara Contoli, Lorenzo Calisti, Giacomo Di Fabrizio, Nicholas Kania, Alessandro Bogliolo, Emanuele Lattanzi

Graphics processing units and tensor processing units coupled with tiny machine learning models deployed on edge devices are revolutionizing computer vision and real-time tracking systems. However, edge devices pose tight resource and power constraints. This paper proposes a real-time vision-based virtual sensors paradigm to provide power-aware multi-object tracking at the edge while preserving tracking accuracy and enhancing privacy. We thoroughly describe our proposed system architecture, focusing on the Dynamic Inference Power Manager (DIPM). Our proposed DIPM is based on an adaptive frame rate to provide energy savings. We implement and deploy the virtual sensor and the DIPM on the NVIDIA Jetson Nano edge platform to prove the effectiveness and efficiency of the proposed solution. The results of extensive experiments demonstrate that the proposed virtual sensor can achieve a reduction in energy consumption of about 36% in videos with relatively low dynamicity and about 21% in more dynamic video content while simultaneously maintaining tracking accuracy within a range of less than 1.2%.

图形处理单元和张量处理单元加上部署在边缘设备上的微型机器学习模型，正在给计算机视觉和实时跟踪系统带来革命性的变化。然而，边缘设备在资源和功耗方面受到严格限制。本文提出了一种基于实时视觉的虚拟传感器范例，可在边缘设备上提供功耗感知的多目标跟踪，同时保持跟踪精度并增强隐私保护。我们详细介绍了我们提出的系统架构，重点是动态推理电源管理器（DIPM）。我们提出的 DIPM 基于自适应帧速率，以节省能源。我们在英伟达 Jetson Nano 边缘平台上实施并部署了虚拟传感器和 DIPM，以证明所提解决方案的有效性和效率。大量实验结果表明，在动态性相对较低的视频中，拟议的虚拟传感器可实现约 36% 的能耗降低，而在动态性较高的视频内容中则可实现约 21% 的能耗降低，同时还能将跟踪精度保持在小于 1.2% 的范围内。

{"title":"A power-aware vision-based virtual sensor for real-time edge computing","authors":"Chiara Contoli, Lorenzo Calisti, Giacomo Di Fabrizio, Nicholas Kania, Alessandro Bogliolo, Emanuele Lattanzi","doi":"10.1007/s11554-024-01482-0","DOIUrl":"https://doi.org/10.1007/s11554-024-01482-0","url":null,"abstract":"Graphics processing units and tensor processing units coupled with tiny machine learning models deployed on edge devices are revolutionizing computer vision and real-time tracking systems. However, edge devices pose tight resource and power constraints. This paper proposes a real-time vision-based virtual sensors paradigm to provide power-aware multi-object tracking at the edge while preserving tracking accuracy and enhancing privacy. We thoroughly describe our proposed system architecture, focusing on the Dynamic Inference Power Manager (DIPM). Our proposed DIPM is based on an adaptive frame rate to provide energy savings. We implement and deploy the virtual sensor and the DIPM on the NVIDIA Jetson Nano edge platform to prove the effectiveness and efficiency of the proposed solution. The results of extensive experiments demonstrate that the proposed virtual sensor can achieve a reduction in energy consumption of about 36% in videos with relatively low dynamicity and about 21% in more dynamic video content while simultaneously maintaining tracking accuracy within a range of less than 1.2%.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"13 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141188555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

SSE-YOLOv5: a real-time fault line selection method based on lightweight modules and attention models SSE-YOLOv5：基于轻量级模块和注意力模型的实时断层线选择方法

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Real-Time Image Processing

Pub Date : 2024-05-29 DOI: 10.1007/s11554-024-01480-2

Shuai Hao, Wei Li, Xu Ma, Zhuo Tian

To address the problems of low precision and poor anti-noise performance of the standard route selection method for the small current grounding faults, a fault line selection approach based on YOLOv5 network that integrates attention modules and lightweight models is proposed. First, grounding system fault’s zero sequence current is utilized as the basis for fault discrimination. A wavelet transform is employed to translate the zero sequence current to a two-dimensional time–frequency map to create a dataset. However, due to the impact of the lack of training sets on the accuracy of line selection, we constructed a simulation model for small current grounding faults based on actual faults. By modifying the fault location, fault angle, and grounding resistance, we generated a simulation dataset to expand the training set. Second, to reduce the impact of noise on fault features during line selection, the SE channel attention model is used to fuse it into the backbone of the YOLOv5 detection network, significantly improving the network's accuracy in detecting fault areas. Finally, to achieve high line selection accuracy and good real-time performance in the detection network, the lightweight network model ShuffleNetV2 is introduced into the constructed network. ShuffleNetV2 reduces the number of network model parameters through its deep separable convolution, improving the real-time performance of line selection. The proposed algorithm in this study was compared with four other algorithms to verify its advantages. The experimental results reveal that the proposed method reached a line selection accuracy of 93.6% under the condition of a small amount of real data samples, while maintaining a line selection accuracy of over 90% in the presence of noise. When the image resolution is 640 × 640, its detection speed is 122fps, indicating good real-time performance.

针对小电流接地故障标准选线方法精度低、抗噪声性能差等问题，提出了一种基于 YOLOv5 网络的故障选线方法，该方法集成了注意力模块和轻量级模型。首先，利用接地系统故障的零序电流作为故障判别的基础。利用小波变换将零序电流转换为二维时频图，从而创建数据集。然而，由于缺乏训练集会影响线路选择的准确性，我们根据实际故障构建了一个小电流接地故障模拟模型。通过修改故障位置、故障角度和接地电阻，我们生成了一个模拟数据集，以扩大训练集。其次，为了减少线路选择过程中噪声对故障特征的影响，我们将 SE 信道注意模型融合到 YOLOv5 检测网络的骨干中，显著提高了网络检测故障区域的准确性。最后，为了实现检测网络的高线路选择精度和良好的实时性，在构建的网络中引入了轻量级网络模型 ShuffleNetV2。ShuffleNetV2 通过深度可分离卷积减少了网络模型参数的数量，提高了线路选择的实时性。本研究中提出的算法与其他四种算法进行了比较，以验证其优势。实验结果表明，在少量真实数据样本的条件下，所提方法的选线准确率达到了 93.6%，而在存在噪声的情况下，选线准确率仍能保持在 90% 以上。当图像分辨率为 640 × 640 时，其检测速度为 122fps，具有良好的实时性。

{"title":"SSE-YOLOv5: a real-time fault line selection method based on lightweight modules and attention models","authors":"Shuai Hao, Wei Li, Xu Ma, Zhuo Tian","doi":"10.1007/s11554-024-01480-2","DOIUrl":"https://doi.org/10.1007/s11554-024-01480-2","url":null,"abstract":"To address the problems of low precision and poor anti-noise performance of the standard route selection method for the small current grounding faults, a fault line selection approach based on YOLOv5 network that integrates attention modules and lightweight models is proposed. First, grounding system fault’s zero sequence current is utilized as the basis for fault discrimination. A wavelet transform is employed to translate the zero sequence current to a two-dimensional time–frequency map to create a dataset. However, due to the impact of the lack of training sets on the accuracy of line selection, we constructed a simulation model for small current grounding faults based on actual faults. By modifying the fault location, fault angle, and grounding resistance, we generated a simulation dataset to expand the training set. Second, to reduce the impact of noise on fault features during line selection, the SE channel attention model is used to fuse it into the backbone of the YOLOv5 detection network, significantly improving the network's accuracy in detecting fault areas. Finally, to achieve high line selection accuracy and good real-time performance in the detection network, the lightweight network model ShuffleNetV2 is introduced into the constructed network. ShuffleNetV2 reduces the number of network model parameters through its deep separable convolution, improving the real-time performance of line selection. The proposed algorithm in this study was compared with four other algorithms to verify its advantages. The experimental results reveal that the proposed method reached a line selection accuracy of 93.6% under the condition of a small amount of real data samples, while maintaining a line selection accuracy of over 90% in the presence of noise. When the image resolution is 640 × 640, its detection speed is 122fps, indicating good real-time performance.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"24 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141188561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

AM YOLO: adaptive multi-scale YOLO for ship instance segmentation AM YOLO：用于船舶实例分割的自适应多尺度 YOLO

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Real-Time Image Processing

Pub Date : 2024-05-28 DOI: 10.1007/s11554-024-01479-9

Ming Yuan, Hao Meng, Junbao Wu

Instance segmentation has seen widespread development and significant progress across various fields. However, ship instance segmentation in marine environments faces challenges, including complex sea surface backgrounds, indistinct target features, and large-scale variations, making it incapable of achieving the desirable results. To overcome these challenges, this paper presents an adaptive multi-scale YOLO (AM YOLO) algorithm to improve instance segmentation performance for multi-scale ship targets in marine environments. Initially, the algorithm proposes a multi-grained adaptive feature enhancement module (MAEM) that utilizes grouped weighting and multiple adaptive mechanisms to enhance the extraction of details and improve the accuracy of multi-scale and global information. Subsequently, this study proposes a refine bidirectional feature pyramid network (RBiFPN) structure, which employs a cross-channel attention adaptive mechanism to integrate feature information and contextual details across different scales fully. Experiments on the challenging MS COCO dataset, COCO-boat dataset, and OVSD dataset show that compared to the baseline YOLOv5s, the AM YOLO model increases instance segmentation precision by 4.0%, 1.4%, and 2.3%, respectively. This improvement enhances the model’s generalization capabilities and achieves an optimal balance between accuracy and speed while maintaining real-time performance, thus broadening the model’s applicability in dynamic marine environments

实例分割技术在各个领域都得到了广泛的发展和显著的进步。然而，海洋环境中的船舶实例分割面临着复杂的海面背景、模糊的目标特征和大尺度变化等挑战，使其无法达到理想的效果。为了克服这些挑战，本文提出了一种自适应多尺度 YOLO（AM YOLO）算法，以提高海洋环境中多尺度船舶目标的实例分割性能。首先，该算法提出了一个多粒度自适应特征增强模块（MAEM），利用分组加权和多种自适应机制来增强细节提取，提高多尺度和全局信息的准确性。随后，本研究提出了一种细化双向特征金字塔网络（RBiFPN）结构，该结构采用了一种跨通道注意力自适应机制，可充分整合不同尺度的特征信息和上下文细节。在具有挑战性的 MS COCO 数据集、COCO-boat 数据集和 OVSD 数据集上的实验表明，与基线 YOLOv5s 相比，AM YOLO 模型的实例分割精度分别提高了 4.0%、1.4% 和 2.3%。这一改进增强了模型的泛化能力，并在保持实时性的同时实现了精度和速度之间的最佳平衡，从而扩大了模型在动态海洋环境中的适用性。

{"title":"AM YOLO: adaptive multi-scale YOLO for ship instance segmentation","authors":"Ming Yuan, Hao Meng, Junbao Wu","doi":"10.1007/s11554-024-01479-9","DOIUrl":"https://doi.org/10.1007/s11554-024-01479-9","url":null,"abstract":"Instance segmentation has seen widespread development and significant progress across various fields. However, ship instance segmentation in marine environments faces challenges, including complex sea surface backgrounds, indistinct target features, and large-scale variations, making it incapable of achieving the desirable results. To overcome these challenges, this paper presents an adaptive multi-scale YOLO (AM YOLO) algorithm to improve instance segmentation performance for multi-scale ship targets in marine environments. Initially, the algorithm proposes a multi-grained adaptive feature enhancement module (MAEM) that utilizes grouped weighting and multiple adaptive mechanisms to enhance the extraction of details and improve the accuracy of multi-scale and global information. Subsequently, this study proposes a refine bidirectional feature pyramid network (RBiFPN) structure, which employs a cross-channel attention adaptive mechanism to integrate feature information and contextual details across different scales fully. Experiments on the challenging MS COCO dataset, COCO-boat dataset, and OVSD dataset show that compared to the baseline YOLOv5s, the AM YOLO model increases instance segmentation precision by 4.0%, 1.4%, and 2.3%, respectively. This improvement enhances the model’s generalization capabilities and achieves an optimal balance between accuracy and speed while maintaining real-time performance, thus broadening the model’s applicability in dynamic marine environments","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"48 3 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141169923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ResLMFFNet: a real-time semantic segmentation network for precision agriculture ResLMFFNet：用于精准农业的实时语义分割网络

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Real-Time Image Processing

Pub Date : 2024-05-28 DOI: 10.1007/s11554-024-01474-0

Irem Ulku

Lightweight multiscale-feature-fusion network (LMFFNet), a proficient real-time CNN architecture, adeptly achieves a balance between inference time and accuracy. Capturing the intricate details of precision agriculture target objects in remote sensing images requires deep SEM-B blocks in the LMFFNet model design. However, employing numerous SEM-B units leads to instability during backward gradient flow. This work proposes the novel residual-LMFFNet (ResLMFFNet) model for ensuring smooth gradient flow within SEM-B blocks. By incorporating residual connections, ResLMFFNet achieves improved accuracy without affecting the inference speed and the number of trainable parameters. The results of the experiments demonstrate that this architecture has achieved superior performance compared to other real-time architectures across diverse precision agriculture applications involving UAV and satellite images. Compared to LMFFNet, the ResLMFFNet architecture enhances the Jaccard Index values by 2.1% for tree detection, 1.4% for crop detection, and 11.2% for wheat-yellow rust detection. Achieving these remarkable accuracy levels involves maintaining almost identical inference time and computational complexity as the LMFFNet model. The source code is available on GitHub: https://github.com/iremulku/Semantic-Segmentation-in-Precision-Agriculture.

轻量级多矢量特征融合网络（LMFFNet）是一种熟练的实时 CNN 架构，能够在推理时间和精度之间取得平衡。要捕捉遥感图像中精准农业目标对象的复杂细节，需要在 LMFFNet 模型设计中采用深度 SEM-B 块。然而，采用大量 SEM-B 单元会导致后向梯度流的不稳定性。本研究提出了新颖的残差-LMFFNet（ResLMFFNet）模型，以确保 SEM-B 块内的梯度流平滑。通过加入残差连接，ResLMFFNet 在不影响推理速度和可训练参数数量的情况下提高了精度。实验结果表明，在涉及无人机和卫星图像的各种精准农业应用中，与其他实时架构相比，该架构实现了更优越的性能。与 LMFFNet 相比，ResLMFFNet 架构在树木检测方面的 Jaccard Index 值提高了 2.1%，在作物检测方面提高了 1.4%，在小麦黄锈病检测方面提高了 11.2%。要达到这些出色的准确度水平，推理时间和计算复杂度几乎与 LMFFNet 模型相同。源代码可在 GitHub 上获取：https://github.com/iremulku/Semantic-Segmentation-in-Precision-Agriculture。

{"title":"ResLMFFNet: a real-time semantic segmentation network for precision agriculture","authors":"Irem Ulku","doi":"10.1007/s11554-024-01474-0","DOIUrl":"https://doi.org/10.1007/s11554-024-01474-0","url":null,"abstract":"Lightweight multiscale-feature-fusion network (LMFFNet), a proficient real-time CNN architecture, adeptly achieves a balance between inference time and accuracy. Capturing the intricate details of precision agriculture target objects in remote sensing images requires deep SEM-B blocks in the LMFFNet model design. However, employing numerous SEM-B units leads to instability during backward gradient flow. This work proposes the novel residual-LMFFNet (ResLMFFNet) model for ensuring smooth gradient flow within SEM-B blocks. By incorporating residual connections, ResLMFFNet achieves improved accuracy without affecting the inference speed and the number of trainable parameters. The results of the experiments demonstrate that this architecture has achieved superior performance compared to other real-time architectures across diverse precision agriculture applications involving UAV and satellite images. Compared to LMFFNet, the ResLMFFNet architecture enhances the Jaccard Index values by 2.1% for tree detection, 1.4% for crop detection, and 11.2% for wheat-yellow rust detection. Achieving these remarkable accuracy levels involves maintaining almost identical inference time and computational complexity as the LMFFNet model. The source code is available on GitHub: https://github.com/iremulku/Semantic-Segmentation-in-Precision-Agriculture.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"63 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141169755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

HSMF: hardware-efficient single-stage feedback mean filter for high-density salt-and-pepper noise removal HSMF：用于去除高密度椒盐噪声的硬件高效单级反馈均值滤波器

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Real-Time Image Processing

Pub Date : 2024-05-25 DOI: 10.1007/s11554-024-01475-z

Midde Venkata Siva, E. P. Jayakumar

Noise is an unwanted element that has a negative impact on digital image quality. Salt-and-pepper noise is a type of noise that can appear at any point during the acquisition or transmission of images. It is essential to utilize proper restoration procedures to lessen the noise. This paper proposes a hardware-efficient VLSI architecture for the feedback decision-based trimmed mean filter that eliminates high-density salt-and-pepper noise in the images. The noisy pixels are identified and corrected by considering the neighbouring pixels in a 3 (times) 3 window corresponding to this noisy centre pixel. Either the mean of the horizontal and vertical noisy pixels or the mean of noise-free pixels in the window is computed. This mean value is fed back and the noisy centre pixel is updated immediately, such that this updated pixel value is used henceforth for correcting the remaining corrupted pixels. It is observed that this procedure helps in removing the noisy pixels effectively even if the noise density is high. Additionally, the designed VLSI architecture is efficient, since the algorithm does not require a sorting process and the computing resources required are less when compared to other state-of-the-art algorithms.

噪点是一种对数字图像质量有负面影响的干扰因素。椒盐噪点是噪点的一种，可能出现在图像采集或传输的任何时候。利用适当的修复程序来减少噪声至关重要。本文为基于反馈决策的修剪均值滤波器提出了一种硬件高效 VLSI 架构，可消除图像中的高密度椒盐噪声。通过考虑与噪声中心像素相对应的 3 (times) 3 窗口中的邻近像素来识别和修正噪声像素。计算窗口中水平和垂直噪声像素的平均值或无噪声像素的平均值。这个平均值会被反馈回来，噪声中心像素也会立即更新，更新后的像素值将被用于修正其余的损坏像素。据观察，即使噪声密度很高，这一程序也能有效去除噪声像素。此外，所设计的 VLSI 架构非常高效，因为该算法不需要排序过程，与其他最先进的算法相比，所需的计算资源更少。

{"title":"HSMF: hardware-efficient single-stage feedback mean filter for high-density salt-and-pepper noise removal","authors":"Midde Venkata Siva, E. P. Jayakumar","doi":"10.1007/s11554-024-01475-z","DOIUrl":"https://doi.org/10.1007/s11554-024-01475-z","url":null,"abstract":"Noise is an unwanted element that has a negative impact on digital image quality. Salt-and-pepper noise is a type of noise that can appear at any point during the acquisition or transmission of images. It is essential to utilize proper restoration procedures to lessen the noise. This paper proposes a hardware-efficient VLSI architecture for the feedback decision-based trimmed mean filter that eliminates high-density salt-and-pepper noise in the images. The noisy pixels are identified and corrected by considering the neighbouring pixels in a 3 (times) 3 window corresponding to this noisy centre pixel. Either the mean of the horizontal and vertical noisy pixels or the mean of noise-free pixels in the window is computed. This mean value is fed back and the noisy centre pixel is updated immediately, such that this updated pixel value is used henceforth for correcting the remaining corrupted pixels. It is observed that this procedure helps in removing the noisy pixels effectively even if the noise density is high. Additionally, the designed VLSI architecture is efficient, since the algorithm does not require a sorting process and the computing resources required are less when compared to other state-of-the-art algorithms.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"14 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141152421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Adaptive complexity control for AV1 video encoder using machine learning 利用机器学习对 AV1 视频编码器进行自适应复杂度控制

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Real-Time Image Processing

Pub Date : 2024-05-19 DOI: 10.1007/s11554-024-01463-3

Isis Bender, Gustavo Rehbein, Guilherme Correa, Luciano Agostini, Marcelo Porto

Digital videos are widely used on various platforms, including smartphones and other battery-powered mobile devices, which can suffer from energy consumption and performance constraints. Video encoders are responsible for compressing video data, enabling the use of this type of media by reducing the data rate while maintaining image quality. To promote the use of digital videos, the continuous improvement of digital video encoding standards is crucial. In this context, the Alliance for Open Media (AOM) developed the AV1 (AOMedia Video 1) format. However, the advanced tools and enhancements provided by AV1 come with a high computational cost. To address this issue, this paper presents the learning-based AV1 complexity controller (LACCO). The goal of LACCO is to dynamically optimize the encoding time of the AV1 encoder for HD 1080 and UHD 4K resolution videos. The controller achieves this goal by predicting the encoding time of future frames and classifying input videos according to their characteristics through the use of trained machine learning models. LACCO was integrated into the reference software of the AV1 encoder and its encoding time reduction ranges from 10 to 70%, with average error results ranging from 0.11 to 1.88 percentage points for HD 1080 resolution and from 0.14 to 3.33 percentage points for UHD 4K resolution.

数字视频广泛应用于各种平台，包括智能手机和其他电池供电的移动设备，这些设备可能会受到能耗和性能的限制。视频编码器负责压缩视频数据，在保持图像质量的同时降低数据传输速率，从而使这类媒体的使用成为可能。为了促进数字视频的使用，不断改进数字视频编码标准至关重要。为此，开放媒体联盟（AOM）开发了 AV1（AOMedia Video 1）格式。然而，AV1 提供的先进工具和增强功能需要高昂的计算成本。为解决这一问题，本文提出了基于学习的 AV1 复杂性控制器（LACCO）。LACCO 的目标是动态优化 AV1 编码器对高清 1080 和超高清 4K 分辨率视频的编码时间。该控制器通过预测未来帧的编码时间，并利用训练有素的机器学习模型根据输入视频的特征对其进行分类，从而实现这一目标。LACCO 已集成到 AV1 编码器的参考软件中，其编码时间缩短了 10% 至 70%，平均误差结果为高清 1080 分辨率 0.11 至 1.88 个百分点，UHD 4K 分辨率 0.14 至 3.33 个百分点。

{"title":"Adaptive complexity control for AV1 video encoder using machine learning","authors":"Isis Bender, Gustavo Rehbein, Guilherme Correa, Luciano Agostini, Marcelo Porto","doi":"10.1007/s11554-024-01463-3","DOIUrl":"https://doi.org/10.1007/s11554-024-01463-3","url":null,"abstract":"Digital videos are widely used on various platforms, including smartphones and other battery-powered mobile devices, which can suffer from energy consumption and performance constraints. Video encoders are responsible for compressing video data, enabling the use of this type of media by reducing the data rate while maintaining image quality. To promote the use of digital videos, the continuous improvement of digital video encoding standards is crucial. In this context, the Alliance for Open Media (AOM) developed the AV1 (AOMedia Video 1) format. However, the advanced tools and enhancements provided by AV1 come with a high computational cost. To address this issue, this paper presents the learning-based AV1 complexity controller (LACCO). The goal of LACCO is to dynamically optimize the encoding time of the AV1 encoder for HD 1080 and UHD 4K resolution videos. The controller achieves this goal by predicting the encoding time of future frames and classifying input videos according to their characteristics through the use of trained machine learning models. LACCO was integrated into the reference software of the AV1 encoder and its encoding time reduction ranges from 10 to 70%, with average error results ranging from 0.11 to 1.88 percentage points for HD 1080 resolution and from 0.14 to 3.33 percentage points for UHD 4K resolution.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"55 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141062474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Fast detection of face masks in public places using QARepVGG-YOLOv7 使用 QARepVGG-YOLOv7 快速检测公共场所的人脸面具

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Real-Time Image Processing

Pub Date : 2024-05-19 DOI: 10.1007/s11554-024-01476-y

Chuying Guan, Jiaxuan Jiang, Zhong Wang

The COVID-19 pandemic has resulted in substantial global losses. In the post-epidemic era, public health needs still advocate the correct use of medical masks in confined spaces such as hospitals and indoors. This can effectively block the spread of infectious diseases through droplets, protect personal and public health, and improve the environmental sustainability and social resilience of cities. Therefore, detecting the correct wearing of masks is crucial. This study proposes an innovative three-class mask detection model based on the QARepVGG-YOLOv7 algorithm. The model replaces the convolution module in the backbone network with the QARepVGG module and uses the quantitative friendly structure and re-parameterization characteristics of the QARepVGG module to achieve high-precision and high-efficiency target detection. To validate the effectiveness of our proposed method, we created a mask dataset of 5095 pictures, including three categories: correct use of masks, incorrect use of masks, and individuals who do not wear masks. We also employed data augmentation techniques to further balance the dataset categories. We tested YOLOv5s, YOLOv6, YOLOv7, and YOLOv8s models on self-made datasets. The results show that the QARepVGG-YOLOv7 model has the best accuracy compared with the most advanced YOLO model. Our model achieves a significantly improved mAP value of 0.946 and a faster fps of 263.2, which is 90.8 fps higher than the YOLOv7 model and a 0.5% increase in map value over the YOLOv7 model. It is a high-precision and high-efficiency mask detection model.

COVID-19 大流行给全球造成了巨大损失。在后疫情时代，公共卫生需求仍然提倡在医院和室内等密闭空间正确使用医用口罩。这可以有效阻止传染病通过飞沫传播，保护个人和公众健康，提高城市的环境可持续性和社会适应力。因此，检测口罩的正确佩戴至关重要。本研究基于 QARepVGG-YOLOv7 算法提出了一种创新的三类口罩检测模型。该模型用 QARepVGG 模块替换了骨干网络中的卷积模块，并利用 QARepVGG 模块的定量友好结构和重参数化特性实现了高精度、高效率的目标检测。为了验证我们提出的方法的有效性，我们创建了一个包含 5095 张图片的口罩数据集，其中包括正确使用口罩、不正确使用口罩和不戴口罩的个人三个类别。我们还采用了数据增强技术来进一步平衡数据集的类别。我们在自制数据集上测试了 YOLOv5s、YOLOv6、YOLOv7 和 YOLOv8s 模型。结果表明，与最先进的 YOLO 模型相比，QARepVGG-YOLOv7 模型的准确度最高。与 YOLOv7 模型相比，我们的模型 mAP 值大幅提高，达到 0.946，fps 快达 263.2，比 YOLOv7 模型快 90.8，地图值提高了 0.5%。这是一个高精度、高效率的掩膜检测模型。

{"title":"Fast detection of face masks in public places using QARepVGG-YOLOv7","authors":"Chuying Guan, Jiaxuan Jiang, Zhong Wang","doi":"10.1007/s11554-024-01476-y","DOIUrl":"https://doi.org/10.1007/s11554-024-01476-y","url":null,"abstract":"The COVID-19 pandemic has resulted in substantial global losses. In the post-epidemic era, public health needs still advocate the correct use of medical masks in confined spaces such as hospitals and indoors. This can effectively block the spread of infectious diseases through droplets, protect personal and public health, and improve the environmental sustainability and social resilience of cities. Therefore, detecting the correct wearing of masks is crucial. This study proposes an innovative three-class mask detection model based on the QARepVGG-YOLOv7 algorithm. The model replaces the convolution module in the backbone network with the QARepVGG module and uses the quantitative friendly structure and re-parameterization characteristics of the QARepVGG module to achieve high-precision and high-efficiency target detection. To validate the effectiveness of our proposed method, we created a mask dataset of 5095 pictures, including three categories: correct use of masks, incorrect use of masks, and individuals who do not wear masks. We also employed data augmentation techniques to further balance the dataset categories. We tested YOLOv5s, YOLOv6, YOLOv7, and YOLOv8s models on self-made datasets. The results show that the QARepVGG-YOLOv7 model has the best accuracy compared with the most advanced YOLO model. Our model achieves a significantly improved mAP value of 0.946 and a faster fps of 263.2, which is 90.8 fps higher than the YOLOv7 model and a 0.5% increase in map value over the YOLOv7 model. It is a high-precision and high-efficiency mask detection model.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"55 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141062466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

FPGA-based implementation of the VVC low-frequency non-separable transform 基于 FPGA 的 VVC 低频非分离变换实现方案

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Real-Time Image Processing

Pub Date : 2024-05-18 DOI: 10.1007/s11554-024-01471-3

Fatma Belghith, Sonda Ben Jdidia, Bouthaina Abdallah, Nouri Masmoudi

The Versatile Video Coding (VVC) standard, released in July 2020, brings better coding performance than the High-Efficiency Video Coding (HEVC) thanks to the introduction of new coding tools. The transform module in the VVC standard incorporates the Multiple Transform Selection (MTS) concept, which relies on separable Discrete Cosine Transform (DCT)/Discrete Sine Transform (DST) kernels, and the recently introduced Low-Frequency Non-Separable Transform (LFNST). This latter serves as a secondary transform process, enhancing coding efficiency by further decorrelating residual samples. However, it introduces heightened computational complexity and substantial resource allocation demands, potentially complicating its hardware implementation. This paper introduces an effective and cost-efficient hardware architecture for LFNST. The proposed design employs additions and bit-shifting operations preserving hardware logic usage. The synthesis results for an Arria 10 10AX115N1F45E1SG FPGA device demonstrate that the logic cost is only of 26% of the available hardware resources. Additionally, the proposed design is working at 204 MHz and can process Ultra High Definition (UHD) 4K videos at up to 60 frames per second (fps).

由于引入了新的编码工具，2020 年 7 月发布的通用视频编码（VVC）标准带来了比高效视频编码（HEVC）更好的编码性能。VVC 标准中的变换模块采用了多重变换选择（MTS）概念，它依赖于可分离的离散余弦变换（DCT）/离散正弦变换（DST）核，以及最近推出的低频非分离变换（LFNST）。后者是一种二次变换过程，通过进一步对残余样本进行去相关处理来提高编码效率。然而，LFNST 带来了更高的计算复杂性和大量的资源分配需求，可能会使其硬件实施变得复杂。本文为 LFNST 引入了一种高效、低成本的硬件架构。拟议的设计采用了加法和位移操作，保留了硬件逻辑的使用。Arria 10 10AX115N1F45E1SG FPGA 器件的综合结果表明，逻辑成本仅占可用硬件资源的 26%。此外，拟议设计的工作频率为 204 MHz，能以高达每秒 60 帧（fps）的速度处理超高清（UHD）4K 视频。

{"title":"FPGA-based implementation of the VVC low-frequency non-separable transform","authors":"Fatma Belghith, Sonda Ben Jdidia, Bouthaina Abdallah, Nouri Masmoudi","doi":"10.1007/s11554-024-01471-3","DOIUrl":"https://doi.org/10.1007/s11554-024-01471-3","url":null,"abstract":"The Versatile Video Coding (VVC) standard, released in July 2020, brings better coding performance than the High-Efficiency Video Coding (HEVC) thanks to the introduction of new coding tools. The transform module in the VVC standard incorporates the Multiple Transform Selection (MTS) concept, which relies on separable Discrete Cosine Transform (DCT)/Discrete Sine Transform (DST) kernels, and the recently introduced Low-Frequency Non-Separable Transform (LFNST). This latter serves as a secondary transform process, enhancing coding efficiency by further decorrelating residual samples. However, it introduces heightened computational complexity and substantial resource allocation demands, potentially complicating its hardware implementation. This paper introduces an effective and cost-efficient hardware architecture for LFNST. The proposed design employs additions and bit-shifting operations preserving hardware logic usage. The synthesis results for an Arria 10 10AX115N1F45E1SG FPGA device demonstrate that the logic cost is only of 26% of the available hardware resources. Additionally, the proposed design is working at 204 MHz and can process Ultra High Definition (UHD) 4K videos at up to 60 frames per second (fps).","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"12 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141062532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A real-time detection for miner behavior via DYS-YOLOv8n model 通过 DYS-YOLOv8n 模型实时检测矿工行为

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Real-Time Image Processing

Pub Date : 2024-05-13 DOI: 10.1007/s11554-024-01466-0

Fangfang Xin, Xinyu He, Chaoxiu Yao, Shan Li, Biao Ma, Hongguang Pan

To address the issues of low real-time performance and poor algorithm accuracy in detecting miner behavior underground, we propose a high-precision real-time detection method named DSY-YOLOv8n based on the characteristics of human body behavior. This method integrates DSConv into the backbone network to enhance multi-scale feature extraction. Additionally, SCConv-C2f replaces C2f modules, reducing redundant calculations and improving model training speed. The optimization strategy of the loss function is employed, and MPDIoU is used to improve the model’s accuracy and speed. The experimental results show: (1) With almost no increase in parameters and calculation amount, the mAP50 of the DSY-YOLOv8n model is 97.4%, which is a 3.2% great improvement over the YOLOv8n model. (2) Compared to Faster-R-CNN, YOLOv5s, and YOLOv7, DYS-YOLOv8n has improved the average accuracy to varying degrees while significantly increasing the detection speed. (3) DYS-YOLOv8n meets the real-time requirements for behavioral detection in mines with a detection speed of 243FPS. In summary, the DYS-YOLOv8n offers a real-time, efficient, and lightweight method for detecting miner behavior in mines, which has high practical value.

针对井下矿工行为检测实时性低、算法准确性差等问题，我们提出了一种基于人体行为特征的高精度实时检测方法，命名为 DSY-YOLOv8n。该方法将 DSConv 集成到主干网络中，以加强多尺度特征提取。此外，SCConv-C2f 取代了 C2f 模块，减少了冗余计算，提高了模型训练速度。采用损失函数的优化策略，并使用 MPDIoU 来提高模型的精度和速度。实验结果表明：（1）在几乎不增加参数和计算量的情况下，DSY-YOLOv8n 模型的 mAP50 为 97.4%，比 YOLOv8n 模型提高了 3.2%。(2）与 Faster-R-CNN、YOLOv5s 和 YOLOv7 相比，DYS-YOLOv8n 在显著提高检测速度的同时，平均准确率也有不同程度的提高。(3）DYS-YOLOv8n 以 243FPS 的检测速度满足了矿井行为检测的实时性要求。综上所述，DYS-YOLOv8n 提供了一种实时、高效、轻便的矿井矿工行为检测方法，具有很高的实用价值。

{"title":"A real-time detection for miner behavior via DYS-YOLOv8n model","authors":"Fangfang Xin, Xinyu He, Chaoxiu Yao, Shan Li, Biao Ma, Hongguang Pan","doi":"10.1007/s11554-024-01466-0","DOIUrl":"https://doi.org/10.1007/s11554-024-01466-0","url":null,"abstract":"To address the issues of low real-time performance and poor algorithm accuracy in detecting miner behavior underground, we propose a high-precision real-time detection method named DSY-YOLOv8n based on the characteristics of human body behavior. This method integrates DSConv into the backbone network to enhance multi-scale feature extraction. Additionally, SCConv-C2f replaces C2f modules, reducing redundant calculations and improving model training speed. The optimization strategy of the loss function is employed, and MPDIoU is used to improve the model’s accuracy and speed. The experimental results show: (1) With almost no increase in parameters and calculation amount, the mAP50 of the DSY-YOLOv8n model is 97.4%, which is a 3.2% great improvement over the YOLOv8n model. (2) Compared to Faster-R-CNN, YOLOv5s, and YOLOv7, DYS-YOLOv8n has improved the average accuracy to varying degrees while significantly increasing the detection speed. (3) DYS-YOLOv8n meets the real-time requirements for behavioral detection in mines with a detection speed of 243FPS. In summary, the DYS-YOLOv8n offers a real-time, efficient, and lightweight method for detecting miner behavior in mines, which has high practical value.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"11 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140930172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0