Journal of Real-Time Image Processing最新文献_第10页

Multiple layers complexity allocation with dynamic control scheme for high-efficiency video coding 高效视频编码的多层复杂性分配与动态控制方案

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Real-Time Image Processing

Pub Date : 2024-04-04 DOI: 10.1007/s11554-024-01452-6

Jiunn-Tsair Fang, Ju-Kai Chen

High-efficiency video coding (HEVC) has significantly improved coding efficiency; however, its quadtree (QT) structures for coding units (CU) substantially raise the overall coding complexity. This study introduces a novel complexity control scheme aimed at enhancing HEVC encoding efficiency. The proposed scheme operates across multiple layers, encompassing the group of pictures (GOP) layer, frame layer, and coding-tree unit (CTU) layer. Each coding layer is assigned a limited coding complexity based on the remaining coding time. Particularly noteworthy is the dynamic scheme implemented to activate the complexity control method. To further expedite encoding, an efficient algorithm is developed for the CTU layer. Experimental results indicate that the 0.46% and 0.98% increases in BD-rate under the target complexity are reduced to 80% and 60% of the complexity constraint, respectively. The rate-distortion performance surpasses existing state-of-the-art complexity control methods, demonstrating the effectiveness of the proposed approach in enhancing HEVC encoding efficiency.

高效视频编码（HEVC）大大提高了编码效率；然而，其编码单元（CU）的四叉树（QT）结构大大提高了整体编码复杂度。本研究介绍了一种旨在提高 HEVC 编码效率的新型复杂度控制方案。建议的方案跨越多个层，包括图片组（GOP）层、帧层和编码树单元（CTU）层。每个编码层根据剩余编码时间分配有限的编码复杂度。尤其值得注意的是，为激活复杂度控制方法而实施的动态方案。为了进一步加快编码，还为 CTU 层开发了一种高效算法。实验结果表明，在目标复杂度下，BD 速率增加的 0.46% 和 0.98% 分别降低到了复杂度约束的 80% 和 60%。速率-失真性能超过了现有的最先进的复杂度控制方法，证明了所提出的方法在提高 HEVC 编码效率方面的有效性。

引用次数: 0

SDPH: a new technique for spatial detection of path holes from huge volume high-resolution raster images in near real-time SDPH：一种近乎实时地从大体积高分辨率光栅图像中进行路径孔空间检测的新技术

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Real-Time Image Processing

Pub Date : 2024-04-04 DOI: 10.1007/s11554-024-01451-7

Murat Tasyurek

Detecting and repairing road defects is crucial for road safety, vehicle maintenance, and enhancing tourism on well-maintained roads. However, monitoring all roads by vehicle incurs high costs. With the widespread use of remote sensing technologies, high-resolution satellite images offer a cost-effective alternative. This study proposes a new technique, SDPH, for automated detection of damaged roads from vast, high-resolution satellite images. In the SDPH technique, satellite images are organized in a pyramid grid file system, allowing deep learning methods to efficiently process them. The images, generated as (256times 256) dimensions, are stored in a directory with explicit location information. The SDPH technique employs a two-stage object detection models, utilizing classical and modified RCNNv3, YOLOv5, and YOLOv8. Classical RCNNv3, YOLOv5, and YOLOv8 and modified RCNNv3, YOLOv5, and YOLOv8 in the first stage for identifying roads, achieving f1 scores of 0.743, 0.716, 0.710, 0.955, 0.958, and 0.954, respectively. When the YOLOv5, with the highest f1 score, was fed to the second stage; modified RCNNv3, YOLOv5, and YOLOv8 detected road defects, achieving f1 scores of 0.957,0.971 and 0.964 in the second process. When the same CNN model was used for road and road defect detection in the proposed SDPH model, classical RCNNv3, improved RCNNv3, classical YOLOv5, improved YOLOv5, classical YOLOv8, improved RCNNv8 achieved micro f1 scores of 0.752, 0.956, 0.726, 0.969, 0.720 and 0.965, respectively. In addition, these models processed 11, 10, 33, 31, 37, and 36 FPS images by performing both stage operations, respectively. Evaluations on geotiff satellite images from Kayseri Metropolitan Municipality, ranging between 20 and 40 gigabytes, demonstrated the efficiency of the SDPH technique. Notably, the modified YOLOv5 outperformed, detecting paths and defects in 0.032 s with the micro f1 score of 0.969. Fine-tuning on TileCache enhanced f1 scores and reduced computational costs across all models.

检测和修复道路缺陷对于道路安全、车辆维护以及在维护良好的道路上促进旅游业至关重要。然而，用车辆监测所有道路的成本很高。随着遥感技术的广泛应用，高分辨率卫星图像提供了一种具有成本效益的替代方法。本研究提出了一种新技术 SDPH，用于从广阔的高分辨率卫星图像中自动检测受损道路。在 SDPH 技术中，卫星图像被组织成一个金字塔网格文件系统，允许深度学习方法对其进行高效处理。生成的图像以（256/times 256/）的维度存储在一个带有明确位置信息的目录中。SDPH 技术采用两阶段物体检测模型，利用经典和改进的 RCNNv3、YOLOv5 和 YOLOv8。在识别道路的第一阶段，经典 RCNNv3、YOLOv5 和 YOLOv8 以及改进 RCNNv3、YOLOv5 和 YOLOv8 的 f1 分数分别为 0.743、0.716、0.710、0.955、0.958 和 0.954。当将 f1 分数最高的 YOLOv5 送入第二阶段时，修改后的 RCNNv3、YOLOv5 和 YOLOv8 检测出了道路缺陷，在第二过程中分别获得了 0.957、0.971 和 0.964 的 f1 分数。当在拟议的 SDPH 模型中使用相同的 CNN 模型进行道路和道路缺陷检测时，经典 RCNNv3、改进 RCNNv3、经典 YOLOv5、改进 YOLOv5、经典 YOLOv8、改进 RCNNv8 的微观 f1 分数分别为 0.752、0.956、0.726、0.969、0.720 和 0.965。此外，这些模型通过执行两个阶段的操作，分别处理了 11、10、33、31、37 和 36 幅 FPS 图像。对来自开塞利大都会的 20 至 40 千兆字节的 geotiff 卫星图像进行的评估证明了 SDPH 技术的效率。值得注意的是，改进后的 YOLOv5 性能更优，在 0.032 秒内就检测出了路径和缺陷，微型 f1 得分为 0.969。对 TileCache 的微调提高了 f1 分数，降低了所有模型的计算成本。

{"title":"SDPH: a new technique for spatial detection of path holes from huge volume high-resolution raster images in near real-time","authors":"Murat Tasyurek","doi":"10.1007/s11554-024-01451-7","DOIUrl":"https://doi.org/10.1007/s11554-024-01451-7","url":null,"abstract":"Detecting and repairing road defects is crucial for road safety, vehicle maintenance, and enhancing tourism on well-maintained roads. However, monitoring all roads by vehicle incurs high costs. With the widespread use of remote sensing technologies, high-resolution satellite images offer a cost-effective alternative. This study proposes a new technique, SDPH, for automated detection of damaged roads from vast, high-resolution satellite images. In the SDPH technique, satellite images are organized in a pyramid grid file system, allowing deep learning methods to efficiently process them. The images, generated as (256times 256) dimensions, are stored in a directory with explicit location information. The SDPH technique employs a two-stage object detection models, utilizing classical and modified RCNNv3, YOLOv5, and YOLOv8. Classical RCNNv3, YOLOv5, and YOLOv8 and modified RCNNv3, YOLOv5, and YOLOv8 in the first stage for identifying roads, achieving f1 scores of 0.743, 0.716, 0.710, 0.955, 0.958, and 0.954, respectively. When the YOLOv5, with the highest f1 score, was fed to the second stage; modified RCNNv3, YOLOv5, and YOLOv8 detected road defects, achieving f1 scores of 0.957,0.971 and 0.964 in the second process. When the same CNN model was used for road and road defect detection in the proposed SDPH model, classical RCNNv3, improved RCNNv3, classical YOLOv5, improved YOLOv5, classical YOLOv8, improved RCNNv8 achieved micro f1 scores of 0.752, 0.956, 0.726, 0.969, 0.720 and 0.965, respectively. In addition, these models processed 11, 10, 33, 31, 37, and 36 FPS images by performing both stage operations, respectively. Evaluations on geotiff satellite images from Kayseri Metropolitan Municipality, ranging between 20 and 40 gigabytes, demonstrated the efficiency of the SDPH technique. Notably, the modified YOLOv5 outperformed, detecting paths and defects in 0.032 s with the micro f1 score of 0.969. Fine-tuning on TileCache enhanced f1 scores and reduced computational costs across all models.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"30 1 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140601916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Online continual streaming learning for embedded space applications 用于嵌入式空间应用的在线持续流式学习

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Real-Time Image Processing

Pub Date : 2024-04-02 DOI: 10.1007/s11554-024-01438-4

Alaa Eddine Mazouz, Van-Tam Nguyen

This paper proposes an online continual learning (OCL) methodology tested on hardware and validated for space applications using an object detection close-proximity operations task. The proposed OCL algorithm simulates a streaming scenario and uses experience replay to enable the model to update its knowledge without suffering catastrophic forgetting by saving past inputs in an onboard reservoir that will be sampled during updates. A stream buffer is introduced to enable online training, i.e., the ability to update the model as data is streamed, one sample at a time, rather than being available in batches. Hyperparameters such as buffer sizes, update rate, batch size, batch concatenation parameters and number of iterations per batch are all investigated to find an optimized approach for the incremental domain and streaming learning task. The algorithm is tested on a customized dataset for space applications simulating changes in visual environments that significantly impact the deployed model’s performance. Our OCL methodology uses Weighted Sampling, a novel approach which allows the system to analytically choose more useful input samples during training, the results show that a model can be updated online achieving up to 60% Average Learning while Average Forgetting can be as low as 13% all with a Model Size Efficiency of 1, meaning the model size does not increase. An additional contribution is an implementation of On-Device Continual Training for embedded applications, a hardware experiment is carried out on the Zynq 7100 FPGA where a pre-trained CNN model is updated online using our FPGA backpropagation pipeline and OCL methodology to take into account new data and satisfactorily complete the planned task in less than 5 min achieving 90 FPS.

本文提出了一种在线持续学习（OCL）方法，在硬件上进行了测试，并利用物体检测近距离操作任务对空间应用进行了验证。所提出的 OCL 算法模拟了一个流场景，并使用经验重放，通过将过去的输入保存在一个板载存储库中，使模型能够更新其知识，而不会遭受灾难性遗忘，该存储库将在更新期间进行采样。引入流缓冲区是为了实现在线训练，即在数据流中一次一个样本地更新模型，而不是成批地更新。对缓冲区大小、更新率、批次大小、批次连接参数和每批迭代次数等超参数进行了研究，以找到增量领域和流式学习任务的优化方法。该算法在空间应用的定制数据集上进行了测试，模拟了视觉环境的变化，这些变化对部署模型的性能产生了重大影响。我们的 OCL 方法使用了加权采样（一种新颖的方法，允许系统在训练过程中分析选择更有用的输入样本），结果表明，在线更新模型可实现高达 60% 的平均学习率，而平均遗忘率可低至 13%，模型大小效率为 1，这意味着模型大小不会增加。另外一个贡献是为嵌入式应用实现了设备上持续训练，在 Zynq 7100 FPGA 上进行了硬件实验，使用我们的 FPGA 反向传播管道和 OCL 方法对预先训练好的 CNN 模型进行在线更新，以考虑到新数据，并在不到 5 分钟的时间内圆满完成计划任务，达到 90 FPS。

{"title":"Online continual streaming learning for embedded space applications","authors":"Alaa Eddine Mazouz, Van-Tam Nguyen","doi":"10.1007/s11554-024-01438-4","DOIUrl":"https://doi.org/10.1007/s11554-024-01438-4","url":null,"abstract":"This paper proposes an online continual learning (OCL) methodology tested on hardware and validated for space applications using an object detection close-proximity operations task. The proposed OCL algorithm simulates a streaming scenario and uses experience replay to enable the model to update its knowledge without suffering catastrophic forgetting by saving past inputs in an onboard reservoir that will be sampled during updates. A stream buffer is introduced to enable online training, i.e., the ability to update the model as data is streamed, one sample at a time, rather than being available in batches. Hyperparameters such as buffer sizes, update rate, batch size, batch concatenation parameters and number of iterations per batch are all investigated to find an optimized approach for the incremental domain and streaming learning task. The algorithm is tested on a customized dataset for space applications simulating changes in visual environments that significantly impact the deployed model’s performance. Our OCL methodology uses Weighted Sampling, a novel approach which allows the system to analytically choose more useful input samples during training, the results show that a model can be updated online achieving up to 60% Average Learning while Average Forgetting can be as low as 13% all with a Model Size Efficiency of 1, meaning the model size does not increase. An additional contribution is an implementation of On-Device Continual Training for embedded applications, a hardware experiment is carried out on the Zynq 7100 FPGA where a pre-trained CNN model is updated online using our FPGA backpropagation pipeline and OCL methodology to take into account new data and satisfactorily complete the planned task in less than 5 min achieving 90 FPS.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"47 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140601729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Performance evaluation of all intra Kvazaar and x265 HEVC encoders on embedded system Nvidia Jetson platform 嵌入式系统 Nvidia Jetson 平台上所有 Kvazaar 和 x265 HEVC 编码器的性能评估

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Real-Time Image Processing

Pub Date : 2024-04-02 DOI: 10.1007/s11554-024-01429-5

R. James, Mohammed Abo-Zahhad, Koji Inoue, Mohammed S. Sayed

The growing demand for high-quality video requires complex coding techniques that cost resource consumption and increase encoding time which represents a challenge for real-time processing on Embedded Systems. Kvazaar and x265 encoders are two efficient implementations of the High-Efficient Video Coding (HEVC) standard. In this paper, the performance of All Intra Kvazaar and x265 encoders on the Nvidia Jetson platform was evaluated using two coding configurations; highspeed preset and high-quality preset. In our work, we used two scenarios, first, the two encoders were run on the CPU, and based on the average encoding time Kvazaar proved to be 65.44% and 69.4% faster than x265 with 1.88% and 0.6% BD-rate improvement over x265 at high-speed and high-quality preset, respectively. In the second scenario, the two encoders were run on the GPU of the Nvidia Jetson, and the results show the average encoding time under each preset is reduced by half of the CPU-based scenario. In addition, Kvazaar is 54.5% and 56.70% faster with 1.93% and 0.45% BD-rate improvement over x265 at high-speed and high-quality preset, respectively. Regarding the scalability, the two encoders on the CPU are linearly scaled up to four threads and speed remains constant afterward. On the GPU, the two encoders are scaled linearly with the number of threads. The obtained results confirmed that, Kvazaar is more efficient and that it can be used on Embedded Systems for real-time video applications due to its high speed and performance over the x265 HEVC encoder

对高质量视频日益增长的需求需要复杂的编码技术，这不仅耗费资源，还增加了编码时间，给嵌入式系统的实时处理带来了挑战。Kvazaar 和 x265 编码器是高效视频编码（HEVC）标准的两种高效实现。本文使用两种编码配置（高速预设和高质量预设）评估了 Nvidia Jetson 平台上 All Intra Kvazaar 和 x265 编码器的性能。在我们的工作中，我们使用了两种情况：第一种情况是在 CPU 上运行这两种编码器，根据平均编码时间，Kvazaar 比 x265 快 65.44% 和 69.4%，在高速预置和高质量预置下，比 x265 的 BD 速率分别提高了 1.88% 和 0.6%。在第二种情况下，两个编码器在 Nvidia Jetson 的 GPU 上运行，结果显示在每个预设值下的平均编码时间比基于 CPU 的情况缩短了一半。此外，在高速和高质量预设下，Kvazaar 比 x265 分别快 54.5% 和 56.70%，BD 速率分别提高 1.93% 和 0.45%。在可扩展性方面，CPU 上的两个编码器可线性扩展至四个线程，之后速度保持不变。在 GPU 上，两个编码器随线程数线性扩展。所获得的结果证实，Kvazaar 编码器更高效，与 x265 HEVC 编码器相比，它具有更高的速度和性能，可用于嵌入式系统的实时视频应用。

{"title":"Performance evaluation of all intra Kvazaar and x265 HEVC encoders on embedded system Nvidia Jetson platform","authors":"R. James, Mohammed Abo-Zahhad, Koji Inoue, Mohammed S. Sayed","doi":"10.1007/s11554-024-01429-5","DOIUrl":"https://doi.org/10.1007/s11554-024-01429-5","url":null,"abstract":"The growing demand for high-quality video requires complex coding techniques that cost resource consumption and increase encoding time which represents a challenge for real-time processing on Embedded Systems. Kvazaar and x265 encoders are two efficient implementations of the High-Efficient Video Coding (HEVC) standard. In this paper, the performance of All Intra Kvazaar and x265 encoders on the Nvidia Jetson platform was evaluated using two coding configurations; highspeed preset and high-quality preset. In our work, we used two scenarios, first, the two encoders were run on the CPU, and based on the average encoding time Kvazaar proved to be 65.44% and 69.4% faster than x265 with 1.88% and 0.6% BD-rate improvement over x265 at high-speed and high-quality preset, respectively. In the second scenario, the two encoders were run on the GPU of the Nvidia Jetson, and the results show the average encoding time under each preset is reduced by half of the CPU-based scenario. In addition, Kvazaar is 54.5% and 56.70% faster with 1.93% and 0.45% BD-rate improvement over x265 at high-speed and high-quality preset, respectively. Regarding the scalability, the two encoders on the CPU are linearly scaled up to four threads and speed remains constant afterward. On the GPU, the two encoders are scaled linearly with the number of threads. The obtained results confirmed that, Kvazaar is more efficient and that it can be used on Embedded Systems for real-time video applications due to its high speed and performance over the x265 HEVC encoder","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"124 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140601914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Slim-neck by GSConv: a lightweight-design for real-time detector architectures GSConv 的 Slim-neck：用于实时检测器架构的轻量级设计

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Real-Time Image Processing

Pub Date : 2024-03-29 DOI: 10.1007/s11554-024-01436-6

Hulin Li, Jun Li, Hanbing Wei, Zheng Liu, Zhenfei Zhan, Qiliang Ren

Real-time object detection is significant for industrial and research fields. On edge devices, a giant model is difficult to achieve the real-time detecting requirement, and a lightweight model built from a large number of the depth-wise separable convolutional could not achieve the sufficient accuracy. We introduce a new lightweight convolutional technique, GSConv, to lighten the model but maintain the accuracy. The GSConv accomplishes an excellent trade-off between the accuracy and speed. Furthermore, we provide a design suggestion based on the GSConv, slim-neck (SNs), to achieve a higher computational cost-effectiveness of the real-time detectors. The effectiveness of the SNs was robustly demonstrated in over twenty sets comparative experiments. In particular, the real-time detectors of ameliorated by the SNs obtain the state-of-the-art (70.9% AP₅₀ for the SODA10M at a speed of ~ 100 FPS on a Tesla T4) compared with the baselines. Code is available at https://github.com/alanli1997/slim-neck-by-gsconv.

实时物体检测对工业和研究领域意义重大。在边缘设备上，巨大的模型难以达到实时检测的要求，而由大量深度可分离卷积建立的轻量级模型也无法达到足够的精度。我们引入了一种新的轻量级卷积技术--GSConv，在保持精度的同时减轻了模型的重量。GSConv 在精度和速度之间实现了很好的权衡。此外，我们还在 GSConv 的基础上提出了一种设计建议--细颈（SNs），以提高实时探测器的计算成本效益。在二十多组对比实验中，SNs 的有效性得到了有力证明。特别是，与基线相比，SNs 改进的实时探测器达到了最先进水平（SODA10M 在特斯拉 T4 上以约 100 FPS 的速度运行时，AP50 为 70.9%）。代码见 https://github.com/alanli1997/slim-neck-by-gsconv。

引用次数: 0

FPGA-SoC implementation of YOLOv4 for flying-object detection 用于飞行物探测的 YOLOv4 FPGA-SoC 实现

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Real-Time Image Processing

Pub Date : 2024-03-29 DOI: 10.1007/s11554-024-01440-w

Dai-Duong Nguyen, Dang-Tuan Nguyen, Minh-Thuy Le, Quoc-Cuong Nguyen

Flying-object detection has become an increasingly attractive avenue for research, particularly with the rising prevalence of unmanned aerial vehicle (UAV). Utilizing deep learning methods offers an effective means of detection with high accuracy. Meanwhile, the demand to implement deep learning models on embedded devices is growing, fueled by the requirement for capabilities that are both real-time and power efficient. FPGA have emerged as the optimal choice for its parallelism, flexibility and energy efficiency. In this paper, we propose an FPGA-based design for YOLOv4 network to address the problem of flying-object detection. Our proposed design explores and provides a suitable solution for overcoming the challenge of limited floating-point resources while maintaining the accuracy and obtain real-time performance and energy efficiency. We have generated an appropriate dataset of flying objects for implementing, training and fine-tuning the network parameters base on this dataset, and then changing some suitable components in the YOLO networks to fit for the deployment on FPGA. Our experiments in Xilinx ZCU104 development kit show that with our implementation, the accuracy is competitive with the original model running on CPU and GPU despite the process of format conversion and model quantization. In terms of speed, the FPGA implementation with the ZCU104 kit is inferior to the ultra high-end GPU, the RTX 2080Ti, but outperforms the GTX 1650. In terms of power consumption, the FPGA implementation is significantly lower than the GPU GTX 1650 about 3 times and about 7 times lower than RTX 2080Ti. In terms of energy efficiency, FPGA is completely superior to GPU with 2–3 times more efficient than the RTX 2080Ti and 3–4 times that of the GTX 1650.

飞行物检测已成为一个越来越有吸引力的研究方向，特别是随着无人驾驶飞行器（UAV）的日益普及。利用深度学习方法提供了一种高精度检测的有效手段。与此同时，在嵌入式设备上实现深度学习模型的需求也在不断增长，这主要是由于对实时性和功耗效率的要求。FPGA 因其并行性、灵活性和能效而成为最佳选择。在本文中，我们提出了一种基于 FPGA 的 YOLOv4 网络设计，以解决飞行物检测问题。我们提出的设计探索并提供了一种合适的解决方案，既克服了浮点资源有限的挑战，又保持了精度，并获得了实时性能和能效。我们生成了一个合适的飞行物数据集，在此基础上实现、训练和微调网络参数，然后改变 YOLO 网络中的一些合适组件，以适应在 FPGA 上的部署。我们在 Xilinx ZCU104 开发套件中进行的实验表明，尽管需要进行格式转换和模型量化，但我们的实现方法与在 CPU 和 GPU 上运行的原始模型相比，精度具有竞争力。在速度方面，使用 ZCU104 套件的 FPGA 实现不如超高端 GPU RTX 2080Ti，但优于 GTX 1650。在功耗方面，FPGA 实现比 GPU GTX 1650 低约 3 倍，比 RTX 2080Ti 低约 7 倍。在能效方面，FPGA 完全优于 GPU，能效是 RTX 2080Ti 的 2-3 倍，是 GTX 1650 的 3-4 倍。

{"title":"FPGA-SoC implementation of YOLOv4 for flying-object detection","authors":"Dai-Duong Nguyen, Dang-Tuan Nguyen, Minh-Thuy Le, Quoc-Cuong Nguyen","doi":"10.1007/s11554-024-01440-w","DOIUrl":"https://doi.org/10.1007/s11554-024-01440-w","url":null,"abstract":"Flying-object detection has become an increasingly attractive avenue for research, particularly with the rising prevalence of unmanned aerial vehicle (UAV). Utilizing deep learning methods offers an effective means of detection with high accuracy. Meanwhile, the demand to implement deep learning models on embedded devices is growing, fueled by the requirement for capabilities that are both real-time and power efficient. FPGA have emerged as the optimal choice for its parallelism, flexibility and energy efficiency. In this paper, we propose an FPGA-based design for YOLOv4 network to address the problem of flying-object detection. Our proposed design explores and provides a suitable solution for overcoming the challenge of limited floating-point resources while maintaining the accuracy and obtain real-time performance and energy efficiency. We have generated an appropriate dataset of flying objects for implementing, training and fine-tuning the network parameters base on this dataset, and then changing some suitable components in the YOLO networks to fit for the deployment on FPGA. Our experiments in Xilinx ZCU104 development kit show that with our implementation, the accuracy is competitive with the original model running on CPU and GPU despite the process of format conversion and model quantization. In terms of speed, the FPGA implementation with the ZCU104 kit is inferior to the ultra high-end GPU, the RTX 2080Ti, but outperforms the GTX 1650. In terms of power consumption, the FPGA implementation is significantly lower than the GPU GTX 1650 about 3 times and about 7 times lower than RTX 2080Ti. In terms of energy efficiency, FPGA is completely superior to GPU with 2–3 times more efficient than the RTX 2080Ti and 3–4 times that of the GTX 1650.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"130 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140324535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

LSDNet: a lightweight ship detection network with improved YOLOv7 LSDNet：改进了 YOLOv7 的轻量级船舶探测网络

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Real-Time Image Processing

Pub Date : 2024-03-27 DOI: 10.1007/s11554-024-01441-9

Cui Lang, Xiaoyan Yu, Xianwei Rong

Accurate ship detection is critical for maritime transportation security. Current deep learning-based object detection algorithms have made marked progress in detection accuracy. However, these models are too heavy to be applied in mobile or embedded devices with limited resources. Thus, this paper proposes a lightweight convolutional neural network shortened as LSDNet for mobile ship detection. In the proposed model, we introduce Partial Convolution into YOLOv7-tiny to reduce its parameter and computational complexity. Meanwhile, GhostConv is introduced to further achieve lightweight structure and improve detection performance. In addition, we use Mosaic-9 data-augmentation method to enhance the robustness of the model. We compared the proposed LSDNet with other approaches on a publicly available ship dataset, SeaShips7000. The experimental results show that LSDNet achieves higher accuracy than other models with less computational cost and parameters. The test results also suggest that the proposed model can meet the requirements of real-time applications.

准确的船舶检测对海上运输安全至关重要。目前基于深度学习的物体检测算法在检测精度方面取得了显著进步。然而，这些模型过于笨重，无法应用于资源有限的移动或嵌入式设备。因此，本文提出了一种用于移动船舶检测的轻量级卷积神经网络，简称 LSDNet。在提出的模型中，我们在 YOLOv7-tiny 中引入了部分卷积，以降低其参数和计算复杂度。同时，引入 GhostConv 以进一步实现轻量级结构并提高检测性能。此外，我们还使用了 Mosaic-9 数据增强方法来提高模型的鲁棒性。我们在公开的船舶数据集 SeaShips7000 上比较了所提出的 LSDNet 和其他方法。实验结果表明，与其他模型相比，LSDNet 以更少的计算成本和参数获得了更高的精度。测试结果还表明，所提出的模型可以满足实时应用的要求。

引用次数: 0

FE-YOLO: YOLO ship detection algorithm based on feature fusion and feature enhancement FE-YOLO：基于特征融合和特征增强的 YOLO 船舶探测算法

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Real-Time Image Processing

Pub Date : 2024-03-27 DOI: 10.1007/s11554-024-01445-5

Shouwen Cai, Hao Meng, Junbao Wu

The technology for detecting maritime targets is crucial for realizing ship intelligence. However, traditional detection algorithms are not ideal due to the diversity of marine targets and complex background environments. Therefore, we choose YOLOv7 as the baseline and propose an end-to-end feature fusion and feature enhancement YOLO (FE-YOLO). First, we introduce channel attention and lightweight Ghostconv into the extended efficient layer aggregation network of YOLOv7, resulting in the improved extended efficient layer aggregation network (IELAN) module. This improvement enables the model to capture context information better and thus enhance the target features. Second, to enhance the network’s feature fusion capability, we design the light spatial pyramid pooling combined with the spatial channel pooling (LSPPCSPC) module and the coordinate attention feature pyramid network (CA-FPN). Furthermore, we develop an N-Loss based on normalized Wasserstein distance (NWD), effectively addressing the class imbalance issue in the ship dataset. Experimental results on the open-source Singapore maritime dataset (SMD) and SeaShips dataset demonstrate that compared to the baseline YOLOv7, FE-YOLO achieves an increase of 4.6% and 3.3% in detection accuracy, respectively.

探测海洋目标的技术对于实现船舶智能至关重要。然而，由于海洋目标的多样性和复杂的背景环境，传统的检测算法并不理想。因此，我们以 YOLOv7 为基准，提出了端到端的特征融合和特征增强 YOLO（FE-YOLO）。首先，我们在 YOLOv7 的扩展高效层聚合网络中引入信道注意和轻量级 Ghostconv，形成改进的扩展高效层聚合网络（IELAN）模块。这一改进使模型能够更好地捕捉上下文信息，从而增强目标特征。其次，为了增强网络的特征融合能力，我们设计了轻空间金字塔池与空间通道池相结合（LSPPCSPC）模块和坐标注意特征金字塔网络（CA-FPN）。此外，我们还开发了基于归一化瓦瑟斯坦距离（NWD）的 N-Loss，有效解决了船舶数据集中的类不平衡问题。开源新加坡海事数据集（SMD）和 SeaShips 数据集的实验结果表明，与基线 YOLOv7 相比，FE-YOLO 的检测准确率分别提高了 4.6% 和 3.3%。

{"title":"FE-YOLO: YOLO ship detection algorithm based on feature fusion and feature enhancement","authors":"Shouwen Cai, Hao Meng, Junbao Wu","doi":"10.1007/s11554-024-01445-5","DOIUrl":"https://doi.org/10.1007/s11554-024-01445-5","url":null,"abstract":"The technology for detecting maritime targets is crucial for realizing ship intelligence. However, traditional detection algorithms are not ideal due to the diversity of marine targets and complex background environments. Therefore, we choose YOLOv7 as the baseline and propose an end-to-end feature fusion and feature enhancement YOLO (FE-YOLO). First, we introduce channel attention and lightweight Ghostconv into the extended efficient layer aggregation network of YOLOv7, resulting in the improved extended efficient layer aggregation network (IELAN) module. This improvement enables the model to capture context information better and thus enhance the target features. Second, to enhance the network’s feature fusion capability, we design the light spatial pyramid pooling combined with the spatial channel pooling (LSPPCSPC) module and the coordinate attention feature pyramid network (CA-FPN). Furthermore, we develop an N-Loss based on normalized Wasserstein distance (NWD), effectively addressing the class imbalance issue in the ship dataset. Experimental results on the open-source Singapore maritime dataset (SMD) and SeaShips dataset demonstrate that compared to the baseline YOLOv7, FE-YOLO achieves an increase of 4.6% and 3.3% in detection accuracy, respectively.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"76 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140311511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

IPCRGC-YOLOv7: face mask detection algorithm based on improved partial convolution and recursive gated convolution IPCRGC-YOLOv7：基于改进的部分卷积和递归门控卷积的人脸面具检测算法

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Real-Time Image Processing

Pub Date : 2024-03-26 DOI: 10.1007/s11554-024-01448-2

Huaping Zhou, Anpei Dang, Kelei Sun

In complex scenarios, current detection algorithms often face challenges such as misdetection and omission when identifying irregularities in pedestrian mask wearing. This paper introduces an enhanced detection method called IPCRGC-YOLOv7 (Improved Partial Convolution Recursive Gate Convolution-YOLOv7) as a solution. Firstly, we integrate the Partial Convolution structure into the backbone network to effectively reduce the number of model parameters. To address the problem of vanishing training gradients, we utilize the residual connection structure derived from the RepVGG network. Additionally, we introduce an efficient aggregation module, PRE-ELAN (Partially Representative Efficiency-ELAN), to replace the original Efficient Long-Range Attention Network (ELAN) structure. Next, we improve the Cross Stage Partial Network (CSPNet) module by incorporating recursive gated convolution. Introducing a new module called CSPNRGC (Cross Stage Partial Network Recursive Gated Convolution), we replace the ELAN structure in the Neck part. This enhancement allows us to achieve higher order spatial interactions across different network hierarchies. Lastly, in the loss function component, we replace the original cross-entropy loss function with Efficient-IoU to enhance loss calculation accuracy. To address the challenge of balancing the contributions of high-quality and low-quality sample weights in the loss, we propose a new loss function called Wise-EIoU (Wise-Efficient IoU). The experimental results show that the IPCRGC-YOLOv7 algorithm improves accuracy by 4.71%, recall by 5.94%, mean Average Precision (mAP@0.5) by 2.9%, and mAP@.5:.95 by 2.7% when compared to the original YOLOv7 algorithm, which can meet the requirements for mask wearing detection accuracy in practical application scenarios.

在复杂场景中，当前的检测算法在识别行人面具佩戴的不规则性时经常面临误检和漏检等挑战。本文提出了一种名为 IPCRGC-YOLOv7（改进的部分卷积递归门卷积-YOLOv7）的增强型检测方法作为解决方案。首先，我们将部分卷积结构集成到主干网络中，有效减少了模型参数的数量。为了解决训练梯度消失的问题，我们利用了 RepVGG 网络衍生的残差连接结构。此外，我们还引入了高效聚合模块 PRE-ELAN（Partially Representative Efficiency-ELAN），以取代原有的高效长距离注意力网络（ELAN）结构。接下来，我们改进了跨阶段部分网络（CSPNet）模块，加入了递归门控卷积。我们引入了一个名为 CSPNRGC（跨阶段部分网络递归门控卷积）的新模块，取代了颈部的 ELAN 结构。这一改进使我们能够在不同的网络层次中实现更高阶的空间交互。最后，在损失函数部分，我们用 Efficient-IoU 取代了原来的交叉熵损失函数，以提高损失计算的准确性。为了解决平衡高质量和低质量样本权重在损失中的贡献这一难题，我们提出了一种新的损失函数，称为 Wise-EIoU（Wise-Efficient IoU）。实验结果表明，与原始 YOLOv7 算法相比，IPCRGC-YOLOv7 算法的准确率提高了 4.71%，召回率提高了 5.94%，平均精度（mAP@0.5）提高了 2.9%，mAP@.5:.95 提高了 2.7%，可以满足实际应用场景中对口罩佩戴检测准确率的要求。

{"title":"IPCRGC-YOLOv7: face mask detection algorithm based on improved partial convolution and recursive gated convolution","authors":"Huaping Zhou, Anpei Dang, Kelei Sun","doi":"10.1007/s11554-024-01448-2","DOIUrl":"https://doi.org/10.1007/s11554-024-01448-2","url":null,"abstract":"In complex scenarios, current detection algorithms often face challenges such as misdetection and omission when identifying irregularities in pedestrian mask wearing. This paper introduces an enhanced detection method called IPCRGC-YOLOv7 (Improved Partial Convolution Recursive Gate Convolution-YOLOv7) as a solution. Firstly, we integrate the Partial Convolution structure into the backbone network to effectively reduce the number of model parameters. To address the problem of vanishing training gradients, we utilize the residual connection structure derived from the RepVGG network. Additionally, we introduce an efficient aggregation module, PRE-ELAN (Partially Representative Efficiency-ELAN), to replace the original Efficient Long-Range Attention Network (ELAN) structure. Next, we improve the Cross Stage Partial Network (CSPNet) module by incorporating recursive gated convolution. Introducing a new module called CSPNRGC (Cross Stage Partial Network Recursive Gated Convolution), we replace the ELAN structure in the Neck part. This enhancement allows us to achieve higher order spatial interactions across different network hierarchies. Lastly, in the loss function component, we replace the original cross-entropy loss function with Efficient-IoU to enhance loss calculation accuracy. To address the challenge of balancing the contributions of high-quality and low-quality sample weights in the loss, we propose a new loss function called Wise-EIoU (Wise-Efficient IoU). The experimental results show that the IPCRGC-YOLOv7 algorithm improves accuracy by 4.71%, recall by 5.94%, mean Average Precision (mAP@0.5) by 2.9%, and mAP@.5:.95 by 2.7% when compared to the original YOLOv7 algorithm, which can meet the requirements for mask wearing detection accuracy in practical application scenarios.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"128 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140311506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An end-to-end framework for real-time violent behavior detection based on 2D CNNs 基于二维 CNN 的端到端暴力行为实时检测框架

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Real-Time Image Processing

Pub Date : 2024-03-25 DOI: 10.1007/s11554-024-01443-7

Peng Zhang, Lijia Dong, Xinlei Zhao, Weimin Lei, Wei Zhang

Violent behavior detection (VioBD), as a special action recognition task, aims to detect violent behaviors in videos, such as mutual fighting and assault. Some progress has been made in the research of violence detection, but the existing methods have poor real-time performance and the algorithm performance is limited by the interference of complex backgrounds and the occlusion of dense crowds. To solve the above problems, we propose an end-to-end real-time violence detection framework based on 2D CNNs. First, we propose a lightweight skeletal image (SI) as the input modality, which can obtain the human body posture information and richer contextual information, and at the same time remove the background interference. As tested, at the same accuracy, the resolution of SI modality is only one-third of that of RGB modality, which greatly improves the real-time performance of model training and inference, and at the same resolution, SI modality has higher inaccuracy. Second, we also design a parallel prediction module (PPM), which can simultaneously obtain the single image detection results and the inter-frame motion information of the video, which can improve the real-time performance of the algorithm compared with the traditional “detect the image first, understand the video later" mode. In addition, we propose an auxiliary parameter generation module (APGM) with both efficiency and accuracy, APGM is a 2D CNNs-based video understanding module for weighting the spatial information of the video features, processing speed can reach 30–40 frames per second, and compared with models such as CNN-LSTM (Iqrar et al., Aamir: Cnn-lstm based smart real-time video surveillance system. In: 2022 14th International Conference on Mathematics, Actuarial, Science, Computer Science and Statistics (MACS), pages 1–5. IEEE, 2022) and Ludl et al. (Cristóbal: Simple yet efficient real-time pose-based action recognition. In: 2019 IEEE Intelligent Transportation Systems Conference (ITSC), pages 581–588. IEEE, 1999), the propagation effect speed can be increased by an average of (3 sim 20) frames per second per group of clips, which further improves the video motion detection efficiency and accuracy, greatly improving real-time performance. We conducted experiments on some challenging benchmarks, and RVBDN can maintain excellent speed and accuracy in long-term interactions, and are able to meet real-time requirements in methods for violence detection and spatio-temporal action detection. Finally, we update our proposed new dataset on violence detection images (violence image dataset). Dataset is available at https://github.com/ChinaZhangPeng/Violence-Image-Dataset

暴力行为检测（Violent behavior detection，VioBD）作为一种特殊的动作识别任务，旨在检测视频中的暴力行为，如相互厮打和攻击。暴力检测的研究已经取得了一些进展，但现有方法的实时性较差，算法性能受到复杂背景干扰和密集人群遮挡的限制。为了解决上述问题，我们提出了一种基于二维 CNN 的端到端实时暴力检测框架。首先，我们提出了一种轻量级骨骼图像（SI）作为输入模态，它可以获取人体姿态信息和更丰富的上下文信息，同时还能去除背景干扰。经测试，在相同精度下，SI 模态的分辨率仅为 RGB 模态的三分之一，大大提高了模型训练和推理的实时性，而在相同分辨率下，SI 模态的误差率更高。其次，我们还设计了并行预测模块（PPM），可以同时获得单幅图像的检测结果和视频的帧间运动信息，与传统的 "先检测图像，后理解视频 "模式相比，可以提高算法的实时性。此外，我们还提出了兼具效率和精度的辅助参数生成模块（APGM），APGM 是基于二维 CNNs 的视频理解模块，用于对视频特征的空间信息进行加权处理，处理速度可达每秒 30-40 帧，与 CNN-LSTM 等模型相比（Iqrar et al：基于 CNN-LSTM 的智能实时视频监控系统。In: 2022 14th International Conference on Mathematics, Actuarial, Science, Computer Science and Statistics (MACS), pages 1-5.IEEE，2022）和 Ludl 等人（Cristóbal：简单高效的基于姿势的实时动作识别。In: 2019 IEEE Intelligent Transportation Systems Conference (ITSC)，pages 581-588.IEEE，1999），每组剪辑的传播效应速度平均每秒可提高 (3 sim 20) 帧，这进一步提高了视频运动检测的效率和准确性，大大提高了实时性。我们在一些具有挑战性的基准上进行了实验，RVBDN 在长期交互中能够保持出色的速度和精度，在暴力检测和时空动作检测方法中能够满足实时性要求。最后，我们更新了新提出的暴力检测图像数据集（暴力图像数据集）。数据集可从以下网址获取：https://github.com/ChinaZhangPeng/Violence-Image-Dataset

{"title":"An end-to-end framework for real-time violent behavior detection based on 2D CNNs","authors":"Peng Zhang, Lijia Dong, Xinlei Zhao, Weimin Lei, Wei Zhang","doi":"10.1007/s11554-024-01443-7","DOIUrl":"https://doi.org/10.1007/s11554-024-01443-7","url":null,"abstract":"Violent behavior detection (VioBD), as a special action recognition task, aims to detect violent behaviors in videos, such as mutual fighting and assault. Some progress has been made in the research of violence detection, but the existing methods have poor real-time performance and the algorithm performance is limited by the interference of complex backgrounds and the occlusion of dense crowds. To solve the above problems, we propose an end-to-end real-time violence detection framework based on 2D CNNs. First, we propose a lightweight skeletal image (SI) as the input modality, which can obtain the human body posture information and richer contextual information, and at the same time remove the background interference. As tested, at the same accuracy, the resolution of SI modality is only one-third of that of RGB modality, which greatly improves the real-time performance of model training and inference, and at the same resolution, SI modality has higher inaccuracy. Second, we also design a parallel prediction module (PPM), which can simultaneously obtain the single image detection results and the inter-frame motion information of the video, which can improve the real-time performance of the algorithm compared with the traditional “detect the image first, understand the video later\" mode. In addition, we propose an auxiliary parameter generation module (APGM) with both efficiency and accuracy, APGM is a 2D CNNs-based video understanding module for weighting the spatial information of the video features, processing speed can reach 30–40 frames per second, and compared with models such as CNN-LSTM (Iqrar et al., Aamir: Cnn-lstm based smart real-time video surveillance system. In: 2022 14th International Conference on Mathematics, Actuarial, Science, Computer Science and Statistics (MACS), pages 1–5. IEEE, 2022) and Ludl et al. (Cristóbal: Simple yet efficient real-time pose-based action recognition. In: 2019 IEEE Intelligent Transportation Systems Conference (ITSC), pages 581–588. IEEE, 1999), the propagation effect speed can be increased by an average of (3 sim 20) frames per second per group of clips, which further improves the video motion detection efficiency and accuracy, greatly improving real-time performance. We conducted experiments on some challenging benchmarks, and RVBDN can maintain excellent speed and accuracy in long-term interactions, and are able to meet real-time requirements in methods for violence detection and spatio-temporal action detection. Finally, we update our proposed new dataset on violence detection images (violence image dataset). Dataset is available at https://github.com/ChinaZhangPeng/Violence-Image-Dataset","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"27 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140302668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0