Learning-based compression of visual objects for smart surveillance

2022 Eleventh International Conference on Image Processing Theory, Tools and Applications (IPTA) Pub Date : 2022-04-19 DOI:10.1109/IPTA54936.2022.9784147

Ruben Antonio, S. Faria, Luis M. N. Tavora, A. Navarro, P. Assunção

{"title":"Learning-based compression of visual objects for smart surveillance","authors":"Ruben Antonio, S. Faria, Luis M. N. Tavora, A. Navarro, P. Assunção","doi":"10.1109/IPTA54936.2022.9784147","DOIUrl":null,"url":null,"abstract":"Advanced video applications in smart environments (e.g., smart cities) bring different challenges associated with increasingly intelligent systems and demanding requirements in emerging fields such as urban surveillance, computer vision in industry, medicine and others. As a consequence, a huge amount of visual data is captured to be analyzed by task-algorithm driven machines. In this context, this paper proposes an efficient learning-based approach to compress relevant visual objects, captured in surveillance contexts and delivered for machine vision processing. An object-based compression scheme is devised, comprising multiple autoencoders, each one optimised to produce an efficient latent representation of a corresponding object class. The performance of the proposed approach is evaluated with two types of visual objects: persons and faces and two task-algorithms: class identification and object recognition, besides traditional image quality metrics like PSNR and VMAF. In comparison with the Versatile Video Coding (VVC) standard, the proposed approach achieves significantly better coding efficiency than the VVC, e.g., up to 46.7% BD-rate reduction. The accuracy of the machine vision tasks is also significantly higher when performed over visual objects compressed with the proposed scheme in comparison with the same tasks performed over the same visual objects compressed with the VVC. These results demonstrate that the learning-based approach proposed in this paper is a more efficient solution for compression of visual objects than standard encoding.","PeriodicalId":381729,"journal":{"name":"2022 Eleventh International Conference on Image Processing Theory, Tools and Applications (IPTA)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 Eleventh International Conference on Image Processing Theory, Tools and Applications (IPTA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPTA54936.2022.9784147","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Advanced video applications in smart environments (e.g., smart cities) bring different challenges associated with increasingly intelligent systems and demanding requirements in emerging fields such as urban surveillance, computer vision in industry, medicine and others. As a consequence, a huge amount of visual data is captured to be analyzed by task-algorithm driven machines. In this context, this paper proposes an efficient learning-based approach to compress relevant visual objects, captured in surveillance contexts and delivered for machine vision processing. An object-based compression scheme is devised, comprising multiple autoencoders, each one optimised to produce an efficient latent representation of a corresponding object class. The performance of the proposed approach is evaluated with two types of visual objects: persons and faces and two task-algorithms: class identification and object recognition, besides traditional image quality metrics like PSNR and VMAF. In comparison with the Versatile Video Coding (VVC) standard, the proposed approach achieves significantly better coding efficiency than the VVC, e.g., up to 46.7% BD-rate reduction. The accuracy of the machine vision tasks is also significantly higher when performed over visual objects compressed with the proposed scheme in comparison with the same tasks performed over the same visual objects compressed with the VVC. These results demonstrate that the learning-based approach proposed in this paper is a more efficient solution for compression of visual objects than standard encoding.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于学习的智能监控视觉对象压缩

智能环境(例如，智能城市)中的高级视频应用带来了与日益智能的系统和新兴领域(如城市监控，工业计算机视觉，医学等)的苛刻要求相关的不同挑战。因此，大量的视觉数据被捕获，并由任务算法驱动的机器进行分析。在此背景下，本文提出了一种高效的基于学习的方法来压缩相关的视觉对象，这些对象在监视环境中捕获并交付给机器视觉处理。设计了一种基于对象的压缩方案，包括多个自动编码器，每个编码器都经过优化以产生相应对象类的有效潜在表示。除了传统的图像质量指标(如PSNR和VMAF)外，还使用两种类型的视觉对象(人和面孔)以及两种任务算法(类识别和对象识别)来评估该方法的性能。与通用视频编码(VVC)标准相比，该方法的编码效率明显高于VVC标准，可将bd率降低46.7%。与使用VVC压缩的相同视觉对象执行相同的任务相比，使用该方案压缩的视觉对象执行相同的任务时，机器视觉任务的准确性也显着更高。这些结果表明，本文提出的基于学习的方法是一种比标准编码更有效的视觉对象压缩解决方案。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2022 Eleventh International Conference on Image Processing Theory, Tools and Applications (IPTA)

自引率

0.00%

发文量