Learning-based compression of visual objects for smart surveillance

Ruben Antonio, S. Faria, Luis M. N. Tavora, A. Navarro, P. Assunção
{"title":"Learning-based compression of visual objects for smart surveillance","authors":"Ruben Antonio, S. Faria, Luis M. N. Tavora, A. Navarro, P. Assunção","doi":"10.1109/IPTA54936.2022.9784147","DOIUrl":null,"url":null,"abstract":"Advanced video applications in smart environments (e.g., smart cities) bring different challenges associated with increasingly intelligent systems and demanding requirements in emerging fields such as urban surveillance, computer vision in industry, medicine and others. As a consequence, a huge amount of visual data is captured to be analyzed by task-algorithm driven machines. In this context, this paper proposes an efficient learning-based approach to compress relevant visual objects, captured in surveillance contexts and delivered for machine vision processing. An object-based compression scheme is devised, comprising multiple autoencoders, each one optimised to produce an efficient latent representation of a corresponding object class. The performance of the proposed approach is evaluated with two types of visual objects: persons and faces and two task-algorithms: class identification and object recognition, besides traditional image quality metrics like PSNR and VMAF. In comparison with the Versatile Video Coding (VVC) standard, the proposed approach achieves significantly better coding efficiency than the VVC, e.g., up to 46.7% BD-rate reduction. The accuracy of the machine vision tasks is also significantly higher when performed over visual objects compressed with the proposed scheme in comparison with the same tasks performed over the same visual objects compressed with the VVC. These results demonstrate that the learning-based approach proposed in this paper is a more efficient solution for compression of visual objects than standard encoding.","PeriodicalId":381729,"journal":{"name":"2022 Eleventh International Conference on Image Processing Theory, Tools and Applications (IPTA)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 Eleventh International Conference on Image Processing Theory, Tools and Applications (IPTA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPTA54936.2022.9784147","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Advanced video applications in smart environments (e.g., smart cities) bring different challenges associated with increasingly intelligent systems and demanding requirements in emerging fields such as urban surveillance, computer vision in industry, medicine and others. As a consequence, a huge amount of visual data is captured to be analyzed by task-algorithm driven machines. In this context, this paper proposes an efficient learning-based approach to compress relevant visual objects, captured in surveillance contexts and delivered for machine vision processing. An object-based compression scheme is devised, comprising multiple autoencoders, each one optimised to produce an efficient latent representation of a corresponding object class. The performance of the proposed approach is evaluated with two types of visual objects: persons and faces and two task-algorithms: class identification and object recognition, besides traditional image quality metrics like PSNR and VMAF. In comparison with the Versatile Video Coding (VVC) standard, the proposed approach achieves significantly better coding efficiency than the VVC, e.g., up to 46.7% BD-rate reduction. The accuracy of the machine vision tasks is also significantly higher when performed over visual objects compressed with the proposed scheme in comparison with the same tasks performed over the same visual objects compressed with the VVC. These results demonstrate that the learning-based approach proposed in this paper is a more efficient solution for compression of visual objects than standard encoding.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于学习的智能监控视觉对象压缩
智能环境(例如,智能城市)中的高级视频应用带来了与日益智能的系统和新兴领域(如城市监控,工业计算机视觉,医学等)的苛刻要求相关的不同挑战。因此,大量的视觉数据被捕获,并由任务算法驱动的机器进行分析。在此背景下,本文提出了一种高效的基于学习的方法来压缩相关的视觉对象,这些对象在监视环境中捕获并交付给机器视觉处理。设计了一种基于对象的压缩方案,包括多个自动编码器,每个编码器都经过优化以产生相应对象类的有效潜在表示。除了传统的图像质量指标(如PSNR和VMAF)外,还使用两种类型的视觉对象(人和面孔)以及两种任务算法(类识别和对象识别)来评估该方法的性能。与通用视频编码(VVC)标准相比,该方法的编码效率明显高于VVC标准,可将bd率降低46.7%。与使用VVC压缩的相同视觉对象执行相同的任务相比,使用该方案压缩的视觉对象执行相同的任务时,机器视觉任务的准确性也显着更高。这些结果表明,本文提出的基于学习的方法是一种比标准编码更有效的视觉对象压缩解决方案。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Special Session 3: Visual Computing in Digital Humanities Complex Texture Features Learned by Applying Randomized Neural Network on Graphs AAEGAN Optimization by Purposeful Noise Injection for the Generation of Bright-Field Brain Organoid Images Towards Fast and Accurate Intimate Contact Recognition through Video Analysis Draco-Based Selective Crypto-Compression Method of 3D objects
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1