具有自适应时序先验和解码运动辅助质量增强功能的学习视频压缩技术

IF 5.2 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS ACM Transactions on Multimedia Computing Communications and Applications Pub Date : 2024-04-27 DOI:10.1145/3661824

Jiayu Yang, Chunhui Yang, Fei Xiong, Yongqi Zhai, Ronggang Wang

{"title":"具有自适应时序先验和解码运动辅助质量增强功能的学习视频压缩技术","authors":"Jiayu Yang, Chunhui Yang, Fei Xiong, Yongqi Zhai, Ronggang Wang","doi":"10.1145/3661824","DOIUrl":null,"url":null,"abstract":"<p>Learned video compression has drawn great attention and shown promising compression performance recently. In this paper, we focus on the two components in learned video compression framework, i.e., conditional entropy model and quality enhancement module, to improve compression performance. Specifically, we propose an adaptive spatial-temporal entropy model for image, motion and residual compression, which introduces temporal prior to reduce temporal redundancy of latents and an additional modulated mask to evaluate the similarity and perform refinement. Besides, a quality enhancement module is proposed for predicted frame and reconstructed frame to improve frame quality and reduce bitrate cost of residual coding. The module reuses decoded optical flow as motion prior and utilizes deformable convolution to mine high-quality information from reference frame in a bit-free manner. The two proposed coding tools are integrated into a pixel-domain residual-coding based compression framework to evaluate their effectiveness. Experimental results demonstrate that our framework achieves competitive compression performance in low-delay scenario, compared with recent learning-based methods and traditional H.265/HEVC in terms of PSNR and MS-SSIM. The code is available at OpenLVC.</p>","PeriodicalId":50937,"journal":{"name":"ACM Transactions on Multimedia Computing Communications and Applications","volume":"50 1","pages":""},"PeriodicalIF":5.2000,"publicationDate":"2024-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Learned Video Compression with Adaptive Temporal Prior and Decoded Motion-aided Quality Enhancement\",\"authors\":\"Jiayu Yang, Chunhui Yang, Fei Xiong, Yongqi Zhai, Ronggang Wang\",\"doi\":\"10.1145/3661824\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Learned video compression has drawn great attention and shown promising compression performance recently. In this paper, we focus on the two components in learned video compression framework, i.e., conditional entropy model and quality enhancement module, to improve compression performance. Specifically, we propose an adaptive spatial-temporal entropy model for image, motion and residual compression, which introduces temporal prior to reduce temporal redundancy of latents and an additional modulated mask to evaluate the similarity and perform refinement. Besides, a quality enhancement module is proposed for predicted frame and reconstructed frame to improve frame quality and reduce bitrate cost of residual coding. The module reuses decoded optical flow as motion prior and utilizes deformable convolution to mine high-quality information from reference frame in a bit-free manner. The two proposed coding tools are integrated into a pixel-domain residual-coding based compression framework to evaluate their effectiveness. Experimental results demonstrate that our framework achieves competitive compression performance in low-delay scenario, compared with recent learning-based methods and traditional H.265/HEVC in terms of PSNR and MS-SSIM. The code is available at OpenLVC.</p>\",\"PeriodicalId\":50937,\"journal\":{\"name\":\"ACM Transactions on Multimedia Computing Communications and Applications\",\"volume\":\"50 1\",\"pages\":\"\"},\"PeriodicalIF\":5.2000,\"publicationDate\":\"2024-04-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Multimedia Computing Communications and Applications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1145/3661824\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Multimedia Computing Communications and Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3661824","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

最近，学习视频压缩引起了广泛关注，并显示出良好的压缩性能。在本文中，我们将重点关注学习视频压缩框架中的两个组件，即条件熵模型和质量增强模块，以提高压缩性能。具体来说，我们提出了一种用于图像、运动和残差压缩的自适应时空熵模型，该模型引入了时间先验来减少潜变量的时间冗余，并引入了一个额外的调制掩码来评估相似性并进行细化。此外，还针对预测帧和重建帧提出了质量增强模块，以提高帧质量并降低残差编码的比特率成本。该模块重新使用解码光流作为运动先验，并利用可变形卷积以无比特方式从参考帧中挖掘高质量信息。为了评估这两种编码工具的有效性，我们将它们集成到一个基于像素域残差编码的压缩框架中。实验结果表明，就 PSNR 和 MS-SSIM 而言，与最新的基于学习的方法和传统 H.265/HEVC 相比，我们的框架在低延迟场景下实现了有竞争力的压缩性能。代码可在 OpenLVC 上获取。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Learned Video Compression with Adaptive Temporal Prior and Decoded Motion-aided Quality Enhancement

Learned video compression has drawn great attention and shown promising compression performance recently. In this paper, we focus on the two components in learned video compression framework, i.e., conditional entropy model and quality enhancement module, to improve compression performance. Specifically, we propose an adaptive spatial-temporal entropy model for image, motion and residual compression, which introduces temporal prior to reduce temporal redundancy of latents and an additional modulated mask to evaluate the similarity and perform refinement. Besides, a quality enhancement module is proposed for predicted frame and reconstructed frame to improve frame quality and reduce bitrate cost of residual coding. The module reuses decoded optical flow as motion prior and utilizes deformable convolution to mine high-quality information from reference frame in a bit-free manner. The two proposed coding tools are integrated into a pixel-domain residual-coding based compression framework to evaluate their effectiveness. Experimental results demonstrate that our framework achieves competitive compression performance in low-delay scenario, compared with recent learning-based methods and traditional H.265/HEVC in terms of PSNR and MS-SSIM. The code is available at OpenLVC.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM Transactions on Multimedia Computing Communications and Applications 工程技术-计算机：理论方法

CiteScore

8.50

自引率

5.90%

发文量

285

审稿时长

7.5 months

期刊介绍： The ACM Transactions on Multimedia Computing, Communications, and Applications is the flagship publication of the ACM Special Interest Group in Multimedia (SIGMM). It is soliciting paper submissions on all aspects of multimedia. Papers on single media (for instance, audio, video, animation) and their processing are also welcome. TOMM is a peer-reviewed, archival journal, available in both print form and digital form. The Journal is published quarterly; with roughly 7 23-page articles in each issue. In addition, all Special Issues are published online-only to ensure a timely publication. The transactions consists primarily of research papers. This is an archival journal and it is intended that the papers will have lasting importance and value over time. In general, papers whose primary focus is on particular multimedia products or the current state of the industry will not be included.