Recent Advances in End-to-End Learned Image and Video Compression

2020 IEEE International Conference on Visual Communications and Image Processing (VCIP) Pub Date : 2020-12-01 DOI:10.1109/VCIP49819.2020.9301753

Wen-Hsiao Peng, H. Hang

{"title":"Recent Advances in End-to-End Learned Image and Video Compression","authors":"Wen-Hsiao Peng, H. Hang","doi":"10.1109/VCIP49819.2020.9301753","DOIUrl":null,"url":null,"abstract":"The DCT-based transform coding technique was adopted by the international standards (ISO JPEG, ITU H.261/264/265, ISO MPEG-2/4/H, and many others) for nearly 30 years. Although researchers are still trying to improve its efficiency by fine-tuning its components and parameters, the basic structure has not changed in the past two decades.The deep learning technology recently developed may provide a new direction for constructing a high-compression image/video coding system. Recent results, particularly from the Challenge on Learned Image Compression (CLIC) at CVPR, indicate that this new type of schemes (often trained end-to-end) may have good potential for further improving compression efficiency.In the first part of this tutorial, we shall (1) summarize briefly the progress of this topic in the past 3 or so years, including an overview of CLIC results and JPEG AI Call-for-Evidence Challenge on Learning-based Image Coding (issued in early 2020). Because Deep Neural Network (DNN)-based image compression is a new area, several techniques and structures have been tested. The recently published autoencoder-based schemes can achieve similar PSNR to BPG (Better Portable Graphics, H.265 still image standard) and has superior subject quality (e.g., MSSSIM), especially at the very low bit rates. In the second part, we shall (2) address the detailed design concepts of image compression algorithms using the autoencoder structure. In the third part, we shall switch gears to (3) explore the emerging area of DNN-based video compression. Recent publications in this area have indicated that end-to-end trained video compression can achieve comparable or superior rate-distortion performance to HEVC/H.265. The CLIC at CVPR 2020 also created for the first time a new track dedicated to P-frame coding.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"205 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/VCIP49819.2020.9301753","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

The DCT-based transform coding technique was adopted by the international standards (ISO JPEG, ITU H.261/264/265, ISO MPEG-2/4/H, and many others) for nearly 30 years. Although researchers are still trying to improve its efficiency by fine-tuning its components and parameters, the basic structure has not changed in the past two decades.The deep learning technology recently developed may provide a new direction for constructing a high-compression image/video coding system. Recent results, particularly from the Challenge on Learned Image Compression (CLIC) at CVPR, indicate that this new type of schemes (often trained end-to-end) may have good potential for further improving compression efficiency.In the first part of this tutorial, we shall (1) summarize briefly the progress of this topic in the past 3 or so years, including an overview of CLIC results and JPEG AI Call-for-Evidence Challenge on Learning-based Image Coding (issued in early 2020). Because Deep Neural Network (DNN)-based image compression is a new area, several techniques and structures have been tested. The recently published autoencoder-based schemes can achieve similar PSNR to BPG (Better Portable Graphics, H.265 still image standard) and has superior subject quality (e.g., MSSSIM), especially at the very low bit rates. In the second part, we shall (2) address the detailed design concepts of image compression algorithms using the autoencoder structure. In the third part, we shall switch gears to (3) explore the emerging area of DNN-based video compression. Recent publications in this area have indicated that end-to-end trained video compression can achieve comparable or superior rate-distortion performance to HEVC/H.265. The CLIC at CVPR 2020 also created for the first time a new track dedicated to P-frame coding.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

端到端学习图像和视频压缩的最新进展

基于dct的变换编码技术被国际标准(ISO JPEG、ITU H.261/264/265、ISO MPEG-2/4/H等)采用了近30年。尽管研究人员仍在试图通过微调其组件和参数来提高其效率，但在过去的20年里，其基本结构没有改变。近年来发展起来的深度学习技术为构建高压缩图像/视频编码系统提供了新的方向。最近的结果，特别是来自CVPR的学习图像压缩挑战(CLIC)，表明这种新型方案(通常是端到端训练)可能具有进一步提高压缩效率的良好潜力。在本教程的第一部分中，我们将(1)简要总结过去3年左右该主题的进展，包括概述CLIC结果和JPEG AI基于学习的图像编码证据征集挑战(2020年初发布)。由于基于深度神经网络(Deep Neural Network, DNN)的图像压缩是一个新领域，因此已有几种技术和结构进行了测试。最近发布的基于自动编码器的方案可以实现与BPG(更好的便携式图形，H.265静止图像标准)相似的PSNR，并且具有优越的主体质量(例如，MSSSIM)，特别是在非常低的比特率下。在第二部分中，我们将(2)讨论使用自编码器结构的图像压缩算法的详细设计概念。在第三部分中，我们将切换到(3)探索基于dnn的视频压缩的新兴领域。该领域的最新出版物表明，端到端训练视频压缩可以实现与HEVC/H.265相当或更高的率失真性能。CVPR 2020上的CLIC还首次创建了一个专门用于p帧编码的新轨道。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)

自引率

0.00%

发文量

期刊最新文献

A Mixed Appearance-based and Coding Distortion-based CNN Fusion Approach for In-loop Filtering in Video Coding APL: Adaptive Preloading of Short Video with Lyapunov Optimization A Novel Visual Analysis Oriented Rate Control Scheme for HEVC A Theory of Occlusion for Improving Rendering Quality of Views A Progressive Fast CU Split Decision Scheme for AVS3