基于外观和编码失真的CNN融合视频编码环内滤波方法

2020 IEEE International Conference on Visual Communications and Image Processing (VCIP) Pub Date : 2020-12-01 DOI:10.1109/VCIP49819.2020.9301895

Jian Yue, Yanbo Gao, Shuai Li, Menghu Jia

{"title":"基于外观和编码失真的CNN融合视频编码环内滤波方法","authors":"Jian Yue, Yanbo Gao, Shuai Li, Menghu Jia","doi":"10.1109/VCIP49819.2020.9301895","DOIUrl":null,"url":null,"abstract":"With the success of the convolutional neural networks (CNNs) in image denoising and other computer vision tasks, CNNs have been investigated for in-loop filtering in video coding. Many existing methods directly use CNNs as powerful tools for filtering without much analysis on its effect. Considering the in-loop filters process the reconstructed video frames produced from a fixed line of video coding operations, the coding distortion in the reconstructed frames may share similar properties that can be learned by CNNs in addition to being a noisy image. Therefore, in this paper, we first categorize the CNN based filtering into two types of processes: appearance-based CNN filtering and coding distortion-based CNN filtering, and develop a two-stream CNN fusion framework accordingly. In the appearance-based CNN filtering, a CNN processes the reconstructed frame as a distorted image and extracts the global appearance information to restore the original image. In order to extract the global information, a CNN with pooling is used first to increase the receptive field and up-sampling is added in the late stage to produce pixel-level frame information. On the contrary, in the coding distortion-based filtering, a CNN processes the reconstructed frame as blocks with certain types of distortions by focusing on the local information to learn the coding distortion resulted by the fixed video coding pipeline. Finally, the appearance-based filtering stream and the coding distortion-based filtering stream are fused together to combine the two aspects of CNN filtering, and also the global and local information. To further reduce the complexity, the similar initial and last convolutional layers are shared over two streams to generate a mixed CNN. Experiments demonstrate that the proposed method achieves better performance than the existing CNN-based filtering methods, with 11.26% BD-rate saving under the All Intra configuration.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"394 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"A Mixed Appearance-based and Coding Distortion-based CNN Fusion Approach for In-loop Filtering in Video Coding\",\"authors\":\"Jian Yue, Yanbo Gao, Shuai Li, Menghu Jia\",\"doi\":\"10.1109/VCIP49819.2020.9301895\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the success of the convolutional neural networks (CNNs) in image denoising and other computer vision tasks, CNNs have been investigated for in-loop filtering in video coding. Many existing methods directly use CNNs as powerful tools for filtering without much analysis on its effect. Considering the in-loop filters process the reconstructed video frames produced from a fixed line of video coding operations, the coding distortion in the reconstructed frames may share similar properties that can be learned by CNNs in addition to being a noisy image. Therefore, in this paper, we first categorize the CNN based filtering into two types of processes: appearance-based CNN filtering and coding distortion-based CNN filtering, and develop a two-stream CNN fusion framework accordingly. In the appearance-based CNN filtering, a CNN processes the reconstructed frame as a distorted image and extracts the global appearance information to restore the original image. In order to extract the global information, a CNN with pooling is used first to increase the receptive field and up-sampling is added in the late stage to produce pixel-level frame information. On the contrary, in the coding distortion-based filtering, a CNN processes the reconstructed frame as blocks with certain types of distortions by focusing on the local information to learn the coding distortion resulted by the fixed video coding pipeline. Finally, the appearance-based filtering stream and the coding distortion-based filtering stream are fused together to combine the two aspects of CNN filtering, and also the global and local information. To further reduce the complexity, the similar initial and last convolutional layers are shared over two streams to generate a mixed CNN. Experiments demonstrate that the proposed method achieves better performance than the existing CNN-based filtering methods, with 11.26% BD-rate saving under the All Intra configuration.\",\"PeriodicalId\":431880,\"journal\":{\"name\":\"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)\",\"volume\":\"394 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/VCIP49819.2020.9301895\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/VCIP49819.2020.9301895","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

随着卷积神经网络在图像去噪和其他计算机视觉任务中的成功，人们开始研究卷积神经网络在视频编码中的环内滤波。许多现有的方法直接使用cnn作为强大的过滤工具，而没有对其效果进行过多的分析。考虑到环内滤波器处理由固定行视频编码操作产生的重构视频帧，重构帧中的编码失真除了是噪声图像外，可能具有类似cnn可以学习的属性。因此，在本文中，我们首先将基于CNN的滤波分为两类过程:基于外观的CNN滤波和基于编码失真的CNN滤波，并据此开发了两流CNN融合框架。在基于外观的CNN滤波中，CNN将重构后的帧作为失真图像处理，提取全局外观信息恢复原始图像。为了提取全局信息，首先使用带池化的CNN来增加接受域，然后在后期增加上采样来产生像素级的帧信息。相反，在基于编码失真的滤波中，CNN通过聚焦局部信息，将重构后的帧处理为具有一定失真类型的块，学习固定视频编码管道造成的编码失真。最后，将基于外观的滤波流和基于编码失真的滤波流融合在一起，将CNN滤波的两个方面结合起来，将全局信息和局部信息结合起来。为了进一步降低复杂性，相似的初始和最后卷积层在两个流上共享以生成混合CNN。实验表明，与现有的基于cnn的滤波方法相比，该方法在All Intra配置下可节省11.26%的BD-rate。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

A Mixed Appearance-based and Coding Distortion-based CNN Fusion Approach for In-loop Filtering in Video Coding

With the success of the convolutional neural networks (CNNs) in image denoising and other computer vision tasks, CNNs have been investigated for in-loop filtering in video coding. Many existing methods directly use CNNs as powerful tools for filtering without much analysis on its effect. Considering the in-loop filters process the reconstructed video frames produced from a fixed line of video coding operations, the coding distortion in the reconstructed frames may share similar properties that can be learned by CNNs in addition to being a noisy image. Therefore, in this paper, we first categorize the CNN based filtering into two types of processes: appearance-based CNN filtering and coding distortion-based CNN filtering, and develop a two-stream CNN fusion framework accordingly. In the appearance-based CNN filtering, a CNN processes the reconstructed frame as a distorted image and extracts the global appearance information to restore the original image. In order to extract the global information, a CNN with pooling is used first to increase the receptive field and up-sampling is added in the late stage to produce pixel-level frame information. On the contrary, in the coding distortion-based filtering, a CNN processes the reconstructed frame as blocks with certain types of distortions by focusing on the local information to learn the coding distortion resulted by the fixed video coding pipeline. Finally, the appearance-based filtering stream and the coding distortion-based filtering stream are fused together to combine the two aspects of CNN filtering, and also the global and local information. To further reduce the complexity, the similar initial and last convolutional layers are shared over two streams to generate a mixed CNN. Experiments demonstrate that the proposed method achieves better performance than the existing CNN-based filtering methods, with 11.26% BD-rate saving under the All Intra configuration.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)

自引率

0.00%

发文量

期刊最新文献

A Mixed Appearance-based and Coding Distortion-based CNN Fusion Approach for In-loop Filtering in Video Coding APL: Adaptive Preloading of Short Video with Lyapunov Optimization A Novel Visual Analysis Oriented Rate Control Scheme for HEVC A Theory of Occlusion for Improving Rendering Quality of Views A Progressive Fast CU Split Decision Scheme for AVS3