{"title":"基于深度学习的高效视频压缩","authors":"Lv Tang, Xinfeng Zhang","doi":"10.1145/3661311","DOIUrl":null,"url":null,"abstract":"<p>Although deep learning technique has achieved significant improvement on image compression, but its advantages are not fully explored in video compression, which leads to the performance of deep-learning based video compression (DLVC) is obvious inferior to that of hybrid video coding framework. In this paper, we proposed a novel network to improve the performance of DLVC from its most important modules, including <i>Motion Process</i> (MP), <i>Residual Compression</i> (RC) and <i>Frame Reconstruction</i> (FR). In MP, we design a split second-order attention and multi-scale feature extraction module to fully remove the warping artifacts from multi-scale feature space and pixel space, which can help reduce the distortion in the following process. In RC, we propose a channel selection mechanism to gradually drop redundant information while preserving informative channels for a better rate-distortion performance. Finally, in FR, we introduce a residual multi-scale recurrent network to improve the quality of the current reconstructed frame by progressively exploiting temporal context information between it and its several previous reconstructed frames. Extensive experiments are conducted on the three widely used video compression datasets (HEVC, UVG and MCL-JVC), and the performance demonstrates the superiority of our proposed approach over the state-of-the-art methods.</p>","PeriodicalId":50937,"journal":{"name":"ACM Transactions on Multimedia Computing Communications and Applications","volume":"99 1","pages":""},"PeriodicalIF":5.2000,"publicationDate":"2024-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"High Efficiency Deep-learning Based Video Compression\",\"authors\":\"Lv Tang, Xinfeng Zhang\",\"doi\":\"10.1145/3661311\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Although deep learning technique has achieved significant improvement on image compression, but its advantages are not fully explored in video compression, which leads to the performance of deep-learning based video compression (DLVC) is obvious inferior to that of hybrid video coding framework. In this paper, we proposed a novel network to improve the performance of DLVC from its most important modules, including <i>Motion Process</i> (MP), <i>Residual Compression</i> (RC) and <i>Frame Reconstruction</i> (FR). In MP, we design a split second-order attention and multi-scale feature extraction module to fully remove the warping artifacts from multi-scale feature space and pixel space, which can help reduce the distortion in the following process. In RC, we propose a channel selection mechanism to gradually drop redundant information while preserving informative channels for a better rate-distortion performance. Finally, in FR, we introduce a residual multi-scale recurrent network to improve the quality of the current reconstructed frame by progressively exploiting temporal context information between it and its several previous reconstructed frames. Extensive experiments are conducted on the three widely used video compression datasets (HEVC, UVG and MCL-JVC), and the performance demonstrates the superiority of our proposed approach over the state-of-the-art methods.</p>\",\"PeriodicalId\":50937,\"journal\":{\"name\":\"ACM Transactions on Multimedia Computing Communications and Applications\",\"volume\":\"99 1\",\"pages\":\"\"},\"PeriodicalIF\":5.2000,\"publicationDate\":\"2024-04-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Multimedia Computing Communications and Applications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1145/3661311\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Multimedia Computing Communications and Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3661311","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
High Efficiency Deep-learning Based Video Compression
Although deep learning technique has achieved significant improvement on image compression, but its advantages are not fully explored in video compression, which leads to the performance of deep-learning based video compression (DLVC) is obvious inferior to that of hybrid video coding framework. In this paper, we proposed a novel network to improve the performance of DLVC from its most important modules, including Motion Process (MP), Residual Compression (RC) and Frame Reconstruction (FR). In MP, we design a split second-order attention and multi-scale feature extraction module to fully remove the warping artifacts from multi-scale feature space and pixel space, which can help reduce the distortion in the following process. In RC, we propose a channel selection mechanism to gradually drop redundant information while preserving informative channels for a better rate-distortion performance. Finally, in FR, we introduce a residual multi-scale recurrent network to improve the quality of the current reconstructed frame by progressively exploiting temporal context information between it and its several previous reconstructed frames. Extensive experiments are conducted on the three widely used video compression datasets (HEVC, UVG and MCL-JVC), and the performance demonstrates the superiority of our proposed approach over the state-of-the-art methods.
期刊介绍:
The ACM Transactions on Multimedia Computing, Communications, and Applications is the flagship publication of the ACM Special Interest Group in Multimedia (SIGMM). It is soliciting paper submissions on all aspects of multimedia. Papers on single media (for instance, audio, video, animation) and their processing are also welcome.
TOMM is a peer-reviewed, archival journal, available in both print form and digital form. The Journal is published quarterly; with roughly 7 23-page articles in each issue. In addition, all Special Issues are published online-only to ensure a timely publication. The transactions consists primarily of research papers. This is an archival journal and it is intended that the papers will have lasting importance and value over time. In general, papers whose primary focus is on particular multimedia products or the current state of the industry will not be included.