{"title":"An Enhanced Video Compression Framework based on Rescaling Networks","authors":"Zhiyu Chen, L. Chen","doi":"10.1109/BMSB58369.2023.10211137","DOIUrl":null,"url":null,"abstract":"Recently, enhanced video compression frameworks compatible with traditional standard codecs have achieved competitive performance especially in low-bandwidth scenarios, among which the downsampling-based preprocessing networks and super-resolution based postprocessing networks are commonly applied. Surrogate networks are further employed as the replacement of non-differentiable standard codecs during training. However, the discard of minor information such as high frequency spatial textures in the process of downsampling restricts the reconstruction quality. Moreover, existing surrogate networks merely imitate the intra-frame coding structure of standard codecs without leveraging inter-frame relations. In this paper, we propose a rescaling-based enhanced video compression framework. The main video stream preserves critical spatial structures and complete temporal information, while another lightweight segment-specific enhancement stream transmitted to the decoder side is extracted and encoded from the key frame of a video segment. The high-frequency spatial information contained in the enhancement stream is further transferred to the whole segment with the guide of decoded LR frames via a Transformer-based Reconstruction Network (TRN), thus enhancing the reconstruction quality at the expense of a small bit cost. Besides, we employ a Virtual Codec Network (VCN) during training for gradients back-propagation, which is able to imitate both inter-frame and intra-frame coding characteristics of standard codecs. Experimental results indicate the superiority of the proposed approach compared with recent downsampling-based enhanced standard compatible frameworks.","PeriodicalId":13080,"journal":{"name":"IEEE international Symposium on Broadband Multimedia Systems and Broadcasting","volume":"18 1","pages":"1-6"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE international Symposium on Broadband Multimedia Systems and Broadcasting","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BMSB58369.2023.10211137","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Recently, enhanced video compression frameworks compatible with traditional standard codecs have achieved competitive performance especially in low-bandwidth scenarios, among which the downsampling-based preprocessing networks and super-resolution based postprocessing networks are commonly applied. Surrogate networks are further employed as the replacement of non-differentiable standard codecs during training. However, the discard of minor information such as high frequency spatial textures in the process of downsampling restricts the reconstruction quality. Moreover, existing surrogate networks merely imitate the intra-frame coding structure of standard codecs without leveraging inter-frame relations. In this paper, we propose a rescaling-based enhanced video compression framework. The main video stream preserves critical spatial structures and complete temporal information, while another lightweight segment-specific enhancement stream transmitted to the decoder side is extracted and encoded from the key frame of a video segment. The high-frequency spatial information contained in the enhancement stream is further transferred to the whole segment with the guide of decoded LR frames via a Transformer-based Reconstruction Network (TRN), thus enhancing the reconstruction quality at the expense of a small bit cost. Besides, we employ a Virtual Codec Network (VCN) during training for gradients back-propagation, which is able to imitate both inter-frame and intra-frame coding characteristics of standard codecs. Experimental results indicate the superiority of the proposed approach compared with recent downsampling-based enhanced standard compatible frameworks.