{"title":"Learned Image Compression With Adaptive Channel and Window-Based Spatial Entropy Models","authors":"Jian Wang;Qiang Ling","doi":"10.1109/TCE.2024.3485179","DOIUrl":null,"url":null,"abstract":"Image compression is essential for reducing the cost to save or transmit images. Recently, learned image compression methods have achieved superior compression performance compared to traditional image compression standards. Many learned image compression methods utilize convolutional entropy models to remove local spatial and channel redundancy in the latent representation. Some recent methods incorporate transformer to further eliminate non-local redundancy. However, these methods employ the same transformer structure to model both spatial and channel correlations, thereby failing to take advantage of the difference between the spatial characteristics and the channel characteristics of the latent representation. To resolve this issue, we propose novel adaptive channel and window-based spatial entropy models. The adaptive channel entropy model, which consists of the channel transformer module and the channel excitation module, dynamically fuses and excites channel information to implicitly predict channel context. More specifically, we first establish the relationship between the decoded channels and the channels to be encoded. Based on that channel relationship, the channel transformer module adaptively updates the predicted channel context. Finally, the channel excitation module is employed to emphasize informative channel context and suppress irrelevant channel context. Furthermore, we introduce a window-based spatial entropy model to capture global semantic information within the window and generate the spatial context of non-anchor features based on the decoded anchor features. The spatial context and channel context are combined to predict the Gaussian parameters of the latent representation. Experimental results demonstrate that our method outperforms some state-of-the-art image compression methods on Kodak, CLIC and Tecnick datasets.","PeriodicalId":13208,"journal":{"name":"IEEE Transactions on Consumer Electronics","volume":"70 4","pages":"6430-6441"},"PeriodicalIF":4.3000,"publicationDate":"2024-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Consumer Electronics","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10730794/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Image compression is essential for reducing the cost to save or transmit images. Recently, learned image compression methods have achieved superior compression performance compared to traditional image compression standards. Many learned image compression methods utilize convolutional entropy models to remove local spatial and channel redundancy in the latent representation. Some recent methods incorporate transformer to further eliminate non-local redundancy. However, these methods employ the same transformer structure to model both spatial and channel correlations, thereby failing to take advantage of the difference between the spatial characteristics and the channel characteristics of the latent representation. To resolve this issue, we propose novel adaptive channel and window-based spatial entropy models. The adaptive channel entropy model, which consists of the channel transformer module and the channel excitation module, dynamically fuses and excites channel information to implicitly predict channel context. More specifically, we first establish the relationship between the decoded channels and the channels to be encoded. Based on that channel relationship, the channel transformer module adaptively updates the predicted channel context. Finally, the channel excitation module is employed to emphasize informative channel context and suppress irrelevant channel context. Furthermore, we introduce a window-based spatial entropy model to capture global semantic information within the window and generate the spatial context of non-anchor features based on the decoded anchor features. The spatial context and channel context are combined to predict the Gaussian parameters of the latent representation. Experimental results demonstrate that our method outperforms some state-of-the-art image compression methods on Kodak, CLIC and Tecnick datasets.
期刊介绍:
The main focus for the IEEE Transactions on Consumer Electronics is the engineering and research aspects of the theory, design, construction, manufacture or end use of mass market electronics, systems, software and services for consumers.