Neural Image Compression via Attentional Multi-scale Back Projection and Frequency Decomposition

Ge Gao, P. You, Rong Pan, Shunyuan Han, Yuanyuan Zhang, Yuchao Dai, Ho-Jun Lee
{"title":"Neural Image Compression via Attentional Multi-scale Back Projection and Frequency Decomposition","authors":"Ge Gao, P. You, Rong Pan, Shunyuan Han, Yuanyuan Zhang, Yuchao Dai, Ho-Jun Lee","doi":"10.1109/ICCV48922.2021.01441","DOIUrl":null,"url":null,"abstract":"In recent years, neural image compression emerges as a rapidly developing topic in computer vision, where the state-of-the-art approaches now exhibit superior compression performance than their conventional counterparts. Despite the great progress, current methods still have limitations in preserving fine spatial details for optimal reconstruction, especially at low compression rates. We make three contributions in tackling this issue. First, we develop a novel back projection method with attentional and multi-scale feature fusion for augmented representation power. Our back projection method recalibrates the current estimation by establishing feedback connections between high-level and low-level attributes in an attentional and discriminative manner. Second, we propose to decompose the input image and separately process the distinct frequency components, whose derived latents are recombined using a novel dual attention module, so that details inside regions of interest could be explicitly manipulated. Third, we propose a novel training scheme for reducing the latent rounding residual. Experimental results show that, when measured in PSNR, our model reduces BD-rate by 9.88% and 10.32% over the state-of-the-art method, and 4.12% and 4.32% over the latest coding standard Versatile Video Coding (VVC) on the Kodak and CLIC2020 Professional Validation dataset, respectively. Our approach also produces more visually pleasant images when optimized for MS-SSIM. The significant improvement upon existing methods shows the effectiveness of our method in preserving and remedying spatial information for enhanced compression quality.","PeriodicalId":6820,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"28 1","pages":"14657-14666"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"45","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE/CVF International Conference on Computer Vision (ICCV)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCV48922.2021.01441","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 45

Abstract

In recent years, neural image compression emerges as a rapidly developing topic in computer vision, where the state-of-the-art approaches now exhibit superior compression performance than their conventional counterparts. Despite the great progress, current methods still have limitations in preserving fine spatial details for optimal reconstruction, especially at low compression rates. We make three contributions in tackling this issue. First, we develop a novel back projection method with attentional and multi-scale feature fusion for augmented representation power. Our back projection method recalibrates the current estimation by establishing feedback connections between high-level and low-level attributes in an attentional and discriminative manner. Second, we propose to decompose the input image and separately process the distinct frequency components, whose derived latents are recombined using a novel dual attention module, so that details inside regions of interest could be explicitly manipulated. Third, we propose a novel training scheme for reducing the latent rounding residual. Experimental results show that, when measured in PSNR, our model reduces BD-rate by 9.88% and 10.32% over the state-of-the-art method, and 4.12% and 4.32% over the latest coding standard Versatile Video Coding (VVC) on the Kodak and CLIC2020 Professional Validation dataset, respectively. Our approach also produces more visually pleasant images when optimized for MS-SSIM. The significant improvement upon existing methods shows the effectiveness of our method in preserving and remedying spatial information for enhanced compression quality.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于注意多尺度反投影和频率分解的神经图像压缩
近年来,神经图像压缩成为计算机视觉领域的一个快速发展的课题,其中最先进的方法现在表现出比传统方法更好的压缩性能。尽管取得了很大的进步,但目前的方法在保留最佳重建的精细空间细节方面仍然存在局限性,特别是在低压缩率下。我们在解决这个问题上有三点贡献。首先,我们开发了一种新的基于注意力和多尺度特征融合的增强表征能力的反向投影方法。我们的反向投影方法通过以注意和判别的方式在高级别和低级别属性之间建立反馈连接来重新校准当前的估计。其次,我们提出对输入图像进行分解并分别处理不同的频率分量,并使用一种新的双注意模块对其衍生的电位进行重组,从而可以明确地操纵感兴趣区域内的细节。第三,我们提出了一种新的训练方案来减少潜在的舍入残差。实验结果表明,当以PSNR测量时,我们的模型比最先进的方法分别降低了9.88%和10.32%,比最新编码标准通用视频编码(VVC)在柯达和CLIC2020专业验证数据集上分别降低了4.12%和4.32%。当针对MS-SSIM进行优化时,我们的方法也会产生更令人赏心悦目的图像。在现有方法的基础上进行了显著改进,表明该方法在保存和修复空间信息以提高压缩质量方面是有效的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Naturalistic Physical Adversarial Patch for Object Detectors Polarimetric Helmholtz Stereopsis Deep Transport Network for Unsupervised Video Object Segmentation Real-time Vanishing Point Detector Integrating Under-parameterized RANSAC and Hough Transform Adaptive Label Noise Cleaning with Meta-Supervision for Deep Face Recognition
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1