Toward Extreme Image Compression With Latent Feature Guidance and Diffusion Prior

IF 11.1 1区工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2024-09-06 DOI:10.1109/TCSVT.2024.3455576

Zhiyuan Li;Yanhui Zhou;Hao Wei;Chenyang Ge;Jingwen Jiang

{"title":"Toward Extreme Image Compression With Latent Feature Guidance and Diffusion Prior","authors":"Zhiyuan Li;Yanhui Zhou;Hao Wei;Chenyang Ge;Jingwen Jiang","doi":"10.1109/TCSVT.2024.3455576","DOIUrl":null,"url":null,"abstract":"Image compression at extremely low bitrates (below 0.1 bits per pixel (bpp)) is a significant challenge due to substantial information loss. In this work, we propose a novel two-stage extreme image compression framework that exploits the powerful generative capability of pre-trained diffusion models to achieve realistic image reconstruction at extremely low bitrates. In the first stage, we treat the latent representation of images in the diffusion space as guidance, employing a VAE-based compression approach to compress images and initially decode the compressed information into content variables. The second stage leverages pre-trained stable diffusion to reconstruct images under the guidance of content variables. Specifically, we introduce a small control module to inject content information while keeping the stable diffusion model fixed to maintain its generative capability. Furthermore, we design a space alignment loss to force the content variables to align with the diffusion space and provide the necessary constraints for optimization. Extensive experiments demonstrate that our method significantly outperforms state-of-the-art approaches in terms of visual performance at extremely low bitrates. The source code and trained models are available at <uri>https://github.com/huai-chang/DiffEIC</uri>.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 1","pages":"888-899"},"PeriodicalIF":11.1000,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems for Video Technology","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10669055/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Image compression at extremely low bitrates (below 0.1 bits per pixel (bpp)) is a significant challenge due to substantial information loss. In this work, we propose a novel two-stage extreme image compression framework that exploits the powerful generative capability of pre-trained diffusion models to achieve realistic image reconstruction at extremely low bitrates. In the first stage, we treat the latent representation of images in the diffusion space as guidance, employing a VAE-based compression approach to compress images and initially decode the compressed information into content variables. The second stage leverages pre-trained stable diffusion to reconstruct images under the guidance of content variables. Specifically, we introduce a small control module to inject content information while keeping the stable diffusion model fixed to maintain its generative capability. Furthermore, we design a space alignment loss to force the content variables to align with the diffusion space and provide the necessary constraints for optimization. Extensive experiments demonstrate that our method significantly outperforms state-of-the-art approaches in terms of visual performance at extremely low bitrates. The source code and trained models are available at https://github.com/huai-chang/DiffEIC.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

利用潜在特征引导和扩散先验实现极端图像压缩

由于大量的信息丢失，极低比特率（低于每像素0.1比特）的图像压缩是一个重大挑战。在这项工作中，我们提出了一种新的两阶段极端图像压缩框架，利用预训练扩散模型的强大生成能力，以极低的比特率实现逼真的图像重建。在第一阶段，我们以扩散空间中图像的潜在表示为指导，采用基于vae的压缩方法对图像进行压缩，并将压缩后的信息初始解码为内容变量。第二阶段利用预训练的稳定扩散，在内容变量的指导下重构图像。具体来说，我们引入了一个小的控制模块来注入内容信息，同时保持稳定扩散模型的固定，以保持其生成能力。此外，我们设计了一个空间对齐损失，以迫使内容变量与扩散空间对齐，并为优化提供必要的约束。大量的实验表明，我们的方法在极低比特率的视觉性能方面明显优于最先进的方法。源代码和经过训练的模型可在https://github.com/huai-chang/DiffEIC上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Transactions on Circuits and Systems for Video Technology 工程技术-工程：电子与电气

CiteScore

13.80

自引率

27.40%

发文量

660

审稿时长

5 months

期刊介绍： The IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) is dedicated to covering all aspects of video technologies from a circuits and systems perspective. We encourage submissions of general, theoretical, and application-oriented papers related to image and video acquisition, representation, presentation, and display. Additionally, we welcome contributions in areas such as processing, filtering, and transforms; analysis and synthesis; learning and understanding; compression, transmission, communication, and networking; as well as storage, retrieval, indexing, and search. Furthermore, papers focusing on hardware and software design and implementation are highly valued. Join us in advancing the field of video technology through innovative research and insights.

期刊最新文献

IEEE Circuits and Systems Society Information IEEE Circuits and Systems Society Information 2025 Index IEEE Transactions on Circuits and Systems for Video Technology IEEE Circuits and Systems Society Information IEEE Circuits and Systems Society Information