利用面向频率的变换进行端到端优化图像压缩

IF 2.3 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Machine Vision and Applications Pub Date : 2024-02-07 DOI:10.1007/s00138-023-01507-x

Yuefeng Zhang, Kai Lin

{"title":"利用面向频率的变换进行端到端优化图像压缩","authors":"Yuefeng Zhang, Kai Lin","doi":"10.1007/s00138-023-01507-x","DOIUrl":null,"url":null,"abstract":"<p>Image compression constitutes a significant challenge amid the era of information explosion. Recent studies employing deep learning methods have demonstrated the superior performance of learning-based image compression methods over traditional codecs. However, an inherent challenge associated with these methods lies in their lack of interpretability. Following an analysis of the varying degrees of compression degradation across different frequency bands, we propose the end-to-end optimized image compression model facilitated by the frequency-oriented transform. The proposed end-to-end image compression model consists of four components: spatial sampling, frequency-oriented transform, entropy estimation, and frequency-aware fusion. The frequency-oriented transform separates the original image signal into distinct frequency bands, aligning with the human-interpretable concept. Leveraging the non-overlapping hypothesis, the model enables scalable coding through the selective transmission of arbitrary frequency components. Extensive experiments are conducted to demonstrate that our model outperforms all traditional codecs including next-generation standard H.266/VVC on MS-SSIM metric. Moreover, visual analysis tasks (i.e., object detection and semantic segmentation) are conducted to verify the proposed compression method that could preserve semantic fidelity besides signal-level precision.</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":"33 1","pages":""},"PeriodicalIF":2.3000,"publicationDate":"2024-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"End-to-end optimized image compression with the frequency-oriented transform\",\"authors\":\"Yuefeng Zhang, Kai Lin\",\"doi\":\"10.1007/s00138-023-01507-x\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Image compression constitutes a significant challenge amid the era of information explosion. Recent studies employing deep learning methods have demonstrated the superior performance of learning-based image compression methods over traditional codecs. However, an inherent challenge associated with these methods lies in their lack of interpretability. Following an analysis of the varying degrees of compression degradation across different frequency bands, we propose the end-to-end optimized image compression model facilitated by the frequency-oriented transform. The proposed end-to-end image compression model consists of four components: spatial sampling, frequency-oriented transform, entropy estimation, and frequency-aware fusion. The frequency-oriented transform separates the original image signal into distinct frequency bands, aligning with the human-interpretable concept. Leveraging the non-overlapping hypothesis, the model enables scalable coding through the selective transmission of arbitrary frequency components. Extensive experiments are conducted to demonstrate that our model outperforms all traditional codecs including next-generation standard H.266/VVC on MS-SSIM metric. Moreover, visual analysis tasks (i.e., object detection and semantic segmentation) are conducted to verify the proposed compression method that could preserve semantic fidelity besides signal-level precision.</p>\",\"PeriodicalId\":51116,\"journal\":{\"name\":\"Machine Vision and Applications\",\"volume\":\"33 1\",\"pages\":\"\"},\"PeriodicalIF\":2.3000,\"publicationDate\":\"2024-02-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Machine Vision and Applications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s00138-023-01507-x\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine Vision and Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s00138-023-01507-x","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

在信息爆炸的时代，图像压缩是一项重大挑战。最近采用深度学习方法的研究表明，基于学习的图像压缩方法比传统编解码器性能更优越。然而，与这些方法相关的一个固有挑战在于它们缺乏可解释性。在分析了不同频段的不同压缩劣化程度后，我们提出了端到端优化图像压缩模型，该模型通过面向频率的变换得以实现。所提出的端到端图像压缩模型由四个部分组成：空间采样、频率导向变换、熵估计和频率感知融合。频率导向变换将原始图像信号分离成不同的频段，符合人类可理解的概念。利用非重叠假设，该模型可通过选择性传输任意频率成分实现可扩展编码。大量实验证明，我们的模型在 MS-SSIM 指标上优于所有传统编解码器，包括下一代标准 H.266/VVC。此外，还进行了视觉分析任务（即物体检测和语义分割），以验证所提出的压缩方法除了能保持信号级精度外，还能保持语义保真度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

End-to-end optimized image compression with the frequency-oriented transform

Image compression constitutes a significant challenge amid the era of information explosion. Recent studies employing deep learning methods have demonstrated the superior performance of learning-based image compression methods over traditional codecs. However, an inherent challenge associated with these methods lies in their lack of interpretability. Following an analysis of the varying degrees of compression degradation across different frequency bands, we propose the end-to-end optimized image compression model facilitated by the frequency-oriented transform. The proposed end-to-end image compression model consists of four components: spatial sampling, frequency-oriented transform, entropy estimation, and frequency-aware fusion. The frequency-oriented transform separates the original image signal into distinct frequency bands, aligning with the human-interpretable concept. Leveraging the non-overlapping hypothesis, the model enables scalable coding through the selective transmission of arbitrary frequency components. Extensive experiments are conducted to demonstrate that our model outperforms all traditional codecs including next-generation standard H.266/VVC on MS-SSIM metric. Moreover, visual analysis tasks (i.e., object detection and semantic segmentation) are conducted to verify the proposed compression method that could preserve semantic fidelity besides signal-level precision.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Machine Vision and Applications 工程技术-工程：电子与电气

CiteScore

6.30

自引率

3.00%

发文量

审稿时长

8.7 months

期刊介绍： Machine Vision and Applications publishes high-quality technical contributions in machine vision research and development. Specifically, the editors encourage submittals in all applications and engineering aspects of image-related computing. In particular, original contributions dealing with scientific, commercial, industrial, military, and biomedical applications of machine vision, are all within the scope of the journal. Particular emphasis is placed on engineering and technology aspects of image processing and computer vision. The following aspects of machine vision applications are of interest: algorithms, architectures, VLSI implementations, AI techniques and expert systems for machine vision, front-end sensing, multidimensional and multisensor machine vision, real-time techniques, image databases, virtual reality and visualization. Papers must include a significant experimental validation component.