基于卷积神经网络的内容感知图像压缩

Proceedings of 9th International Symposium on Graphic Engineering and Design Pub Date : 2018-11-01 DOI:10.24867/GRID-2018-P56

Alen Selimović, A. Hladnik

{"title":"基于卷积神经网络的内容感知图像压缩","authors":"Alen Selimović, A. Hladnik","doi":"10.24867/GRID-2018-P56","DOIUrl":null,"url":null,"abstract":"Traditional image compression algorithms treat all image regions equally, regardless of their content, often resulting in reconstructed images that do not correlate well with human perception. Content-aware compression, on the other hand, prioritizes image regions that are more relevant to the interpretation of an image and encodes them at a higher bitrate, i.e. without loss or with less loss, than the rest of the image. Our paper explores the multi-structure region of interest (MS-ROI) model, a convolutional neural network, which enables the localization of multiple regions of interest (ROIs) in an image. The localization is expressed as a corresponding saliency map, which identifies the relevance of individual image regions and provides a saliency value for each pixel of the given image. This information is then used to guide the compression. The saliency values are discretized into multiple levels and more important levels are encoded with a higher quality factor Q than the less important ones, allowing for most of the reduction in image resolution to occur in non-salient image regions. Because the generated saliency maps produce soft boundaries between salient and non-salient image regions, smooth transitions between these regions are achieved. The obtained image is then encoded further using the standard JPEG algorithm with a uniform Q factor, resulting in the final image of the standard JPEG format. Our model was trained on the Caltech-101 image dataset and its performance was tested on two other image datasets. Presented are the obtained saliency maps for several images, as well as the results of contentaware compression, which are compared to the standard JPEG compression at different Q factors. For an objective comparison and evaluation of the quality of the obtained images, various standard quality metrics were used, i.e. mean squared error (MSE), peak signal-to-noise ratio (PSNR), structural similarity index (SSIM) and multi-scale structural similarity index (MS-SSIM).","PeriodicalId":371126,"journal":{"name":"Proceedings of 9th International Symposium on Graphic Engineering and Design","volume":"72 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"CONTENT-AWARE IMAGE COMPRESSION WITH CONVOLUTIONAL NEURAL NETWORKS\",\"authors\":\"Alen Selimović, A. Hladnik\",\"doi\":\"10.24867/GRID-2018-P56\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Traditional image compression algorithms treat all image regions equally, regardless of their content, often resulting in reconstructed images that do not correlate well with human perception. Content-aware compression, on the other hand, prioritizes image regions that are more relevant to the interpretation of an image and encodes them at a higher bitrate, i.e. without loss or with less loss, than the rest of the image. Our paper explores the multi-structure region of interest (MS-ROI) model, a convolutional neural network, which enables the localization of multiple regions of interest (ROIs) in an image. The localization is expressed as a corresponding saliency map, which identifies the relevance of individual image regions and provides a saliency value for each pixel of the given image. This information is then used to guide the compression. The saliency values are discretized into multiple levels and more important levels are encoded with a higher quality factor Q than the less important ones, allowing for most of the reduction in image resolution to occur in non-salient image regions. Because the generated saliency maps produce soft boundaries between salient and non-salient image regions, smooth transitions between these regions are achieved. The obtained image is then encoded further using the standard JPEG algorithm with a uniform Q factor, resulting in the final image of the standard JPEG format. Our model was trained on the Caltech-101 image dataset and its performance was tested on two other image datasets. Presented are the obtained saliency maps for several images, as well as the results of contentaware compression, which are compared to the standard JPEG compression at different Q factors. For an objective comparison and evaluation of the quality of the obtained images, various standard quality metrics were used, i.e. mean squared error (MSE), peak signal-to-noise ratio (PSNR), structural similarity index (SSIM) and multi-scale structural similarity index (MS-SSIM).\",\"PeriodicalId\":371126,\"journal\":{\"name\":\"Proceedings of 9th International Symposium on Graphic Engineering and Design\",\"volume\":\"72 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of 9th International Symposium on Graphic Engineering and Design\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.24867/GRID-2018-P56\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of 9th International Symposium on Graphic Engineering and Design","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.24867/GRID-2018-P56","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

传统的图像压缩算法平等地对待所有图像区域，而不考虑其内容，通常会导致重建的图像与人类感知不太相关。另一方面，内容感知压缩优先考虑与图像解释更相关的图像区域，并以更高的比特率对它们进行编码，即没有丢失或丢失较少，而不是图像的其余部分。本文探讨了多结构感兴趣区域(MS-ROI)模型，该模型是一种卷积神经网络，能够在图像中定位多个感兴趣区域(roi)。定位被表示为相应的显著性图，该显著性图识别单个图像区域的相关性，并为给定图像的每个像素提供显著性值。然后使用这些信息来指导压缩。显著性值被离散成多个级别，更重要的级别比不重要的级别使用更高的质量因子Q进行编码，从而允许大多数图像分辨率的降低发生在非显著性图像区域。由于生成的显著性图在显著和非显著图像区域之间产生软边界，因此可以实现这些区域之间的平滑过渡。然后使用统一Q因子的标准JPEG算法对获得的图像进行进一步编码，从而得到标准JPEG格式的最终图像。我们的模型在Caltech-101图像数据集上进行了训练，并在另外两个图像数据集上测试了其性能。本文给出了几幅图像的显著性图，以及内容感知压缩的结果，并将其与不同Q因子下的标准JPEG压缩进行了比较。为了客观比较和评价获得的图像质量，使用了各种标准的质量指标，即均方误差(MSE)、峰值信噪比(PSNR)、结构相似指数(SSIM)和多尺度结构相似指数(MS-SSIM)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

CONTENT-AWARE IMAGE COMPRESSION WITH CONVOLUTIONAL NEURAL NETWORKS

Traditional image compression algorithms treat all image regions equally, regardless of their content, often resulting in reconstructed images that do not correlate well with human perception. Content-aware compression, on the other hand, prioritizes image regions that are more relevant to the interpretation of an image and encodes them at a higher bitrate, i.e. without loss or with less loss, than the rest of the image. Our paper explores the multi-structure region of interest (MS-ROI) model, a convolutional neural network, which enables the localization of multiple regions of interest (ROIs) in an image. The localization is expressed as a corresponding saliency map, which identifies the relevance of individual image regions and provides a saliency value for each pixel of the given image. This information is then used to guide the compression. The saliency values are discretized into multiple levels and more important levels are encoded with a higher quality factor Q than the less important ones, allowing for most of the reduction in image resolution to occur in non-salient image regions. Because the generated saliency maps produce soft boundaries between salient and non-salient image regions, smooth transitions between these regions are achieved. The obtained image is then encoded further using the standard JPEG algorithm with a uniform Q factor, resulting in the final image of the standard JPEG format. Our model was trained on the Caltech-101 image dataset and its performance was tested on two other image datasets. Presented are the obtained saliency maps for several images, as well as the results of contentaware compression, which are compared to the standard JPEG compression at different Q factors. For an objective comparison and evaluation of the quality of the obtained images, various standard quality metrics were used, i.e. mean squared error (MSE), peak signal-to-noise ratio (PSNR), structural similarity index (SSIM) and multi-scale structural similarity index (MS-SSIM).

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of 9th International Symposium on Graphic Engineering and Design

CiteScore

0.50

自引率

0.00%

发文量