基于卷积神经网络的内容感知图像压缩

Alen Selimović, A. Hladnik
{"title":"基于卷积神经网络的内容感知图像压缩","authors":"Alen Selimović, A. Hladnik","doi":"10.24867/GRID-2018-P56","DOIUrl":null,"url":null,"abstract":"Traditional image compression algorithms treat all image regions equally, regardless of their content, often resulting in reconstructed images that do not correlate well with human perception. Content-aware compression, on the other hand, prioritizes image regions that are more relevant to the interpretation of an image and encodes them at a higher bitrate, i.e. without loss or with less loss, than the rest of the image. Our paper explores the multi-structure region of interest (MS-ROI) model, a convolutional neural network, which enables the localization of multiple regions of interest (ROIs) in an image. The localization is expressed as a corresponding saliency map, which identifies the relevance of individual image regions and provides a saliency value for each pixel of the given image. This information is then used to guide the compression. The saliency values are discretized into multiple levels and more important levels are encoded with a higher quality factor Q than the less important ones, allowing for most of the reduction in image resolution to occur in non-salient image regions. Because the generated saliency maps produce soft boundaries between salient and non-salient image regions, smooth transitions between these regions are achieved. The obtained image is then encoded further using the standard JPEG algorithm with a uniform Q factor, resulting in the final image of the standard JPEG format. Our model was trained on the Caltech-101 image dataset and its performance was tested on two other image datasets. Presented are the obtained saliency maps for several images, as well as the results of contentaware compression, which are compared to the standard JPEG compression at different Q factors. For an objective comparison and evaluation of the quality of the obtained images, various standard quality metrics were used, i.e. mean squared error (MSE), peak signal-to-noise ratio (PSNR), structural similarity index (SSIM) and multi-scale structural similarity index (MS-SSIM).","PeriodicalId":371126,"journal":{"name":"Proceedings of 9th International Symposium on Graphic Engineering and Design","volume":"72 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"CONTENT-AWARE IMAGE COMPRESSION WITH CONVOLUTIONAL NEURAL NETWORKS\",\"authors\":\"Alen Selimović, A. Hladnik\",\"doi\":\"10.24867/GRID-2018-P56\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Traditional image compression algorithms treat all image regions equally, regardless of their content, often resulting in reconstructed images that do not correlate well with human perception. Content-aware compression, on the other hand, prioritizes image regions that are more relevant to the interpretation of an image and encodes them at a higher bitrate, i.e. without loss or with less loss, than the rest of the image. Our paper explores the multi-structure region of interest (MS-ROI) model, a convolutional neural network, which enables the localization of multiple regions of interest (ROIs) in an image. The localization is expressed as a corresponding saliency map, which identifies the relevance of individual image regions and provides a saliency value for each pixel of the given image. This information is then used to guide the compression. The saliency values are discretized into multiple levels and more important levels are encoded with a higher quality factor Q than the less important ones, allowing for most of the reduction in image resolution to occur in non-salient image regions. Because the generated saliency maps produce soft boundaries between salient and non-salient image regions, smooth transitions between these regions are achieved. The obtained image is then encoded further using the standard JPEG algorithm with a uniform Q factor, resulting in the final image of the standard JPEG format. Our model was trained on the Caltech-101 image dataset and its performance was tested on two other image datasets. Presented are the obtained saliency maps for several images, as well as the results of contentaware compression, which are compared to the standard JPEG compression at different Q factors. For an objective comparison and evaluation of the quality of the obtained images, various standard quality metrics were used, i.e. mean squared error (MSE), peak signal-to-noise ratio (PSNR), structural similarity index (SSIM) and multi-scale structural similarity index (MS-SSIM).\",\"PeriodicalId\":371126,\"journal\":{\"name\":\"Proceedings of 9th International Symposium on Graphic Engineering and Design\",\"volume\":\"72 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of 9th International Symposium on Graphic Engineering and Design\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.24867/GRID-2018-P56\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of 9th International Symposium on Graphic Engineering and Design","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.24867/GRID-2018-P56","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

传统的图像压缩算法平等地对待所有图像区域,而不考虑其内容,通常会导致重建的图像与人类感知不太相关。另一方面,内容感知压缩优先考虑与图像解释更相关的图像区域,并以更高的比特率对它们进行编码,即没有丢失或丢失较少,而不是图像的其余部分。本文探讨了多结构感兴趣区域(MS-ROI)模型,该模型是一种卷积神经网络,能够在图像中定位多个感兴趣区域(roi)。定位被表示为相应的显著性图,该显著性图识别单个图像区域的相关性,并为给定图像的每个像素提供显著性值。然后使用这些信息来指导压缩。显著性值被离散成多个级别,更重要的级别比不重要的级别使用更高的质量因子Q进行编码,从而允许大多数图像分辨率的降低发生在非显著性图像区域。由于生成的显著性图在显著和非显著图像区域之间产生软边界,因此可以实现这些区域之间的平滑过渡。然后使用统一Q因子的标准JPEG算法对获得的图像进行进一步编码,从而得到标准JPEG格式的最终图像。我们的模型在Caltech-101图像数据集上进行了训练,并在另外两个图像数据集上测试了其性能。本文给出了几幅图像的显著性图,以及内容感知压缩的结果,并将其与不同Q因子下的标准JPEG压缩进行了比较。为了客观比较和评价获得的图像质量,使用了各种标准的质量指标,即均方误差(MSE)、峰值信噪比(PSNR)、结构相似指数(SSIM)和多尺度结构相似指数(MS-SSIM)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
CONTENT-AWARE IMAGE COMPRESSION WITH CONVOLUTIONAL NEURAL NETWORKS
Traditional image compression algorithms treat all image regions equally, regardless of their content, often resulting in reconstructed images that do not correlate well with human perception. Content-aware compression, on the other hand, prioritizes image regions that are more relevant to the interpretation of an image and encodes them at a higher bitrate, i.e. without loss or with less loss, than the rest of the image. Our paper explores the multi-structure region of interest (MS-ROI) model, a convolutional neural network, which enables the localization of multiple regions of interest (ROIs) in an image. The localization is expressed as a corresponding saliency map, which identifies the relevance of individual image regions and provides a saliency value for each pixel of the given image. This information is then used to guide the compression. The saliency values are discretized into multiple levels and more important levels are encoded with a higher quality factor Q than the less important ones, allowing for most of the reduction in image resolution to occur in non-salient image regions. Because the generated saliency maps produce soft boundaries between salient and non-salient image regions, smooth transitions between these regions are achieved. The obtained image is then encoded further using the standard JPEG algorithm with a uniform Q factor, resulting in the final image of the standard JPEG format. Our model was trained on the Caltech-101 image dataset and its performance was tested on two other image datasets. Presented are the obtained saliency maps for several images, as well as the results of contentaware compression, which are compared to the standard JPEG compression at different Q factors. For an objective comparison and evaluation of the quality of the obtained images, various standard quality metrics were used, i.e. mean squared error (MSE), peak signal-to-noise ratio (PSNR), structural similarity index (SSIM) and multi-scale structural similarity index (MS-SSIM).
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
0.50
自引率
0.00%
发文量
0
期刊最新文献
QUALITY PERFORMANCE TESTING FOR BASE PAPER OF CORRUGATED PAPERBOARD BY DSC METHOD GRADATION, COLOUR RANGE AND COLORIMETRIC ACCURACY OF DIGITAL PROJECTOR JVC DLA-RS 15 THE INVESTIGATION OF USING ZIRCONIUM OXIDE MICROSPHERES IN PAPER COATING PRINTABILITY CHARACTERISTICS OF PAPER MADE FROM A JAPANESE KNOTWEED DIFFERENCE BETWEEN USING COLOURIMETRIC VALUES (L*a*b*) OR OPTICAL DENSITY FOR RANDOM PRINT NONUNIFORMITY QUANTIFICATION
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1