Toward Transparent Deep Image Aesthetics Assessment With Tag-Based Content Descriptors

IF 13.7 IEEE transactions on image processing : a publication of the IEEE Signal Processing Society Pub Date : 2023-08-30 DOI:10.1109/TIP.2023.3308852

Jingwen Hou;Weisi Lin;Yuming Fang;Haoning Wu;Chaofeng Chen;Liang Liao;Weide Liu

{"title":"Toward Transparent Deep Image Aesthetics Assessment With Tag-Based Content Descriptors","authors":"Jingwen Hou;Weisi Lin;Yuming Fang;Haoning Wu;Chaofeng Chen;Liang Liao;Weide Liu","doi":"10.1109/TIP.2023.3308852","DOIUrl":null,"url":null,"abstract":"Deep learning approaches for Image Aesthetics Assessment (IAA) have shown promising results in recent years, but the internal mechanisms of these models remain unclear. Previous studies have demonstrated that image aesthetics can be predicted using semantic features, such as pre-trained object classification features. However, these semantic features are learned implicitly, and therefore, previous works have not elucidated what the semantic features are representing. In this work, we aim to create a more transparent deep learning framework for IAA by introducing explainable semantic features. To achieve this, we propose Tag-based Content Descriptors (TCDs), where each value in a TCD describes the relevance of an image to a human-readable tag that refers to a specific type of image content. This allows us to build IAA models from explicit descriptions of image contents. We first propose the explicit matching process to produce TCDs that adopt predefined tags to describe image contents. We show that a simple MLP-based IAA model with TCDs only based on predefined tags can achieve an SRCC of 0.767, which is comparable to most state-of-the-art methods. However, predefined tags may not be sufficient to describe all possible image contents that the model may encounter. Therefore, we further propose the implicit matching process to describe image contents that cannot be described by predefined tags. By integrating components obtained from the implicit matching process into TCDs, the IAA model further achieves an SRCC of 0.817, which significantly outperforms existing IAA methods. Both the explicit matching process and the implicit matching process are realized by the proposed TCD generator. To evaluate the performance of the proposed TCD generator in matching images with predefined tags, we also labeled 5101 images with photography-related tags to form a validation set. And experimental results show that the proposed TCD generator can meaningfully assign photography-related tags to images.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"3070-3085"},"PeriodicalIF":13.7000,"publicationDate":"2023-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10235894/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Deep learning approaches for Image Aesthetics Assessment (IAA) have shown promising results in recent years, but the internal mechanisms of these models remain unclear. Previous studies have demonstrated that image aesthetics can be predicted using semantic features, such as pre-trained object classification features. However, these semantic features are learned implicitly, and therefore, previous works have not elucidated what the semantic features are representing. In this work, we aim to create a more transparent deep learning framework for IAA by introducing explainable semantic features. To achieve this, we propose Tag-based Content Descriptors (TCDs), where each value in a TCD describes the relevance of an image to a human-readable tag that refers to a specific type of image content. This allows us to build IAA models from explicit descriptions of image contents. We first propose the explicit matching process to produce TCDs that adopt predefined tags to describe image contents. We show that a simple MLP-based IAA model with TCDs only based on predefined tags can achieve an SRCC of 0.767, which is comparable to most state-of-the-art methods. However, predefined tags may not be sufficient to describe all possible image contents that the model may encounter. Therefore, we further propose the implicit matching process to describe image contents that cannot be described by predefined tags. By integrating components obtained from the implicit matching process into TCDs, the IAA model further achieves an SRCC of 0.817, which significantly outperforms existing IAA methods. Both the explicit matching process and the implicit matching process are realized by the proposed TCD generator. To evaluate the performance of the proposed TCD generator in matching images with predefined tags, we also labeled 5101 images with photography-related tags to form a validation set. And experimental results show that the proposed TCD generator can meaningfully assign photography-related tags to images.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

利用基于标签的内容描述符实现透明的深度图像美学评估

近年来，用于图像美学评估（IAA）的深度学习方法取得了可喜的成果，但这些模型的内部机制仍不清楚。以往的研究表明，图像美学可以通过语义特征（如预训练的对象分类特征）来预测。然而，这些语义特征都是隐式学习的，因此以往的研究并未阐明这些语义特征代表了什么。在这项工作中，我们旨在通过引入可解释的语义特征，为 IAA 创建一个更加透明的深度学习框架。为此，我们提出了基于标签的内容描述符（TCD），TCD 中的每个值都描述了图像与人类可读标签的相关性，该标签指的是特定类型的图像内容。这样，我们就能根据图像内容的明确描述建立 IAA 模型。我们首先提出了显式匹配流程，以生成采用预定义标签来描述图像内容的 TCD。我们的研究表明，一个简单的基于 MLP 的 IAA 模型，其 TCD 仅基于预定义标签，就能达到 0.767 的 SRCC，与大多数最先进的方法不相上下。然而，预定义标签可能不足以描述模型可能遇到的所有图像内容。因此，我们进一步提出了隐式匹配过程，以描述预定义标签无法描述的图像内容。通过将隐式匹配过程获得的组件集成到 TCD 中，IAA 模型的 SRCC 进一步达到了 0.817，明显优于现有的 IAA 方法。显式匹配过程和隐式匹配过程均由所提出的 TCD 生成器实现。为了评估所提出的 TCD 生成器在匹配带有预定义标签的图像方面的性能，我们还为 5101 幅图像标注了与摄影相关的标签，以形成验证集。实验结果表明，建议的 TCD 生成器可以为图像分配有意义的摄影相关标签。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society

自引率

0.00%

发文量