Adaptive Mixed-Scale Feature Fusion Network for Blind AI-Generated Image Quality Assessment

IF 4.8 1区计算机科学 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC IEEE Transactions on Broadcasting Pub Date : 2024-03-06 DOI:10.1109/TBC.2024.3391060

Tianwei Zhou;Songbai Tan;Wei Zhou;Yu Luo;Yuan-Gen Wang;Guanghui Yue

{"title":"Adaptive Mixed-Scale Feature Fusion Network for Blind AI-Generated Image Quality Assessment","authors":"Tianwei Zhou;Songbai Tan;Wei Zhou;Yu Luo;Yuan-Gen Wang;Guanghui Yue","doi":"10.1109/TBC.2024.3391060","DOIUrl":null,"url":null,"abstract":"With the increasing maturity of the text-to-image and image-to-image generative models, AI-generated images (AGIs) have shown great application potential in advertisement, entertainment, education, social media, etc. Although remarkable advancements have been achieved in generative models, very few efforts have been paid to design relevant quality assessment models. In this paper, we propose a novel blind image quality assessment (IQA) network, named AMFF-Net, for AGIs. AMFF-Net evaluates AGI quality from three dimensions, i.e., “visual quality”, “authenticity”, and “consistency”. Specifically, inspired by the characteristics of the human visual system and motivated by the observation that “visual quality” and “authenticity” are characterized by both local and global aspects, AMFF-Net scales the image up and down and takes the scaled images and original-sized image as the inputs to obtain multi-scale features. After that, an Adaptive Feature Fusion (AFF) block is used to adaptively fuse the multi-scale features with learnable weights. In addition, considering the correlation between the image and prompt, AMFF-Net compares the semantic features from text encoder and image encoder to evaluate the text-to-image alignment. We carry out extensive experiments on three AGI quality assessment databases, and the experimental results show that our AMFF-Net obtains better performance than nine state-of-the-art blind IQA methods. The results of ablation experiments further demonstrate the effectiveness of the proposed multi-scale input strategy and AFF block.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"70 3","pages":"833-843"},"PeriodicalIF":4.8000,"publicationDate":"2024-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Broadcasting","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10520989/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

With the increasing maturity of the text-to-image and image-to-image generative models, AI-generated images (AGIs) have shown great application potential in advertisement, entertainment, education, social media, etc. Although remarkable advancements have been achieved in generative models, very few efforts have been paid to design relevant quality assessment models. In this paper, we propose a novel blind image quality assessment (IQA) network, named AMFF-Net, for AGIs. AMFF-Net evaluates AGI quality from three dimensions, i.e., “visual quality”, “authenticity”, and “consistency”. Specifically, inspired by the characteristics of the human visual system and motivated by the observation that “visual quality” and “authenticity” are characterized by both local and global aspects, AMFF-Net scales the image up and down and takes the scaled images and original-sized image as the inputs to obtain multi-scale features. After that, an Adaptive Feature Fusion (AFF) block is used to adaptively fuse the multi-scale features with learnable weights. In addition, considering the correlation between the image and prompt, AMFF-Net compares the semantic features from text encoder and image encoder to evaluate the text-to-image alignment. We carry out extensive experiments on three AGI quality assessment databases, and the experimental results show that our AMFF-Net obtains better performance than nine state-of-the-art blind IQA methods. The results of ablation experiments further demonstrate the effectiveness of the proposed multi-scale input strategy and AFF block.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

用于人工智能生成的盲图像质量评估的自适应混合尺度特征融合网络

随着文本到图像和图像到图像生成模型的日益成熟，人工智能生成的图像（AGIs）在广告、娱乐、教育、社交媒体等领域显示出巨大的应用潜力。尽管在生成模型方面已经取得了令人瞩目的进展，但很少有人致力于设计相关的质量评估模型。在本文中，我们为 AGIs 提出了一种新型盲图像质量评估（IQA）网络，名为 AMFF-Net。AMFF-Net 从三个维度评估 AGI 质量，即 "视觉质量"、"真实性 "和 "一致性"。具体来说，AMFF-Net 受人类视觉系统特征的启发，并观察到 "视觉质量 "和 "真实性 "具有局部和全局两个方面的特征，因此将图像进行上下缩放，并将缩放后的图像和原始大小的图像作为输入，从而获得多尺度特征。然后，使用自适应特征融合（AFF）模块，利用可学习权重对多尺度特征进行自适应融合。此外，考虑到图像和提示之间的相关性，AMFF-Net 还会比较来自文本编码器和图像编码器的语义特征，以评估文本到图像的对齐情况。我们在三个 AGI 质量评估数据库上进行了大量实验，实验结果表明我们的 AMFF-Net 比九种最先进的盲 IQA 方法获得了更好的性能。消融实验结果进一步证明了所提出的多尺度输入策略和 AFF 块的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Transactions on Broadcasting 工程技术-电信学

CiteScore

9.40

自引率

31.10%

发文量

审稿时长

6-12 weeks

期刊介绍： The Society’s Field of Interest is “Devices, equipment, techniques and systems related to broadcast technology, including the production, distribution, transmission, and propagation aspects.” In addition to this formal FOI statement, which is used to provide guidance to the Publications Committee in the selection of content, the AdCom has further resolved that “broadcast systems includes all aspects of transmission, propagation, and reception.”

期刊最新文献

IEEE Transactions on Broadcasting Information for Readers and Authors Light Field Referring Segmentation: A Benchmark and an LLM-Based Approach Compression Efficiency and Picture Quality Assessment of Broadcast HDR Videos With and Without Film-Grain Table of Contents 2025 Scott Helt Memorial Award for the Best Paper Published in IEEE Transactions on Broadcasting