用于人工智能生成的盲图像质量评估的自适应混合尺度特征融合网络

IF 3.2 1区 计算机科学 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC IEEE Transactions on Broadcasting Pub Date : 2024-03-06 DOI:10.1109/TBC.2024.3391060
Tianwei Zhou;Songbai Tan;Wei Zhou;Yu Luo;Yuan-Gen Wang;Guanghui Yue
{"title":"用于人工智能生成的盲图像质量评估的自适应混合尺度特征融合网络","authors":"Tianwei Zhou;Songbai Tan;Wei Zhou;Yu Luo;Yuan-Gen Wang;Guanghui Yue","doi":"10.1109/TBC.2024.3391060","DOIUrl":null,"url":null,"abstract":"With the increasing maturity of the text-to-image and image-to-image generative models, AI-generated images (AGIs) have shown great application potential in advertisement, entertainment, education, social media, etc. Although remarkable advancements have been achieved in generative models, very few efforts have been paid to design relevant quality assessment models. In this paper, we propose a novel blind image quality assessment (IQA) network, named AMFF-Net, for AGIs. AMFF-Net evaluates AGI quality from three dimensions, i.e., “visual quality”, “authenticity”, and “consistency”. Specifically, inspired by the characteristics of the human visual system and motivated by the observation that “visual quality” and “authenticity” are characterized by both local and global aspects, AMFF-Net scales the image up and down and takes the scaled images and original-sized image as the inputs to obtain multi-scale features. After that, an Adaptive Feature Fusion (AFF) block is used to adaptively fuse the multi-scale features with learnable weights. In addition, considering the correlation between the image and prompt, AMFF-Net compares the semantic features from text encoder and image encoder to evaluate the text-to-image alignment. We carry out extensive experiments on three AGI quality assessment databases, and the experimental results show that our AMFF-Net obtains better performance than nine state-of-the-art blind IQA methods. The results of ablation experiments further demonstrate the effectiveness of the proposed multi-scale input strategy and AFF block.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"70 3","pages":"833-843"},"PeriodicalIF":3.2000,"publicationDate":"2024-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Adaptive Mixed-Scale Feature Fusion Network for Blind AI-Generated Image Quality Assessment\",\"authors\":\"Tianwei Zhou;Songbai Tan;Wei Zhou;Yu Luo;Yuan-Gen Wang;Guanghui Yue\",\"doi\":\"10.1109/TBC.2024.3391060\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the increasing maturity of the text-to-image and image-to-image generative models, AI-generated images (AGIs) have shown great application potential in advertisement, entertainment, education, social media, etc. Although remarkable advancements have been achieved in generative models, very few efforts have been paid to design relevant quality assessment models. In this paper, we propose a novel blind image quality assessment (IQA) network, named AMFF-Net, for AGIs. AMFF-Net evaluates AGI quality from three dimensions, i.e., “visual quality”, “authenticity”, and “consistency”. Specifically, inspired by the characteristics of the human visual system and motivated by the observation that “visual quality” and “authenticity” are characterized by both local and global aspects, AMFF-Net scales the image up and down and takes the scaled images and original-sized image as the inputs to obtain multi-scale features. After that, an Adaptive Feature Fusion (AFF) block is used to adaptively fuse the multi-scale features with learnable weights. In addition, considering the correlation between the image and prompt, AMFF-Net compares the semantic features from text encoder and image encoder to evaluate the text-to-image alignment. We carry out extensive experiments on three AGI quality assessment databases, and the experimental results show that our AMFF-Net obtains better performance than nine state-of-the-art blind IQA methods. The results of ablation experiments further demonstrate the effectiveness of the proposed multi-scale input strategy and AFF block.\",\"PeriodicalId\":13159,\"journal\":{\"name\":\"IEEE Transactions on Broadcasting\",\"volume\":\"70 3\",\"pages\":\"833-843\"},\"PeriodicalIF\":3.2000,\"publicationDate\":\"2024-03-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Broadcasting\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10520989/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Broadcasting","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10520989/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

摘要

随着文本到图像和图像到图像生成模型的日益成熟,人工智能生成的图像(AGIs)在广告、娱乐、教育、社交媒体等领域显示出巨大的应用潜力。尽管在生成模型方面已经取得了令人瞩目的进展,但很少有人致力于设计相关的质量评估模型。在本文中,我们为 AGIs 提出了一种新型盲图像质量评估(IQA)网络,名为 AMFF-Net。AMFF-Net 从三个维度评估 AGI 质量,即 "视觉质量"、"真实性 "和 "一致性"。具体来说,AMFF-Net 受人类视觉系统特征的启发,并观察到 "视觉质量 "和 "真实性 "具有局部和全局两个方面的特征,因此将图像进行上下缩放,并将缩放后的图像和原始大小的图像作为输入,从而获得多尺度特征。然后,使用自适应特征融合(AFF)模块,利用可学习权重对多尺度特征进行自适应融合。此外,考虑到图像和提示之间的相关性,AMFF-Net 还会比较来自文本编码器和图像编码器的语义特征,以评估文本到图像的对齐情况。我们在三个 AGI 质量评估数据库上进行了大量实验,实验结果表明我们的 AMFF-Net 比九种最先进的盲 IQA 方法获得了更好的性能。消融实验结果进一步证明了所提出的多尺度输入策略和 AFF 块的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Adaptive Mixed-Scale Feature Fusion Network for Blind AI-Generated Image Quality Assessment
With the increasing maturity of the text-to-image and image-to-image generative models, AI-generated images (AGIs) have shown great application potential in advertisement, entertainment, education, social media, etc. Although remarkable advancements have been achieved in generative models, very few efforts have been paid to design relevant quality assessment models. In this paper, we propose a novel blind image quality assessment (IQA) network, named AMFF-Net, for AGIs. AMFF-Net evaluates AGI quality from three dimensions, i.e., “visual quality”, “authenticity”, and “consistency”. Specifically, inspired by the characteristics of the human visual system and motivated by the observation that “visual quality” and “authenticity” are characterized by both local and global aspects, AMFF-Net scales the image up and down and takes the scaled images and original-sized image as the inputs to obtain multi-scale features. After that, an Adaptive Feature Fusion (AFF) block is used to adaptively fuse the multi-scale features with learnable weights. In addition, considering the correlation between the image and prompt, AMFF-Net compares the semantic features from text encoder and image encoder to evaluate the text-to-image alignment. We carry out extensive experiments on three AGI quality assessment databases, and the experimental results show that our AMFF-Net obtains better performance than nine state-of-the-art blind IQA methods. The results of ablation experiments further demonstrate the effectiveness of the proposed multi-scale input strategy and AFF block.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
IEEE Transactions on Broadcasting
IEEE Transactions on Broadcasting 工程技术-电信学
CiteScore
9.40
自引率
31.10%
发文量
79
审稿时长
6-12 weeks
期刊介绍: The Society’s Field of Interest is “Devices, equipment, techniques and systems related to broadcast technology, including the production, distribution, transmission, and propagation aspects.” In addition to this formal FOI statement, which is used to provide guidance to the Publications Committee in the selection of content, the AdCom has further resolved that “broadcast systems includes all aspects of transmission, propagation, and reception.”
期刊最新文献
Front Cover Table of Contents Table of Contents IEEE Transactions on Broadcasting Information for Authors IEEE Transactions on Broadcasting Information for Authors
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1