Robustifying vision transformer for image forgery localization with multi-exit architectures

IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pattern Recognition Pub Date : 2025-08-01 Epub Date: 2025-03-12 DOI:10.1016/j.patcog.2025.111565
Zenan Shi , Haipeng Chen , Dong Zhang
{"title":"Robustifying vision transformer for image forgery localization with multi-exit architectures","authors":"Zenan Shi ,&nbsp;Haipeng Chen ,&nbsp;Dong Zhang","doi":"10.1016/j.patcog.2025.111565","DOIUrl":null,"url":null,"abstract":"<div><div>The proliferation of image manipulation tools has led to an increase in the number of manipulated images being disseminated online, posing risks like the propagation of fake news and telecom fraud. Thus, there is an increasing demand for precise, generic, and robust methods for detecting and locating manipulated images. In this paper, we propose a simple and clean model, named MEAFormer, for image forgery localization that does not heavily rely on pre-trained models. MEAFormer comprises three main components: an <em>encoder network</em>, a <em>neck network</em>, and a <em>decoder network</em>. Specifically, the transformer-based <em>encoder network</em> extracts hierarchical feature representations from the input image, providing rich contextual information in each layer. The <em>neck network</em>, incorporating our proposed cross-layer feature aggregation (CFA), aggregates these hierarchical features. To achieve better spatial feature co-occurrence, instead of using noise or edge artifacts, we introduce a multi-scale graph reasoning (MGR) module within the <em>decoder network</em> via bipartite graphs over the encoder and decoder features in a multi-scale fashion. The cross-level enhancement (CLE) further performs adjacent-level feature fusion to amplify the regions of interest in aggregated manipulation features. Finally, the multi-exit architecture (MEA) guides the model to learn fine-grained features and segment out the manipulated region. Extensive experiments across diverse and challenging datasets conclusively establish the superiority of MEAFormer over existing state-of-the-art methods, excelling in accuracy, generalization, and robustness.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"164 ","pages":"Article 111565"},"PeriodicalIF":7.6000,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320325002250","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/3/12 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

The proliferation of image manipulation tools has led to an increase in the number of manipulated images being disseminated online, posing risks like the propagation of fake news and telecom fraud. Thus, there is an increasing demand for precise, generic, and robust methods for detecting and locating manipulated images. In this paper, we propose a simple and clean model, named MEAFormer, for image forgery localization that does not heavily rely on pre-trained models. MEAFormer comprises three main components: an encoder network, a neck network, and a decoder network. Specifically, the transformer-based encoder network extracts hierarchical feature representations from the input image, providing rich contextual information in each layer. The neck network, incorporating our proposed cross-layer feature aggregation (CFA), aggregates these hierarchical features. To achieve better spatial feature co-occurrence, instead of using noise or edge artifacts, we introduce a multi-scale graph reasoning (MGR) module within the decoder network via bipartite graphs over the encoder and decoder features in a multi-scale fashion. The cross-level enhancement (CLE) further performs adjacent-level feature fusion to amplify the regions of interest in aggregated manipulation features. Finally, the multi-exit architecture (MEA) guides the model to learn fine-grained features and segment out the manipulated region. Extensive experiments across diverse and challenging datasets conclusively establish the superiority of MEAFormer over existing state-of-the-art methods, excelling in accuracy, generalization, and robustness.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于多出口结构的图像伪造定位鲁棒视觉变压器
图像处理工具的激增导致被操纵的图像在网上传播的数量增加,带来了假新闻传播和电信欺诈等风险。因此,对精确、通用和鲁棒的检测和定位被操纵图像的方法的需求越来越大。在本文中,我们提出了一个简单而干净的模型,称为MEAFormer,用于图像伪造定位,不严重依赖于预训练模型。MEAFormer主要由三个部分组成:编码器网络、颈网络和解码器网络。具体来说,基于变压器的编码器网络从输入图像中提取分层特征表示,在每一层提供丰富的上下文信息。颈部网络,结合我们提出的跨层特征聚合(CFA),聚合这些分层特征。为了实现更好的空间特征共现,我们在解码器网络中引入了一个多尺度图推理(MGR)模块,该模块以多尺度方式通过编码器和解码器特征上的二部图来实现。交叉层增强(cross-level enhancement, CLE)进一步进行邻接层特征融合,放大聚合操作特征中的感兴趣区域。最后,多出口结构(MEA)引导模型学习细粒度特征并分割出被操纵区域。在不同和具有挑战性的数据集上进行的大量实验最终确定了MEAFormer优于现有最先进的方法,在准确性,泛化和鲁棒性方面表现出色。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Pattern Recognition
Pattern Recognition 工程技术-工程:电子与电气
CiteScore
14.40
自引率
16.20%
发文量
683
审稿时长
5.6 months
期刊介绍: The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.
期刊最新文献
IrisMAE: Structure-aware masked image modeling for iris recognition Minimizing the pretraining gap: Domain-aligned text-based person retrieval Stealthy backdoor attack method targeting group fairness in self-supervised learning Single-domain generalization for fastener detection via sample reconstruction and class-wise domain contrast EdgeFusionNet: Edge information-guided small object detection for remote sensing images
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1