M2TR: Multi-modal Multi-scale Transformers for Deepfake Detection

Proceedings of the 2022 International Conference on Multimedia Retrieval Pub Date : 2021-04-20 DOI:10.1145/3512527.3531415

Junke Wang, Zuxuan Wu, Jingjing Chen, Yu-Gang Jiang

引用次数: 110

Abstract

The widespread dissemination of Deepfakes demands effective approaches that can detect perceptually convincing forged images. In this paper, we aim to capture the subtle manipulation artifacts at different scales using transformer models. In particular, we introduce a Multi-modal Multi-scale TRansformer (M2TR), which operates on patches of different sizes to detect local inconsistencies in images at different spatial levels. M2TR further learns to detect forgery artifacts in the frequency domain to complement RGB information through a carefully designed cross modality fusion block. In addition, to stimulate Deepfake detection research, we introduce a high-quality Deepfake dataset, SR-DF, which consists of 4,000 DeepFake videos generated by state-of-the-art face swapping and facial reenactment methods. We conduct extensive experiments to verify the effectiveness of the proposed method, which outperforms state-of-the-art Deepfake detection methods by clear margins.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

M2TR:用于深度伪造检测的多模态多尺度变压器

Deepfakes的广泛传播需要有效的方法来检测感知上令人信服的伪造图像。在本文中，我们的目标是使用变压器模型在不同的尺度上捕获微妙的操作工件。特别地，我们引入了一种多模态多尺度变压器(M2TR)，它对不同大小的斑块进行操作，以检测不同空间水平图像中的局部不一致。M2TR进一步学习在频域检测伪造文物，通过精心设计的交叉模态融合块来补充RGB信息。此外，为了刺激Deepfake检测研究，我们引入了一个高质量的Deepfake数据集SR-DF，该数据集由4000个Deepfake视频组成，这些视频是通过最先进的面部交换和面部再现方法生成的。我们进行了大量的实验来验证所提出方法的有效性，该方法明显优于最先进的Deepfake检测方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 2022 International Conference on Multimedia Retrieval

自引率

0.00%

发文量