用于一般深度伪造检测的域不变性和斑块判别特征学习

IF 5.2 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS ACM Transactions on Multimedia Computing Communications and Applications Pub Date : 2024-04-27 DOI:10.1145/3657297

Jian Zhang, Jiangqun Ni, Fan Nie, jiwu Huang

{"title":"用于一般深度伪造检测的域不变性和斑块判别特征学习","authors":"Jian Zhang, Jiangqun Ni, Fan Nie, jiwu Huang","doi":"10.1145/3657297","DOIUrl":null,"url":null,"abstract":"<p>Hyper-realistic avatars in the metaverse have already raised security concerns about deepfake techniques, deepfakes involving generated video “recording” may be mistaken for a real recording of the people it depicts. As a result, deepfake detection has drawn considerable attention in the multimedia forensic community. Though existing methods for deepfake detection achieve fairly good performance under the intra-dataset scenario, many of them gain unsatisfying results in the case of cross-dataset testing with more practical value, where the forged faces in training and testing datasets are from different domains. To tackle this issue, in this paper, we propose a novel Domain-Invariant and Patch-Discriminative feature learning framework - DI&PD. For image-level feature learning, a single-side adversarial domain generalization is introduced to eliminate domain variances and learn domain-invariant features in training samples from different manipulation methods, along with the global and local random crop augmentation strategy to generate more data views of forged images at various scales. A graph structure is then built by splitting the learned image-level feature maps, with each spatial location corresponding to a local patch, which facilitates patch representation learning by message-passing among similar nodes. Two types of center losses are utilized to learn more discriminative features in both image-level and patch-level embedding spaces. Extensive experimental results on several datasets demonstrate the effectiveness and generalization of the proposed method compared with other state-of-the-art methods.</p>","PeriodicalId":50937,"journal":{"name":"ACM Transactions on Multimedia Computing Communications and Applications","volume":"100 1","pages":""},"PeriodicalIF":5.2000,"publicationDate":"2024-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Domain-invariant and Patch-discriminative Feature Learning for General Deepfake Detection\",\"authors\":\"Jian Zhang, Jiangqun Ni, Fan Nie, jiwu Huang\",\"doi\":\"10.1145/3657297\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Hyper-realistic avatars in the metaverse have already raised security concerns about deepfake techniques, deepfakes involving generated video “recording” may be mistaken for a real recording of the people it depicts. As a result, deepfake detection has drawn considerable attention in the multimedia forensic community. Though existing methods for deepfake detection achieve fairly good performance under the intra-dataset scenario, many of them gain unsatisfying results in the case of cross-dataset testing with more practical value, where the forged faces in training and testing datasets are from different domains. To tackle this issue, in this paper, we propose a novel Domain-Invariant and Patch-Discriminative feature learning framework - DI&PD. For image-level feature learning, a single-side adversarial domain generalization is introduced to eliminate domain variances and learn domain-invariant features in training samples from different manipulation methods, along with the global and local random crop augmentation strategy to generate more data views of forged images at various scales. A graph structure is then built by splitting the learned image-level feature maps, with each spatial location corresponding to a local patch, which facilitates patch representation learning by message-passing among similar nodes. Two types of center losses are utilized to learn more discriminative features in both image-level and patch-level embedding spaces. Extensive experimental results on several datasets demonstrate the effectiveness and generalization of the proposed method compared with other state-of-the-art methods.</p>\",\"PeriodicalId\":50937,\"journal\":{\"name\":\"ACM Transactions on Multimedia Computing Communications and Applications\",\"volume\":\"100 1\",\"pages\":\"\"},\"PeriodicalIF\":5.2000,\"publicationDate\":\"2024-04-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Multimedia Computing Communications and Applications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1145/3657297\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Multimedia Computing Communications and Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3657297","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

元宇宙中的超逼真头像已经引起了人们对深度伪造技术的安全担忧，涉及生成的视频 "记录 "的深度伪造可能会被误认为是所描述人物的真实记录。因此，深度伪造检测引起了多媒体取证界的极大关注。虽然现有的深度伪造检测方法在数据集内的情况下取得了相当好的性能，但在更具实用价值的跨数据集测试中，即训练数据集和测试数据集中的伪造人脸来自不同领域的情况下，许多方法的结果并不令人满意。为了解决这个问题，我们在本文中提出了一个新颖的领域不变和斑块判别特征学习框架--DI&PD。在图像级特征学习方面，我们引入了单侧对抗域泛化来消除域变异，并在来自不同操作方法的训练样本中学习域不变特征，同时采用全局和局部随机裁剪增强策略来生成更多不同尺度伪造图像的数据视图。然后，通过分割学习到的图像级特征图来构建图结构，每个空间位置对应一个局部补丁，从而通过相似节点之间的信息传递来促进补丁表示学习。利用两种类型的中心损失，可以在图像级和补丁级嵌入空间中学习到更具区分性的特征。在多个数据集上进行的大量实验结果表明，与其他最先进的方法相比，所提出的方法既有效又具有普适性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Domain-invariant and Patch-discriminative Feature Learning for General Deepfake Detection

Hyper-realistic avatars in the metaverse have already raised security concerns about deepfake techniques, deepfakes involving generated video “recording” may be mistaken for a real recording of the people it depicts. As a result, deepfake detection has drawn considerable attention in the multimedia forensic community. Though existing methods for deepfake detection achieve fairly good performance under the intra-dataset scenario, many of them gain unsatisfying results in the case of cross-dataset testing with more practical value, where the forged faces in training and testing datasets are from different domains. To tackle this issue, in this paper, we propose a novel Domain-Invariant and Patch-Discriminative feature learning framework - DI&PD. For image-level feature learning, a single-side adversarial domain generalization is introduced to eliminate domain variances and learn domain-invariant features in training samples from different manipulation methods, along with the global and local random crop augmentation strategy to generate more data views of forged images at various scales. A graph structure is then built by splitting the learned image-level feature maps, with each spatial location corresponding to a local patch, which facilitates patch representation learning by message-passing among similar nodes. Two types of center losses are utilized to learn more discriminative features in both image-level and patch-level embedding spaces. Extensive experimental results on several datasets demonstrate the effectiveness and generalization of the proposed method compared with other state-of-the-art methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM Transactions on Multimedia Computing Communications and Applications 工程技术-计算机：理论方法

CiteScore

8.50

自引率

5.90%

发文量

285

审稿时长

7.5 months

期刊介绍： The ACM Transactions on Multimedia Computing, Communications, and Applications is the flagship publication of the ACM Special Interest Group in Multimedia (SIGMM). It is soliciting paper submissions on all aspects of multimedia. Papers on single media (for instance, audio, video, animation) and their processing are also welcome. TOMM is a peer-reviewed, archival journal, available in both print form and digital form. The Journal is published quarterly; with roughly 7 23-page articles in each issue. In addition, all Special Issues are published online-only to ensure a timely publication. The transactions consists primarily of research papers. This is an archival journal and it is intended that the papers will have lasting importance and value over time. In general, papers whose primary focus is on particular multimedia products or the current state of the industry will not be included.