{"title":"Domain-invariant and Patch-discriminative Feature Learning for General Deepfake Detection","authors":"Jian Zhang, Jiangqun Ni, Fan Nie, jiwu Huang","doi":"10.1145/3657297","DOIUrl":null,"url":null,"abstract":"<p>Hyper-realistic avatars in the metaverse have already raised security concerns about deepfake techniques, deepfakes involving generated video “recording” may be mistaken for a real recording of the people it depicts. As a result, deepfake detection has drawn considerable attention in the multimedia forensic community. Though existing methods for deepfake detection achieve fairly good performance under the intra-dataset scenario, many of them gain unsatisfying results in the case of cross-dataset testing with more practical value, where the forged faces in training and testing datasets are from different domains. To tackle this issue, in this paper, we propose a novel Domain-Invariant and Patch-Discriminative feature learning framework - DI&PD. For image-level feature learning, a single-side adversarial domain generalization is introduced to eliminate domain variances and learn domain-invariant features in training samples from different manipulation methods, along with the global and local random crop augmentation strategy to generate more data views of forged images at various scales. A graph structure is then built by splitting the learned image-level feature maps, with each spatial location corresponding to a local patch, which facilitates patch representation learning by message-passing among similar nodes. Two types of center losses are utilized to learn more discriminative features in both image-level and patch-level embedding spaces. Extensive experimental results on several datasets demonstrate the effectiveness and generalization of the proposed method compared with other state-of-the-art methods.</p>","PeriodicalId":50937,"journal":{"name":"ACM Transactions on Multimedia Computing Communications and Applications","volume":"100 1","pages":""},"PeriodicalIF":5.2000,"publicationDate":"2024-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Multimedia Computing Communications and Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3657297","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Hyper-realistic avatars in the metaverse have already raised security concerns about deepfake techniques, deepfakes involving generated video “recording” may be mistaken for a real recording of the people it depicts. As a result, deepfake detection has drawn considerable attention in the multimedia forensic community. Though existing methods for deepfake detection achieve fairly good performance under the intra-dataset scenario, many of them gain unsatisfying results in the case of cross-dataset testing with more practical value, where the forged faces in training and testing datasets are from different domains. To tackle this issue, in this paper, we propose a novel Domain-Invariant and Patch-Discriminative feature learning framework - DI&PD. For image-level feature learning, a single-side adversarial domain generalization is introduced to eliminate domain variances and learn domain-invariant features in training samples from different manipulation methods, along with the global and local random crop augmentation strategy to generate more data views of forged images at various scales. A graph structure is then built by splitting the learned image-level feature maps, with each spatial location corresponding to a local patch, which facilitates patch representation learning by message-passing among similar nodes. Two types of center losses are utilized to learn more discriminative features in both image-level and patch-level embedding spaces. Extensive experimental results on several datasets demonstrate the effectiveness and generalization of the proposed method compared with other state-of-the-art methods.
期刊介绍:
The ACM Transactions on Multimedia Computing, Communications, and Applications is the flagship publication of the ACM Special Interest Group in Multimedia (SIGMM). It is soliciting paper submissions on all aspects of multimedia. Papers on single media (for instance, audio, video, animation) and their processing are also welcome.
TOMM is a peer-reviewed, archival journal, available in both print form and digital form. The Journal is published quarterly; with roughly 7 23-page articles in each issue. In addition, all Special Issues are published online-only to ensure a timely publication. The transactions consists primarily of research papers. This is an archival journal and it is intended that the papers will have lasting importance and value over time. In general, papers whose primary focus is on particular multimedia products or the current state of the industry will not be included.