The emergence of advanced Deepfake technologies has gradually raised concerns in society, prompting significant attention to Deepfake detection. However, in real-world scenarios, Deepfakes often involve multiple faces. Despite this, most existing detection methods still detect these faces individually, overlooking the informative correlation between them and the relationship between the global information of the image and the local information of the faces. In this paper, we address this limitation by proposing FILTER, a novel framework for multi-face forgery detection that explicitly captures underlying correlations. FILTER consists of two main modules: Multi-face Relationship Learning (MRL) and Global Feature Aggregation (GFA). Specifically, MRL learns the correlation of local facial features in multi-face images, and GFA constructs the relationship between image-level labels and individual facial features to enhance performance from a global perspective. In particular, a contrastive learning loss function is used to better discriminate between real and fake faces. Extensive experiments on two publicly available multi-face forgery datasets demonstrate the state-of-the-art performance of FILTER in multi-face forgery detection. For example, on Openforensics Test-Challenge dataset, FILTER outperforms the previous state-of-the-art methods with a higher AUC score (0.980) and higher detection accuracy (92.04%).
{"title":"Exploiting Facial Relationships and Feature Aggregation for Multi-Face Forgery Detection","authors":"Chenhao Lin;Fangbin Yi;Hang Wang;Jingyi Deng;Zhengyu Zhao;Qian Li;Chao Shen","doi":"10.1109/TIFS.2024.3461469","DOIUrl":"10.1109/TIFS.2024.3461469","url":null,"abstract":"The emergence of advanced Deepfake technologies has gradually raised concerns in society, prompting significant attention to Deepfake detection. However, in real-world scenarios, Deepfakes often involve multiple faces. Despite this, most existing detection methods still detect these faces individually, overlooking the informative correlation between them and the relationship between the global information of the image and the local information of the faces. In this paper, we address this limitation by proposing FILTER, a novel framework for multi-face forgery detection that explicitly captures underlying correlations. FILTER consists of two main modules: Multi-face Relationship Learning (MRL) and Global Feature Aggregation (GFA). Specifically, MRL learns the correlation of local facial features in multi-face images, and GFA constructs the relationship between image-level labels and individual facial features to enhance performance from a global perspective. In particular, a contrastive learning loss function is used to better discriminate between real and fake faces. Extensive experiments on two publicly available multi-face forgery datasets demonstrate the state-of-the-art performance of FILTER in multi-face forgery detection. For example, on Openforensics Test-Challenge dataset, FILTER outperforms the previous state-of-the-art methods with a higher AUC score (0.980) and higher detection accuracy (92.04%).","PeriodicalId":13492,"journal":{"name":"IEEE Transactions on Information Forensics and Security","volume":"19 ","pages":"8832-8844"},"PeriodicalIF":6.3,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142313854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Single-shot face anti-spoofing (FAS) is a key technique for securing face recognition systems, relying solely on static images as input. However, single-shot FAS remains a challenging and under-explored problem due to two reasons: 1) On the data side, learning FAS from RGB images is largely context-dependent, and single-shot images without additional annotations contain limited semantic information. 2) On the model side, existing single-shot FAS models struggle to provide proper evidence for their decisions, and FAS methods based on depth estimation require expensive per-pixel annotations. To address these issues, we construct and release a large binocular NIR image dataset named BNI-FAS, which contains more than 300,000 real face and plane attack images, and propose an Interpretable FAS Transformer (IFAST) that requires only weak supervision to produce interpretable predictions. Our IFAST generates pixel-wise disparity maps using the proposed disparity estimation Transformer with Dynamic Matching Attention (DMA) blocks. Besides, we design a confidence map generator to work in tandem with a dual-teacher distillation module to obtain the final discriminant results. Comprehensive experiments show that our IFAST achieves state-of-the-art performance on BNI-FAS, verifying its effectiveness of single-shot FAS on binocular NIR images. The project page is available at https://ifast-bni.github.io/