{"title":"Bi-Stream Coteaching Network for Weakly-Supervised Deepfake Localization in Videos","authors":"Zhaoyang Li;Zhu Teng;Baopeng Zhang;Jianping Fan","doi":"10.1109/TIFS.2025.3533906","DOIUrl":null,"url":null,"abstract":"With the rapid evolution of deepfake technologies, attackers can arbitrarily alter the intended message of a video by modifying just a few frames. To this extent, simplistic binary judgments of entire videos increasingly seem less convincing and interpretable. Although numerous efforts have been made to develop fine-grained interpretations, these typically depend on elaborate annotations, which are both costly and challenging to obtain in real-world scenarios. To push the related frontier research, we introduce a novel task called Weakly-Supervised Deepfake Localization (WSDL), which aims to identify manipulated frames only with cushy video-level labels. Meanwhile, we propose a new framework named Bi-stream coteaching Deepfake Localization (CoDL), which advances the WSDL task through a progressive mutual refinement strategy across complementary spatial and temporal modalities. The CoDL framework incorporates an inconsistency perception module that discerns subtle forgeries by assessing spatial and temporal incoherence, and a prototype-based enhancement module that mitigates frame noise and amplifies discrepancies to create a robust feature space. Additionally, a progressive coteaching mechanism is implemented to facilitate the exchange of valuable knowledge between modalities, enhancing the detection of subtle frame-level forgery features and thereby improving the model’s generalization capabilities. Extensive experiments are conducted to demonstrate the superiority of our approach, particularly achieving an impressive 8.83% improvement in AUC on highly compressed datasets when learning from weak supervision.","PeriodicalId":13492,"journal":{"name":"IEEE Transactions on Information Forensics and Security","volume":"20 ","pages":"1724-1738"},"PeriodicalIF":6.3000,"publicationDate":"2025-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Information Forensics and Security","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10880117/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
With the rapid evolution of deepfake technologies, attackers can arbitrarily alter the intended message of a video by modifying just a few frames. To this extent, simplistic binary judgments of entire videos increasingly seem less convincing and interpretable. Although numerous efforts have been made to develop fine-grained interpretations, these typically depend on elaborate annotations, which are both costly and challenging to obtain in real-world scenarios. To push the related frontier research, we introduce a novel task called Weakly-Supervised Deepfake Localization (WSDL), which aims to identify manipulated frames only with cushy video-level labels. Meanwhile, we propose a new framework named Bi-stream coteaching Deepfake Localization (CoDL), which advances the WSDL task through a progressive mutual refinement strategy across complementary spatial and temporal modalities. The CoDL framework incorporates an inconsistency perception module that discerns subtle forgeries by assessing spatial and temporal incoherence, and a prototype-based enhancement module that mitigates frame noise and amplifies discrepancies to create a robust feature space. Additionally, a progressive coteaching mechanism is implemented to facilitate the exchange of valuable knowledge between modalities, enhancing the detection of subtle frame-level forgery features and thereby improving the model’s generalization capabilities. Extensive experiments are conducted to demonstrate the superiority of our approach, particularly achieving an impressive 8.83% improvement in AUC on highly compressed datasets when learning from weak supervision.
期刊介绍:
The IEEE Transactions on Information Forensics and Security covers the sciences, technologies, and applications relating to information forensics, information security, biometrics, surveillance and systems applications that incorporate these features