基于双流协同教学网络的视频弱监督深度假定位

IF 8 1区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS IEEE Transactions on Information Forensics and Security Pub Date : 2025-02-11 DOI:10.1109/TIFS.2025.3533906

Zhaoyang Li;Zhu Teng;Baopeng Zhang;Jianping Fan

{"title":"基于双流协同教学网络的视频弱监督深度假定位","authors":"Zhaoyang Li;Zhu Teng;Baopeng Zhang;Jianping Fan","doi":"10.1109/TIFS.2025.3533906","DOIUrl":null,"url":null,"abstract":"With the rapid evolution of deepfake technologies, attackers can arbitrarily alter the intended message of a video by modifying just a few frames. To this extent, simplistic binary judgments of entire videos increasingly seem less convincing and interpretable. Although numerous efforts have been made to develop fine-grained interpretations, these typically depend on elaborate annotations, which are both costly and challenging to obtain in real-world scenarios. To push the related frontier research, we introduce a novel task called Weakly-Supervised Deepfake Localization (WSDL), which aims to identify manipulated frames only with cushy video-level labels. Meanwhile, we propose a new framework named Bi-stream coteaching Deepfake Localization (CoDL), which advances the WSDL task through a progressive mutual refinement strategy across complementary spatial and temporal modalities. The CoDL framework incorporates an inconsistency perception module that discerns subtle forgeries by assessing spatial and temporal incoherence, and a prototype-based enhancement module that mitigates frame noise and amplifies discrepancies to create a robust feature space. Additionally, a progressive coteaching mechanism is implemented to facilitate the exchange of valuable knowledge between modalities, enhancing the detection of subtle frame-level forgery features and thereby improving the model’s generalization capabilities. Extensive experiments are conducted to demonstrate the superiority of our approach, particularly achieving an impressive 8.83% improvement in AUC on highly compressed datasets when learning from weak supervision.","PeriodicalId":13492,"journal":{"name":"IEEE Transactions on Information Forensics and Security","volume":"20 ","pages":"1724-1738"},"PeriodicalIF":8.0000,"publicationDate":"2025-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Bi-Stream Coteaching Network for Weakly-Supervised Deepfake Localization in Videos\",\"authors\":\"Zhaoyang Li;Zhu Teng;Baopeng Zhang;Jianping Fan\",\"doi\":\"10.1109/TIFS.2025.3533906\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the rapid evolution of deepfake technologies, attackers can arbitrarily alter the intended message of a video by modifying just a few frames. To this extent, simplistic binary judgments of entire videos increasingly seem less convincing and interpretable. Although numerous efforts have been made to develop fine-grained interpretations, these typically depend on elaborate annotations, which are both costly and challenging to obtain in real-world scenarios. To push the related frontier research, we introduce a novel task called Weakly-Supervised Deepfake Localization (WSDL), which aims to identify manipulated frames only with cushy video-level labels. Meanwhile, we propose a new framework named Bi-stream coteaching Deepfake Localization (CoDL), which advances the WSDL task through a progressive mutual refinement strategy across complementary spatial and temporal modalities. The CoDL framework incorporates an inconsistency perception module that discerns subtle forgeries by assessing spatial and temporal incoherence, and a prototype-based enhancement module that mitigates frame noise and amplifies discrepancies to create a robust feature space. Additionally, a progressive coteaching mechanism is implemented to facilitate the exchange of valuable knowledge between modalities, enhancing the detection of subtle frame-level forgery features and thereby improving the model’s generalization capabilities. Extensive experiments are conducted to demonstrate the superiority of our approach, particularly achieving an impressive 8.83% improvement in AUC on highly compressed datasets when learning from weak supervision.\",\"PeriodicalId\":13492,\"journal\":{\"name\":\"IEEE Transactions on Information Forensics and Security\",\"volume\":\"20 \",\"pages\":\"1724-1738\"},\"PeriodicalIF\":8.0000,\"publicationDate\":\"2025-02-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Information Forensics and Security\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10880117/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Information Forensics and Security","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10880117/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

摘要

随着深度伪造技术的快速发展，攻击者可以通过修改几帧来任意改变视频的预期信息。在这种程度上，对整个视频的简单化的二元判断似乎越来越缺乏说服力和可解释性。尽管已经为开发细粒度的解释做出了大量努力，但这些解释通常依赖于精细的注释，而在实际场景中获得这些注释既昂贵又具有挑战性。为了推动相关的前沿研究，我们引入了一种名为弱监督深度伪造定位（WSDL）的新任务，该任务旨在仅用轻松的视频级标签识别被操纵的帧。同时，我们提出了一个名为双流协同教学深度假定位（CoDL）的新框架，该框架通过一种跨互补空间和时间模式的渐进相互细化策略来推进WSDL任务。CoDL框架包含一个不一致感知模块，该模块通过评估空间和时间的不一致性来识别微妙的伪造，以及一个基于原型的增强模块，该模块可以减轻框架噪声并放大差异，从而创建一个健壮的特征空间。此外，实现了渐进的协同教学机制，以促进模式之间有价值的知识交换，增强对细微帧级伪造特征的检测，从而提高模型的泛化能力。我们进行了大量的实验来证明我们的方法的优越性，特别是当从弱监督学习时，在高度压缩的数据集上实现了令人印象深刻的8.83%的AUC改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Bi-Stream Coteaching Network for Weakly-Supervised Deepfake Localization in Videos

With the rapid evolution of deepfake technologies, attackers can arbitrarily alter the intended message of a video by modifying just a few frames. To this extent, simplistic binary judgments of entire videos increasingly seem less convincing and interpretable. Although numerous efforts have been made to develop fine-grained interpretations, these typically depend on elaborate annotations, which are both costly and challenging to obtain in real-world scenarios. To push the related frontier research, we introduce a novel task called Weakly-Supervised Deepfake Localization (WSDL), which aims to identify manipulated frames only with cushy video-level labels. Meanwhile, we propose a new framework named Bi-stream coteaching Deepfake Localization (CoDL), which advances the WSDL task through a progressive mutual refinement strategy across complementary spatial and temporal modalities. The CoDL framework incorporates an inconsistency perception module that discerns subtle forgeries by assessing spatial and temporal incoherence, and a prototype-based enhancement module that mitigates frame noise and amplifies discrepancies to create a robust feature space. Additionally, a progressive coteaching mechanism is implemented to facilitate the exchange of valuable knowledge between modalities, enhancing the detection of subtle frame-level forgery features and thereby improving the model’s generalization capabilities. Extensive experiments are conducted to demonstrate the superiority of our approach, particularly achieving an impressive 8.83% improvement in AUC on highly compressed datasets when learning from weak supervision.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Information Forensics and Security 工程技术-工程：电子与电气

CiteScore

14.40

自引率

7.40%

发文量

234

审稿时长

6.5 months

期刊介绍： The IEEE Transactions on Information Forensics and Security covers the sciences, technologies, and applications relating to information forensics, information security, biometrics, surveillance and systems applications that incorporate these features