Video Inpainting Localization With Contrastive Learning

IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC IEEE Signal Processing Letters Pub Date : 2025-01-08 DOI:10.1109/LSP.2025.3527196
Zijie Lou;Gang Cao;Man Lin
{"title":"Video Inpainting Localization With Contrastive Learning","authors":"Zijie Lou;Gang Cao;Man Lin","doi":"10.1109/LSP.2025.3527196","DOIUrl":null,"url":null,"abstract":"Video inpainting techniques typically serve to restore destroyed or missing regions in digital videos. However, such techniques may also be illegally used to remove important objects for creating forged videos. This letter proposes a simple yet effective forensic scheme for Video Inpainting LOcalization with ContrAstive Learning (ViLocal). A 3D Uniformer encoder is applied to the video noise residual for learning effective spatiotemporal features. To enhance discriminative power, supervised contrastive learning is adopted to capture the local regional inconsistency through separating the pristine and inpainted pixels. The pixel-wise inpainting localization map is yielded by a lightweight convolution decoder with two-stage training. To prepare enough training samples, we build a video object segmentation dataset (VOS2k5) of 2500 videos with pixel-level annotations per frame. Extensive experimental results validate the superiority of ViLocal over the state-of-the-arts.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"611-615"},"PeriodicalIF":3.2000,"publicationDate":"2025-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Signal Processing Letters","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10833786/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

Abstract

Video inpainting techniques typically serve to restore destroyed or missing regions in digital videos. However, such techniques may also be illegally used to remove important objects for creating forged videos. This letter proposes a simple yet effective forensic scheme for Video Inpainting LOcalization with ContrAstive Learning (ViLocal). A 3D Uniformer encoder is applied to the video noise residual for learning effective spatiotemporal features. To enhance discriminative power, supervised contrastive learning is adopted to capture the local regional inconsistency through separating the pristine and inpainted pixels. The pixel-wise inpainting localization map is yielded by a lightweight convolution decoder with two-stage training. To prepare enough training samples, we build a video object segmentation dataset (VOS2k5) of 2500 videos with pixel-level annotations per frame. Extensive experimental results validate the superiority of ViLocal over the state-of-the-arts.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
视频绘画定位与对比学习
视频补漆技术通常用于恢复数字视频中被破坏或缺失的区域。然而,这种技术也可能被非法用于删除重要对象以制作伪造视频。本文提出了一种简单而有效的基于对比学习(ViLocal)的视频图像定位取证方案。将三维均匀编码器应用于视频噪声残差中,学习有效的时空特征。为了提高判别能力,采用监督对比学习方法,通过分离原始像素和填充像素来捕捉局部区域的不一致性。像素级的图像定位图是由一个轻量级的卷积解码器通过两阶段的训练生成的。为了准备足够的训练样本,我们构建了一个包含2500个视频的视频对象分割数据集(VOS2k5),每帧具有像素级注释。大量的实验结果验证了ViLocal的优越性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IEEE Signal Processing Letters
IEEE Signal Processing Letters 工程技术-工程:电子与电气
CiteScore
7.40
自引率
12.80%
发文量
339
审稿时长
2.8 months
期刊介绍: The IEEE Signal Processing Letters is a monthly, archival publication designed to provide rapid dissemination of original, cutting-edge ideas and timely, significant contributions in signal, image, speech, language and audio processing. Papers published in the Letters can be presented within one year of their appearance in signal processing conferences such as ICASSP, GlobalSIP and ICIP, and also in several workshop organized by the Signal Processing Society.
期刊最新文献
Heterogeneous Dual-Branch Emotional Consistency Network for Facial Expression Recognition Adaptive Superpixel-Guided Non-Homogeneous Image Dehazing Video Inpainting Localization With Contrastive Learning Cross-View Fusion for Multi-View Clustering Piecewise Student's t-distribution Mixture Model-Based Estimation for NAND Flash Memory Channels
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1