CRCL: Causal Representation Consistency Learning for Anomaly Detection in Surveillance Videos

Yang Liu;Hongjin Wang;Zepu Wang;Xiaoguang Zhu;Jing Liu;Peng Sun;Rui Tang;Jianwei Du;Victor C. M. Leung;Liang Song
{"title":"CRCL: Causal Representation Consistency Learning for Anomaly Detection in Surveillance Videos","authors":"Yang Liu;Hongjin Wang;Zepu Wang;Xiaoguang Zhu;Jing Liu;Peng Sun;Rui Tang;Jianwei Du;Victor C. M. Leung;Liang Song","doi":"10.1109/TIP.2025.3558089","DOIUrl":null,"url":null,"abstract":"Video Anomaly Detection (VAD) remains a fundamental yet formidable task in the video understanding community, with promising applications in areas such as information forensics and public safety protection. Due to the rarity and diversity of anomalies, existing methods only use easily collected regular events to model the inherent normality of normal spatial-temporal patterns in an unsupervised manner. Although such methods have made significant progress benefiting from the development of deep learning, they attempt to model the statistical dependency between observable videos and semantic labels, which is a crude description of normality and lacks a systematic exploration of its underlying causal relationships. Previous studies have shown that existing unsupervised VAD models are incapable of label-independent data offsets (e.g., scene changes) in real-world scenarios and may fail to respond to light anomalies due to the overgeneralization of deep neural networks. Inspired by causality learning, we argue that there exist causal factors that can adequately generalize the prototypical patterns of regular events and present significant deviations when anomalous instances occur. In this regard, we propose Causal Representation Consistency Learning (CRCL) to implicitly mine potential scene-robust causal variable in unsupervised video normality learning. Specifically, building on the structural causal models, we propose scene-debiasing learning and causality-inspired normality learning to strip away entangled scene bias in deep representations and learn causal video normality, respectively. Extensive experiments on benchmarks validate the superiority of our method over conventional deep representation learning. Moreover, ablation studies and extension validation show that the CRCL can cope with label-independent biases in multi-scene settings and maintain stable performance with only limited training data available.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"2351-2366"},"PeriodicalIF":13.7000,"publicationDate":"2025-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10962292/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Video Anomaly Detection (VAD) remains a fundamental yet formidable task in the video understanding community, with promising applications in areas such as information forensics and public safety protection. Due to the rarity and diversity of anomalies, existing methods only use easily collected regular events to model the inherent normality of normal spatial-temporal patterns in an unsupervised manner. Although such methods have made significant progress benefiting from the development of deep learning, they attempt to model the statistical dependency between observable videos and semantic labels, which is a crude description of normality and lacks a systematic exploration of its underlying causal relationships. Previous studies have shown that existing unsupervised VAD models are incapable of label-independent data offsets (e.g., scene changes) in real-world scenarios and may fail to respond to light anomalies due to the overgeneralization of deep neural networks. Inspired by causality learning, we argue that there exist causal factors that can adequately generalize the prototypical patterns of regular events and present significant deviations when anomalous instances occur. In this regard, we propose Causal Representation Consistency Learning (CRCL) to implicitly mine potential scene-robust causal variable in unsupervised video normality learning. Specifically, building on the structural causal models, we propose scene-debiasing learning and causality-inspired normality learning to strip away entangled scene bias in deep representations and learn causal video normality, respectively. Extensive experiments on benchmarks validate the superiority of our method over conventional deep representation learning. Moreover, ablation studies and extension validation show that the CRCL can cope with label-independent biases in multi-scene settings and maintain stable performance with only limited training data available.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于因果表示一致性学习的监控视频异常检测
视频异常检测(VAD)仍然是视频理解领域的一项基本而艰巨的任务,在信息取证和公共安全保护等领域有着广阔的应用前景。由于异常的稀缺性和多样性,现有方法仅使用易于收集的规则事件以无监督的方式来模拟正常时空模式的固有正态性。尽管这些方法得益于深度学习的发展取得了重大进展,但它们试图对可观察视频和语义标签之间的统计依赖性进行建模,这是对正态性的粗略描述,缺乏对其潜在因果关系的系统探索。先前的研究表明,现有的无监督VAD模型无法在现实场景中实现与标签无关的数据偏移(例如场景变化),并且由于深度神经网络的过度泛化,可能无法响应光线异常。受因果关系学习的启发,我们认为存在因果因素可以充分概括规则事件的原型模式,并在异常实例发生时呈现显着偏差。在这方面,我们提出因果表示一致性学习(CRCL)来隐式挖掘无监督视频正态性学习中潜在的场景鲁棒性因果变量。具体来说,在结构因果模型的基础上,我们提出了场景去偏见学习和因果启发的正态性学习,分别在深度表示中去除纠缠的场景偏见和学习因果视频正态性。大量的基准实验验证了我们的方法优于传统的深度表示学习。此外,烧烧研究和扩展验证表明,CRCL可以在多场景环境下处理标签无关的偏差,并在有限的训练数据下保持稳定的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Reflectance Prediction-based Knowledge Distillation for Robust 3D Object Detection in Compressed Point Clouds. Implicit Neural Compression of Point Clouds. Token Calibration for Transformer-based Domain Adaptation. Task-Driven Underwater Image Enhancement via Hierarchical Semantic Refinement. Coupled Diffusion Posterior Sampling for Unsupervised Hyperspectral and Multispectral Images Fusion.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1