{"title":"A lightweight video anomaly detection model with weak supervision and adaptive instance selection","authors":"Yang Wang , Jiaogen Zhou , Jihong Guan","doi":"10.1016/j.neucom.2024.128698","DOIUrl":null,"url":null,"abstract":"<div><div>Video anomaly detection is to determine whether there are any abnormal events, behaviors or objects in a given video, which enables effective and intelligent public safety management. As video anomaly labeling is both time-consuming and expensive, most existing works employ unsupervised or weakly supervised learning methods. This paper focuses on weakly supervised video anomaly detection, in which the training videos are labeled whether or not they contain any anomalies, but lack information about the specific frames and quantities of anomalies. However, the uncertainty of weakly labeled data and the large model size prevent existing methods from wide deployment in real scenarios, especially the resource-limit situations such as edge-computing. In this paper, we develop a lightweight video anomaly detection model. On the one hand, we propose an adaptive instance selection strategy, which is based on the model’s current status to select confident instances, thereby mitigating the uncertainty of weakly labeled data and subsequently promoting the model’s performance. On the other hand, we design a lightweight multi-level temporal correlation attention module and an hourglass-shaped fully connected layer to construct the model, which can reduce the model parameters to only 0.56% of the existing methods (e.g. RTFM). Extensive experiments on three public datasets UCF-Crime, ShanghaiTech and XD-Violence show that our model performs better than or equally to the existing lightweight methods, while with a significantly reduced number of model parameters. Furthermore, by integrating the improved module designed in this paper with the VadCLIP method proposed by Wu et al., we achieve the state-of-the-art performance of non-lightweight models on the UCF-Crime and XD-Violence datasets.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":null,"pages":null},"PeriodicalIF":5.5000,"publicationDate":"2024-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231224014693","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Video anomaly detection is to determine whether there are any abnormal events, behaviors or objects in a given video, which enables effective and intelligent public safety management. As video anomaly labeling is both time-consuming and expensive, most existing works employ unsupervised or weakly supervised learning methods. This paper focuses on weakly supervised video anomaly detection, in which the training videos are labeled whether or not they contain any anomalies, but lack information about the specific frames and quantities of anomalies. However, the uncertainty of weakly labeled data and the large model size prevent existing methods from wide deployment in real scenarios, especially the resource-limit situations such as edge-computing. In this paper, we develop a lightweight video anomaly detection model. On the one hand, we propose an adaptive instance selection strategy, which is based on the model’s current status to select confident instances, thereby mitigating the uncertainty of weakly labeled data and subsequently promoting the model’s performance. On the other hand, we design a lightweight multi-level temporal correlation attention module and an hourglass-shaped fully connected layer to construct the model, which can reduce the model parameters to only 0.56% of the existing methods (e.g. RTFM). Extensive experiments on three public datasets UCF-Crime, ShanghaiTech and XD-Violence show that our model performs better than or equally to the existing lightweight methods, while with a significantly reduced number of model parameters. Furthermore, by integrating the improved module designed in this paper with the VadCLIP method proposed by Wu et al., we achieve the state-of-the-art performance of non-lightweight models on the UCF-Crime and XD-Violence datasets.
期刊介绍:
Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.