{"title":"Dual contrast discriminator with sharing attention for video anomaly detection","authors":"Yiwenhao Zeng, Yihua Chen, Songsen Yu, Mingzhang Yang, Rongrong Chen, Fang Xu","doi":"10.1007/s00138-024-01566-8","DOIUrl":null,"url":null,"abstract":"<p>The detection of video anomalies is a well-known issue in the realm of visual research. The volume of normal and abnormal sample data in this field is unbalanced, hence unsupervised training is generally used in research. Since the development of deep learning, the field of video anomaly has developed from reconstruction-based detection methods to prediction-based detection methods, and then to hybrid detection methods. To identify the presence of anomalies, these methods take advantage of the differences between ground-truth frames and reconstruction or prediction frames. Thus, the evaluation of the results is directly impacted by the quality of the generated frames. Built around the Dual Contrast Discriminator for Video Sequences (DCDVS) and the corresponding loss function, we present a novel hybrid detection method for further explanation. With less false positives and more accuracy, this method improves the discriminator’s guidance on the reconstruction-prediction network’s generation performance. we integrate optical flow processing and attention processes into the Auto-encoder (AE) reconstruction network. The network’s sensitivity to motion information and its ability to concentrate on important areas are improved by this integration. Additionally, DCDVS’s capacity to successfully recognize significant features gets improved by introducing the attention module implemented through parameter sharing. Aiming to reduce the risk of network overfitting, we also invented reverse augmentation, a data augmentation technique designed specifically for temporal data. Our approach achieved outstanding performance with AUC scores of 99.4, 92.9, and 77.3<span>\\(\\%\\)</span> on the UCSD Ped2, CUHK Avenue, and ShanghaiTech datasets, respectively, demonstrates competitiveness with advanced methods and validates its effectiveness.</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":"230 1","pages":""},"PeriodicalIF":2.4000,"publicationDate":"2024-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine Vision and Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s00138-024-01566-8","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
The detection of video anomalies is a well-known issue in the realm of visual research. The volume of normal and abnormal sample data in this field is unbalanced, hence unsupervised training is generally used in research. Since the development of deep learning, the field of video anomaly has developed from reconstruction-based detection methods to prediction-based detection methods, and then to hybrid detection methods. To identify the presence of anomalies, these methods take advantage of the differences between ground-truth frames and reconstruction or prediction frames. Thus, the evaluation of the results is directly impacted by the quality of the generated frames. Built around the Dual Contrast Discriminator for Video Sequences (DCDVS) and the corresponding loss function, we present a novel hybrid detection method for further explanation. With less false positives and more accuracy, this method improves the discriminator’s guidance on the reconstruction-prediction network’s generation performance. we integrate optical flow processing and attention processes into the Auto-encoder (AE) reconstruction network. The network’s sensitivity to motion information and its ability to concentrate on important areas are improved by this integration. Additionally, DCDVS’s capacity to successfully recognize significant features gets improved by introducing the attention module implemented through parameter sharing. Aiming to reduce the risk of network overfitting, we also invented reverse augmentation, a data augmentation technique designed specifically for temporal data. Our approach achieved outstanding performance with AUC scores of 99.4, 92.9, and 77.3\(\%\) on the UCSD Ped2, CUHK Avenue, and ShanghaiTech datasets, respectively, demonstrates competitiveness with advanced methods and validates its effectiveness.
期刊介绍:
Machine Vision and Applications publishes high-quality technical contributions in machine vision research and development. Specifically, the editors encourage submittals in all applications and engineering aspects of image-related computing. In particular, original contributions dealing with scientific, commercial, industrial, military, and biomedical applications of machine vision, are all within the scope of the journal.
Particular emphasis is placed on engineering and technology aspects of image processing and computer vision.
The following aspects of machine vision applications are of interest: algorithms, architectures, VLSI implementations, AI techniques and expert systems for machine vision, front-end sensing, multidimensional and multisensor machine vision, real-time techniques, image databases, virtual reality and visualization. Papers must include a significant experimental validation component.