Jin Fan , Yuxiang Ji , Huifeng Wu , Yan Ge , Danfeng Sun , Jia Wu
{"title":"An unsupervised video anomaly detection method via Optical Flow decomposition and Spatio-Temporal feature learning","authors":"Jin Fan , Yuxiang Ji , Huifeng Wu , Yan Ge , Danfeng Sun , Jia Wu","doi":"10.1016/j.patrec.2024.08.013","DOIUrl":null,"url":null,"abstract":"<div><p>The purpose of this paper is to present an unsupervised video anomaly detection method using Optical Flow decomposition and Spatio-Temporal feature learning (OFST). This method employs a combination of optical flow reconstruction and video frame prediction to achieve satisfactory results. The proposed OFST framework is composed of two modules: the Multi-Granularity Memory-augmented Autoencoder with Optical Flow Decomposition (MG-MemAE-OFD) and a Two-Stream Network based on Spatio-Temporal feature learning (TSN-ST). The MG-MemAE-OFD module is composed of three functional blocks: optical flow decomposition, autoencoder, and multi-granularity memory networks. The optical flow decomposition block is used to extract the main motion information of objects in optical flow, and the granularity memory network is utilized to memorize normal patterns and improve the quality of the reconstructions. To predict video frames, we introduce a two-stream network based on spatiotemporal feature learning (TSN-ST), which adopts parallel standard Transformer blocks and a temporal block to learn spatiotemporal features from video frames and optical flows. The OFST combines these two modules so that the prediction error of abnormal samples is further increased due to the larger reconstruction error. In contrast, the normal samples obtain a lower reconstruction error and prediction error. Therefore, the anomaly detection capability of the method is greatly enhanced. Our proposed model was evaluated on public datasets. Specifically, in terms of the area under the curve (AUC), our model achieved an accuracy of 85.74% on the Ped1 dataset, 99.62% on the Ped2 dataset, 93.89% on the Avenue dataset, and 76.0% on the ShanghaiTech Dataset. Our experimental results show an average improvement of 1.2% compared to the current state-of-the-art.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"185 ","pages":"Pages 239-246"},"PeriodicalIF":3.9000,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition Letters","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167865524002459","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
The purpose of this paper is to present an unsupervised video anomaly detection method using Optical Flow decomposition and Spatio-Temporal feature learning (OFST). This method employs a combination of optical flow reconstruction and video frame prediction to achieve satisfactory results. The proposed OFST framework is composed of two modules: the Multi-Granularity Memory-augmented Autoencoder with Optical Flow Decomposition (MG-MemAE-OFD) and a Two-Stream Network based on Spatio-Temporal feature learning (TSN-ST). The MG-MemAE-OFD module is composed of three functional blocks: optical flow decomposition, autoencoder, and multi-granularity memory networks. The optical flow decomposition block is used to extract the main motion information of objects in optical flow, and the granularity memory network is utilized to memorize normal patterns and improve the quality of the reconstructions. To predict video frames, we introduce a two-stream network based on spatiotemporal feature learning (TSN-ST), which adopts parallel standard Transformer blocks and a temporal block to learn spatiotemporal features from video frames and optical flows. The OFST combines these two modules so that the prediction error of abnormal samples is further increased due to the larger reconstruction error. In contrast, the normal samples obtain a lower reconstruction error and prediction error. Therefore, the anomaly detection capability of the method is greatly enhanced. Our proposed model was evaluated on public datasets. Specifically, in terms of the area under the curve (AUC), our model achieved an accuracy of 85.74% on the Ped1 dataset, 99.62% on the Ped2 dataset, 93.89% on the Avenue dataset, and 76.0% on the ShanghaiTech Dataset. Our experimental results show an average improvement of 1.2% compared to the current state-of-the-art.
期刊介绍:
Pattern Recognition Letters aims at rapid publication of concise articles of a broad interest in pattern recognition.
Subject areas include all the current fields of interest represented by the Technical Committees of the International Association of Pattern Recognition, and other developing themes involving learning and recognition.