Aggregated Spatio-temporal MLP-Mixer for Violence Recognition in Video Clips

2023 Sixth International Symposium on Computer, Consumer and Control (IS3C) Pub Date : 2023-06-01 DOI:10.1109/IS3C57901.2023.00020

Yuepeng Shen, Jenhui Chen

{"title":"Aggregated Spatio-temporal MLP-Mixer for Violence Recognition in Video Clips","authors":"Yuepeng Shen, Jenhui Chen","doi":"10.1109/IS3C57901.2023.00020","DOIUrl":null,"url":null,"abstract":"Existing violent behavior datasets are not perfect in quantity and quality due to the difficulty of collecting. Although the state-of-the-art Transformer models had shown their capability in behavior recognition, it is unsuitable for the task of short-term behavior understanding (e.g., violent behavior recognition) due to the need for a large amount of data to achieve their best performance. Recently, a simple deep learning architecture, an all multilayer perceptron (MLP) architecture called MLP-Mixer, was proposed against Transformer in the task of a few-sample dataset to obtain competitive results. Motivated by spatio-temporal features on neurons, we invent a dual-form dataset for MLP-Mixer-based model training called aggregated spatio-temporal MLP-Mixer (ASM) to handle video understanding tasks. We show that ASM outperforms the state-of-the-art Transformer models as well as some of the best-performed convolutional neural network (CNN) approaches on three public datasets, smart-city CCTV violence detection dataset (SCVD), real-life violence situations (RLVS) dataset, and Hockey fight. Experimental results further validate our idea on short-term behavior scene understanding improvement.","PeriodicalId":142483,"journal":{"name":"2023 Sixth International Symposium on Computer, Consumer and Control (IS3C)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 Sixth International Symposium on Computer, Consumer and Control (IS3C)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IS3C57901.2023.00020","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Existing violent behavior datasets are not perfect in quantity and quality due to the difficulty of collecting. Although the state-of-the-art Transformer models had shown their capability in behavior recognition, it is unsuitable for the task of short-term behavior understanding (e.g., violent behavior recognition) due to the need for a large amount of data to achieve their best performance. Recently, a simple deep learning architecture, an all multilayer perceptron (MLP) architecture called MLP-Mixer, was proposed against Transformer in the task of a few-sample dataset to obtain competitive results. Motivated by spatio-temporal features on neurons, we invent a dual-form dataset for MLP-Mixer-based model training called aggregated spatio-temporal MLP-Mixer (ASM) to handle video understanding tasks. We show that ASM outperforms the state-of-the-art Transformer models as well as some of the best-performed convolutional neural network (CNN) approaches on three public datasets, smart-city CCTV violence detection dataset (SCVD), real-life violence situations (RLVS) dataset, and Hockey fight. Experimental results further validate our idea on short-term behavior scene understanding improvement.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

视频片段中暴力识别的聚合时空mlp混频器

现有的暴力行为数据集由于收集难度大，在数量和质量上都不完善。虽然目前最先进的Transformer模型在行为识别方面已经表现出了一定的能力，但由于需要大量的数据才能达到最佳性能，因此不适合用于短期行为理解(例如暴力行为识别)的任务。最近，提出了一种简单的深度学习架构，一种称为MLP- mixer的全多层感知器(MLP)架构，以对抗Transformer在少数样本数据集的任务中获得竞争结果。基于神经元的时空特征，我们发明了一种基于MLP-Mixer模型训练的双形式数据集，称为聚合时空MLP-Mixer (ASM)来处理视频理解任务。我们表明，ASM在三个公共数据集，智能城市CCTV暴力检测数据集(SCVD)，现实生活中的暴力情况(RLVS)数据集和曲棍球比赛上优于最先进的Transformer模型以及一些性能最好的卷积神经网络(CNN)方法。实验结果进一步验证了我们对短期行为场景理解的改进思路。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2023 Sixth International Symposium on Computer, Consumer and Control (IS3C)

自引率

0.00%

发文量