基于二维时空表征的视频暴力检测

2021 IEEE International Conference on Image Processing (ICIP) Pub Date : 2021-09-19 DOI:10.1109/ICIP42928.2021.9506142

Mohamed Chelali, Camille Kurtz, N. Vincent

{"title":"基于二维时空表征的视频暴力检测","authors":"Mohamed Chelali, Camille Kurtz, N. Vincent","doi":"10.1109/ICIP42928.2021.9506142","DOIUrl":null,"url":null,"abstract":"Action recognition in videos, especially for violence detection, is now a hot topic in computer vision. The interest of this task is related to the multiplication of videos from surveillance cameras or live television content producing complex $2D+t$ data. State-of-the-art methods rely on end-to-end learning from 3D neural network approaches that should be trained with a large amount of data to obtain discriminating features. To face these limitations, we present in this article a method to classify videos for violence recognition purpose, by using a classical 2D convolutional neural network (CNN). The strategy of the method is two-fold: (1) we start by building several 2D spatio-temporal representations from an input video, (2) the new representations are considered to feed the CNN to the train/test process. The classification decision of the video is carried out by aggregating the individual decisions from its different 2D spatio-temporal representations. An experimental study on public datasets containing violent videos highlights the interest of the presented method.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Violence Detection from Video under 2D Spatio-Temporal Representations\",\"authors\":\"Mohamed Chelali, Camille Kurtz, N. Vincent\",\"doi\":\"10.1109/ICIP42928.2021.9506142\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Action recognition in videos, especially for violence detection, is now a hot topic in computer vision. The interest of this task is related to the multiplication of videos from surveillance cameras or live television content producing complex $2D+t$ data. State-of-the-art methods rely on end-to-end learning from 3D neural network approaches that should be trained with a large amount of data to obtain discriminating features. To face these limitations, we present in this article a method to classify videos for violence recognition purpose, by using a classical 2D convolutional neural network (CNN). The strategy of the method is two-fold: (1) we start by building several 2D spatio-temporal representations from an input video, (2) the new representations are considered to feed the CNN to the train/test process. The classification decision of the video is carried out by aggregating the individual decisions from its different 2D spatio-temporal representations. An experimental study on public datasets containing violent videos highlights the interest of the presented method.\",\"PeriodicalId\":314429,\"journal\":{\"name\":\"2021 IEEE International Conference on Image Processing (ICIP)\",\"volume\":\"29 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-09-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE International Conference on Image Processing (ICIP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICIP42928.2021.9506142\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Image Processing (ICIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIP42928.2021.9506142","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

视频中的动作识别，尤其是暴力检测，是当前计算机视觉领域的研究热点。这项任务的兴趣与来自监控摄像机或直播电视内容的视频相乘有关，这些视频产生复杂的2D+t数据。最先进的方法依赖于3D神经网络方法的端到端学习，这些方法需要经过大量数据的训练才能获得判别特征。为了面对这些限制，我们在本文中提出了一种方法，通过使用经典的二维卷积神经网络(CNN)来对视频进行暴力识别。该方法的策略是双重的:(1)我们首先从输入视频中构建几个二维时空表示，(2)新的表示被认为将CNN馈送到训练/测试过程。视频的分类决策是通过汇总来自不同二维时空表征的单个决策来完成的。对包含暴力视频的公共数据集的实验研究突出了所提出方法的兴趣。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Violence Detection from Video under 2D Spatio-Temporal Representations

Action recognition in videos, especially for violence detection, is now a hot topic in computer vision. The interest of this task is related to the multiplication of videos from surveillance cameras or live television content producing complex $2D+t$ data. State-of-the-art methods rely on end-to-end learning from 3D neural network approaches that should be trained with a large amount of data to obtain discriminating features. To face these limitations, we present in this article a method to classify videos for violence recognition purpose, by using a classical 2D convolutional neural network (CNN). The strategy of the method is two-fold: (1) we start by building several 2D spatio-temporal representations from an input video, (2) the new representations are considered to feed the CNN to the train/test process. The classification decision of the video is carried out by aggregating the individual decisions from its different 2D spatio-temporal representations. An experimental study on public datasets containing violent videos highlights the interest of the presented method.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 IEEE International Conference on Image Processing (ICIP)

自引率

0.00%

发文量