Violence Detection Based on Three-Dimensional Convolutional Neural Network with Inception-ResNet

Shen Jianjie, Zou Weijun
{"title":"Violence Detection Based on Three-Dimensional Convolutional Neural Network with Inception-ResNet","authors":"Shen Jianjie, Zou Weijun","doi":"10.1109/TOCS50858.2020.9339755","DOIUrl":null,"url":null,"abstract":"Violence detection based on deep learning is a research hotspot in intelligent video surveillance. The pre-trained Three-Dimensional convolutional network (C3D) has a weak ability to extract spatiotemporal features of video. It can only achieve an accuracy of 88.2% on the UCF-101 data set, which cannot meet the accuracy requirements for detecting violent behavior in videos. Thus, this paper proposes a network architecture based on the C3D and fusion of the Inception-Resnet-v2 network residual Inception module. Through adaptive learning of feature weights, the three-dimensional features of violent behavior videos can be fully explored and the ability to express features is enhanced. Secondly, in view of the small amount of data in the data set for violence detection (HockeyFights), which easily leads to the problems of overfitting and low generalization ability, the UCF101 data set is used for fine-tune, so that the shallow layer of the network can fully extract the spatiotemporal features; Finally, the use of quantization tools to quantify network parameters and adjusting the sliding window parameters during inference can effectively improves the inference efficiency and improves the real-time performance while ensuring high accuracy. Through experiments, the accuracy of the network designed in the paper on the UCF-101 dataset is improved by 6.1% compared to the C3D network, and by 3.1% compared with the R3D network, indicating that the improved network structure can extract more spatiotemporal features, and finally achieved an accuracy of 95.1% on the HockeyFights test set.","PeriodicalId":373862,"journal":{"name":"2020 IEEE Conference on Telecommunications, Optics and Computer Science (TOCS)","volume":"114 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE Conference on Telecommunications, Optics and Computer Science (TOCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TOCS50858.2020.9339755","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Violence detection based on deep learning is a research hotspot in intelligent video surveillance. The pre-trained Three-Dimensional convolutional network (C3D) has a weak ability to extract spatiotemporal features of video. It can only achieve an accuracy of 88.2% on the UCF-101 data set, which cannot meet the accuracy requirements for detecting violent behavior in videos. Thus, this paper proposes a network architecture based on the C3D and fusion of the Inception-Resnet-v2 network residual Inception module. Through adaptive learning of feature weights, the three-dimensional features of violent behavior videos can be fully explored and the ability to express features is enhanced. Secondly, in view of the small amount of data in the data set for violence detection (HockeyFights), which easily leads to the problems of overfitting and low generalization ability, the UCF101 data set is used for fine-tune, so that the shallow layer of the network can fully extract the spatiotemporal features; Finally, the use of quantization tools to quantify network parameters and adjusting the sliding window parameters during inference can effectively improves the inference efficiency and improves the real-time performance while ensuring high accuracy. Through experiments, the accuracy of the network designed in the paper on the UCF-101 dataset is improved by 6.1% compared to the C3D network, and by 3.1% compared with the R3D network, indicating that the improved network structure can extract more spatiotemporal features, and finally achieved an accuracy of 95.1% on the HockeyFights test set.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于Inception-ResNet的三维卷积神经网络暴力检测
基于深度学习的暴力检测是智能视频监控领域的研究热点。预训练的三维卷积网络(C3D)对视频的时空特征提取能力较弱。在UCF-101数据集上只能达到88.2%的准确率,无法满足视频中暴力行为检测的准确率要求。因此,本文提出了一种基于C3D和融合Inception- resnet -v2网络残馀Inception模块的网络架构。通过特征权值的自适应学习,可以充分挖掘暴力行为视频的三维特征,增强特征的表达能力。其次,针对暴力检测(HockeyFights)数据集中数据量少,容易导致过拟合和泛化能力低的问题,采用UCF101数据集进行微调,使网络的浅层能够充分提取时空特征;最后,利用量化工具对网络参数进行量化,并在推理过程中对滑动窗口参数进行调整,可以有效地提高推理效率,在保证高精度的同时提高实时性。通过实验,本文设计的网络在UCF-101数据集上的准确率比C3D网络提高了6.1%,比R3D网络提高了3.1%,表明改进后的网络结构可以提取更多的时空特征,最终在HockeyFights测试集上达到95.1%的准确率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Research on Fault Diagnosis Method of Power Grid Based on Artificial Intelligence Research on Digital Oil Painting Based on Digital Image Processing Technology Effect of adding seed nuclei on acoustic agglomeration efficiency of natural fog An overview of biological data generation using generative adversarial networks Application of Intelligent Safety Supervision Based on Artificial Intelligence Technology
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1