Violent Scene Detection of Film Videos Based on Multi-Task Learning of Temporal-Spatial Features

2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR) Pub Date : 2021-09-01 DOI:10.1109/MIPR51284.2021.00067

Z. Zheng, Wei Zhong, Long Ye, Li Fang, Qin Zhang

{"title":"Violent Scene Detection of Film Videos Based on Multi-Task Learning of Temporal-Spatial Features","authors":"Z. Zheng, Wei Zhong, Long Ye, Li Fang, Qin Zhang","doi":"10.1109/MIPR51284.2021.00067","DOIUrl":null,"url":null,"abstract":"In this paper, we propose a new framework for the violent scene detection of film videos based on multi-task learning of temporal-spatial features. In the proposed framework, for the violent behavior representation of film clips, we employ a temporal excitation and aggregation network to extract the temporal-spatial deep features in the visual modality. And on the other hand, a recurrent neural network with local attention is utilized to extract the utterance-level representation of affective analysis in the audio modality. In the process of feature mapping, we concern the task of violent scene detection together with that of affective analysis and then propose a multi-task learning strategy to effectively predict the violent scene of film clips. To evaluate the effectiveness of the proposed method, the experiments are done on the task of violent scenes detection 2015. The experimental results show that our model outperforms most of the state of the art methods, validating the innovation of considering the task of violent scene detection jointly with the violence emotion analysis.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MIPR51284.2021.00067","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

In this paper, we propose a new framework for the violent scene detection of film videos based on multi-task learning of temporal-spatial features. In the proposed framework, for the violent behavior representation of film clips, we employ a temporal excitation and aggregation network to extract the temporal-spatial deep features in the visual modality. And on the other hand, a recurrent neural network with local attention is utilized to extract the utterance-level representation of affective analysis in the audio modality. In the process of feature mapping, we concern the task of violent scene detection together with that of affective analysis and then propose a multi-task learning strategy to effectively predict the violent scene of film clips. To evaluate the effectiveness of the proposed method, the experiments are done on the task of violent scenes detection 2015. The experimental results show that our model outperforms most of the state of the art methods, validating the innovation of considering the task of violent scene detection jointly with the violence emotion analysis.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于时空特征多任务学习的电影视频暴力场景检测

本文提出了一种基于时空特征多任务学习的电影视频暴力场景检测新框架。在本文提出的框架中，对于电影片段的暴力行为表征，我们采用时间激励和聚合网络来提取视觉模态中的时空深层特征。另一方面，利用具有局部注意的递归神经网络提取情态分析的话语级表示。在特征映射过程中，我们将暴力场景检测任务与情感分析任务结合起来，提出了一种多任务学习策略来有效地预测电影片段的暴力场景。为了评估该方法的有效性，在2015年的暴力场景检测任务上进行了实验。实验结果表明，我们的模型优于大多数最新的方法，验证了将暴力场景检测任务与暴力情绪分析联合考虑的创新。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)

自引率

0.00%

发文量

期刊最新文献

XM2A: Multi-Scale Multi-Head Attention with Cross-Talk for Multi-Variate Time Series Analysis Demo Paper: Ad Hoc Search On Statistical Data Based On Categorization And Metadata Augmentation An Introduction to the JPEG Fake Media Initiative Augmented Tai-Chi Chuan Practice Tool with Pose Evaluation Exploring the Spatial-Visual Locality of Geo-tagged Urban Street Images