Threshold Active Learning Approach for Physical Violence Detection on Images Obtained from Video (Frame-Level) Using Pre-Trained Deep Learning Neural Network Models

Algorithms Pub Date : 2024-07-18 DOI:10.3390/a17070316

Itzel M. Abundez, Roberto Alejo, Francisco Primero Primero, E. Granda-Gutiérrez, O. Portillo-Rodríguez, Juan Alberto Antonio Velázquez

{"title":"Threshold Active Learning Approach for Physical Violence Detection on Images Obtained from Video (Frame-Level) Using Pre-Trained Deep Learning Neural Network Models","authors":"Itzel M. Abundez, Roberto Alejo, Francisco Primero Primero, E. Granda-Gutiérrez, O. Portillo-Rodríguez, Juan Alberto Antonio Velázquez","doi":"10.3390/a17070316","DOIUrl":null,"url":null,"abstract":"Public authorities and private companies have used video cameras as part of surveillance systems, and one of their objectives is the rapid detection of physically violent actions. This task is usually performed by human visual inspection, which is labor-intensive. For this reason, different deep learning models have been implemented to remove the human eye from this task, yielding positive results. One of the main problems in detecting physical violence in videos is the variety of scenarios that can exist, which leads to different models being trained on datasets, leading them to detect physical violence in only one or a few types of videos. In this work, we present an approach for physical violence detection on images obtained from video based on threshold active learning, that increases the classifier’s robustness in environments where it was not trained. The proposed approach consists of two stages: In the first stage, pre-trained neural network models are trained on initial datasets, and we use a threshold (μ) to identify those images that the classifier considers ambiguous or hard to classify. Then, they are included in the training dataset, and the model is retrained to improve its classification performance. In the second stage, we test the model with video images from other environments, and we again employ (μ) to detect ambiguous images that a human expert analyzes to determine the real class or delete the ambiguity on them. After that, the ambiguous images are added to the original training set and the classifier is retrained; this process is repeated while ambiguous images exist. The model is a hybrid neural network that uses transfer learning and a threshold μ to detect physical violence on images obtained from video files successfully. In this active learning process, the classifier can detect physical violence in different environments, where the main contribution is the method used to obtain a threshold μ (which is based on the neural network output) that allows human experts to contribute to the classification process to obtain more robust neural networks and high-quality datasets. The experimental results show the proposed approach’s effectiveness in detecting physical violence, where it is trained using an initial dataset, and new images are added to improve its robustness in diverse environments.","PeriodicalId":502609,"journal":{"name":"Algorithms","volume":" 8","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Algorithms","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/a17070316","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Public authorities and private companies have used video cameras as part of surveillance systems, and one of their objectives is the rapid detection of physically violent actions. This task is usually performed by human visual inspection, which is labor-intensive. For this reason, different deep learning models have been implemented to remove the human eye from this task, yielding positive results. One of the main problems in detecting physical violence in videos is the variety of scenarios that can exist, which leads to different models being trained on datasets, leading them to detect physical violence in only one or a few types of videos. In this work, we present an approach for physical violence detection on images obtained from video based on threshold active learning, that increases the classifier’s robustness in environments where it was not trained. The proposed approach consists of two stages: In the first stage, pre-trained neural network models are trained on initial datasets, and we use a threshold (μ) to identify those images that the classifier considers ambiguous or hard to classify. Then, they are included in the training dataset, and the model is retrained to improve its classification performance. In the second stage, we test the model with video images from other environments, and we again employ (μ) to detect ambiguous images that a human expert analyzes to determine the real class or delete the ambiguity on them. After that, the ambiguous images are added to the original training set and the classifier is retrained; this process is repeated while ambiguous images exist. The model is a hybrid neural network that uses transfer learning and a threshold μ to detect physical violence on images obtained from video files successfully. In this active learning process, the classifier can detect physical violence in different environments, where the main contribution is the method used to obtain a threshold μ (which is based on the neural network output) that allows human experts to contribute to the classification process to obtain more robust neural networks and high-quality datasets. The experimental results show the proposed approach’s effectiveness in detecting physical violence, where it is trained using an initial dataset, and new images are added to improve its robustness in diverse environments.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

使用预训练的深度学习神经网络模型在视频图像（帧级）上进行人身暴力检测的阈值主动学习方法

公共机构和私营公司已将摄像机用作监控系统的一部分，其目标之一是快速发现身体暴力行动。这项任务通常由人工目测完成，耗费大量人力物力。为此，人们采用了不同的深度学习模型，使人眼不再参与这项任务，并取得了积极的成果。检测视频中的身体暴力的主要问题之一是可能存在的各种场景，这导致在数据集上训练不同的模型，导致它们只能在一种或几种类型的视频中检测到身体暴力。在这项工作中，我们提出了一种基于阈值主动学习的方法，用于检测视频图像中的身体暴力，从而提高分类器在未经训练的环境中的鲁棒性。所提出的方法包括两个阶段：第一阶段，在初始数据集上训练预训练的神经网络模型，我们使用阈值（μ）来识别分类器认为模糊或难以分类的图像。然后，将它们纳入训练数据集，对模型进行再训练，以提高其分类性能。在第二阶段，我们用其他环境中的视频图像来测试模型，并再次使用 (μ) 来检测模糊图像，由人类专家对其进行分析，以确定真正的类别或删除其模糊性。然后，将模糊图像添加到原始训练集，并重新训练分类器；只要存在模糊图像，这一过程就会重复。该模型是一个混合神经网络，利用迁移学习和阈值 μ 成功地检测了从视频文件中获取的图像上的身体暴力。在这个主动学习过程中，分类器可以检测到不同环境中的身体暴力，其主要贡献在于用于获得阈值μ（基于神经网络输出）的方法，该方法允许人类专家参与分类过程，以获得更强大的神经网络和高质量的数据集。实验结果表明了所提出的方法在检测身体暴力方面的有效性，该方法使用初始数据集进行训练，并添加新的图像以提高其在不同环境中的鲁棒性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Algorithms

自引率

0.00%

发文量