A cascaded deep-learning-based model for face mask detection

IF 1.5 4区计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Data Technologies and Applications Pub Date : 2022-06-28 DOI:10.1108/dta-02-2022-0076

Akhil Kumar

{"title":"A cascaded deep-learning-based model for face mask detection","authors":"Akhil Kumar","doi":"10.1108/dta-02-2022-0076","DOIUrl":null,"url":null,"abstract":"PurposeThis work aims to present a deep learning model for face mask detection in surveillance environments such as automatic teller machines (ATMs), banks, etc. to identify persons wearing face masks. In surveillance environments, complete visibility of the face area is a guideline, and criminals and law offenders commit crimes by hiding their faces behind a face mask. The face mask detector model proposed in this work can be used as a tool and integrated with surveillance cameras in autonomous surveillance environments to identify and catch law offenders and criminals.Design/methodology/approachThe proposed face mask detector is developed by integrating the residual network (ResNet)34 feature extractor on top of three You Only Look Once (YOLO) detection layers along with the usage of the spatial pyramid pooling (SPP) layer to extract a rich and dense feature map. Furthermore, at the training time, data augmentation operations such as Mosaic and MixUp have been applied to the feature extraction network so that it can get trained with images of varying complexities. The proposed detector is trained and tested over a custom face mask detection dataset consisting of 52,635 images. For validation, comparisons have been provided with the performance of YOLO v1, v2, tiny YOLO v1, v2, v3 and v4 and other benchmark work present in the literature by evaluating performance metrics such as precision, recall, F1 score, mean average precision (mAP) for the overall dataset and average precision (AP) for each class of the dataset.FindingsThe proposed face mask detector achieved 4.75–9.75 per cent higher detection accuracy in terms of mAP, 5–31 per cent higher AP for detection of faces with masks and, specifically, 2–30 per cent higher AP for detection of face masks on the face region as compared to the tested baseline variants of YOLO. Furthermore, the usage of the ResNet34 feature extractor and SPP layer in the proposed detection model reduced the training time and the detection time. The proposed face mask detection model can perform detection over an image in 0.45 s, which is 0.2–0.15 s lesser than that for other tested YOLO variants, thus making the proposed detection model perform detections at a higher speed.Research limitations/implicationsThe proposed face mask detector model can be utilized as a tool to detect persons with face masks who are a potential threat to the automatic surveillance environments such as ATMs, banks, airport security checks, etc. The other research implication of the proposed work is that it can be trained and tested for other object detection problems such as cancer detection in images, fish species detection, vehicle detection, etc.Practical implicationsThe proposed face mask detector can be integrated with automatic surveillance systems and used as a tool to detect persons with face masks who are potential threats to ATMs, banks, etc. and in the present times of COVID-19 to detect if the people are following a COVID-appropriate behavior of wearing a face mask or not in the public areas.Originality/valueThe novelty of this work lies in the usage of the ResNet34 feature extractor with YOLO detection layers, which makes the proposed model a compact and powerful convolutional neural-network-based face mask detector model. Furthermore, the SPP layer has been applied to the ResNet34 feature extractor to make it able to extract a rich and dense feature map. The other novelty of the present work is the implementation of Mosaic and MixUp data augmentation in the training network that provided the feature extractor with 3× images of varying complexities and orientations and further aided in achieving higher detection accuracy. The proposed model is novel in terms of extracting rich features, performing augmentation at the training time and achieving high detection accuracy while maintaining the detection speed.","PeriodicalId":56156,"journal":{"name":"Data Technologies and Applications","volume":"1 1","pages":"84-107"},"PeriodicalIF":1.5000,"publicationDate":"2022-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data Technologies and Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1108/dta-02-2022-0076","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 1

Abstract

PurposeThis work aims to present a deep learning model for face mask detection in surveillance environments such as automatic teller machines (ATMs), banks, etc. to identify persons wearing face masks. In surveillance environments, complete visibility of the face area is a guideline, and criminals and law offenders commit crimes by hiding their faces behind a face mask. The face mask detector model proposed in this work can be used as a tool and integrated with surveillance cameras in autonomous surveillance environments to identify and catch law offenders and criminals.Design/methodology/approachThe proposed face mask detector is developed by integrating the residual network (ResNet)34 feature extractor on top of three You Only Look Once (YOLO) detection layers along with the usage of the spatial pyramid pooling (SPP) layer to extract a rich and dense feature map. Furthermore, at the training time, data augmentation operations such as Mosaic and MixUp have been applied to the feature extraction network so that it can get trained with images of varying complexities. The proposed detector is trained and tested over a custom face mask detection dataset consisting of 52,635 images. For validation, comparisons have been provided with the performance of YOLO v1, v2, tiny YOLO v1, v2, v3 and v4 and other benchmark work present in the literature by evaluating performance metrics such as precision, recall, F1 score, mean average precision (mAP) for the overall dataset and average precision (AP) for each class of the dataset.FindingsThe proposed face mask detector achieved 4.75–9.75 per cent higher detection accuracy in terms of mAP, 5–31 per cent higher AP for detection of faces with masks and, specifically, 2–30 per cent higher AP for detection of face masks on the face region as compared to the tested baseline variants of YOLO. Furthermore, the usage of the ResNet34 feature extractor and SPP layer in the proposed detection model reduced the training time and the detection time. The proposed face mask detection model can perform detection over an image in 0.45 s, which is 0.2–0.15 s lesser than that for other tested YOLO variants, thus making the proposed detection model perform detections at a higher speed.Research limitations/implicationsThe proposed face mask detector model can be utilized as a tool to detect persons with face masks who are a potential threat to the automatic surveillance environments such as ATMs, banks, airport security checks, etc. The other research implication of the proposed work is that it can be trained and tested for other object detection problems such as cancer detection in images, fish species detection, vehicle detection, etc.Practical implicationsThe proposed face mask detector can be integrated with automatic surveillance systems and used as a tool to detect persons with face masks who are potential threats to ATMs, banks, etc. and in the present times of COVID-19 to detect if the people are following a COVID-appropriate behavior of wearing a face mask or not in the public areas.Originality/valueThe novelty of this work lies in the usage of the ResNet34 feature extractor with YOLO detection layers, which makes the proposed model a compact and powerful convolutional neural-network-based face mask detector model. Furthermore, the SPP layer has been applied to the ResNet34 feature extractor to make it able to extract a rich and dense feature map. The other novelty of the present work is the implementation of Mosaic and MixUp data augmentation in the training network that provided the feature extractor with 3× images of varying complexities and orientations and further aided in achieving higher detection accuracy. The proposed model is novel in terms of extracting rich features, performing augmentation at the training time and achieving high detection accuracy while maintaining the detection speed.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于级联深度学习的口罩检测模型

本工作旨在提出一种深度学习模型，用于自动柜员机(atm)、银行等监控环境中的口罩检测，以识别戴口罩的人员。在监视环境中，面部区域完全可见是一种指导方针，犯罪分子和违法者将脸部隐藏在口罩后面进行犯罪。本文提出的面罩检测器模型可以作为一种工具，并与自主监控环境中的监控摄像头相结合，以识别和捕获违法者和犯罪分子。设计/方法/方法所提出的人脸检测器是在三个You Only Look Once (YOLO)检测层之上集成残差网络(ResNet)34特征提取器，并使用空间金字塔池(SPP)层提取丰富而密集的特征图。此外，在训练时，在特征提取网络中应用了马赛克和MixUp等数据增强操作，使其可以用不同复杂度的图像进行训练。该检测器在包含52,635张图像的自定义面罩检测数据集上进行训练和测试。为了验证，通过评估精度、召回率、F1分数、整体数据集的平均平均精度(mAP)和每类数据集的平均精度(AP)等性能指标，对YOLO v1、v2、微型YOLO v1、v2、v3和v4以及文献中存在的其他基准工作的性能进行了比较。与YOLO测试的基线变体相比，所提出的口罩检测器在mAP方面的检测准确率提高了4.75 - 9.75%，在检测戴口罩的面部时的AP提高了5 - 31%，特别是在检测面部区域的口罩时的AP提高了2 - 30%。此外，在检测模型中使用ResNet34特征提取器和SPP层，减少了训练时间和检测时间。本文提出的口罩检测模型可以在0.45 s内完成对一幅图像的检测，比其他已测试的YOLO变体检测时间缩短0.2-0.15 s，从而提高了检测速度。研究局限/启示建议的面罩侦测模型可作为一种工具，用以侦测对自动监察环境(例如自动柜员机、银行、机场保安检查等)构成潜在威胁的戴面罩人士。这项工作的另一个研究意义是，它可以被训练和测试用于其他目标检测问题，如图像中的癌症检测、鱼类检测、车辆检测等。实际意义提议的面罩检测器可以与自动监控系统集成，作为一种工具，用于检测对自动取款机、银行、等，并在当前COVID-19时期检测人们在公共场所是否遵循了佩戴口罩的COVID-19适当行为。独创性/价值本工作的新颖之处在于使用了带有YOLO检测层的ResNet34特征提取器，这使得所提出的模型成为一个紧凑而强大的基于卷积神经网络的人脸面具检测模型。此外，将SPP层应用于ResNet34特征提取器，使其能够提取出丰富而密集的特征图。本工作的另一个新颖之处是在训练网络中实现马赛克和混合数据增强，为特征提取器提供不同复杂性和方向的3倍图像，并进一步帮助实现更高的检测精度。该模型提取了丰富的特征，在训练时进行了增强，在保持检测速度的同时实现了较高的检测精度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊