使用 QARepVGG-YOLOv7 快速检测公共场所的人脸面具

IF 2.9 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Journal of Real-Time Image Processing Pub Date : 2024-05-19 DOI:10.1007/s11554-024-01476-y

Chuying Guan, Jiaxuan Jiang, Zhong Wang

{"title":"使用 QARepVGG-YOLOv7 快速检测公共场所的人脸面具","authors":"Chuying Guan, Jiaxuan Jiang, Zhong Wang","doi":"10.1007/s11554-024-01476-y","DOIUrl":null,"url":null,"abstract":"<p>The COVID-19 pandemic has resulted in substantial global losses. In the post-epidemic era, public health needs still advocate the correct use of medical masks in confined spaces such as hospitals and indoors. This can effectively block the spread of infectious diseases through droplets, protect personal and public health, and improve the environmental sustainability and social resilience of cities. Therefore, detecting the correct wearing of masks is crucial. This study proposes an innovative three-class mask detection model based on the QARepVGG-YOLOv7 algorithm. The model replaces the convolution module in the backbone network with the QARepVGG module and uses the quantitative friendly structure and re-parameterization characteristics of the QARepVGG module to achieve high-precision and high-efficiency target detection. To validate the effectiveness of our proposed method, we created a mask dataset of 5095 pictures, including three categories: correct use of masks, incorrect use of masks, and individuals who do not wear masks. We also employed data augmentation techniques to further balance the dataset categories. We tested YOLOv5s, YOLOv6, YOLOv7, and YOLOv8s models on self-made datasets. The results show that the QARepVGG-YOLOv7 model has the best accuracy compared with the most advanced YOLO model. Our model achieves a significantly improved mAP value of 0.946 and a faster fps of 263.2, which is 90.8 fps higher than the YOLOv7 model and a 0.5% increase in map value over the YOLOv7 model. It is a high-precision and high-efficiency mask detection model.</p>","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"55 1","pages":""},"PeriodicalIF":2.9000,"publicationDate":"2024-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Fast detection of face masks in public places using QARepVGG-YOLOv7\",\"authors\":\"Chuying Guan, Jiaxuan Jiang, Zhong Wang\",\"doi\":\"10.1007/s11554-024-01476-y\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>The COVID-19 pandemic has resulted in substantial global losses. In the post-epidemic era, public health needs still advocate the correct use of medical masks in confined spaces such as hospitals and indoors. This can effectively block the spread of infectious diseases through droplets, protect personal and public health, and improve the environmental sustainability and social resilience of cities. Therefore, detecting the correct wearing of masks is crucial. This study proposes an innovative three-class mask detection model based on the QARepVGG-YOLOv7 algorithm. The model replaces the convolution module in the backbone network with the QARepVGG module and uses the quantitative friendly structure and re-parameterization characteristics of the QARepVGG module to achieve high-precision and high-efficiency target detection. To validate the effectiveness of our proposed method, we created a mask dataset of 5095 pictures, including three categories: correct use of masks, incorrect use of masks, and individuals who do not wear masks. We also employed data augmentation techniques to further balance the dataset categories. We tested YOLOv5s, YOLOv6, YOLOv7, and YOLOv8s models on self-made datasets. The results show that the QARepVGG-YOLOv7 model has the best accuracy compared with the most advanced YOLO model. Our model achieves a significantly improved mAP value of 0.946 and a faster fps of 263.2, which is 90.8 fps higher than the YOLOv7 model and a 0.5% increase in map value over the YOLOv7 model. It is a high-precision and high-efficiency mask detection model.</p>\",\"PeriodicalId\":51224,\"journal\":{\"name\":\"Journal of Real-Time Image Processing\",\"volume\":\"55 1\",\"pages\":\"\"},\"PeriodicalIF\":2.9000,\"publicationDate\":\"2024-05-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Real-Time Image Processing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s11554-024-01476-y\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Real-Time Image Processing","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s11554-024-01476-y","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

COVID-19 大流行给全球造成了巨大损失。在后疫情时代，公共卫生需求仍然提倡在医院和室内等密闭空间正确使用医用口罩。这可以有效阻止传染病通过飞沫传播，保护个人和公众健康，提高城市的环境可持续性和社会适应力。因此，检测口罩的正确佩戴至关重要。本研究基于 QARepVGG-YOLOv7 算法提出了一种创新的三类口罩检测模型。该模型用 QARepVGG 模块替换了骨干网络中的卷积模块，并利用 QARepVGG 模块的定量友好结构和重参数化特性实现了高精度、高效率的目标检测。为了验证我们提出的方法的有效性，我们创建了一个包含 5095 张图片的口罩数据集，其中包括正确使用口罩、不正确使用口罩和不戴口罩的个人三个类别。我们还采用了数据增强技术来进一步平衡数据集的类别。我们在自制数据集上测试了 YOLOv5s、YOLOv6、YOLOv7 和 YOLOv8s 模型。结果表明，与最先进的 YOLO 模型相比，QARepVGG-YOLOv7 模型的准确度最高。与 YOLOv7 模型相比，我们的模型 mAP 值大幅提高，达到 0.946，fps 快达 263.2，比 YOLOv7 模型快 90.8，地图值提高了 0.5%。这是一个高精度、高效率的掩膜检测模型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Fast detection of face masks in public places using QARepVGG-YOLOv7

The COVID-19 pandemic has resulted in substantial global losses. In the post-epidemic era, public health needs still advocate the correct use of medical masks in confined spaces such as hospitals and indoors. This can effectively block the spread of infectious diseases through droplets, protect personal and public health, and improve the environmental sustainability and social resilience of cities. Therefore, detecting the correct wearing of masks is crucial. This study proposes an innovative three-class mask detection model based on the QARepVGG-YOLOv7 algorithm. The model replaces the convolution module in the backbone network with the QARepVGG module and uses the quantitative friendly structure and re-parameterization characteristics of the QARepVGG module to achieve high-precision and high-efficiency target detection. To validate the effectiveness of our proposed method, we created a mask dataset of 5095 pictures, including three categories: correct use of masks, incorrect use of masks, and individuals who do not wear masks. We also employed data augmentation techniques to further balance the dataset categories. We tested YOLOv5s, YOLOv6, YOLOv7, and YOLOv8s models on self-made datasets. The results show that the QARepVGG-YOLOv7 model has the best accuracy compared with the most advanced YOLO model. Our model achieves a significantly improved mAP value of 0.946 and a faster fps of 263.2, which is 90.8 fps higher than the YOLOv7 model and a 0.5% increase in map value over the YOLOv7 model. It is a high-precision and high-efficiency mask detection model.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Real-Time Image Processing COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-ENGINEERING, ELECTRICAL & ELECTRONIC

CiteScore

6.80

自引率

6.70%

发文量

审稿时长

6 months

期刊介绍： Due to rapid advancements in integrated circuit technology, the rich theoretical results that have been developed by the image and video processing research community are now being increasingly applied in practical systems to solve real-world image and video processing problems. Such systems involve constraints placed not only on their size, cost, and power consumption, but also on the timeliness of the image data processed. Examples of such systems are mobile phones, digital still/video/cell-phone cameras, portable media players, personal digital assistants, high-definition television, video surveillance systems, industrial visual inspection systems, medical imaging devices, vision-guided autonomous robots, spectral imaging systems, and many other real-time embedded systems. In these real-time systems, strict timing requirements demand that results are available within a certain interval of time as imposed by the application. It is often the case that an image processing algorithm is developed and proven theoretically sound, presumably with a specific application in mind, but its practical applications and the detailed steps, methodology, and trade-off analysis required to achieve its real-time performance are not fully explored, leaving these critical and usually non-trivial issues for those wishing to employ the algorithm in a real-time system. The Journal of Real-Time Image Processing is intended to bridge the gap between the theory and practice of image processing, serving the greater community of researchers, practicing engineers, and industrial professionals who deal with designing, implementing or utilizing image processing systems which must satisfy real-time design constraints.