{"title":"Research on small-scale face detection methods in dense scenes","authors":"Yuan Cao, Bei Zhang, Changqing Wang, Meng Wang","doi":"10.1007/s10489-025-06231-9","DOIUrl":null,"url":null,"abstract":"<div><p>Face detection serves as the core foundation for applications such as face analysis, recognition and reconstruction. In dense scenarios, the target scale difference is significant, and the instance pixels are too small as well as the mutual occlusion is serious leading to inconspicuous feature representation. However, existing detection methods rely on convolutional and pooling layers for feature extraction, with insufficient deep feature extraction and limited inference capability, leading to inaccurate recognition and high leakage rate. Therefore, we propose a small-scale face detection model YOLO-SXS based on the extended Transformer structure, which makes full use of contextual information and feature fusion networks to significantly improve the detection performance for small-scale and occluded faces. Specifically, the fusion of Swin Transformer and Convolutional Neural Networks (CNN) for feature extraction enhances the network’s ability to perceive global features; the Space to Depth (SPD-Conv) mapping is used to improve the network’s feature extraction in low-resolution and small-target detection tasks; furthermore, by adding fine-grained features, YOLO-SXS can significantly improve its performance for small-scale and occluded face detection capability; in addition, by adding a fine-grained feature fusion layer, feature information is retained to the maximum extent, which effectively reduces the loss of target information. The performance evaluation was performed on WIDER FACE, SCUT-HEAD and FDDB datasets, and the experimental results show that our proposed method significantly improves the performance of recognizing small-sized faces and achieves high detection rate and low error rate.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 5","pages":""},"PeriodicalIF":3.4000,"publicationDate":"2025-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Intelligence","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10489-025-06231-9","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Face detection serves as the core foundation for applications such as face analysis, recognition and reconstruction. In dense scenarios, the target scale difference is significant, and the instance pixels are too small as well as the mutual occlusion is serious leading to inconspicuous feature representation. However, existing detection methods rely on convolutional and pooling layers for feature extraction, with insufficient deep feature extraction and limited inference capability, leading to inaccurate recognition and high leakage rate. Therefore, we propose a small-scale face detection model YOLO-SXS based on the extended Transformer structure, which makes full use of contextual information and feature fusion networks to significantly improve the detection performance for small-scale and occluded faces. Specifically, the fusion of Swin Transformer and Convolutional Neural Networks (CNN) for feature extraction enhances the network’s ability to perceive global features; the Space to Depth (SPD-Conv) mapping is used to improve the network’s feature extraction in low-resolution and small-target detection tasks; furthermore, by adding fine-grained features, YOLO-SXS can significantly improve its performance for small-scale and occluded face detection capability; in addition, by adding a fine-grained feature fusion layer, feature information is retained to the maximum extent, which effectively reduces the loss of target information. The performance evaluation was performed on WIDER FACE, SCUT-HEAD and FDDB datasets, and the experimental results show that our proposed method significantly improves the performance of recognizing small-sized faces and achieves high detection rate and low error rate.
期刊介绍:
With a focus on research in artificial intelligence and neural networks, this journal addresses issues involving solutions of real-life manufacturing, defense, management, government and industrial problems which are too complex to be solved through conventional approaches and require the simulation of intelligent thought processes, heuristics, applications of knowledge, and distributed and parallel processing. The integration of these multiple approaches in solving complex problems is of particular importance.
The journal presents new and original research and technological developments, addressing real and complex issues applicable to difficult problems. It provides a medium for exchanging scientific research and technological achievements accomplished by the international community.