{"title":"Text-Guided Synthesis of Masked Face Images","authors":"Anjali T, Masilamani V","doi":"10.1145/3654667","DOIUrl":null,"url":null,"abstract":"<p>The COVID-19 pandemic has made us all understand that wearing a face mask protects us from the spread of respiratory viruses. The face authentication systems, which are trained on the basis of facial key points such as the eyes, nose, and mouth, found it difficult to identify the person when the majority of the face is covered by the face mask. Removing the mask for authentication will cause the infection to spread. The possible solutions are: (a) to train the face recognition systems to identify the person with the upper face features (b) Reconstruct the complete face of the person with a generative model. (c) train the model with a dataset of the masked faces of the people. In this paper, we explore the scope of generative models for image synthesis. We used stable diffusion to generate masked face images of popular celebrities on various text prompts. A realistic dataset of 15K masked face images of 100 celebrities is generated and is called the Realistic Synthetic Masked Face Dataset (RSMFD). The model and the generated dataset will be made public so that researchers can augment the dataset. According to our knowledge, this is the largest masked face recognition dataset with realistic images. The generated images were tested on popular deep face recognition models and achieved significant results. The dataset is also trained and tested on some of the famous image classification models, and the results are competitive. The dataset is available on this link:- https://drive.google.com/drive/folders/1yetcgUOL1TOP4rod1geGsOkIrIJHtcEw?usp=sharing\n</p>","PeriodicalId":50937,"journal":{"name":"ACM Transactions on Multimedia Computing Communications and Applications","volume":"1 1","pages":""},"PeriodicalIF":5.2000,"publicationDate":"2024-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Multimedia Computing Communications and Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3654667","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
The COVID-19 pandemic has made us all understand that wearing a face mask protects us from the spread of respiratory viruses. The face authentication systems, which are trained on the basis of facial key points such as the eyes, nose, and mouth, found it difficult to identify the person when the majority of the face is covered by the face mask. Removing the mask for authentication will cause the infection to spread. The possible solutions are: (a) to train the face recognition systems to identify the person with the upper face features (b) Reconstruct the complete face of the person with a generative model. (c) train the model with a dataset of the masked faces of the people. In this paper, we explore the scope of generative models for image synthesis. We used stable diffusion to generate masked face images of popular celebrities on various text prompts. A realistic dataset of 15K masked face images of 100 celebrities is generated and is called the Realistic Synthetic Masked Face Dataset (RSMFD). The model and the generated dataset will be made public so that researchers can augment the dataset. According to our knowledge, this is the largest masked face recognition dataset with realistic images. The generated images were tested on popular deep face recognition models and achieved significant results. The dataset is also trained and tested on some of the famous image classification models, and the results are competitive. The dataset is available on this link:- https://drive.google.com/drive/folders/1yetcgUOL1TOP4rod1geGsOkIrIJHtcEw?usp=sharing
期刊介绍:
The ACM Transactions on Multimedia Computing, Communications, and Applications is the flagship publication of the ACM Special Interest Group in Multimedia (SIGMM). It is soliciting paper submissions on all aspects of multimedia. Papers on single media (for instance, audio, video, animation) and their processing are also welcome.
TOMM is a peer-reviewed, archival journal, available in both print form and digital form. The Journal is published quarterly; with roughly 7 23-page articles in each issue. In addition, all Special Issues are published online-only to ensure a timely publication. The transactions consists primarily of research papers. This is an archival journal and it is intended that the papers will have lasting importance and value over time. In general, papers whose primary focus is on particular multimedia products or the current state of the industry will not be included.