{"title":"TransFAS: Transformer-based network for Face Anti-Spoofing using Token Guided Inspection","authors":"Dipra Chaudhry, Harshi Goel, Bindu Verma","doi":"10.1109/I2CT57861.2023.10126455","DOIUrl":null,"url":null,"abstract":"Face IDs are becoming the most acceptable modality used for authentication purposes in many recognition systems. This makes it crucial for the recognition and authentication systems to carry out a spoof detection operation before performing facial recognition. The Face Anti-Spoofing (FAS) systems handle the task of identifying fakes. Traditionally, Convolutional Neural Networks (CNNs) have been used to detect spoofs. But, CNNs have certain limitations. One such limitation is that they are not very efficient in extracting the relative placement of different objects. In this paper, we propose a novel TransFAS system. It is based on Video Vision Transformer (VVT). The system takes a bunch of frames at a time and then extracts tokens from them. These tokens are flattened and then loaded with positional information to store the relative placement of each entity in a token. These embedded tokens are passed on to the Transformer Encoder. In the transformer encoder, work is done in different layers. Its final output is a prediction of whether the input sample is live or spoof (print attack, replay attack or 3D Mask attack). Our model is trained on Replay-Attack and 3DMAD datasets. Results show that our model performs better than most of the existing models.","PeriodicalId":150346,"journal":{"name":"2023 IEEE 8th International Conference for Convergence in Technology (I2CT)","volume":"84 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE 8th International Conference for Convergence in Technology (I2CT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/I2CT57861.2023.10126455","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Face IDs are becoming the most acceptable modality used for authentication purposes in many recognition systems. This makes it crucial for the recognition and authentication systems to carry out a spoof detection operation before performing facial recognition. The Face Anti-Spoofing (FAS) systems handle the task of identifying fakes. Traditionally, Convolutional Neural Networks (CNNs) have been used to detect spoofs. But, CNNs have certain limitations. One such limitation is that they are not very efficient in extracting the relative placement of different objects. In this paper, we propose a novel TransFAS system. It is based on Video Vision Transformer (VVT). The system takes a bunch of frames at a time and then extracts tokens from them. These tokens are flattened and then loaded with positional information to store the relative placement of each entity in a token. These embedded tokens are passed on to the Transformer Encoder. In the transformer encoder, work is done in different layers. Its final output is a prediction of whether the input sample is live or spoof (print attack, replay attack or 3D Mask attack). Our model is trained on Replay-Attack and 3DMAD datasets. Results show that our model performs better than most of the existing models.