Recently, multimodal authentication methods based on deep learning have been widely explored in biometrics. Nevertheless, the contradiction between the data privacy protection and the requirement of sufficient data when model optimizing has become increasingly prominent. To this end, we proposes a multimodal biometric federated learning framework (FedMB) to realize the multiparty joint training of identity authentication models with different modal data while protecting the users’ data privacy. Specifically, a personalized multimodal biometric recognition model fully trained by each participant is first obtained to improve the authentication performance, using modal point clustering with class-first federated learning methods on the service side with the modal. Then a complementary multimodal biometric recognition strategy is implemented to build a complementary modal model. Finally, the fusion participant local model, with the modal model and complementary modal model, is trained by all participants again to obtain a more personalized modal model. The experimental results have demonstrated that the proposed FedMB can either protect the data privacy or utilize the data from all participants to train the personalized biometric recognition model to improve identity authentication performance.
{"title":"A Multimodal Biometric Recognition Method Based on Federated Learning","authors":"Guang Chen, Dacan Luo, Fengzhao Lian, Feng Tian, Xu Yang, Wenxiong Kang","doi":"10.1049/2024/5873909","DOIUrl":"10.1049/2024/5873909","url":null,"abstract":"<p>Recently, multimodal authentication methods based on deep learning have been widely explored in biometrics. Nevertheless, the contradiction between the data privacy protection and the requirement of sufficient data when model optimizing has become increasingly prominent. To this end, we proposes a multimodal biometric federated learning framework (FedMB) to realize the multiparty joint training of identity authentication models with different modal data while protecting the users’ data privacy. Specifically, a personalized multimodal biometric recognition model fully trained by each participant is first obtained to improve the authentication performance, using modal point clustering with class-first federated learning methods on the service side with the modal. Then a complementary multimodal biometric recognition strategy is implemented to build a complementary modal model. Finally, the fusion participant local model, with the modal model and complementary modal model, is trained by all participants again to obtain a more personalized modal model. The experimental results have demonstrated that the proposed FedMB can either protect the data privacy or utilize the data from all participants to train the personalized biometric recognition model to improve identity authentication performance.</p>","PeriodicalId":48821,"journal":{"name":"IET Biometrics","volume":"2024 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/2024/5873909","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142641417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Contactless palmprint recognition offers friendly customer experience due to its ability to operate without touching the recognition device under rigid constrained conditions. Recent palmprint recognition methods have shown promising accuracy; however, there still exist some issues that need to be further studied such as the limited discrimination of the single feature and how to effectively fuse deep features and shallow features. In this paper, deep features and shallow features are integrated into a unified framework using feature-level and score-level fusion methods. Specifically, deep feature is extracted by residual neural network (ResNet), and shallow features are extracted by principal component analysis (PCA), linear discriminant analysis (LDA), and competitive coding (CompCode). In feature-level fusion stage, ResNet feature and PCA feature are dimensionally reduced and fused by canonical correlation analysis technique to achieve the fused feature for the next stage. In score-level fusion stage, score information is embedded in the fused feature, LDA feature, and CompCode feature to obtain a more reliable and robust recognition performance. The proposed method achieves competitive performance on Tongji dataset and demonstrates more satisfying generalization capabilities on IITD and CASIA datasets. Comprehensive validation across three palmprint datasets confirms the effectiveness of our proposed deep and shallow feature fusion approach.
{"title":"Deep and Shallow Feature Fusion in Feature Score Level for Palmprint Recognition","authors":"Yihang Wu, Junlin Hu","doi":"10.1049/2024/5683547","DOIUrl":"10.1049/2024/5683547","url":null,"abstract":"<p>Contactless palmprint recognition offers friendly customer experience due to its ability to operate without touching the recognition device under rigid constrained conditions. Recent palmprint recognition methods have shown promising accuracy; however, there still exist some issues that need to be further studied such as the limited discrimination of the single feature and how to effectively fuse deep features and shallow features. In this paper, deep features and shallow features are integrated into a unified framework using feature-level and score-level fusion methods. Specifically, deep feature is extracted by residual neural network (ResNet), and shallow features are extracted by principal component analysis (PCA), linear discriminant analysis (LDA), and competitive coding (CompCode). In feature-level fusion stage, ResNet feature and PCA feature are dimensionally reduced and fused by canonical correlation analysis technique to achieve the fused feature for the next stage. In score-level fusion stage, score information is embedded in the fused feature, LDA feature, and CompCode feature to obtain a more reliable and robust recognition performance. The proposed method achieves competitive performance on Tongji dataset and demonstrates more satisfying generalization capabilities on IITD and CASIA datasets. Comprehensive validation across three palmprint datasets confirms the effectiveness of our proposed deep and shallow feature fusion approach.</p>","PeriodicalId":48821,"journal":{"name":"IET Biometrics","volume":"2024 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/2024/5683547","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142524913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Human behavior recognition is the process of automatically identifying and analyzing multiple human behaviors using modern technology. From previous studies, we find that redundant features not only slow down the model training process and increase the structural complexity but also degrade the overall performance of the model. To overcome this problem, this paper investigates a temporal convolutional neural network (TCN) model based on improved sparrow search algorithm random forest (SSARF) feature selection to accurately identify human behavioral traits based on wearable devices. The model is based on the TCN classification model and incorporates a random forest with the sparrow optimization algorithm to perform dimensionality reduction on the original features, which is used to remove poorly correlated and unimportant features and retain effective features with a certain contribution rate to generate the optimal feature subset. In order to verify the reliability of the method, the performance of the model was evaluated on two public datasets, UCI Human Activity Recognition and WISDM, respectively, and 98.54% and 97.83% recognition accuracies were obtained, which were improved by 0.47% and 1.04% compared to the prefeature selection, but the number of features was reduced by 84.31% and 32.50% compared to the original feature set. In addition, we compared the TCN classification model with other deep learning models in terms of evaluation metrics such as F1 score, recall, precision, and accuracy, and the results showed that the TCN model outperformed the other control models in all four metrics. Meanwhile, it also outperforms the existing recognition methods in terms of accuracy and other aspects, which have some practical application value.
{"title":"Research on TCN Model Based on SSARF Feature Selection in the Field of Human Behavior Recognition","authors":"Wei Zhang, Guibo Yu, Shijie Deng","doi":"10.1049/2024/4982277","DOIUrl":"10.1049/2024/4982277","url":null,"abstract":"<p>Human behavior recognition is the process of automatically identifying and analyzing multiple human behaviors using modern technology. From previous studies, we find that redundant features not only slow down the model training process and increase the structural complexity but also degrade the overall performance of the model. To overcome this problem, this paper investigates a temporal convolutional neural network (TCN) model based on improved sparrow search algorithm random forest (SSARF) feature selection to accurately identify human behavioral traits based on wearable devices. The model is based on the TCN classification model and incorporates a random forest with the sparrow optimization algorithm to perform dimensionality reduction on the original features, which is used to remove poorly correlated and unimportant features and retain effective features with a certain contribution rate to generate the optimal feature subset. In order to verify the reliability of the method, the performance of the model was evaluated on two public datasets, UCI Human Activity Recognition and WISDM, respectively, and 98.54% and 97.83% recognition accuracies were obtained, which were improved by 0.47% and 1.04% compared to the prefeature selection, but the number of features was reduced by 84.31% and 32.50% compared to the original feature set. In addition, we compared the TCN classification model with other deep learning models in terms of evaluation metrics such as <i>F</i><sub>1</sub> score, recall, precision, and accuracy, and the results showed that the TCN model outperformed the other control models in all four metrics. Meanwhile, it also outperforms the existing recognition methods in terms of accuracy and other aspects, which have some practical application value.</p>","PeriodicalId":48821,"journal":{"name":"IET Biometrics","volume":"2024 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/2024/4982277","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142359950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Min Li, Xue Jiang, Honghao Zhu, Fei Liu, Huabin Wang, Liang Tao, Shijun Liu
Structural features are capable of effectively capturing the overall texture variations in images. However, in locally prominent areas with visible veins, other characteristics such as directionality, convexity–concavity, and curvature also play a crucial role in recognition, and their impact cannot be overlooked. This paper introduces a novel approach, the histogram of variable curvature directional binary statistical (HVCDBS), which combines the structural and directional features of images. The proposed method is designed for extracting discriminative multifeature information in vein recognition. First, a multidirection and multicurvature Gabor filter is introduced for convolution with vein images, yielding directional and convexity–concavity information at each pixel, along with curvature information for the corresponding curve. Simultaneously incorporating the original image feature information, these four aspects of information are fused and encoded to construct a variable curvature binary pattern (VCBP) with multifeatures. Second, the feature map containing multifeature information is blockwise processed to build variable curvature binary statistical features. Finally, competitive Gabor directional binary statistical features are combined, and a matching score-level fusion scheme is employed based on maximizing the interclass distance and minimizing the intraclass distance to determine the optimal weights. This process fuses the two feature maps into a one-dimensional feature vector, achieving an effective representation of vein images. Extensive experiments were conducted on four widely utilized vein databases, and the results indicate that the proposed algorithm, compared with solely extraction of structural features, achieved higher recognition rates and lower equal error rates.
{"title":"A Finger Vein Recognition Algorithm Based on the Histogram of Variable Curvature Directional Binary Statistics","authors":"Min Li, Xue Jiang, Honghao Zhu, Fei Liu, Huabin Wang, Liang Tao, Shijun Liu","doi":"10.1049/2024/7408331","DOIUrl":"10.1049/2024/7408331","url":null,"abstract":"<p>Structural features are capable of effectively capturing the overall texture variations in images. However, in locally prominent areas with visible veins, other characteristics such as directionality, convexity–concavity, and curvature also play a crucial role in recognition, and their impact cannot be overlooked. This paper introduces a novel approach, the histogram of variable curvature directional binary statistical (HVCDBS), which combines the structural and directional features of images. The proposed method is designed for extracting discriminative multifeature information in vein recognition. First, a multidirection and multicurvature Gabor filter is introduced for convolution with vein images, yielding directional and convexity–concavity information at each pixel, along with curvature information for the corresponding curve. Simultaneously incorporating the original image feature information, these four aspects of information are fused and encoded to construct a variable curvature binary pattern (VCBP) with multifeatures. Second, the feature map containing multifeature information is blockwise processed to build variable curvature binary statistical features. Finally, competitive Gabor directional binary statistical features are combined, and a matching score-level fusion scheme is employed based on maximizing the interclass distance and minimizing the intraclass distance to determine the optimal weights. This process fuses the two feature maps into a one-dimensional feature vector, achieving an effective representation of vein images. Extensive experiments were conducted on four widely utilized vein databases, and the results indicate that the proposed algorithm, compared with solely extraction of structural features, achieved higher recognition rates and lower equal error rates.</p>","PeriodicalId":48821,"journal":{"name":"IET Biometrics","volume":"2024 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/2024/7408331","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142320637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pinar Santemiz, Luuk J. Spreeuwers, Raymond N. J. Veldhuis
Face recognition from side-view positions poses a considerable challenge in automatic face recognition tasks. Pose variation up to the side-view is an issue of difference in appearance and visibility since only one eye is visible at the side-view poses. Traditionally overlooked, recent advancements in deep learning have brought side-view poses to the forefront of research attention. This survey comprehensively investigates methods addressing pose variations up to side-view and categorizes research efforts into feature-based, image-based, and set-based pose handling. Unlike existing surveys addressing pose variations, our emphasis is specifically on extreme poses. We report numerous promising innovations in each category and contemplate the utilization and challenges associated with side-view. Furthermore, we introduce current datasets and benchmarks, conduct performance evaluations across diverse methods, and explore their unique constraints. Notably, while feature-based methods currently stand as the state-of-the-art, our observations suggest that cross-dataset evaluations, attempted by only a few researchers, produce worse results. Consequently, the challenge of matching arbitrary poses in uncontrolled settings persists.
{"title":"A Survey on Automatic Face Recognition Using Side-View Face Images","authors":"Pinar Santemiz, Luuk J. Spreeuwers, Raymond N. J. Veldhuis","doi":"10.1049/2024/7886911","DOIUrl":"10.1049/2024/7886911","url":null,"abstract":"<p>Face recognition from side-view positions poses a considerable challenge in automatic face recognition tasks. Pose variation up to the side-view is an issue of difference in appearance and visibility since only one eye is visible at the side-view poses. Traditionally overlooked, recent advancements in deep learning have brought side-view poses to the forefront of research attention. This survey comprehensively investigates methods addressing pose variations up to side-view and categorizes research efforts into feature-based, image-based, and set-based pose handling. Unlike existing surveys addressing pose variations, our emphasis is specifically on extreme poses. We report numerous promising innovations in each category and contemplate the utilization and challenges associated with side-view. Furthermore, we introduce current datasets and benchmarks, conduct performance evaluations across diverse methods, and explore their unique constraints. Notably, while feature-based methods currently stand as the state-of-the-art, our observations suggest that cross-dataset evaluations, attempted by only a few researchers, produce worse results. Consequently, the challenge of matching arbitrary poses in uncontrolled settings persists.</p>","PeriodicalId":48821,"journal":{"name":"IET Biometrics","volume":"2024 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/2024/7886911","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141966581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In recent years, fingerprint authentication has gained widespread adoption in diverse identification systems, including smartphones, wearable devices, and attendance machines, etc. Nonetheless, these systems are vulnerable to spoofing attacks from suspicious fingerprints, posing significant risks to privacy. Consequently, a fingerprint presentation attack detection (PAD) strategy is proposed to ensure the security of these systems. Most of the previous work concentrated on how to build a deep learning framework to improve the PAD performance by augmenting fingerprint samples, and little attention has been paid to the fundamental difference between live and fake fingerprints to optimize feature extractors. This paper proposes a new fingerprint liveness detection method based on Siamese attention residual convolutional neural network (Res-CNN) that offers an interpretative perspective to this challenge. To leverage the variance in ridge continuity features (RCFs) between live and fake fingerprints, a Gabor filter is utilized to enhance the texture details of the fingerprint ridges, followed by the construction of an attention Res-CNN model to extract RCF between the live and fake fingerprints. The model mitigates the performance deterioration caused by gradient disappearance. Furthermore, to highlight the difference in RCF, a Siamese attention residual network is devised, and the ridge continuity amplification loss function is designed to optimize the training process. Ultimately, the RCF parameters are transferred to the model, and transfer learning is utilized to aid its acquisition, thereby assuring the model’s interpretability. The experimental outcomes conducted on three publicly accessible fingerprint datasets demonstrate the superiority of the proposed method, exhibiting remarkable performance in both true detection rate and average classification error rate. Moreover, our method exhibits remarkable capabilities in PAD tasks, including cross-material experiments and cross-sensor experiments. Additionally, we leverage Gradient-weighted Class Activation Mapping to generate a heatmap that visualizes the interpretability of our model, offering a compelling visual validation.
近年来,指纹验证在智能手机、可穿戴设备和考勤机等各种身份识别系统中得到了广泛应用。然而,这些系统很容易受到可疑指纹的欺骗攻击,给隐私带来巨大风险。因此,我们提出了指纹呈现攻击检测(PAD)策略,以确保这些系统的安全性。以往的工作大多集中在如何构建一个深度学习框架,通过增强指纹样本来提高 PAD 性能,而很少有人关注活体指纹和假指纹之间的根本区别,以优化特征提取器。本文提出了一种基于连体注意残差卷积神经网络(Res-CNN)的新型指纹真实性检测方法,为应对这一挑战提供了一种解释性视角。为了利用真假指纹脊连续性特征(RCF)的差异,本文利用 Gabor 滤波器增强指纹脊的纹理细节,然后构建注意力残差卷积神经网络模型来提取真假指纹之间的 RCF。该模型可减轻因梯度消失而导致的性能下降。此外,为了突出 RCF 的差异,设计了一个连体注意残差网络,并设计了脊连续性放大损失函数来优化训练过程。最后,将 RCF 参数转移到模型中,并利用迁移学习来帮助模型的获取,从而确保模型的可解释性。在三个可公开访问的指纹数据集上进行的实验结果表明了所提方法的优越性,在真实检测率和平均分类错误率方面都表现出色。此外,我们的方法在 PAD 任务(包括跨材料实验和跨传感器实验)中表现出卓越的能力。此外,我们还利用梯度加权类激活映射生成热图,直观显示了我们模型的可解释性,提供了令人信服的可视化验证。
{"title":"An Interpretable Siamese Attention Res-CNN for Fingerprint Spoofing Detection","authors":"Chengsheng Yuan, Zhenyu Xu, Xinting Li, Zhili Zhou, Junhao Huang, Ping Guo","doi":"10.1049/2024/6630173","DOIUrl":"10.1049/2024/6630173","url":null,"abstract":"<p>In recent years, fingerprint authentication has gained widespread adoption in diverse identification systems, including smartphones, wearable devices, and attendance machines, etc. Nonetheless, these systems are vulnerable to spoofing attacks from suspicious fingerprints, posing significant risks to privacy. Consequently, a fingerprint presentation attack detection (PAD) strategy is proposed to ensure the security of these systems. Most of the previous work concentrated on how to build a deep learning framework to improve the PAD performance by augmenting fingerprint samples, and little attention has been paid to the fundamental difference between live and fake fingerprints to optimize feature extractors. This paper proposes a new fingerprint liveness detection method based on Siamese attention residual convolutional neural network (Res-CNN) that offers an interpretative perspective to this challenge. To leverage the variance in ridge continuity features (RCFs) between live and fake fingerprints, a Gabor filter is utilized to enhance the texture details of the fingerprint ridges, followed by the construction of an attention Res-CNN model to extract RCF between the live and fake fingerprints. The model mitigates the performance deterioration caused by gradient disappearance. Furthermore, to highlight the difference in RCF, a Siamese attention residual network is devised, and the ridge continuity amplification loss function is designed to optimize the training process. Ultimately, the RCF parameters are transferred to the model, and transfer learning is utilized to aid its acquisition, thereby assuring the model’s interpretability. The experimental outcomes conducted on three publicly accessible fingerprint datasets demonstrate the superiority of the proposed method, exhibiting remarkable performance in both true detection rate and average classification error rate. Moreover, our method exhibits remarkable capabilities in PAD tasks, including cross-material experiments and cross-sensor experiments. Additionally, we leverage Gradient-weighted Class Activation Mapping to generate a heatmap that visualizes the interpretability of our model, offering a compelling visual validation.</p>","PeriodicalId":48821,"journal":{"name":"IET Biometrics","volume":"2024 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/2024/6630173","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141631158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We propose original semantic labels for detailed face parsing to improve the accuracy of face recognition by focusing on parts in a face. The part labels used in conventional face parsing are defined based on biological features, and thus, one label is given to a large region, such as skin. Our semantic labels are defined by separating parts with large areas based on the structure of the face and considering the left and right sides for all parts to consider head pose changes, occlusion, and other factors. By utilizing the capability of assigning detailed part labels to face images, we propose a novel data augmentation method based on detailed face parsing called Face Semantic Erasing (FSErasing) to improve the performance of face recognition. FSErasing is to randomly mask a part of the face image based on the detailed part labels, and therefore, we can apply erasing-type data augmentation to the face image that considers the characteristics of the face. Through experiments using public face image datasets, we demonstrate that FSErasing is effective for improving the performance of face recognition and face attribute estimation. In face recognition, adding FSErasing in training ResNet-34 with Softmax using CelebA improves the average accuracy by 0.354 points and the average equal error rate (EER) by 0.312 points, and with ArcFace, the average accuracy and EER improve by 0.752 and 0.802 points, respectively. ResNet-50 with Softmax using CASIA-WebFace improves the average accuracy by 0.442 points and the average EER by 0.452 points, and with ArcFace, the average accuracy and EER improve by 0.228 points and 0.500 points, respectively. In face attribute estimation, adding FSErasing as a data augmentation method in training with CelebA improves the estimation accuracy by 0.54 points. We also apply our detailed face parsing model to visualize face recognition models and demonstrate its higher explainability than general visualization methods.
{"title":"FSErasing: Improving Face Recognition with Data Augmentation Using Face Parsing","authors":"Hiroya Kawai, Koichi Ito, Hwann-Tzong Chen, Takafumi Aoki","doi":"10.1049/2024/6663315","DOIUrl":"10.1049/2024/6663315","url":null,"abstract":"<p>We propose original semantic labels for detailed face parsing to improve the accuracy of face recognition by focusing on parts in a face. The part labels used in conventional face parsing are defined based on biological features, and thus, one label is given to a large region, such as skin. Our semantic labels are defined by separating parts with large areas based on the structure of the face and considering the left and right sides for all parts to consider head pose changes, occlusion, and other factors. By utilizing the capability of assigning detailed part labels to face images, we propose a novel data augmentation method based on detailed face parsing called Face Semantic Erasing (FSErasing) to improve the performance of face recognition. FSErasing is to randomly mask a part of the face image based on the detailed part labels, and therefore, we can apply erasing-type data augmentation to the face image that considers the characteristics of the face. Through experiments using public face image datasets, we demonstrate that FSErasing is effective for improving the performance of face recognition and face attribute estimation. In face recognition, adding FSErasing in training ResNet-34 with Softmax using CelebA improves the average accuracy by 0.354 points and the average equal error rate (EER) by 0.312 points, and with ArcFace, the average accuracy and EER improve by 0.752 and 0.802 points, respectively. ResNet-50 with Softmax using CASIA-WebFace improves the average accuracy by 0.442 points and the average EER by 0.452 points, and with ArcFace, the average accuracy and EER improve by 0.228 points and 0.500 points, respectively. In face attribute estimation, adding FSErasing as a data augmentation method in training with CelebA improves the estimation accuracy by 0.54 points. We also apply our detailed face parsing model to visualize face recognition models and demonstrate its higher explainability than general visualization methods.</p>","PeriodicalId":48821,"journal":{"name":"IET Biometrics","volume":"2024 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/2024/6663315","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141308918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Huimin She, Yongjian Hu, Beibei Liu, Chang-Tsun Li
Identity-based Deepfake detection methods have the potential to improve the generalization, robustness, and interpretability of the model. However, current identity-based methods either require a reference or can only be used to detect face replacement but not face reenactment. In this paper, we propose a novel Deepfake video detection approach based on identity anomalies. We observe two types of identity anomalies: the inconsistency between clip-level static ID (facial appearance) and clip-level dynamic ID (facial behavior) and the temporal inconsistency of image-level static IDs. Since these two types of anomalies can be detected through self-consistency and do not depend on the manipulation type, our method is a reference-free and manipulation-independent approach. Specifically, our detection network consists of two branches: the static–dynamic ID discrepancy detection branch for the inconsistency between dynamic and static ID and the temporal static ID anomaly detection branch for the temporal anomaly of static ID. We combine the outputs of the two branches by weighted averaging to obtain the final detection result. We also designed two loss functions: the static–dynamic ID matching loss and the dynamic ID constraint loss, to enhance the representation and discriminability of dynamic ID. We conduct experiments on four benchmark datasets and compare our method with the state-of-the-art methods. Results show that our method can detect not only face replacement but also face reenactment, and also has better detection performance over the state-of-the-art methods on unknown datasets. It also has superior robustness against compression. Identity-based features provide a good explanation of the detection results.
基于身份的 Deepfake 检测方法有可能提高模型的通用性、鲁棒性和可解释性。然而,目前基于身份的方法要么需要参照物,要么只能用于检测人脸替换,而不能检测人脸重现。在本文中,我们提出了一种基于身份异常的新型 Deepfake 视频检测方法。我们观察到两类身份异常:片段级静态 ID(面部外观)和片段级动态 ID(面部行为)之间的不一致性,以及图像级静态 ID 的时间不一致性。由于这两类异常可以通过自洽性检测出来,并且不依赖于操作类型,因此我们的方法是一种无参照、不依赖于操作的方法。具体来说,我们的检测网络由两个分支组成:静态-动态 ID 差异检测分支,用于检测动态 ID 和静态 ID 之间的不一致;时间静态 ID 异常检测分支,用于检测静态 ID 的时间异常。我们通过加权平均的方式将两个分支的输出结果合并,得到最终的检测结果。我们还设计了两个损失函数:静态-动态 ID 匹配损失和动态 ID 约束损失,以增强动态 ID 的代表性和可辨别性。我们在四个基准数据集上进行了实验,并将我们的方法与最先进的方法进行了比较。结果表明,我们的方法不仅能检测到人脸替换,还能检测到人脸重演,而且在未知数据集上的检测性能优于最先进的方法。此外,该方法还具有卓越的抗压缩鲁棒性。基于身份的特征很好地解释了检测结果。
{"title":"Exploring Static–Dynamic ID Matching and Temporal Static ID Inconsistency for Generalizable Deepfake Detection","authors":"Huimin She, Yongjian Hu, Beibei Liu, Chang-Tsun Li","doi":"10.1049/2024/2280143","DOIUrl":"10.1049/2024/2280143","url":null,"abstract":"<p>Identity-based Deepfake detection methods have the potential to improve the generalization, robustness, and interpretability of the model. However, current identity-based methods either require a reference or can only be used to detect face replacement but not face reenactment. In this paper, we propose a novel Deepfake video detection approach based on identity anomalies. We observe two types of identity anomalies: the inconsistency between clip-level static ID (facial appearance) and clip-level dynamic ID (facial behavior) and the temporal inconsistency of image-level static IDs. Since these two types of anomalies can be detected through self-consistency and do not depend on the manipulation type, our method is a reference-free and manipulation-independent approach. Specifically, our detection network consists of two branches: the static–dynamic ID discrepancy detection branch for the inconsistency between dynamic and static ID and the temporal static ID anomaly detection branch for the temporal anomaly of static ID. We combine the outputs of the two branches by weighted averaging to obtain the final detection result. We also designed two loss functions: the static–dynamic ID matching loss and the dynamic ID constraint loss, to enhance the representation and discriminability of dynamic ID. We conduct experiments on four benchmark datasets and compare our method with the state-of-the-art methods. Results show that our method can detect not only face replacement but also face reenactment, and also has better detection performance over the state-of-the-art methods on unknown datasets. It also has superior robustness against compression. Identity-based features provide a good explanation of the detection results.</p>","PeriodicalId":48821,"journal":{"name":"IET Biometrics","volume":"2024 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/2024/2280143","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141298409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hengnian Qi, Gang Zeng, Keke Jia, Chu Zhang, Xiaoping Wu, Mengxia Li, Qing Lang, Lingxuan Wang
The quality of people’s lives is closely related to their emotional state. Positive emotions can boost confidence and help overcome difficulties, while negative emotions can harm both physical and mental health. Research has shown that people’s handwriting is associated with their emotions. In this study, audio-visual media were used to induce emotions, and a dot-matrix digital pen was used to collect neutral text data written by participants in three emotional states: calm, happy, and sad. To address the challenge of limited samples, a novel conditional table generative adversarial network called conditional tabular-generative adversarial network (CTAB-GAN) was used to increase the number of task samples, and the recognition accuracy of task samples improved by 4.18%. The TabNet (a neural network designed for tabular data) with SimAM (a simple, parameter-free attention module) was employed and compared with the original TabNet and traditional machine learning models; the incorporation of the SimAm attention mechanism led to a 1.35% improvement in classification accuracy. Experimental results revealed significant differences between negative (sad) and nonnegative (calm and happy) emotions, with a recognition accuracy of 80.67%. Overall, this study demonstrated the feasibility of emotion recognition based on handwriting with the assistance of CTAB-GAN and SimAm-TabNet. It provides guidance for further research on emotion recognition or other handwriting-based applications.
{"title":"Emotion Recognition Based on Handwriting Using Generative Adversarial Networks and Deep Learning","authors":"Hengnian Qi, Gang Zeng, Keke Jia, Chu Zhang, Xiaoping Wu, Mengxia Li, Qing Lang, Lingxuan Wang","doi":"10.1049/2024/5351588","DOIUrl":"10.1049/2024/5351588","url":null,"abstract":"<p>The quality of people’s lives is closely related to their emotional state. Positive emotions can boost confidence and help overcome difficulties, while negative emotions can harm both physical and mental health. Research has shown that people’s handwriting is associated with their emotions. In this study, audio-visual media were used to induce emotions, and a dot-matrix digital pen was used to collect neutral text data written by participants in three emotional states: calm, happy, and sad. To address the challenge of limited samples, a novel conditional table generative adversarial network called conditional tabular-generative adversarial network (CTAB-GAN) was used to increase the number of task samples, and the recognition accuracy of task samples improved by 4.18%. The TabNet (a neural network designed for tabular data) with SimAM (a simple, parameter-free attention module) was employed and compared with the original TabNet and traditional machine learning models; the incorporation of the SimAm attention mechanism led to a 1.35% improvement in classification accuracy. Experimental results revealed significant differences between negative (sad) and nonnegative (calm and happy) emotions, with a recognition accuracy of 80.67%. Overall, this study demonstrated the feasibility of emotion recognition based on handwriting with the assistance of CTAB-GAN and SimAm-TabNet. It provides guidance for further research on emotion recognition or other handwriting-based applications.</p>","PeriodicalId":48821,"journal":{"name":"IET Biometrics","volume":"2024 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/2024/5351588","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141246105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tuğçe Arıcan, Raymond Veldhuis, Luuk Spreeuwers, Loïc Bergeron, Christoph Busch, Ehsaneddin Jalilian, Christof Kauba, Simon Kirchgasser, Sébastien Marcel, Bernhard Prommegger, Kiran Raja, Raghavendra Ramachandra, Andreas Uhl
Finger vein recognition is gaining popularity in the field of biometrics, yet the inter-operability of finger vein patterns has received limited attention. This study aims to fill this gap by introducing a cross-device finger vein dataset and evaluating the performance of finger vein recognition across devices using a classical method, a convolutional neural network, and our proposed patch-based convolutional auto-encoder (CAE). The findings emphasise the importance of standardisation of finger vein recognition, similar to that of fingerprints or irises, crucial for achieving inter-operability. Despite the inherent challenges of cross-device recognition, the proposed CAE architecture in this study demonstrates promising results in finger vein recognition, particularly in the context of cross-device comparisons.
{"title":"A Comparative Study of Cross-Device Finger Vein Recognition Using Classical and Deep Learning Approaches","authors":"Tuğçe Arıcan, Raymond Veldhuis, Luuk Spreeuwers, Loïc Bergeron, Christoph Busch, Ehsaneddin Jalilian, Christof Kauba, Simon Kirchgasser, Sébastien Marcel, Bernhard Prommegger, Kiran Raja, Raghavendra Ramachandra, Andreas Uhl","doi":"10.1049/2024/3236602","DOIUrl":"10.1049/2024/3236602","url":null,"abstract":"<p>Finger vein recognition is gaining popularity in the field of biometrics, yet the inter-operability of finger vein patterns has received limited attention. This study aims to fill this gap by introducing a cross-device finger vein dataset and evaluating the performance of finger vein recognition across devices using a classical method, a convolutional neural network, and our proposed patch-based convolutional auto-encoder (CAE). The findings emphasise the importance of standardisation of finger vein recognition, similar to that of fingerprints or irises, crucial for achieving inter-operability. Despite the inherent challenges of cross-device recognition, the proposed CAE architecture in this study demonstrates promising results in finger vein recognition, particularly in the context of cross-device comparisons.</p>","PeriodicalId":48821,"journal":{"name":"IET Biometrics","volume":"2024 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/2024/3236602","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140381478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}