Junze Zheng, Xinping Gao, Ajian Liu, Haocheng Yuan, Jun Wan, Yanyan Liang, Jiankang Deng, Sergio Escalera, Hugo Jair Escalante, Zhen Lei, Isabelle Guyon, Du Zhang
Face antispoofing (FAS) technologies play a pivotal role in safeguarding face recognition (FR) systems against potential security loopholes. The biometric community has witnessed significant advancements lately, largely due to the exceptional performance of deep learning architectures and the abundance of substantial datasets. Despite these progress, FR systems remain susceptible to both physical and digital forgery attacks. However, most existing detection methods do not address both types of threats concurrently. To bridge this gap and foster the development of a comprehensive detection framework, we have compiled a unified dataset named UniAttackData. This dataset incorporates both physical and digital spoofing attacks while maintaining identity consistency, encompassing 1800 participants each subjected to two different physical attacks (PAs) and 12 different digital attacks (DAs), respectively. This effort has resulted in a comprehensive collection of 29,706 video samples. We organized the Chalearn FAS face attack detection challenge based on this novel resource to boost research aiming to promote joint antispoofing efforts. The Chalearn unified antispoofing attack detection challenge drew 136 teams during the development phase, with 13 teams advancing to the final round. The organizing team revalidated and re-executed the submitted code to determine the final rankings. This paper provides a summary of the challenge, covering the dataset used, the protocol definition, the evaluation metrics, and the competition results. Additionally, we discuss the top-ranked algorithms and the research insights offered by the competition for attack detection.
{"title":"Unified Physical–Digital Face Attack Detection Challenge: A Review","authors":"Junze Zheng, Xinping Gao, Ajian Liu, Haocheng Yuan, Jun Wan, Yanyan Liang, Jiankang Deng, Sergio Escalera, Hugo Jair Escalante, Zhen Lei, Isabelle Guyon, Du Zhang","doi":"10.1049/bme2/9653627","DOIUrl":"https://doi.org/10.1049/bme2/9653627","url":null,"abstract":"<p>Face antispoofing (FAS) technologies play a pivotal role in safeguarding face recognition (FR) systems against potential security loopholes. The biometric community has witnessed significant advancements lately, largely due to the exceptional performance of deep learning architectures and the abundance of substantial datasets. Despite these progress, FR systems remain susceptible to both physical and digital forgery attacks. However, most existing detection methods do not address both types of threats concurrently. To bridge this gap and foster the development of a comprehensive detection framework, we have compiled a unified dataset named UniAttackData. This dataset incorporates both physical and digital spoofing attacks while maintaining identity consistency, encompassing 1800 participants each subjected to two different physical attacks (PAs) and 12 different digital attacks (DAs), respectively. This effort has resulted in a comprehensive collection of 29,706 video samples. We organized the Chalearn FAS face attack detection challenge based on this novel resource to boost research aiming to promote joint antispoofing efforts. The Chalearn unified antispoofing attack detection challenge drew 136 teams during the development phase, with 13 teams advancing to the final round. The organizing team revalidated and re-executed the submitted code to determine the final rankings. This paper provides a summary of the challenge, covering the dataset used, the protocol definition, the evaluation metrics, and the competition results. Additionally, we discuss the top-ranked algorithms and the research insights offered by the competition for attack detection.</p>","PeriodicalId":48821,"journal":{"name":"IET Biometrics","volume":"2026 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2026-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/bme2/9653627","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146139148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Facial expression recognition is vital in pattern recognition and affective computing. With the advancement of deep learning, its performance has improved, yet challenges remain in nonlaboratory environments due to occlusion, poor lighting, and varying head poses. This study explores a robust facial expression recognition approach using a CNN-based model integrated with key point localization techniques. Instead of relying on a dense set of landmarks, the proposed method focuses on fewer but more informative expression key points. Each point is analyzed for local shape features, and contour consistency is verified using indexing along the normal direction. This strategy enhances robustness while reducing computational complexity. Specifically, the hybrid active shape model (ASM) + structure method significantly lowers the processing load compared to the traditional ASM approach. Experimental results demonstrate a 3.02% improvement in recognition accuracy over one-to-many SVM classifiers when dealing with clear facial images. Furthermore, the system shows strong resilience to partial occlusions and maintains real-time performance, making it suitable for real-world applications. The proposed framework highlights the importance of selecting effective key points and optimizing feature extraction to enhance both accuracy and efficiency in facial expression recognition tasks under challenging conditions.
{"title":"Robustness Analysis of Distributed CNN Model Training in Expression Recognition","authors":"Jun Li","doi":"10.1049/bme2/4107824","DOIUrl":"https://doi.org/10.1049/bme2/4107824","url":null,"abstract":"<p>Facial expression recognition is vital in pattern recognition and affective computing. With the advancement of deep learning, its performance has improved, yet challenges remain in nonlaboratory environments due to occlusion, poor lighting, and varying head poses. This study explores a robust facial expression recognition approach using a CNN-based model integrated with key point localization techniques. Instead of relying on a dense set of landmarks, the proposed method focuses on fewer but more informative expression key points. Each point is analyzed for local shape features, and contour consistency is verified using indexing along the normal direction. This strategy enhances robustness while reducing computational complexity. Specifically, the hybrid active shape model (ASM) + structure method significantly lowers the processing load compared to the traditional ASM approach. Experimental results demonstrate a 3.02% improvement in recognition accuracy over one-to-many SVM classifiers when dealing with clear facial images. Furthermore, the system shows strong resilience to partial occlusions and maintains real-time performance, making it suitable for real-world applications. The proposed framework highlights the importance of selecting effective key points and optimizing feature extraction to enhance both accuracy and efficiency in facial expression recognition tasks under challenging conditions.</p>","PeriodicalId":48821,"journal":{"name":"IET Biometrics","volume":"2026 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2026-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/bme2/4107824","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146148306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Deep learning has significantly improved the performance of fingerprint liveness detection, while susceptibility to adversarial attacks remains a critical security challenge. Existing input transformation–based defense methods, including JPEG compression, total variance minimization (TVM), high-level representation guided denoiser (HGD), and Defed, are typically designed for specific attacks, resulting in limited generalization across diverse adversarial scenarios. Experimental analysis indicates that among the four defense methods based on input transformation, Defed achieves the best overall performance when evaluated against both momentum iterative fast gradient sign method (MI-FGSM) and DeepFool attacks. However, Defed exhibits strong robustness against MI-FGSM attacks but demonstrates insufficient defense effectiveness against DeepFool attacks. To address this issue, an improved method of Defed has been proposed by integrating a learnable Gaussian noise module into the core structure to enable adaptive suppression of adversarial perturbations, and by employing 1 × 1 convolutions to allow cross-channel information interaction, thereby enhancing feature consistency and overall robustness. Experimental results on the LivDet 2015 dataset demonstrate that the defense success rate against DeepFool attacks has increased by 3%–5%, while strong robustness against MI-FGSM attacks has been maintained, substantially improving the security and reliability of fingerprint liveness detection systems.
{"title":"Introducing Learnable Gaussian Noise Into Defed for Enhanced Defense Against Adversarial Attacks in Fingerprint Liveness Detection","authors":"Shuifa Sun, Shaohua Hu, Yifei Wang, Wanyi Zheng, Jianming Lin, Sani M. Abdullahi, Jian Zhang","doi":"10.1049/bme2/5664546","DOIUrl":"https://doi.org/10.1049/bme2/5664546","url":null,"abstract":"<p>Deep learning has significantly improved the performance of fingerprint liveness detection, while susceptibility to adversarial attacks remains a critical security challenge. Existing input transformation–based defense methods, including JPEG compression, total variance minimization (TVM), high-level representation guided denoiser (HGD), and Defed, are typically designed for specific attacks, resulting in limited generalization across diverse adversarial scenarios. Experimental analysis indicates that among the four defense methods based on input transformation, Defed achieves the best overall performance when evaluated against both momentum iterative fast gradient sign method (MI-FGSM) and DeepFool attacks. However, Defed exhibits strong robustness against MI-FGSM attacks but demonstrates insufficient defense effectiveness against DeepFool attacks. To address this issue, an improved method of Defed has been proposed by integrating a learnable Gaussian noise module into the core structure to enable adaptive suppression of adversarial perturbations, and by employing 1 × 1 convolutions to allow cross-channel information interaction, thereby enhancing feature consistency and overall robustness. Experimental results on the LivDet 2015 dataset demonstrate that the defense success rate against DeepFool attacks has increased by 3%–5%, while strong robustness against MI-FGSM attacks has been maintained, substantially improving the security and reliability of fingerprint liveness detection systems.</p>","PeriodicalId":48821,"journal":{"name":"IET Biometrics","volume":"2026 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2026-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/bme2/5664546","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146002409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Syed Konain Abbas, Sandip Purnapatra, M. G. Sarwar Murshed, Conor Miller-Lynch, Lambert Igene, Soumyabrata Dey, Stephanie Schuckers, Faraz Hussain
Large fingerprint datasets, while important for training and evaluation, are time-consuming and expensive to collect and require strict privacy measures. Researchers are exploring the use of synthetic fingerprint data to address these issues. This article presents a novel approach for generating synthetic fingerprint images (both spoof and live), addressing concerns related to privacy, cost, and accessibility in biometric data collection. Our approach utilizes conditional StyleGAN2-ADA and StyleGAN3 architectures to produce high-resolution synthetic live fingerprints, conditioned on specific finger identities (thumb through little finger). Additionally, we employ CycleGANs to translate these into realistic spoof fingerprints, simulating a variety of presentation attack materials (e.g., EcoFlex, Play-Doh). These synthetic spoof fingerprints are crucial for developing robust spoof detection systems. Through these generative models, we created two synthetic datasets (DB2 and DB3), each containing 1500 fingerprint images of all 10 fingers with multiple impressions per finger, and including corresponding spoofs in eight material types. The results indicate robust performance: our StyleGAN3 model achieves a Fréchet inception distance (FID) as low as 5, and the generated fingerprints achieve a true acceptance rate (TAR) of 99.47% at a 0.01% false acceptance rate (FAR). The StyleGAN2-ADA model achieved a TAR of 98.67% at the same 0.01% FAR. We assess fingerprint quality using standard metrics (NFIQ2, MINDTCT), and notably, matching experiments confirm strong privacy preservation, with no significant evidence of identity leakage, confirming the strong privacy-preserving properties of our synthetic datasets.
{"title":"Conditional Synthetic Live and Spoof Fingerprint Generation","authors":"Syed Konain Abbas, Sandip Purnapatra, M. G. Sarwar Murshed, Conor Miller-Lynch, Lambert Igene, Soumyabrata Dey, Stephanie Schuckers, Faraz Hussain","doi":"10.1049/bme2/7736489","DOIUrl":"https://doi.org/10.1049/bme2/7736489","url":null,"abstract":"<p>Large fingerprint datasets, while important for training and evaluation, are time-consuming and expensive to collect and require strict privacy measures. Researchers are exploring the use of synthetic fingerprint data to address these issues. This article presents a novel approach for generating synthetic fingerprint images (both spoof and live), addressing concerns related to privacy, cost, and accessibility in biometric data collection. Our approach utilizes conditional StyleGAN2-ADA and StyleGAN3 architectures to produce high-resolution synthetic live fingerprints, conditioned on specific finger identities (thumb through little finger). Additionally, we employ CycleGANs to translate these into realistic spoof fingerprints, simulating a variety of presentation attack materials (e.g., EcoFlex, Play-Doh). These synthetic spoof fingerprints are crucial for developing robust spoof detection systems. Through these generative models, we created two synthetic datasets (DB2 and DB3), each containing 1500 fingerprint images of all 10 fingers with multiple impressions per finger, and including corresponding spoofs in eight material types. The results indicate robust performance: our StyleGAN3 model achieves a Fréchet inception distance (FID) as low as 5, and the generated fingerprints achieve a true acceptance rate (TAR) of 99.47% at a 0.01% false acceptance rate (FAR). The StyleGAN2-ADA model achieved a TAR of 98.67% at the same 0.01% FAR. We assess fingerprint quality using standard metrics (NFIQ2, MINDTCT), and notably, matching experiments confirm strong privacy preservation, with no significant evidence of identity leakage, confirming the strong privacy-preserving properties of our synthetic datasets.</p>","PeriodicalId":48821,"journal":{"name":"IET Biometrics","volume":"2026 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2026-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/bme2/7736489","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145964221","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Roberto Gallardo-Cava, David Ortega-DelCampo, Daniel Palacios-Alonso, Javier M. Moguerza, Cristina Conde, Enrique Cabello
This study introduces a reinforcement training framework for face recognition systems (FRSs) that leverages facial morphing techniques to generate counterfactual visual instances for model enhancement. Two complementary morphing strategies were employed: a geometric approach based on Delaunay–Voronoi triangulation (DVT-Morph) and a generative approach using latent diffusion and autoencoder-based models (diffusion-based morphing [MorDIFF]). The generated morphs act as controlled counterfactuals, representing minimally modified facial images that induce changes in FRS verification decisions. The proposed method integrates these counterfactuals into the training process of two state-of-the-art recognition systems, ArcFace and MagFace, to strengthen their decision boundaries and improve their robustness, calibration, and explainability. By combining morphing-based counterfactual generation with eXplainable Artificial Intelligence (XAI) techniques, the framework enables a more interpretable embedding space and increased resilience against morphing and adversarial perturbations. The experimental results demonstrate that the inclusion of morph-based counterfactuals significantly enhances the verification accuracy and decision transparency of modern FRSs. Moreover, the methodology is model- and morphing-agnostic and can be applied to any FRS architecture, regardless of the morphing generation technique.
{"title":"Reinforcement Training of Face Recognition Systems Using Morphing and XAI Methods","authors":"Roberto Gallardo-Cava, David Ortega-DelCampo, Daniel Palacios-Alonso, Javier M. Moguerza, Cristina Conde, Enrique Cabello","doi":"10.1049/bme2/7897011","DOIUrl":"https://doi.org/10.1049/bme2/7897011","url":null,"abstract":"<p>This study introduces a reinforcement training framework for face recognition systems (FRSs) that leverages facial morphing techniques to generate counterfactual visual instances for model enhancement. Two complementary morphing strategies were employed: a geometric approach based on Delaunay–Voronoi triangulation (DVT-Morph) and a generative approach using latent diffusion and autoencoder-based models (diffusion-based morphing [MorDIFF]). The generated morphs act as controlled counterfactuals, representing minimally modified facial images that induce changes in FRS verification decisions. The proposed method integrates these counterfactuals into the training process of two state-of-the-art recognition systems, ArcFace and MagFace, to strengthen their decision boundaries and improve their robustness, calibration, and explainability. By combining morphing-based counterfactual generation with eXplainable Artificial Intelligence (XAI) techniques, the framework enables a more interpretable embedding space and increased resilience against morphing and adversarial perturbations. The experimental results demonstrate that the inclusion of morph-based counterfactuals significantly enhances the verification accuracy and decision transparency of modern FRSs. Moreover, the methodology is model- and morphing-agnostic and can be applied to any FRS architecture, regardless of the morphing generation technique.</p>","PeriodicalId":48821,"journal":{"name":"IET Biometrics","volume":"2026 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/bme2/7897011","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145909230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In the face of the mounting challenges posed by cybersecurity threats, there is an imperative for the development of robust identity authentication systems to safeguard sensitive user data. Conventional biometric authentication methods, such as fingerprinting and facial recognition, are vulnerable to spoofing attacks. In contrast, electrocardiogram (ECG) signals offer distinct advantages as dynamic, “liveness”-assured biomarkers, exhibiting individual specificity. This study proposes a novel fusion network model, the convolutional neural network (CNN)-transformer fusion network (CTFN), to achieve high-precision ECG-based identity authentication by synergizing local feature extraction and global signal correlation analysis. The proposed framework integrates a multistage enhanced CNN to capture fine-grained local patterns in ECG morphology and a transformer encoder to model long-range dependencies in heartbeat sequences. An adaptive weighting mechanism dynamically optimizes the contributions of both modules during feature fusion. The efficacy of CTFN was evaluated in three critical real-world scenarios: single/multi-heartbeat authentication, cross-temporal consistency, and emotional variability resistance. The evaluation was conducted on 283 subjects from four public ECG databases: CYBHi, PTB, ECG-ID, and MIT-BIH. The CYBHi dataset revealed that CTFN exhibited a state-of-the-art recognition accuracy of 98.46%, 80.95%, and 90.76%, respectively, signifying its remarkable performance. Notably, the model attained a 100% authentication accuracy rate using only six heartbeats. This represents a 25% decrease in input requirements when compared to prior works, while concurrently maintaining its robust performance against physiological variations induced by emotional states or temporal gaps. These results demonstrate that CTFN significantly advances the practicality of ECG biometrics by balancing high accuracy with minimal data acquisition demands, offering a scalable and spoof-resistant solution for secure authentication systems.
{"title":"CTFN: Multistage CNN-Transformer Fusion Network for ECG Authentication","authors":"Heng Jia, Zhidong Zhao, Yefei Zhang, Xianfei Zhang, Yanjun Deng, Yongguang Wang, Hao Wang, Pengfei Jiao","doi":"10.1049/bme2/8757767","DOIUrl":"https://doi.org/10.1049/bme2/8757767","url":null,"abstract":"<p>In the face of the mounting challenges posed by cybersecurity threats, there is an imperative for the development of robust identity authentication systems to safeguard sensitive user data. Conventional biometric authentication methods, such as fingerprinting and facial recognition, are vulnerable to spoofing attacks. In contrast, electrocardiogram (ECG) signals offer distinct advantages as dynamic, “liveness”-assured biomarkers, exhibiting individual specificity. This study proposes a novel fusion network model, the convolutional neural network (CNN)-transformer fusion network (CTFN), to achieve high-precision ECG-based identity authentication by synergizing local feature extraction and global signal correlation analysis. The proposed framework integrates a multistage enhanced CNN to capture fine-grained local patterns in ECG morphology and a transformer encoder to model long-range dependencies in heartbeat sequences. An adaptive weighting mechanism dynamically optimizes the contributions of both modules during feature fusion. The efficacy of CTFN was evaluated in three critical real-world scenarios: single/multi-heartbeat authentication, cross-temporal consistency, and emotional variability resistance. The evaluation was conducted on 283 subjects from four public ECG databases: CYBHi, PTB, ECG-ID, and MIT-BIH. The CYBHi dataset revealed that CTFN exhibited a state-of-the-art recognition accuracy of 98.46%, 80.95%, and 90.76%, respectively, signifying its remarkable performance. Notably, the model attained a 100% authentication accuracy rate using only six heartbeats. This represents a 25% decrease in input requirements when compared to prior works, while concurrently maintaining its robust performance against physiological variations induced by emotional states or temporal gaps. These results demonstrate that CTFN significantly advances the practicality of ECG biometrics by balancing high accuracy with minimal data acquisition demands, offering a scalable and spoof-resistant solution for secure authentication systems.</p>","PeriodicalId":48821,"journal":{"name":"IET Biometrics","volume":"2025 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/bme2/8757767","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145905238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The last few years have witnessed an exponential rise in smart infrastructures including internet of things (IoT)-driven human–machine interface (HMI), automatic tailoring machine (ATM), smart home, data access control interfaces, and varied other e-infrastructures that demand robust person authentication systems. However, guaranteeing scalability with minimum latency and power consumption in the aforesaid applications remains a challenge. Unlike cryptographic methods, the biometric authentication methods seem to be more efficient, especially in terms of their ability toward non-repudiation, low-computational cost, low complexity, and minimum latency. The world has witnessed cyber breaches due to mimicking tools or techniques. In addition, standalone biometrics might be prone to false positives over a large application environment. It indicates the need of a multi-metric biometrics solution to guarantee robust and reliable personal authentication and verification task. To cope up the aforesaid demands, in this paper, a novel deep spatiotextural (DST) feature learning-driven multimodal biometric is proposed. Unlike traditional biometric solutions, we applied iris and fingerprint images altogether to achieve a robust person authentication solution. Here, the both biometrics input images (i.e., fingerprint and iris) were processed for preprocessing tasks such asintensity and histogram equalization, and Z-score normalization and resizing. Subsequently, firefly heuristic-driven fuzzy C-means (FCMs) clustering (FFCM) algorithm is developed to segment region-of-interest (ROI) from the input fingerprint and iris images. The segmented ROI-specific color images were processed for the DST feature extraction by using gray level co-occurrence metrics (GLCMs) and ResNet101 deep network. The extracted DST features were processed for feature-level fusion, and thus, the composite feature vector obtained was processed for multiclass classification by using random forest (RF) ensemble classifier. The simulation results confirmed (user) verification accuracy of 99.74%, 98.86%, recall 98.49%, and F-measure 98.67%, signifying its superiority over other state-of-the-arts. The feature learning robustness over the targeted multimetric biometrics confirms its suitability for real-world person authentication tasks.
{"title":"Deep Spatiotextural Feature Learning-Driven Multimetric Biometric Authentication System for Strategic Smart Infrastructures: An Iris–Fingerprint Multimodality Solution","authors":"Chethana J., Ravi J.","doi":"10.1049/bme2/9919250","DOIUrl":"10.1049/bme2/9919250","url":null,"abstract":"<p>The last few years have witnessed an exponential rise in smart infrastructures including internet of things (IoT)-driven human–machine interface (HMI), automatic tailoring machine (ATM), smart home, data access control interfaces, and varied other e-infrastructures that demand robust person authentication systems. However, guaranteeing scalability with minimum latency and power consumption in the aforesaid applications remains a challenge. Unlike cryptographic methods, the biometric authentication methods seem to be more efficient, especially in terms of their ability toward non-repudiation, low-computational cost, low complexity, and minimum latency. The world has witnessed cyber breaches due to mimicking tools or techniques. In addition, standalone biometrics might be prone to false positives over a large application environment. It indicates the need of a multi-metric biometrics solution to guarantee robust and reliable personal authentication and verification task. To cope up the aforesaid demands, in this paper, a novel deep spatiotextural (DST) feature learning-driven multimodal biometric is proposed. Unlike traditional biometric solutions, we applied iris and fingerprint images altogether to achieve a robust person authentication solution. Here, the both biometrics input images (i.e., fingerprint and iris) were processed for preprocessing tasks such asintensity and histogram equalization, and <i>Z</i>-score normalization and resizing. Subsequently, firefly heuristic-driven fuzzy <i>C</i>-means (FCMs) clustering (FFCM) algorithm is developed to segment region-of-interest (ROI) from the input fingerprint and iris images. The segmented ROI-specific color images were processed for the DST feature extraction by using gray level co-occurrence metrics (GLCMs) and ResNet101 deep network. The extracted DST features were processed for feature-level fusion, and thus, the composite feature vector obtained was processed for multiclass classification by using random forest (RF) ensemble classifier. The simulation results confirmed (user) verification accuracy of 99.74%, 98.86%, recall 98.49%, and <i>F</i>-measure 98.67%, signifying its superiority over other state-of-the-arts. The feature learning robustness over the targeted multimetric biometrics confirms its suitability for real-world person authentication tasks.</p>","PeriodicalId":48821,"journal":{"name":"IET Biometrics","volume":"2025 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/bme2/9919250","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145824695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yongze Li, Jing Gu, Ning Li, Bo-Han Li, Xiaoyuan Yu, Zhiyao Liang, Bin Li, Zhen Lei
Unified face attack detection (UAD), which simultaneously addresses physical presentation attacks (PAs) and digital forgery attacks (DAs) with a vision-language model, remains challenging due to the difficulty of effectively separating live and fake cues. The challenges mainly arise from two aspects: (1) text prompts are insufficiently aligned with visual features across layers, and (2) patch tokens containing live and fake cues often overlap, leading to ambiguous attribution and small decision margins. To address these problems, we propose a novel Layer-wise Cue Alignment framework (LCA) that leverages textual features to extract both layer-wise and global cues from patch tokens, and we further introduce a new training strategy to improve the separation of live and fake cues. Specifically, the layer-wise prompts are obtained by the cue matching block (CMB), which matches textual features with patch embeddings at each transformer layer, and the layer-level cues are injected into the visual features of each layer and further aggregated by the cue fusion block (CFB) to form comprehensive prompts that enhance the overall visual representation. Moreover, we design a complementary supervision mechanism (CSM) that suppresses forgery cues in live faces while enforcing mutual exclusivity between live and fake cues in attack samples to improve the reliability of cue separation. Extensive experiments on multiple benchmarks demonstrate that our framework achieves state-of-the-art performance on most protocols of the datasets.
{"title":"Layer-Wise Cue Alignment for Unified Face Attack Detection With Vision-Language Model","authors":"Yongze Li, Jing Gu, Ning Li, Bo-Han Li, Xiaoyuan Yu, Zhiyao Liang, Bin Li, Zhen Lei","doi":"10.1049/bme2/3954107","DOIUrl":"https://doi.org/10.1049/bme2/3954107","url":null,"abstract":"<p>Unified face attack detection (UAD), which simultaneously addresses physical presentation attacks (PAs) and digital forgery attacks (DAs) with a vision-language model, remains challenging due to the difficulty of effectively separating live and fake cues. The challenges mainly arise from two aspects: (1) text prompts are insufficiently aligned with visual features across layers, and (2) patch tokens containing live and fake cues often overlap, leading to ambiguous attribution and small decision margins. To address these problems, we propose a novel Layer-wise Cue Alignment framework (LCA) that leverages textual features to extract both layer-wise and global cues from patch tokens, and we further introduce a new training strategy to improve the separation of live and fake cues. Specifically, the layer-wise prompts are obtained by the cue matching block (CMB), which matches textual features with patch embeddings at each transformer layer, and the layer-level cues are injected into the visual features of each layer and further aggregated by the cue fusion block (CFB) to form comprehensive prompts that enhance the overall visual representation. Moreover, we design a complementary supervision mechanism (CSM) that suppresses forgery cues in live faces while enforcing mutual exclusivity between live and fake cues in attack samples to improve the reliability of cue separation. Extensive experiments on multiple benchmarks demonstrate that our framework achieves state-of-the-art performance on most protocols of the datasets.</p>","PeriodicalId":48821,"journal":{"name":"IET Biometrics","volume":"2025 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/bme2/3954107","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145845946","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recently, electrocardiogram (ECG) signals have garnered significant attention in the field of identity authentication. For identity authentication, the ECG signals collected by wearing smart devices need to be determined whether the signal belongs to an enrolled one. Constrained by the computational efficiency of smart devices in practical scenarios, it is essential to reduce the complexity of the method to lower the computational load. To maintain the accuracy (ACC) of identity authentication, most research efforts rely on both R-wave extraction and segmentation for subsequent authentication. Moreover, many methods constantly require the model to be retrained during the user enrollment stage, leading to performance degradation and waste of training resources. Hence, we propose a simple yet effective ECG identity authentication method that applies blind segmentation and is free from retraining, which greatly simplifies the authentication process. To mitigate the equal error rate (EER) during the verification phase, a combination of AAM-softmax and triplet losses is employed, along with the incorporation of the hard negative mining within batch samples. Extensive experiments demonstrate that our method outperforms competitors by a large margin, e.g., achieving 0.40% EER on the large-scale autonomic dataset. Within models of comparable parameter sizes, our approach demonstrates markedly higher computational efficiency on both CPU and GPU platforms. The source code has been publicly released and is available at: https://github.com/DanMerry/LowEER.
{"title":"Simple yet Effective ECG Identity Authentication With Low EER and Without Retraining","authors":"Mingyu Dong, Zhidong Zhao, Yefei Zhang, Yanjun Deng, Hao Wang, Zhe Ye","doi":"10.1049/bme2/7968221","DOIUrl":"10.1049/bme2/7968221","url":null,"abstract":"<p>Recently, electrocardiogram (ECG) signals have garnered significant attention in the field of identity authentication. For identity authentication, the ECG signals collected by wearing smart devices need to be determined whether the signal belongs to an enrolled one. Constrained by the computational efficiency of smart devices in practical scenarios, it is essential to reduce the complexity of the method to lower the computational load. To maintain the accuracy (ACC) of identity authentication, most research efforts rely on both R-wave extraction and segmentation for subsequent authentication. Moreover, many methods constantly require the model to be retrained during the user enrollment stage, leading to performance degradation and waste of training resources. Hence, we propose a simple yet effective ECG identity authentication method that applies blind segmentation and is free from retraining, which greatly simplifies the authentication process. To mitigate the equal error rate (EER) during the verification phase, a combination of AAM-softmax and triplet losses is employed, along with the incorporation of the hard negative mining within batch samples. Extensive experiments demonstrate that our method outperforms competitors by a large margin, e.g., achieving 0.40% EER on the large-scale autonomic dataset. Within models of comparable parameter sizes, our approach demonstrates markedly higher computational efficiency on both CPU and GPU platforms. The source code has been publicly released and is available at: https://github.com/DanMerry/LowEER.</p>","PeriodicalId":48821,"journal":{"name":"IET Biometrics","volume":"2025 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/bme2/7968221","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145739916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
As security threats continue to evolve, multimodal biometric recognition systems (MBRSs) have emerged as robust solutions for reliable user authentication. To the best of our knowledge, this study presents the first systematic literature review (SLR) specifically focused on MBRS based on physiological traits, combining traditional image processing techniques (e.g., Gabor filters and edge detection) with artificial intelligence (AI) methods. These include machine learning (ML) approaches (e.g., Softmax classifier and linear discriminant analysis), deep learning (DL) models (e.g., convolutional neural networks [CNNs]), and metaheuristic optimization algorithms (e.g., firefly algorithm, gray wolf optimizer [GWO], and GwPeSOA). We analyze and compare the frequency and effectiveness of various fusion levels (sensor, feature, score, and decision) employed in the literature. Our review synthesizes findings from 29 peer-reviewed studies, highlights commonly used biometric traits and databases (e.g., CASIA and IITD), and categorizes the fusion techniques applied at each stage of the biometric pipeline, from preprocessing and feature extraction to decision-making. Results show that score-level fusion remains the most widely adopted approach. Multimodal systems combining multiple physiological traits (e.g., face, iris, and finger vein) demonstrate significant performance gains, with some studies reporting accuracies reaching 100%. Importantly, no prior review has provided such an integrative perspective combining handcrafted techniques with diverse AI–based approaches across multiple fusion levels. This comprehensive synthesis is intended to guide future research toward more practical, scalable, and accurate multimodal biometric systems.
{"title":"Multimodal Biometrics: A Review of Handcrafted and AI–Based Fusion Approaches","authors":"Hind Es-Sobbahi, Mohamed Radouane, Khalid Nafil","doi":"10.1049/bme2/5055434","DOIUrl":"https://doi.org/10.1049/bme2/5055434","url":null,"abstract":"<p>As security threats continue to evolve, multimodal biometric recognition systems (MBRSs) have emerged as robust solutions for reliable user authentication. To the best of our knowledge, this study presents the first systematic literature review (SLR) specifically focused on MBRS based on physiological traits, combining traditional image processing techniques (e.g., Gabor filters and edge detection) with artificial intelligence (AI) methods. These include machine learning (ML) approaches (e.g., Softmax classifier and linear discriminant analysis), deep learning (DL) models (e.g., convolutional neural networks [CNNs]), and metaheuristic optimization algorithms (e.g., firefly algorithm, gray wolf optimizer [GWO], and GwPeSOA). We analyze and compare the frequency and effectiveness of various fusion levels (sensor, feature, score, and decision) employed in the literature. Our review synthesizes findings from 29 peer-reviewed studies, highlights commonly used biometric traits and databases (e.g., CASIA and IITD), and categorizes the fusion techniques applied at each stage of the biometric pipeline, from preprocessing and feature extraction to decision-making. Results show that score-level fusion remains the most widely adopted approach. Multimodal systems combining multiple physiological traits (e.g., face, iris, and finger vein) demonstrate significant performance gains, with some studies reporting accuracies reaching 100%. Importantly, no prior review has provided such an integrative perspective combining handcrafted techniques with diverse AI–based approaches across multiple fusion levels. This comprehensive synthesis is intended to guide future research toward more practical, scalable, and accurate multimodal biometric systems.</p>","PeriodicalId":48821,"journal":{"name":"IET Biometrics","volume":"2025 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/bme2/5055434","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145406982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}