Pub Date : 2026-01-26DOI: 10.1109/TBIOM.2026.3652264
{"title":"IEEE Transactions on Biometrics, Behavior, and Identity Science Information for Authors","authors":"","doi":"10.1109/TBIOM.2026.3652264","DOIUrl":"https://doi.org/10.1109/TBIOM.2026.3652264","url":null,"abstract":"","PeriodicalId":73307,"journal":{"name":"IEEE transactions on biometrics, behavior, and identity science","volume":"8 1","pages":"C3-C3"},"PeriodicalIF":5.0,"publicationDate":"2026-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11364043","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146045359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-26DOI: 10.1109/TBIOM.2026.3652243
{"title":"IEEE Transactions on Biometrics, Behavior, and Identity Science Publication Information","authors":"","doi":"10.1109/TBIOM.2026.3652243","DOIUrl":"https://doi.org/10.1109/TBIOM.2026.3652243","url":null,"abstract":"","PeriodicalId":73307,"journal":{"name":"IEEE transactions on biometrics, behavior, and identity science","volume":"8 1","pages":"C2-C2"},"PeriodicalIF":5.0,"publicationDate":"2026-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11364035","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146045356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the rapid proliferation of face recognition systems, the risk of privacy leakage from facial images has become a pressing concern. Applications such as Find Face and Social Mapper can readily expose an individual’s identity without consent. Existing anonymization approaches partially address the problem: synthesis and fusion-based methods suppress identity but often distort facial attributes, reducing data utility, while adversarial perturbation methods improve generalizability across recognition models but rely on fixed $mathcal {L}_{1}$ or $mathcal {L}_{2}$ norms, leading to underfitting or overfitting. As a result, no single method jointly satisfies the three essential properties of effective anonymization: privacy, data utility, and generalizability. To address these limitations, we present a novel Privacy-Preserving Identity Anonymization (PrIdentity) algorithm that anonymizes the identity of a given image while preserving privacy. Our approach learns adversarial perturbations through an $mathcal {L}_{p}$ norm based regularization technique, maintaining a balance between privacy and data utility. Furthermore, we ensure the anonymized images generalize effectively across different unseen face recognition models. To the best of our knowledge, this is the first work to introduce a learnable $p$ parameter in the $mathcal {L}_{p}$ norm for privacy-preservation. We evaluate PrIdentity on the LFW, CelebA, and CelebA-HQ datasets across multiple face recognition architectures, complemented by a user study on both original and anonymized images. The results demonstrate that our algorithm effectively conceals identities while preserving visual appearance, achieving state-of-the-art performance in identity anonymization. We also carry out bounding box distance prediction experiments to validate data utility, attaining a state-of-the-art Euclidean distance of 2.65, which is 1.17 lower than the second-best method.
{"title":"PrIdentity: Generalizable Privacy-Preserving Adversarial Perturbations for Anonymizing Facial Identity","authors":"Saheb Chhabra;Kartik Thakral;Richa Singh;Mayank Vatsa","doi":"10.1109/TBIOM.2025.3625986","DOIUrl":"https://doi.org/10.1109/TBIOM.2025.3625986","url":null,"abstract":"With the rapid proliferation of face recognition systems, the risk of privacy leakage from facial images has become a pressing concern. Applications such as Find Face and Social Mapper can readily expose an individual’s identity without consent. Existing anonymization approaches partially address the problem: synthesis and fusion-based methods suppress identity but often distort facial attributes, reducing data utility, while adversarial perturbation methods improve generalizability across recognition models but rely on fixed <inline-formula> <tex-math>$mathcal {L}_{1}$ </tex-math></inline-formula> or <inline-formula> <tex-math>$mathcal {L}_{2}$ </tex-math></inline-formula> norms, leading to underfitting or overfitting. As a result, no single method jointly satisfies the three essential properties of effective anonymization: privacy, data utility, and generalizability. To address these limitations, we present a novel Privacy-Preserving Identity Anonymization (PrIdentity) algorithm that anonymizes the identity of a given image while preserving privacy. Our approach learns adversarial perturbations through an <inline-formula> <tex-math>$mathcal {L}_{p}$ </tex-math></inline-formula> norm based regularization technique, maintaining a balance between privacy and data utility. Furthermore, we ensure the anonymized images generalize effectively across different unseen face recognition models. To the best of our knowledge, this is the first work to introduce a learnable <inline-formula> <tex-math>$p$ </tex-math></inline-formula> parameter in the <inline-formula> <tex-math>$mathcal {L}_{p}$ </tex-math></inline-formula> norm for privacy-preservation. We evaluate PrIdentity on the LFW, CelebA, and CelebA-HQ datasets across multiple face recognition architectures, complemented by a user study on both original and anonymized images. The results demonstrate that our algorithm effectively conceals identities while preserving visual appearance, achieving state-of-the-art performance in identity anonymization. We also carry out bounding box distance prediction experiments to validate data utility, attaining a state-of-the-art Euclidean distance of 2.65, which is 1.17 lower than the second-best method.","PeriodicalId":73307,"journal":{"name":"IEEE transactions on biometrics, behavior, and identity science","volume":"8 1","pages":"137-151"},"PeriodicalIF":5.0,"publicationDate":"2025-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146045313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-10DOI: 10.1109/TBIOM.2025.3618315
{"title":"2025 Index IEEE Transactions on Biometrics, Behavior, and Identity Science","authors":"","doi":"10.1109/TBIOM.2025.3618315","DOIUrl":"https://doi.org/10.1109/TBIOM.2025.3618315","url":null,"abstract":"","PeriodicalId":73307,"journal":{"name":"IEEE transactions on biometrics, behavior, and identity science","volume":"7 4","pages":"953-970"},"PeriodicalIF":5.0,"publicationDate":"2025-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11199364","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145255922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-03DOI: 10.1109/TBIOM.2025.3617011
K. Nguyen;Feng Liu;C. Fookes;S. Sridharan;Xiaoming Liu;Arun Ross
The rapid emergence of airborne platforms and imaging sensors is enabling new forms of aerial surveillance due to their unprecedented advantages in scale, mobility, deployment, and covert observation capabilities. This article provides a comprehensive overview of 150+ papers over the last 10 years of human-centric aerial surveillance tasks from a computer vision and machine learning perspective. It aims to provide readers with an in-depth systematic review and technical analysis of the current state of aerial surveillance tasks using drones, UAVs, and other airborne platforms. The object of interest is humans, where human subjects are to be detected, identified, and re-identified. More specifically, for each of these tasks, we first identify unique challenges in performing these tasks in an aerial setting compared to the popular ground-based setting and subsequently compile and analyze aerial datasets publicly available for each task. Most importantly, we delve deep into the approaches in the aerial surveillance literature with a focus on investigating how they presently address aerial challenges and techniques for improvement. We conclude the paper by discussing the gaps and open research questions to inform future research avenues.
{"title":"Person Recognition in Aerial Surveillance: A Decade Survey","authors":"K. Nguyen;Feng Liu;C. Fookes;S. Sridharan;Xiaoming Liu;Arun Ross","doi":"10.1109/TBIOM.2025.3617011","DOIUrl":"https://doi.org/10.1109/TBIOM.2025.3617011","url":null,"abstract":"The rapid emergence of airborne platforms and imaging sensors is enabling new forms of aerial surveillance due to their unprecedented advantages in scale, mobility, deployment, and covert observation capabilities. This article provides a comprehensive overview of 150+ papers over the last 10 years of human-centric aerial surveillance tasks from a computer vision and machine learning perspective. It aims to provide readers with an in-depth systematic review and technical analysis of the current state of aerial surveillance tasks using drones, UAVs, and other airborne platforms. The object of interest is humans, where human subjects are to be detected, identified, and re-identified. More specifically, for each of these tasks, we first identify unique challenges in performing these tasks in an aerial setting compared to the popular ground-based setting and subsequently compile and analyze aerial datasets publicly available for each task. Most importantly, we delve deep into the approaches in the aerial surveillance literature with a focus on investigating how they presently address aerial challenges and techniques for improvement. We conclude the paper by discussing the gaps and open research questions to inform future research avenues.","PeriodicalId":73307,"journal":{"name":"IEEE transactions on biometrics, behavior, and identity science","volume":"8 1","pages":"3-19"},"PeriodicalIF":5.0,"publicationDate":"2025-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146045303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Face recognition systems are increasingly vulnerable to presentation attacks, wherein adversaries employ artifacts such as printed photographs, replayed videos, or 3D masks, etc, to impersonate genuine users. Despite significant progress in face anti-spoofing, most existing methods exhibit poor generalization when confronted with unseen attack types or domain shifts such as changes in sensors, environments, or acquisition protocols, thereby limiting their robustness in real-world applications. We propose the Gradient-Reversed Domain-Generalizable Multi-Modal Cross-Attention Network (GR-DXNet) that enhances resilience against previously unseen attacks and enables seamless adaptation across diverse domains. GR-DXNet employs dual-modality learning by fusing RGB frames, which capture fine-grained texture, with depth maps that reveal 3D structural cues. Temporal Convolutional Networks (TCNs) are integrated to model motion-based inconsistencies, improving the detection of dynamic emerging spoof patterns. To enhance cross-modal representation, a query-key-value-based cross-attention mechanism is introduced, enabling effective alignment and fusion of RGB and depth features. Furthermore, a post-fusion Gradient Reversal Layer (GRL) is employed that adversarially aligns cross-modal embeddings to suppress domain-specific bias without handcrafted augmentations or complex disentanglement, encouraging the model to learn domain-invariant features and strengthens generalization under unseen domains. Extensive evaluations on benchmark datasets across both intra- and cross-dataset protocols, offering a reliable solution for real-world face spoof detection.
{"title":"Gradient-Reversed Domain-Generalizable Multi-Modal Cross-Attention Network for Robust Face Anti-Spoofing","authors":"Koyya Deepthi Krishna Yadav;Ilaiah Kavati;Ramalingaswamy Cheruku","doi":"10.1109/TBIOM.2025.3616651","DOIUrl":"https://doi.org/10.1109/TBIOM.2025.3616651","url":null,"abstract":"Face recognition systems are increasingly vulnerable to presentation attacks, wherein adversaries employ artifacts such as printed photographs, replayed videos, or 3D masks, etc, to impersonate genuine users. Despite significant progress in face anti-spoofing, most existing methods exhibit poor generalization when confronted with unseen attack types or domain shifts such as changes in sensors, environments, or acquisition protocols, thereby limiting their robustness in real-world applications. We propose the Gradient-Reversed Domain-Generalizable Multi-Modal Cross-Attention Network (GR-DXNet) that enhances resilience against previously unseen attacks and enables seamless adaptation across diverse domains. GR-DXNet employs dual-modality learning by fusing RGB frames, which capture fine-grained texture, with depth maps that reveal 3D structural cues. Temporal Convolutional Networks (TCNs) are integrated to model motion-based inconsistencies, improving the detection of dynamic emerging spoof patterns. To enhance cross-modal representation, a query-key-value-based cross-attention mechanism is introduced, enabling effective alignment and fusion of RGB and depth features. Furthermore, a post-fusion Gradient Reversal Layer (GRL) is employed that adversarially aligns cross-modal embeddings to suppress domain-specific bias without handcrafted augmentations or complex disentanglement, encouraging the model to learn domain-invariant features and strengthens generalization under unseen domains. Extensive evaluations on benchmark datasets across both intra- and cross-dataset protocols, offering a reliable solution for real-world face spoof detection.","PeriodicalId":73307,"journal":{"name":"IEEE transactions on biometrics, behavior, and identity science","volume":"8 1","pages":"84-98"},"PeriodicalIF":5.0,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146045306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-30DOI: 10.1109/TBIOM.2025.3615961
Wuyang Chen;Kele Xu;Yanjie Sun;Yong Dou;Huaimin Wang
The human voice carries valuable cues about an individual’s identity and emotions. A more intriguing question emerges: can one’s facial appearance be deduced from their voice alone? Existing efforts have primarily focused on exploring the relationship between natural audio and visual data, with limited attention given to the specific biometric domain of speaker-voice and face correlation. This study seeks to model the facial-related information embedded within the voice and ultimately predict an unknown person’s appearance solely based on their unheard voice. This task presents several challenges: firstly, while natural sounds exhibit significant variability, human voices often share similar frequencies, complicating the establishment of mappings between them. Secondly, generating faces from a voice presents an ill-posed problem, as details such as makeup and heap pose cannot be inferred from voice alone. In this article, we introduce a novel framework named Voice2Visage, designed to tackle this task by leveraging self-supervised cross-modal and intra-modal learning to predict faces corresponding to input voice. To ensure the feasibility of our method, we optimize existing algorithms in automated dataset collection. Additionally, we systematically design experiments to test the usability and stability of commonly used quantitative metrics in the field of facial identity comparison. The results validate the close semantic association between the generated face and the reference one, showcasing its reliability. Our work provides a fresh perspective on exploring the depth of physiological characteristics concealed within human voices and the intricate interplay between appearance and voice. Our code is available at https://github.com/colaudiolab/Voice2Visage.
{"title":"Voice2Visage: Deciphering Faces From Voices","authors":"Wuyang Chen;Kele Xu;Yanjie Sun;Yong Dou;Huaimin Wang","doi":"10.1109/TBIOM.2025.3615961","DOIUrl":"https://doi.org/10.1109/TBIOM.2025.3615961","url":null,"abstract":"The human voice carries valuable cues about an individual’s identity and emotions. A more intriguing question emerges: can one’s facial appearance be deduced from their voice alone? Existing efforts have primarily focused on exploring the relationship between natural audio and visual data, with limited attention given to the specific biometric domain of speaker-voice and face correlation. This study seeks to model the facial-related information embedded within the voice and ultimately predict an unknown person’s appearance solely based on their unheard voice. This task presents several challenges: firstly, while natural sounds exhibit significant variability, human voices often share similar frequencies, complicating the establishment of mappings between them. Secondly, generating faces from a voice presents an ill-posed problem, as details such as makeup and heap pose cannot be inferred from voice alone. In this article, we introduce a novel framework named Voice2Visage, designed to tackle this task by leveraging self-supervised cross-modal and intra-modal learning to predict faces corresponding to input voice. To ensure the feasibility of our method, we optimize existing algorithms in automated dataset collection. Additionally, we systematically design experiments to test the usability and stability of commonly used quantitative metrics in the field of facial identity comparison. The results validate the close semantic association between the generated face and the reference one, showcasing its reliability. Our work provides a fresh perspective on exploring the depth of physiological characteristics concealed within human voices and the intricate interplay between appearance and voice. Our code is available at <uri>https://github.com/colaudiolab/Voice2Visage</uri>.","PeriodicalId":73307,"journal":{"name":"IEEE transactions on biometrics, behavior, and identity science","volume":"8 1","pages":"111-121"},"PeriodicalIF":5.0,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146045358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-26DOI: 10.1109/TBIOM.2025.3614578
Lifang Zhou;Miaomiao Chen;Hangsheng Ruan
Existing face forgery detection methods often overfit to known forgery patterns, resulting in limited generalization to unseen manipulations. To address this issue, we propose a Structural Consistency method for Face Forgery Detection via Frequency Domain Enhancement and Self-predictive Learning, which embeds frequency-domain features into the spatial representation and leverages the structural consistency inherent in genuine facial contexts to provide more discriminative cues for forgery identification. Specifically, we design a data augmentation module to extract the frequency information through the Discrete Cosine Transform (DCT) and enhance it using a Frequency Domain Enhancement Module (FEM) to capture subtle forgery artifacts. Furthermore, we design a Self-Prediction Learning Module (SPLM) that reconstructs the occluded central region of a face by exploiting the structural consistency of real facial features. To further guide the learning process, we define a self-predictive reconstruction loss that minimizes the prediction error in the occluded region and helps reinforce structural consistency. Moreover, we propose a Reconstruction Difference Guidance (RDG) module, which explicitly emphasizes potential forgery regions by computing pixel-wise discrepancies between the reconstructed image and the original input. This process produces an attention map that guides the classifier to focus on semantically inconsistent or anomalous regions. Experimental results demonstrate that our method achieves superior generalization and robustness across diverse datasets.
{"title":"Structural Consistency for Face Forgery Detection via Frequency Domain Enhancement and Self-Predictive Learning","authors":"Lifang Zhou;Miaomiao Chen;Hangsheng Ruan","doi":"10.1109/TBIOM.2025.3614578","DOIUrl":"https://doi.org/10.1109/TBIOM.2025.3614578","url":null,"abstract":"Existing face forgery detection methods often overfit to known forgery patterns, resulting in limited generalization to unseen manipulations. To address this issue, we propose a Structural Consistency method for Face Forgery Detection via Frequency Domain Enhancement and Self-predictive Learning, which embeds frequency-domain features into the spatial representation and leverages the structural consistency inherent in genuine facial contexts to provide more discriminative cues for forgery identification. Specifically, we design a data augmentation module to extract the frequency information through the Discrete Cosine Transform (DCT) and enhance it using a Frequency Domain Enhancement Module (FEM) to capture subtle forgery artifacts. Furthermore, we design a Self-Prediction Learning Module (SPLM) that reconstructs the occluded central region of a face by exploiting the structural consistency of real facial features. To further guide the learning process, we define a self-predictive reconstruction loss that minimizes the prediction error in the occluded region and helps reinforce structural consistency. Moreover, we propose a Reconstruction Difference Guidance (RDG) module, which explicitly emphasizes potential forgery regions by computing pixel-wise discrepancies between the reconstructed image and the original input. This process produces an attention map that guides the classifier to focus on semantically inconsistent or anomalous regions. Experimental results demonstrate that our method achieves superior generalization and robustness across diverse datasets.","PeriodicalId":73307,"journal":{"name":"IEEE transactions on biometrics, behavior, and identity science","volume":"8 1","pages":"99-110"},"PeriodicalIF":5.0,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146045300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-25DOI: 10.1109/TBIOM.2025.3607046
{"title":"IEEE Transactions on Biometrics, Behavior, and Identity Science Information for Authors","authors":"","doi":"10.1109/TBIOM.2025.3607046","DOIUrl":"https://doi.org/10.1109/TBIOM.2025.3607046","url":null,"abstract":"","PeriodicalId":73307,"journal":{"name":"IEEE transactions on biometrics, behavior, and identity science","volume":"7 4","pages":"C3-C3"},"PeriodicalIF":5.0,"publicationDate":"2025-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11180156","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145134932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-23DOI: 10.1109/TBIOM.2025.3613586
Hanlin Li;Wanquan Liu;Chenqiang Gao;Ping Wang;Huafeng Wang
Gait recognition, a promising biometric technique, faces significant challenges in unconstrained in-the-wild scenarios. While spatial modeling has progressed, existing state-of-the-art methods fundamentally struggle with temporal variations due to their reliance on strategies developed for constrained environments, limiting their effectiveness in diverse real-world conditions. To overcome this critical bottleneck, we propose GaitMspT, a novel Multi-scale and Multi-perspective Temporal Learning Network engineered for robust unconstrained gait recognition. GaitMspT introduces two key modules: a Multi-scale Temporal Extraction (MsTE) module that captures diverse temporal features across three distinct scales, effectively mitigating issues like gait contour occlusion; and a Multi-perspective Spatial-Temporal Extraction (MpSTE) module that extracts nuanced horizontal and vertical gait variations, emphasizing salient components. Their synergistic integration endows our network with significantly enhanced temporal modeling capabilities. Extensive experiments on four prominent in-the-wild gait datasets (Gait3D, GREW, CCPG, and SUSTech1K) unequivocally demonstrate that GaitMspT substantially outperforms existing state-of-the-art methods, achieving superior recognition accuracy while maintaining an excellent balance between performance and computational complexity.
{"title":"GaitMspT: A Novel Multi-Scale and Multi-Perspective Temporal Learning Network for Gait Recognition in the Wild","authors":"Hanlin Li;Wanquan Liu;Chenqiang Gao;Ping Wang;Huafeng Wang","doi":"10.1109/TBIOM.2025.3613586","DOIUrl":"https://doi.org/10.1109/TBIOM.2025.3613586","url":null,"abstract":"Gait recognition, a promising biometric technique, faces significant challenges in unconstrained in-the-wild scenarios. While spatial modeling has progressed, existing state-of-the-art methods fundamentally struggle with temporal variations due to their reliance on strategies developed for constrained environments, limiting their effectiveness in diverse real-world conditions. To overcome this critical bottleneck, we propose GaitMspT, a novel Multi-scale and Multi-perspective Temporal Learning Network engineered for robust unconstrained gait recognition. GaitMspT introduces two key modules: a Multi-scale Temporal Extraction (MsTE) module that captures diverse temporal features across three distinct scales, effectively mitigating issues like gait contour occlusion; and a Multi-perspective Spatial-Temporal Extraction (MpSTE) module that extracts nuanced horizontal and vertical gait variations, emphasizing salient components. Their synergistic integration endows our network with significantly enhanced temporal modeling capabilities. Extensive experiments on four prominent in-the-wild gait datasets (Gait3D, GREW, CCPG, and SUSTech1K) unequivocally demonstrate that GaitMspT substantially outperforms existing state-of-the-art methods, achieving superior recognition accuracy while maintaining an excellent balance between performance and computational complexity.","PeriodicalId":73307,"journal":{"name":"IEEE transactions on biometrics, behavior, and identity science","volume":"8 1","pages":"71-83"},"PeriodicalIF":5.0,"publicationDate":"2025-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146045310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}