We propose original semantic labels for detailed face parsing to improve the accuracy of face recognition by focusing on parts in a face. The part labels used in conventional face parsing are defined based on biological features, and thus, one label is given to a large region, such as skin. Our semantic labels are defined by separating parts with large areas based on the structure of the face and considering the left and right sides for all parts to consider head pose changes, occlusion, and other factors. By utilizing the capability of assigning detailed part labels to face images, we propose a novel data augmentation method based on detailed face parsing called Face Semantic Erasing (FSErasing) to improve the performance of face recognition. FSErasing is to randomly mask a part of the face image based on the detailed part labels, and therefore, we can apply erasing-type data augmentation to the face image that considers the characteristics of the face. Through experiments using public face image datasets, we demonstrate that FSErasing is effective for improving the performance of face recognition and face attribute estimation. In face recognition, adding FSErasing in training ResNet-34 with Softmax using CelebA improves the average accuracy by 0.354 points and the average equal error rate (EER) by 0.312 points, and with ArcFace, the average accuracy and EER improve by 0.752 and 0.802 points, respectively. ResNet-50 with Softmax using CASIA-WebFace improves the average accuracy by 0.442 points and the average EER by 0.452 points, and with ArcFace, the average accuracy and EER improve by 0.228 points and 0.500 points, respectively. In face attribute estimation, adding FSErasing as a data augmentation method in training with CelebA improves the estimation accuracy by 0.54 points. We also apply our detailed face parsing model to visualize face recognition models and demonstrate its higher explainability than general visualization methods.
{"title":"FSErasing: Improving Face Recognition with Data Augmentation Using Face Parsing","authors":"Hiroya Kawai, Koichi Ito, Hwann-Tzong Chen, Takafumi Aoki","doi":"10.1049/2024/6663315","DOIUrl":"10.1049/2024/6663315","url":null,"abstract":"<p>We propose original semantic labels for detailed face parsing to improve the accuracy of face recognition by focusing on parts in a face. The part labels used in conventional face parsing are defined based on biological features, and thus, one label is given to a large region, such as skin. Our semantic labels are defined by separating parts with large areas based on the structure of the face and considering the left and right sides for all parts to consider head pose changes, occlusion, and other factors. By utilizing the capability of assigning detailed part labels to face images, we propose a novel data augmentation method based on detailed face parsing called Face Semantic Erasing (FSErasing) to improve the performance of face recognition. FSErasing is to randomly mask a part of the face image based on the detailed part labels, and therefore, we can apply erasing-type data augmentation to the face image that considers the characteristics of the face. Through experiments using public face image datasets, we demonstrate that FSErasing is effective for improving the performance of face recognition and face attribute estimation. In face recognition, adding FSErasing in training ResNet-34 with Softmax using CelebA improves the average accuracy by 0.354 points and the average equal error rate (EER) by 0.312 points, and with ArcFace, the average accuracy and EER improve by 0.752 and 0.802 points, respectively. ResNet-50 with Softmax using CASIA-WebFace improves the average accuracy by 0.442 points and the average EER by 0.452 points, and with ArcFace, the average accuracy and EER improve by 0.228 points and 0.500 points, respectively. In face attribute estimation, adding FSErasing as a data augmentation method in training with CelebA improves the estimation accuracy by 0.54 points. We also apply our detailed face parsing model to visualize face recognition models and demonstrate its higher explainability than general visualization methods.</p>","PeriodicalId":48821,"journal":{"name":"IET Biometrics","volume":"2024 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/2024/6663315","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141308918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Huimin She, Yongjian Hu, Beibei Liu, Chang-Tsun Li
Identity-based Deepfake detection methods have the potential to improve the generalization, robustness, and interpretability of the model. However, current identity-based methods either require a reference or can only be used to detect face replacement but not face reenactment. In this paper, we propose a novel Deepfake video detection approach based on identity anomalies. We observe two types of identity anomalies: the inconsistency between clip-level static ID (facial appearance) and clip-level dynamic ID (facial behavior) and the temporal inconsistency of image-level static IDs. Since these two types of anomalies can be detected through self-consistency and do not depend on the manipulation type, our method is a reference-free and manipulation-independent approach. Specifically, our detection network consists of two branches: the static–dynamic ID discrepancy detection branch for the inconsistency between dynamic and static ID and the temporal static ID anomaly detection branch for the temporal anomaly of static ID. We combine the outputs of the two branches by weighted averaging to obtain the final detection result. We also designed two loss functions: the static–dynamic ID matching loss and the dynamic ID constraint loss, to enhance the representation and discriminability of dynamic ID. We conduct experiments on four benchmark datasets and compare our method with the state-of-the-art methods. Results show that our method can detect not only face replacement but also face reenactment, and also has better detection performance over the state-of-the-art methods on unknown datasets. It also has superior robustness against compression. Identity-based features provide a good explanation of the detection results.
基于身份的 Deepfake 检测方法有可能提高模型的通用性、鲁棒性和可解释性。然而,目前基于身份的方法要么需要参照物,要么只能用于检测人脸替换,而不能检测人脸重现。在本文中,我们提出了一种基于身份异常的新型 Deepfake 视频检测方法。我们观察到两类身份异常:片段级静态 ID(面部外观)和片段级动态 ID(面部行为)之间的不一致性,以及图像级静态 ID 的时间不一致性。由于这两类异常可以通过自洽性检测出来,并且不依赖于操作类型,因此我们的方法是一种无参照、不依赖于操作的方法。具体来说,我们的检测网络由两个分支组成:静态-动态 ID 差异检测分支,用于检测动态 ID 和静态 ID 之间的不一致;时间静态 ID 异常检测分支,用于检测静态 ID 的时间异常。我们通过加权平均的方式将两个分支的输出结果合并,得到最终的检测结果。我们还设计了两个损失函数:静态-动态 ID 匹配损失和动态 ID 约束损失,以增强动态 ID 的代表性和可辨别性。我们在四个基准数据集上进行了实验,并将我们的方法与最先进的方法进行了比较。结果表明,我们的方法不仅能检测到人脸替换,还能检测到人脸重演,而且在未知数据集上的检测性能优于最先进的方法。此外,该方法还具有卓越的抗压缩鲁棒性。基于身份的特征很好地解释了检测结果。
{"title":"Exploring Static–Dynamic ID Matching and Temporal Static ID Inconsistency for Generalizable Deepfake Detection","authors":"Huimin She, Yongjian Hu, Beibei Liu, Chang-Tsun Li","doi":"10.1049/2024/2280143","DOIUrl":"10.1049/2024/2280143","url":null,"abstract":"<p>Identity-based Deepfake detection methods have the potential to improve the generalization, robustness, and interpretability of the model. However, current identity-based methods either require a reference or can only be used to detect face replacement but not face reenactment. In this paper, we propose a novel Deepfake video detection approach based on identity anomalies. We observe two types of identity anomalies: the inconsistency between clip-level static ID (facial appearance) and clip-level dynamic ID (facial behavior) and the temporal inconsistency of image-level static IDs. Since these two types of anomalies can be detected through self-consistency and do not depend on the manipulation type, our method is a reference-free and manipulation-independent approach. Specifically, our detection network consists of two branches: the static–dynamic ID discrepancy detection branch for the inconsistency between dynamic and static ID and the temporal static ID anomaly detection branch for the temporal anomaly of static ID. We combine the outputs of the two branches by weighted averaging to obtain the final detection result. We also designed two loss functions: the static–dynamic ID matching loss and the dynamic ID constraint loss, to enhance the representation and discriminability of dynamic ID. We conduct experiments on four benchmark datasets and compare our method with the state-of-the-art methods. Results show that our method can detect not only face replacement but also face reenactment, and also has better detection performance over the state-of-the-art methods on unknown datasets. It also has superior robustness against compression. Identity-based features provide a good explanation of the detection results.</p>","PeriodicalId":48821,"journal":{"name":"IET Biometrics","volume":"2024 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/2024/2280143","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141298409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hengnian Qi, Gang Zeng, Keke Jia, Chu Zhang, Xiaoping Wu, Mengxia Li, Qing Lang, Lingxuan Wang
The quality of people’s lives is closely related to their emotional state. Positive emotions can boost confidence and help overcome difficulties, while negative emotions can harm both physical and mental health. Research has shown that people’s handwriting is associated with their emotions. In this study, audio-visual media were used to induce emotions, and a dot-matrix digital pen was used to collect neutral text data written by participants in three emotional states: calm, happy, and sad. To address the challenge of limited samples, a novel conditional table generative adversarial network called conditional tabular-generative adversarial network (CTAB-GAN) was used to increase the number of task samples, and the recognition accuracy of task samples improved by 4.18%. The TabNet (a neural network designed for tabular data) with SimAM (a simple, parameter-free attention module) was employed and compared with the original TabNet and traditional machine learning models; the incorporation of the SimAm attention mechanism led to a 1.35% improvement in classification accuracy. Experimental results revealed significant differences between negative (sad) and nonnegative (calm and happy) emotions, with a recognition accuracy of 80.67%. Overall, this study demonstrated the feasibility of emotion recognition based on handwriting with the assistance of CTAB-GAN and SimAm-TabNet. It provides guidance for further research on emotion recognition or other handwriting-based applications.
{"title":"Emotion Recognition Based on Handwriting Using Generative Adversarial Networks and Deep Learning","authors":"Hengnian Qi, Gang Zeng, Keke Jia, Chu Zhang, Xiaoping Wu, Mengxia Li, Qing Lang, Lingxuan Wang","doi":"10.1049/2024/5351588","DOIUrl":"10.1049/2024/5351588","url":null,"abstract":"<p>The quality of people’s lives is closely related to their emotional state. Positive emotions can boost confidence and help overcome difficulties, while negative emotions can harm both physical and mental health. Research has shown that people’s handwriting is associated with their emotions. In this study, audio-visual media were used to induce emotions, and a dot-matrix digital pen was used to collect neutral text data written by participants in three emotional states: calm, happy, and sad. To address the challenge of limited samples, a novel conditional table generative adversarial network called conditional tabular-generative adversarial network (CTAB-GAN) was used to increase the number of task samples, and the recognition accuracy of task samples improved by 4.18%. The TabNet (a neural network designed for tabular data) with SimAM (a simple, parameter-free attention module) was employed and compared with the original TabNet and traditional machine learning models; the incorporation of the SimAm attention mechanism led to a 1.35% improvement in classification accuracy. Experimental results revealed significant differences between negative (sad) and nonnegative (calm and happy) emotions, with a recognition accuracy of 80.67%. Overall, this study demonstrated the feasibility of emotion recognition based on handwriting with the assistance of CTAB-GAN and SimAm-TabNet. It provides guidance for further research on emotion recognition or other handwriting-based applications.</p>","PeriodicalId":48821,"journal":{"name":"IET Biometrics","volume":"2024 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/2024/5351588","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141246105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tuğçe Arıcan, Raymond Veldhuis, Luuk Spreeuwers, Loïc Bergeron, Christoph Busch, Ehsaneddin Jalilian, Christof Kauba, Simon Kirchgasser, Sébastien Marcel, Bernhard Prommegger, Kiran Raja, Raghavendra Ramachandra, Andreas Uhl
Finger vein recognition is gaining popularity in the field of biometrics, yet the inter-operability of finger vein patterns has received limited attention. This study aims to fill this gap by introducing a cross-device finger vein dataset and evaluating the performance of finger vein recognition across devices using a classical method, a convolutional neural network, and our proposed patch-based convolutional auto-encoder (CAE). The findings emphasise the importance of standardisation of finger vein recognition, similar to that of fingerprints or irises, crucial for achieving inter-operability. Despite the inherent challenges of cross-device recognition, the proposed CAE architecture in this study demonstrates promising results in finger vein recognition, particularly in the context of cross-device comparisons.
{"title":"A Comparative Study of Cross-Device Finger Vein Recognition Using Classical and Deep Learning Approaches","authors":"Tuğçe Arıcan, Raymond Veldhuis, Luuk Spreeuwers, Loïc Bergeron, Christoph Busch, Ehsaneddin Jalilian, Christof Kauba, Simon Kirchgasser, Sébastien Marcel, Bernhard Prommegger, Kiran Raja, Raghavendra Ramachandra, Andreas Uhl","doi":"10.1049/2024/3236602","DOIUrl":"10.1049/2024/3236602","url":null,"abstract":"<p>Finger vein recognition is gaining popularity in the field of biometrics, yet the inter-operability of finger vein patterns has received limited attention. This study aims to fill this gap by introducing a cross-device finger vein dataset and evaluating the performance of finger vein recognition across devices using a classical method, a convolutional neural network, and our proposed patch-based convolutional auto-encoder (CAE). The findings emphasise the importance of standardisation of finger vein recognition, similar to that of fingerprints or irises, crucial for achieving inter-operability. Despite the inherent challenges of cross-device recognition, the proposed CAE architecture in this study demonstrates promising results in finger vein recognition, particularly in the context of cross-device comparisons.</p>","PeriodicalId":48821,"journal":{"name":"IET Biometrics","volume":"2024 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/2024/3236602","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140381478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Text-independent speaker verification (TI-SV) is a crucial task in speaker recognition, as it involves verifying an individual’s claimed identity from speech of arbitrary content without any human intervention. The target for TI-SV is to design a discriminative network to learn deep speaker embedding for speaker idiosyncrasy. In this paper, we propose a deep speaker embedding learning approach of a hybrid deep neural network (DNN) for TI-SV in FM broadcasting. Not only acoustic features are utilized, but also phoneme features are introduced as prior knowledge to collectively learn deep speaker embedding. The hybrid DNN consists of a convolutional neural network architecture for generating acoustic features and a multilayer perceptron architecture for extracting phoneme features sequentially, which represent significant pronunciation attributes. The extracted acoustic and phoneme features are concatenated to form deep embedding descriptors for speaker identity. The hybrid DNN demonstrates not only the complementarity between acoustic and phoneme features but also the temporality of phoneme features in a sequence. Our experiments show that the hybrid DNN outperforms existing methods and delivers a remarkable performance in FM broadcasting TI-SV.
{"title":"Learning Deep Embedding with Acoustic and Phoneme Features for Speaker Recognition in FM Broadcasting","authors":"Xiao Li, Xiao Chen, Rui Fu, Xiao Hu, Mintong Chen, Kun Niu","doi":"10.1049/2024/6694481","DOIUrl":"10.1049/2024/6694481","url":null,"abstract":"<p>Text-independent speaker verification (TI-SV) is a crucial task in speaker recognition, as it involves verifying an individual’s claimed identity from speech of arbitrary content without any human intervention. The target for TI-SV is to design a discriminative network to learn deep speaker embedding for speaker idiosyncrasy. In this paper, we propose a deep speaker embedding learning approach of a hybrid deep neural network (DNN) for TI-SV in FM broadcasting. Not only acoustic features are utilized, but also phoneme features are introduced as prior knowledge to collectively learn deep speaker embedding. The hybrid DNN consists of a convolutional neural network architecture for generating acoustic features and a multilayer perceptron architecture for extracting phoneme features sequentially, which represent significant pronunciation attributes. The extracted acoustic and phoneme features are concatenated to form deep embedding descriptors for speaker identity. The hybrid DNN demonstrates not only the complementarity between acoustic and phoneme features but also the temporality of phoneme features in a sequence. Our experiments show that the hybrid DNN outperforms existing methods and delivers a remarkable performance in FM broadcasting TI-SV.</p>","PeriodicalId":48821,"journal":{"name":"IET Biometrics","volume":"2024 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/2024/6694481","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140220402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jascha Kolberg, Yannik Schäfer, Christian Rathgeb, Christoph Busch
With the rise of deep neural networks, the performance of biometric systems has increased tremendously. Biometric systems for face recognition are now used in everyday life, e.g., border control, crime prevention, or personal device access control. Although the accuracy of face recognition systems is generally high, they are not without flaws. Many biometric systems have been found to exhibit demographic bias, resulting in different demographic groups being not recognized with the same accuracy. This is especially true for facial recognition due to demographic factors, e.g., gender and skin color. While many previous works already reported demographic bias, this work aims to reduce demographic bias for biometric face recognition applications. In this regard, 12 face recognition systems are benchmarked regarding biometric recognition performance as well as demographic differentials, i.e., fairness. Subsequently, multiple fusion techniques are applied with the goal to improve the fairness in contrast to single systems. The experimental results show that it is possible to improve the fairness regarding single demographics, e.g., skin color or gender, while improving fairness for demographic subgroups turns out to be more challenging.
{"title":"On the Potential of Algorithm Fusion for Demographic Bias Mitigation in Face Recognition","authors":"Jascha Kolberg, Yannik Schäfer, Christian Rathgeb, Christoph Busch","doi":"10.1049/2024/1808587","DOIUrl":"10.1049/2024/1808587","url":null,"abstract":"<p>With the rise of deep neural networks, the performance of biometric systems has increased tremendously. Biometric systems for face recognition are now used in everyday life, e.g., border control, crime prevention, or personal device access control. Although the accuracy of face recognition systems is generally high, they are not without flaws. Many biometric systems have been found to exhibit demographic bias, resulting in different demographic groups being not recognized with the same accuracy. This is especially true for facial recognition due to demographic factors, e.g., gender and skin color. While many previous works already reported demographic bias, this work aims to reduce demographic bias for biometric face recognition applications. In this regard, 12 face recognition systems are benchmarked regarding biometric recognition performance as well as demographic differentials, i.e., fairness. Subsequently, multiple fusion techniques are applied with the goal to improve the fairness in contrast to single systems. The experimental results show that it is possible to improve the fairness regarding single demographics, e.g., skin color or gender, while improving fairness for demographic subgroups turns out to be more challenging.</p>","PeriodicalId":48821,"journal":{"name":"IET Biometrics","volume":"2024 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/2024/1808587","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140436576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yi Zhao, Xin Jin, Song Gao, Liwen Wu, Shaowen Yao, Qian Jiang
The widespread dissemination of high-fidelity fake faces created by face forgery techniques has caused serious trust concerns and ethical issues in modern society. Consequently, face forgery detection has emerged as a prominent topic of research to prevent technology abuse. Although, most existing face forgery detectors demonstrate success when evaluating high-quality faces under intra-dataset scenarios, they often overfit manipulation-specific artifacts and lack robustness to postprocessing operations. In this work, we design an innovative dual-branch collaboration framework that leverages the strengths of the transformer and CNN to thoroughly dig into the multimodal forgery artifacts from both a global and local perspective. Specifically, a novel adaptive noise trace enhancement module (ANTEM) is proposed to remove high-level face content while amplifying more generalized forgery artifacts in the noise domain. Then, the transformer-based branch can track long-range noise features. Meanwhile, considering that subtle forgery artifacts could be described in the frequency domain even in a compression scenario, a multilevel frequency-aware module (MFAM) is developed and further applied to the CNN-based branch to extract complementary frequency-aware clues. Besides, we incorporate a collaboration strategy involving cross-entropy loss and single center loss to enhance the learning of more generalized representations by optimizing the fusion features of the dual branch. Extensive experiments on various benchmark datasets substantiate the superior generalization and robustness of our framework when compared to the competing approaches.
{"title":"Face Forgery Detection with Long-Range Noise Features and Multilevel Frequency-Aware Clues","authors":"Yi Zhao, Xin Jin, Song Gao, Liwen Wu, Shaowen Yao, Qian Jiang","doi":"10.1049/2024/6523854","DOIUrl":"10.1049/2024/6523854","url":null,"abstract":"<p>The widespread dissemination of high-fidelity fake faces created by face forgery techniques has caused serious trust concerns and ethical issues in modern society. Consequently, face forgery detection has emerged as a prominent topic of research to prevent technology abuse. Although, most existing face forgery detectors demonstrate success when evaluating high-quality faces under intra-dataset scenarios, they often overfit manipulation-specific artifacts and lack robustness to postprocessing operations. In this work, we design an innovative dual-branch collaboration framework that leverages the strengths of the transformer and CNN to thoroughly dig into the multimodal forgery artifacts from both a global and local perspective. Specifically, a novel adaptive noise trace enhancement module (ANTEM) is proposed to remove high-level face content while amplifying more generalized forgery artifacts in the noise domain. Then, the transformer-based branch can track long-range noise features. Meanwhile, considering that subtle forgery artifacts could be described in the frequency domain even in a compression scenario, a multilevel frequency-aware module (MFAM) is developed and further applied to the CNN-based branch to extract complementary frequency-aware clues. Besides, we incorporate a collaboration strategy involving cross-entropy loss and single center loss to enhance the learning of more generalized representations by optimizing the fusion features of the dual branch. Extensive experiments on various benchmark datasets substantiate the superior generalization and robustness of our framework when compared to the competing approaches.</p>","PeriodicalId":48821,"journal":{"name":"IET Biometrics","volume":"2024 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/2024/6523854","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139862462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pesigrihastamadya Normakristagaluh, Geert J. Laanstra, Luuk J. Spreeuwers, Raymond N. J. Veldhuis
This paper studies the impact of illumination direction and bundle width on finger vascular pattern imaging and recognition performance. A qualitative theoretical model is presented to explain the projection of finger blood vessels on the skin. A series of experiments were conducted using a scanner of our design with illumination from the top, a single-direction side (left or right), and narrow or wide beams. A new dataset was collected for the experiments, containing 4,428 NIR images of finger vein patterns captured under well-controlled conditions to minimize position and rotation angle differences between different sessions. Top illumination performs well because of more homogenous, which enhances a larger number of visible veins. Narrower bundles of light do not affect which veins are visible, but they reduce the overexposure at finger boundaries and increase the quality of vascular pattern images. The narrow beam achieves the best performance with 0% of [email protected]%, and the wide beam consistently results in a higher false nonmatch rate. The comparison of left- and right-side illumination has the highest error rates because only the veins in the middle of the finger are visible in both images. Different directional illumination may be interoperable since they produce the same vascular pattern and principally are the projected shadows on the finger surface. Score and image fusion for right- and left-side result in recognition performance similar to that obtained with top illumination, indicating the vein patterns are independent of illumination direction. All results of these experiments support the proposed model.
{"title":"The Impact of Illumination on Finger Vascular Pattern Recognition","authors":"Pesigrihastamadya Normakristagaluh, Geert J. Laanstra, Luuk J. Spreeuwers, Raymond N. J. Veldhuis","doi":"10.1049/2024/4413655","DOIUrl":"10.1049/2024/4413655","url":null,"abstract":"<p>This paper studies the impact of illumination direction and bundle width on finger vascular pattern imaging and recognition performance. A qualitative theoretical model is presented to explain the projection of finger blood vessels on the skin. A series of experiments were conducted using a scanner of our design with illumination from the top, a single-direction side (left or right), and narrow or wide beams. A new dataset was collected for the experiments, containing 4,428 NIR images of finger vein patterns captured under well-controlled conditions to minimize position and rotation angle differences between different sessions. Top illumination performs well because of more homogenous, which enhances a larger number of visible veins. Narrower bundles of light do not affect which veins are visible, but they reduce the overexposure at finger boundaries and increase the quality of vascular pattern images. The narrow beam achieves the best performance with 0% of [email protected]%, and the wide beam consistently results in a higher false nonmatch rate. The comparison of left- and right-side illumination has the highest error rates because only the veins in the middle of the finger are visible in both images. Different directional illumination may be interoperable since they produce the same vascular pattern and principally are the projected shadows on the finger surface. Score and image fusion for right- and left-side result in recognition performance similar to that obtained with top illumination, indicating the vein patterns are independent of illumination direction. All results of these experiments support the proposed model.</p>","PeriodicalId":48821,"journal":{"name":"IET Biometrics","volume":"2024 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/2024/4413655","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139867791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}