Pub Date : 2016-03-03DOI: 10.1109/IWBF.2016.7449679
Jozef Polacky, R. Jarina, M. Chmulik
In this paper, we investigate the effect of lossy speech compression on text-independent speaker verification task. We have evaluated the voice biometrics performance over several state-of-the art speech codecs including recently released Enhanced Voice Services (EVS) codec. The tests were performed in both codec-matched and codec-mismatched scenarios. The test results show that EVS outperforms other speech codecs used in our test and it can be used to generate speaker models that are quite robust to varying compression levels. It was also shown that if a speech codec of higher quality (EVS, G711) is included in training data (mismatched and partially mismatched scenarios), the automatic speaker verification (ASV) gives better results than in the case of matched scenario.
{"title":"Assessment of automatic speaker verification on lossy transcoded speech","authors":"Jozef Polacky, R. Jarina, M. Chmulik","doi":"10.1109/IWBF.2016.7449679","DOIUrl":"https://doi.org/10.1109/IWBF.2016.7449679","url":null,"abstract":"In this paper, we investigate the effect of lossy speech compression on text-independent speaker verification task. We have evaluated the voice biometrics performance over several state-of-the art speech codecs including recently released Enhanced Voice Services (EVS) codec. The tests were performed in both codec-matched and codec-mismatched scenarios. The test results show that EVS outperforms other speech codecs used in our test and it can be used to generate speaker models that are quite robust to varying compression levels. It was also shown that if a speech codec of higher quality (EVS, G711) is included in training data (mismatched and partially mismatched scenarios), the automatic speaker verification (ASV) gives better results than in the case of matched scenario.","PeriodicalId":282164,"journal":{"name":"2016 4th International Conference on Biometrics and Forensics (IWBF)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130312529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-03-03DOI: 10.1109/IWBF.2016.7449672
M. Gomez-Barrero, Julian Fierrez, Javier Galbally
Any privacy leakage of biometric data poses severe security risks given their sensitive nature. Biometric templates should thus be protected, storing irreversibly transformed or encrypted biometric signals, while preserving the unprotected system's performance. Based on the recent developments by Zhu et al. on privacy preserving similarity evaluation of time series data, we present a new biometric template protection scheme based on homomorphic probabilistic encryption, where only encrypted data is stored or exchanged. We then apply the proposed scheme to signature verification and show that all requirements described in the ISO/IEC 24745 standard are met with no performance degradation, using a publicly available database and a free implementation of the Paillier cryptosystem. Moreover, the proposed approach is robust to hill-climbing attacks.
{"title":"Variable-length template protection based on homomorphic encryption with application to signature biometrics","authors":"M. Gomez-Barrero, Julian Fierrez, Javier Galbally","doi":"10.1109/IWBF.2016.7449672","DOIUrl":"https://doi.org/10.1109/IWBF.2016.7449672","url":null,"abstract":"Any privacy leakage of biometric data poses severe security risks given their sensitive nature. Biometric templates should thus be protected, storing irreversibly transformed or encrypted biometric signals, while preserving the unprotected system's performance. Based on the recent developments by Zhu et al. on privacy preserving similarity evaluation of time series data, we present a new biometric template protection scheme based on homomorphic probabilistic encryption, where only encrypted data is stored or exchanged. We then apply the proposed scheme to signature verification and show that all requirements described in the ISO/IEC 24745 standard are met with no performance degradation, using a publicly available database and a free implementation of the Paillier cryptosystem. Moreover, the proposed approach is robust to hill-climbing attacks.","PeriodicalId":282164,"journal":{"name":"2016 4th International Conference on Biometrics and Forensics (IWBF)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123372832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-03-03DOI: 10.1109/IWBF.2016.7449675
Rita Singh, D. Gençaga, B. Raj
The human voice can be disguised in many ways. The purpose of disguise could either be to impersonate another person, or to conceal the identity of the original speaker, or both. On the other hand, the goal of any biometric analysis on disguised voices could also be twofold: either to find out if the originator of the disguised voice is a given speaker, or to know how a speaker's voice can be manipulated so that the extent and type of disguise that the speaker can perform can be guessed a-priori. Any analysis toward the former goal must rely on the knowledge of what characteristics of a person's voice are least affected or unaffected by attempted disguise. Analysis towards the latter goal must use the knowledge of what sounds are typically most amenable to voluntary variation by the speaker, so that the extent to which given speakers can successfully disguise their voice can be estimated. Our paper attempts to establish a simple methodology for analysis of voice for both goals. We study the voice impersonations performed by an expert mimic, focusing specifically on formants and formant-related measurements, to find out the extent and type of formant manipulations that are performed by the expert at the level of individual phonemes. Expert mimicry is an extreme form of attempted disguise. Our study is presented with the expectation that non-expert attempts at voice disguise by mimicry will fall within the gold standard of manipulation patterns set by an expert mimic, and that it is therefore useful to establish this gold standard.
{"title":"Formant manipulations in voice disguise by mimicry","authors":"Rita Singh, D. Gençaga, B. Raj","doi":"10.1109/IWBF.2016.7449675","DOIUrl":"https://doi.org/10.1109/IWBF.2016.7449675","url":null,"abstract":"The human voice can be disguised in many ways. The purpose of disguise could either be to impersonate another person, or to conceal the identity of the original speaker, or both. On the other hand, the goal of any biometric analysis on disguised voices could also be twofold: either to find out if the originator of the disguised voice is a given speaker, or to know how a speaker's voice can be manipulated so that the extent and type of disguise that the speaker can perform can be guessed a-priori. Any analysis toward the former goal must rely on the knowledge of what characteristics of a person's voice are least affected or unaffected by attempted disguise. Analysis towards the latter goal must use the knowledge of what sounds are typically most amenable to voluntary variation by the speaker, so that the extent to which given speakers can successfully disguise their voice can be estimated. Our paper attempts to establish a simple methodology for analysis of voice for both goals. We study the voice impersonations performed by an expert mimic, focusing specifically on formants and formant-related measurements, to find out the extent and type of formant manipulations that are performed by the expert at the level of individual phonemes. Expert mimicry is an extreme form of attempted disguise. Our study is presented with the expectation that non-expert attempts at voice disguise by mimicry will fall within the gold standard of manipulation patterns set by an expert mimic, and that it is therefore useful to establish this gold standard.","PeriodicalId":282164,"journal":{"name":"2016 4th International Conference on Biometrics and Forensics (IWBF)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130751586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-03-03DOI: 10.1109/IWBF.2016.7449680
D. Morocho, A. Morales, Julian Fierrez, Rubén Tolosana
This work explores crowdsourcing for the establishment of human baseline performance on signature recognition. We present five experiments according to three different scenarios in which laymen, people without Forensic Document Examiner experience, have to decide about the authenticity of a given signature. The scenarios include single comparisons between one genuine sample and one unlabeled sample based on image, video or time sequences and comparisons with multiple training and test sets. The human performance obtained varies from 7% to 80% depending of the scenario and the results suggest the large potential of these collaborative platforms and encourage to further research on this area.
{"title":"Signature recognition: establishing human baseline performance via crowdsourcing","authors":"D. Morocho, A. Morales, Julian Fierrez, Rubén Tolosana","doi":"10.1109/IWBF.2016.7449680","DOIUrl":"https://doi.org/10.1109/IWBF.2016.7449680","url":null,"abstract":"This work explores crowdsourcing for the establishment of human baseline performance on signature recognition. We present five experiments according to three different scenarios in which laymen, people without Forensic Document Examiner experience, have to decide about the authenticity of a given signature. The scenarios include single comparisons between one genuine sample and one unlabeled sample based on image, video or time sequences and comparisons with multiple training and test sets. The human performance obtained varies from 7% to 80% depending of the scenario and the results suggest the large potential of these collaborative platforms and encourage to further research on this area.","PeriodicalId":282164,"journal":{"name":"2016 4th International Conference on Biometrics and Forensics (IWBF)","volume":"68 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120896248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-03-03DOI: 10.1109/IWBF.2016.7449697
A. Lanitis, N. Tsapatsoulis
Facial age progression is the process of synthesizing a face image at an older age based on images showing a person at a younger age. The ability to generate accurate age progressed face images is important for a number of forensic investigation tasks. In this paper we analyze the performance of a number of publicly available age progression applications, with respect to different parameters encountered in age progression including imaging conditions of input images, presence of occluding structures, age of input/target faces, and age progression range. Through the analysis and quantification of age progression accuracy in the presence of different conditions, we extract a number of conclusions that take the form of a set of guidelines related to factors that forensic artists and age progression researchers should focus their attention in order to produce improved age progression methodologies.
{"title":"On the analysis of factors influencing the performance of facial age progression","authors":"A. Lanitis, N. Tsapatsoulis","doi":"10.1109/IWBF.2016.7449697","DOIUrl":"https://doi.org/10.1109/IWBF.2016.7449697","url":null,"abstract":"Facial age progression is the process of synthesizing a face image at an older age based on images showing a person at a younger age. The ability to generate accurate age progressed face images is important for a number of forensic investigation tasks. In this paper we analyze the performance of a number of publicly available age progression applications, with respect to different parameters encountered in age progression including imaging conditions of input images, presence of occluding structures, age of input/target faces, and age progression range. Through the analysis and quantification of age progression accuracy in the presence of different conditions, we extract a number of conclusions that take the form of a set of guidelines related to factors that forensic artists and age progression researchers should focus their attention in order to produce improved age progression methodologies.","PeriodicalId":282164,"journal":{"name":"2016 4th International Conference on Biometrics and Forensics (IWBF)","volume":"251 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116476535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-03-03DOI: 10.1109/IWBF.2016.7449677
Sinan H. Alkassar, W. L. Woo, S. Dlay, J. Chambers
Sclera blood vessels have been investigated recently as an efficient biometric trait. Capturing this part of the eye with a normal camera using visible-wavelength images rather than near-infrared images has provoked research interest. However, processing noisy sclera images captured at-a-distance and on-the-move has not been extensively investigated. Therefore in this paper, we propose a new method for minimizing the effect of distance on sclera recognition. This method involves sclera template rotation alignment and a distance scaling method to minimize the error rates when noisy eye images are captured at-a-distance and on-the-move. The experimental results using the on-the-move and at-a-distance UBIRIS.v2 database show a significant improvement in term of accuracy and error rates.
{"title":"A novel method for sclera recognition with images captured on-the-move and at-a-distance","authors":"Sinan H. Alkassar, W. L. Woo, S. Dlay, J. Chambers","doi":"10.1109/IWBF.2016.7449677","DOIUrl":"https://doi.org/10.1109/IWBF.2016.7449677","url":null,"abstract":"Sclera blood vessels have been investigated recently as an efficient biometric trait. Capturing this part of the eye with a normal camera using visible-wavelength images rather than near-infrared images has provoked research interest. However, processing noisy sclera images captured at-a-distance and on-the-move has not been extensively investigated. Therefore in this paper, we propose a new method for minimizing the effect of distance on sclera recognition. This method involves sclera template rotation alignment and a distance scaling method to minimize the error rates when noisy eye images are captured at-a-distance and on-the-move. The experimental results using the on-the-move and at-a-distance UBIRIS.v2 database show a significant improvement in term of accuracy and error rates.","PeriodicalId":282164,"journal":{"name":"2016 4th International Conference on Biometrics and Forensics (IWBF)","volume":"103 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121943140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-03-03DOI: 10.1109/IWBF.2016.7449687
T. Verlekar, P. Correia
Gait has become a popular trait for biometric recognition especially in surveillance environments due to its advantage of being captured without active user participation. However the gait description obtained in such scenarios depends on the observed walking direction of the user. Hence, if the user's walking direction is unknown, comparison against a previously prepared database can be rendered impossible. This paper discusses the problem of identifying the walking direction in an unconstrained environment and proposes a novel approach to identify the walking direction. The walking direction is identified by computing a perceptual hash (PHash) over the leg region of the user and comparing it against the PHash values obtained for training sequences. The proposed method is computationally inexpensive and performs better than the state-of-the-art methods. It is also robust against appearance changes that may be caused for instance by the user wearing a coat or carrying a bag.
{"title":"Walking direction identification using perceptual hashing","authors":"T. Verlekar, P. Correia","doi":"10.1109/IWBF.2016.7449687","DOIUrl":"https://doi.org/10.1109/IWBF.2016.7449687","url":null,"abstract":"Gait has become a popular trait for biometric recognition especially in surveillance environments due to its advantage of being captured without active user participation. However the gait description obtained in such scenarios depends on the observed walking direction of the user. Hence, if the user's walking direction is unknown, comparison against a previously prepared database can be rendered impossible. This paper discusses the problem of identifying the walking direction in an unconstrained environment and proposes a novel approach to identify the walking direction. The walking direction is identified by computing a perceptual hash (PHash) over the leg region of the user and comparing it against the PHash values obtained for training sequences. The proposed method is computationally inexpensive and performs better than the state-of-the-art methods. It is also robust against appearance changes that may be caused for instance by the user wearing a coat or carrying a bag.","PeriodicalId":282164,"journal":{"name":"2016 4th International Conference on Biometrics and Forensics (IWBF)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116836172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-03-03DOI: 10.1109/IWBF.2016.7449696
Rita Singh, B. Raj, J. Baker
Conventional approaches to estimating speakers' physiometric parameters such as height, age, weight etc. from their voice analyze the speech signal at relatively coarse time resolutions, typically with analysis windows of 25ms or longer. At these resolutions the analysis effectively captures the structure of the supra-glottal vocal tract. In this paper we hypothesize that by analyzing the signal at a finer temporal resolution that is lower than a pitch period, it may be possible to analyze segments of the speech signal that are obtained entirely when the glottis is open, and thereby capture some of the sub-glottal structure that may be represented in the voice. To explore this hypothesis we propose an analysis approach that combines signal analysis techniques suited to fine-temporal-resolution analysis and well-known regression models. We test it on the prediction of heights and ages of speakers from a standard speech database. Our findings show that the higher-resolution analysis does provide benefits over conventional analysis for estimating speaker height, although it is less useful in predicting age.
{"title":"Short-term analysis for estimating physical parameters of speakers","authors":"Rita Singh, B. Raj, J. Baker","doi":"10.1109/IWBF.2016.7449696","DOIUrl":"https://doi.org/10.1109/IWBF.2016.7449696","url":null,"abstract":"Conventional approaches to estimating speakers' physiometric parameters such as height, age, weight etc. from their voice analyze the speech signal at relatively coarse time resolutions, typically with analysis windows of 25ms or longer. At these resolutions the analysis effectively captures the structure of the supra-glottal vocal tract. In this paper we hypothesize that by analyzing the signal at a finer temporal resolution that is lower than a pitch period, it may be possible to analyze segments of the speech signal that are obtained entirely when the glottis is open, and thereby capture some of the sub-glottal structure that may be represented in the voice. To explore this hypothesis we propose an analysis approach that combines signal analysis techniques suited to fine-temporal-resolution analysis and well-known regression models. We test it on the prediction of heights and ages of speakers from a standard speech database. Our findings show that the higher-resolution analysis does provide benefits over conventional analysis for estimating speaker height, although it is less useful in predicting age.","PeriodicalId":282164,"journal":{"name":"2016 4th International Conference on Biometrics and Forensics (IWBF)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124064394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-03-03DOI: 10.1109/IWBF.2016.7449685
Musab T. S. Al-Kaltakchi, W. L. Woo, S. Dlay, J. Chambers
In this paper, a new combination of features and normalization methods is investigated for robust biometric speaker identification. Mel Frequency Cepstral Coefficients (MFCC) are efficient for speaker identification in clean speech while Power Normalized Cepstral Coefficients (PNCC) features are robust for noisy environments. Therefore, combining both features together is better than taking each one individually. In addition, Cepstral Mean and Variance Normalization (CMVN) and Feature Warping (FW) are used to mitigate possible channel effects and the handset mismatch in voice measurements. Speaker modelling is based on a Gaussian Mixture Model (GMM) with a universal background model (UBM). Coupled parameter learning between the speaker models and UBM is utilized to improve performance. Finally, maximum, mean and weighted sum fusions of model scores are used to enhance the Speaker Identification Accuracy (SIA). Verifications conducted on the TIMIT database with and without noise confirm performance improvement.
{"title":"Study of fusion strategies and exploiting the combination of MFCC and PNCC features for robust biometric speaker identification","authors":"Musab T. S. Al-Kaltakchi, W. L. Woo, S. Dlay, J. Chambers","doi":"10.1109/IWBF.2016.7449685","DOIUrl":"https://doi.org/10.1109/IWBF.2016.7449685","url":null,"abstract":"In this paper, a new combination of features and normalization methods is investigated for robust biometric speaker identification. Mel Frequency Cepstral Coefficients (MFCC) are efficient for speaker identification in clean speech while Power Normalized Cepstral Coefficients (PNCC) features are robust for noisy environments. Therefore, combining both features together is better than taking each one individually. In addition, Cepstral Mean and Variance Normalization (CMVN) and Feature Warping (FW) are used to mitigate possible channel effects and the handset mismatch in voice measurements. Speaker modelling is based on a Gaussian Mixture Model (GMM) with a universal background model (UBM). Coupled parameter learning between the speaker models and UBM is utilized to improve performance. Finally, maximum, mean and weighted sum fusions of model scores are used to enhance the Speaker Identification Accuracy (SIA). Verifications conducted on the TIMIT database with and without noise confirm performance improvement.","PeriodicalId":282164,"journal":{"name":"2016 4th International Conference on Biometrics and Forensics (IWBF)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124780325","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-03-03DOI: 10.1109/IWBF.2016.7449690
Klemen Grm, S. Dobrišek, V. Štruc
Recent advances in deep learning made it possible to build deep hierarchical models capable of delivering state-of-the-art performance in various vision tasks, such as object recognition, detection or tracking. For recognition tasks the most common approach when using deep models is to learn object representations (or features) directly from raw image-input and then feed the learned features to a suitable classifier. Deep models used in this pipeline are typically heavily parameterized and require enormous amounts of training data to deliver competitive recognition performance. Despite the use of data augmentation techniques, many application domains, predefined experimental protocols or specifics of the recognition problem limit the amount of available training data and make training an effective deep hierarchical model a difficult task. In this paper, we present a novel, deep pair-wise similarity learning (DPSL) strategy for deep models, developed specifically to overcome the problem of insufficient training data, and demonstrate its usage on the task of face recognition. Unlike existing (deep) learning strategies, DPSL operates on image-pairs and tries to learn pair-wise image similarities that can be used for recognition purposes directly instead of feature representations that need to be fed to appropriate classification techniques, as with traditional deep learning pipelines. Since our DPSL strategy assumes an image pair as the input to the learning procedure, the amount of training data available to train deep models is quadratic in the number of available training images, which is of paramount importance for models with a large number of parameters. We demonstrate the efficacy of the proposed learning strategy by developing a deep model for pose-invariant face recognition, called Pose-Invariant Similarity Index (PISI), and presenting comparative experimental results on the FERET an IJB-A datasets.
{"title":"Deep pair-wise similarity learning for face recognition","authors":"Klemen Grm, S. Dobrišek, V. Štruc","doi":"10.1109/IWBF.2016.7449690","DOIUrl":"https://doi.org/10.1109/IWBF.2016.7449690","url":null,"abstract":"Recent advances in deep learning made it possible to build deep hierarchical models capable of delivering state-of-the-art performance in various vision tasks, such as object recognition, detection or tracking. For recognition tasks the most common approach when using deep models is to learn object representations (or features) directly from raw image-input and then feed the learned features to a suitable classifier. Deep models used in this pipeline are typically heavily parameterized and require enormous amounts of training data to deliver competitive recognition performance. Despite the use of data augmentation techniques, many application domains, predefined experimental protocols or specifics of the recognition problem limit the amount of available training data and make training an effective deep hierarchical model a difficult task. In this paper, we present a novel, deep pair-wise similarity learning (DPSL) strategy for deep models, developed specifically to overcome the problem of insufficient training data, and demonstrate its usage on the task of face recognition. Unlike existing (deep) learning strategies, DPSL operates on image-pairs and tries to learn pair-wise image similarities that can be used for recognition purposes directly instead of feature representations that need to be fed to appropriate classification techniques, as with traditional deep learning pipelines. Since our DPSL strategy assumes an image pair as the input to the learning procedure, the amount of training data available to train deep models is quadratic in the number of available training images, which is of paramount importance for models with a large number of parameters. We demonstrate the efficacy of the proposed learning strategy by developing a deep model for pose-invariant face recognition, called Pose-Invariant Similarity Index (PISI), and presenting comparative experimental results on the FERET an IJB-A datasets.","PeriodicalId":282164,"journal":{"name":"2016 4th International Conference on Biometrics and Forensics (IWBF)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122208709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}