首页 > 最新文献

2016 4th International Conference on Biometrics and Forensics (IWBF)最新文献

英文 中文
Assessment of automatic speaker verification on lossy transcoded speech 有损转码语音的自动说话人验证评估
Pub Date : 2016-03-03 DOI: 10.1109/IWBF.2016.7449679
Jozef Polacky, R. Jarina, M. Chmulik
In this paper, we investigate the effect of lossy speech compression on text-independent speaker verification task. We have evaluated the voice biometrics performance over several state-of-the art speech codecs including recently released Enhanced Voice Services (EVS) codec. The tests were performed in both codec-matched and codec-mismatched scenarios. The test results show that EVS outperforms other speech codecs used in our test and it can be used to generate speaker models that are quite robust to varying compression levels. It was also shown that if a speech codec of higher quality (EVS, G711) is included in training data (mismatched and partially mismatched scenarios), the automatic speaker verification (ASV) gives better results than in the case of matched scenario.
本文研究了有损语音压缩对文本无关说话人验证任务的影响。我们已经评估了几种最先进的语音编解码器的语音生物识别性能,包括最近发布的增强型语音服务(EVS)编解码器。测试在编解码器匹配和编解码器不匹配两种情况下进行。测试结果表明,EVS优于我们测试中使用的其他语音编解码器,并且可以用于生成对不同压缩级别具有相当鲁棒性的扬声器模型。当训练数据中包含更高质量的语音编解码器(EVS, G711)(不匹配和部分不匹配场景)时,自动说话人验证(ASV)的结果优于匹配场景。
{"title":"Assessment of automatic speaker verification on lossy transcoded speech","authors":"Jozef Polacky, R. Jarina, M. Chmulik","doi":"10.1109/IWBF.2016.7449679","DOIUrl":"https://doi.org/10.1109/IWBF.2016.7449679","url":null,"abstract":"In this paper, we investigate the effect of lossy speech compression on text-independent speaker verification task. We have evaluated the voice biometrics performance over several state-of-the art speech codecs including recently released Enhanced Voice Services (EVS) codec. The tests were performed in both codec-matched and codec-mismatched scenarios. The test results show that EVS outperforms other speech codecs used in our test and it can be used to generate speaker models that are quite robust to varying compression levels. It was also shown that if a speech codec of higher quality (EVS, G711) is included in training data (mismatched and partially mismatched scenarios), the automatic speaker verification (ASV) gives better results than in the case of matched scenario.","PeriodicalId":282164,"journal":{"name":"2016 4th International Conference on Biometrics and Forensics (IWBF)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130312529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Variable-length template protection based on homomorphic encryption with application to signature biometrics 基于同态加密的变长模板保护及其在签名生物识别中的应用
Pub Date : 2016-03-03 DOI: 10.1109/IWBF.2016.7449672
M. Gomez-Barrero, Julian Fierrez, Javier Galbally
Any privacy leakage of biometric data poses severe security risks given their sensitive nature. Biometric templates should thus be protected, storing irreversibly transformed or encrypted biometric signals, while preserving the unprotected system's performance. Based on the recent developments by Zhu et al. on privacy preserving similarity evaluation of time series data, we present a new biometric template protection scheme based on homomorphic probabilistic encryption, where only encrypted data is stored or exchanged. We then apply the proposed scheme to signature verification and show that all requirements described in the ISO/IEC 24745 standard are met with no performance degradation, using a publicly available database and a free implementation of the Paillier cryptosystem. Moreover, the proposed approach is robust to hill-climbing attacks.
由于生物特征数据的敏感性,任何隐私泄露都会带来严重的安全风险。因此,生物识别模板应该受到保护,存储不可逆转的转换或加密的生物识别信号,同时保持未受保护的系统的性能。基于Zhu等人在时间序列数据隐私保护相似性评估方面的最新进展,我们提出了一种新的基于同态概率加密的生物特征模板保护方案,该方案只存储或交换加密过的数据。然后,我们将提出的方案应用于签名验证,并使用公开可用的数据库和Paillier密码系统的免费实现,证明ISO/IEC 24745标准中描述的所有要求都没有性能下降。此外,该方法对爬坡攻击具有较强的鲁棒性。
{"title":"Variable-length template protection based on homomorphic encryption with application to signature biometrics","authors":"M. Gomez-Barrero, Julian Fierrez, Javier Galbally","doi":"10.1109/IWBF.2016.7449672","DOIUrl":"https://doi.org/10.1109/IWBF.2016.7449672","url":null,"abstract":"Any privacy leakage of biometric data poses severe security risks given their sensitive nature. Biometric templates should thus be protected, storing irreversibly transformed or encrypted biometric signals, while preserving the unprotected system's performance. Based on the recent developments by Zhu et al. on privacy preserving similarity evaluation of time series data, we present a new biometric template protection scheme based on homomorphic probabilistic encryption, where only encrypted data is stored or exchanged. We then apply the proposed scheme to signature verification and show that all requirements described in the ISO/IEC 24745 standard are met with no performance degradation, using a publicly available database and a free implementation of the Paillier cryptosystem. Moreover, the proposed approach is robust to hill-climbing attacks.","PeriodicalId":282164,"journal":{"name":"2016 4th International Conference on Biometrics and Forensics (IWBF)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123372832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Formant manipulations in voice disguise by mimicry 通过模仿进行声音伪装的共振峰操纵
Pub Date : 2016-03-03 DOI: 10.1109/IWBF.2016.7449675
Rita Singh, D. Gençaga, B. Raj
The human voice can be disguised in many ways. The purpose of disguise could either be to impersonate another person, or to conceal the identity of the original speaker, or both. On the other hand, the goal of any biometric analysis on disguised voices could also be twofold: either to find out if the originator of the disguised voice is a given speaker, or to know how a speaker's voice can be manipulated so that the extent and type of disguise that the speaker can perform can be guessed a-priori. Any analysis toward the former goal must rely on the knowledge of what characteristics of a person's voice are least affected or unaffected by attempted disguise. Analysis towards the latter goal must use the knowledge of what sounds are typically most amenable to voluntary variation by the speaker, so that the extent to which given speakers can successfully disguise their voice can be estimated. Our paper attempts to establish a simple methodology for analysis of voice for both goals. We study the voice impersonations performed by an expert mimic, focusing specifically on formants and formant-related measurements, to find out the extent and type of formant manipulations that are performed by the expert at the level of individual phonemes. Expert mimicry is an extreme form of attempted disguise. Our study is presented with the expectation that non-expert attempts at voice disguise by mimicry will fall within the gold standard of manipulation patterns set by an expert mimic, and that it is therefore useful to establish this gold standard.
人的声音可以通过很多方式伪装。伪装的目的可能是模仿另一个人,或者隐藏原来说话人的身份,或者两者兼而有之。另一方面,对伪装的声音进行任何生物特征分析的目标也可能是双重的:要么找出伪装声音的发起者是否是给定的说话者,要么知道说话者的声音是如何被操纵的,以便可以先验地猜测说话者伪装的程度和类型。任何对前一个目标的分析都必须依赖于一个人的声音的哪些特征是最不受伪装影响的。对后一个目标的分析必须要用到什么样的声音最容易被说话者随意改变的知识,这样就可以估计出说话者成功伪装自己声音的程度。我们的论文试图建立一个简单的方法来分析这两个目标的声音。我们研究了由专家模仿的声音模仿,特别关注共振峰和共振峰相关的测量,以找出专家在单个音素水平上执行的共振峰操纵的程度和类型。熟练的模仿是一种极端的伪装。我们的研究期望通过模仿来伪装声音的非专家尝试将落入由专家模仿设定的操纵模式的黄金标准,因此建立这一黄金标准是有用的。
{"title":"Formant manipulations in voice disguise by mimicry","authors":"Rita Singh, D. Gençaga, B. Raj","doi":"10.1109/IWBF.2016.7449675","DOIUrl":"https://doi.org/10.1109/IWBF.2016.7449675","url":null,"abstract":"The human voice can be disguised in many ways. The purpose of disguise could either be to impersonate another person, or to conceal the identity of the original speaker, or both. On the other hand, the goal of any biometric analysis on disguised voices could also be twofold: either to find out if the originator of the disguised voice is a given speaker, or to know how a speaker's voice can be manipulated so that the extent and type of disguise that the speaker can perform can be guessed a-priori. Any analysis toward the former goal must rely on the knowledge of what characteristics of a person's voice are least affected or unaffected by attempted disguise. Analysis towards the latter goal must use the knowledge of what sounds are typically most amenable to voluntary variation by the speaker, so that the extent to which given speakers can successfully disguise their voice can be estimated. Our paper attempts to establish a simple methodology for analysis of voice for both goals. We study the voice impersonations performed by an expert mimic, focusing specifically on formants and formant-related measurements, to find out the extent and type of formant manipulations that are performed by the expert at the level of individual phonemes. Expert mimicry is an extreme form of attempted disguise. Our study is presented with the expectation that non-expert attempts at voice disguise by mimicry will fall within the gold standard of manipulation patterns set by an expert mimic, and that it is therefore useful to establish this gold standard.","PeriodicalId":282164,"journal":{"name":"2016 4th International Conference on Biometrics and Forensics (IWBF)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130751586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Signature recognition: establishing human baseline performance via crowdsourcing 签名识别:通过众包建立人类基准性能
Pub Date : 2016-03-03 DOI: 10.1109/IWBF.2016.7449680
D. Morocho, A. Morales, Julian Fierrez, Rubén Tolosana
This work explores crowdsourcing for the establishment of human baseline performance on signature recognition. We present five experiments according to three different scenarios in which laymen, people without Forensic Document Examiner experience, have to decide about the authenticity of a given signature. The scenarios include single comparisons between one genuine sample and one unlabeled sample based on image, video or time sequences and comparisons with multiple training and test sets. The human performance obtained varies from 7% to 80% depending of the scenario and the results suggest the large potential of these collaborative platforms and encourage to further research on this area.
这项工作探讨了在签名识别中建立人类基线性能的众包。我们根据三种不同的场景提出了五个实验,在这些场景中,外行人,没有法医文件审查员经验的人,必须决定给定签名的真实性。这些场景包括基于图像、视频或时间序列的真实样本和未标记样本之间的单一比较,以及与多个训练和测试集的比较。根据不同的场景,人类的表现从7%到80%不等,结果表明这些协作平台具有巨大的潜力,并鼓励进一步研究这一领域。
{"title":"Signature recognition: establishing human baseline performance via crowdsourcing","authors":"D. Morocho, A. Morales, Julian Fierrez, Rubén Tolosana","doi":"10.1109/IWBF.2016.7449680","DOIUrl":"https://doi.org/10.1109/IWBF.2016.7449680","url":null,"abstract":"This work explores crowdsourcing for the establishment of human baseline performance on signature recognition. We present five experiments according to three different scenarios in which laymen, people without Forensic Document Examiner experience, have to decide about the authenticity of a given signature. The scenarios include single comparisons between one genuine sample and one unlabeled sample based on image, video or time sequences and comparisons with multiple training and test sets. The human performance obtained varies from 7% to 80% depending of the scenario and the results suggest the large potential of these collaborative platforms and encourage to further research on this area.","PeriodicalId":282164,"journal":{"name":"2016 4th International Conference on Biometrics and Forensics (IWBF)","volume":"68 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120896248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
On the analysis of factors influencing the performance of facial age progression 影响面部年龄进展表现的因素分析
Pub Date : 2016-03-03 DOI: 10.1109/IWBF.2016.7449697
A. Lanitis, N. Tsapatsoulis
Facial age progression is the process of synthesizing a face image at an older age based on images showing a person at a younger age. The ability to generate accurate age progressed face images is important for a number of forensic investigation tasks. In this paper we analyze the performance of a number of publicly available age progression applications, with respect to different parameters encountered in age progression including imaging conditions of input images, presence of occluding structures, age of input/target faces, and age progression range. Through the analysis and quantification of age progression accuracy in the presence of different conditions, we extract a number of conclusions that take the form of a set of guidelines related to factors that forensic artists and age progression researchers should focus their attention in order to produce improved age progression methodologies.
面部年龄递进是根据一个人年轻时的面部图像合成一个老年人的面部图像的过程。生成准确的年龄进展面部图像的能力对于许多法医调查任务都很重要。在本文中,我们分析了一些公开可用的年龄递进应用程序的性能,涉及年龄递进中遇到的不同参数,包括输入图像的成像条件、遮挡结构的存在、输入/目标面部的年龄以及年龄递进范围。通过对不同条件下年龄进展准确性的分析和量化,我们得出了一些结论,这些结论采取了一套指导方针的形式,这些指导方针与法医艺术家和年龄进展研究人员应该关注的因素有关,以便产生改进的年龄进展方法。
{"title":"On the analysis of factors influencing the performance of facial age progression","authors":"A. Lanitis, N. Tsapatsoulis","doi":"10.1109/IWBF.2016.7449697","DOIUrl":"https://doi.org/10.1109/IWBF.2016.7449697","url":null,"abstract":"Facial age progression is the process of synthesizing a face image at an older age based on images showing a person at a younger age. The ability to generate accurate age progressed face images is important for a number of forensic investigation tasks. In this paper we analyze the performance of a number of publicly available age progression applications, with respect to different parameters encountered in age progression including imaging conditions of input images, presence of occluding structures, age of input/target faces, and age progression range. Through the analysis and quantification of age progression accuracy in the presence of different conditions, we extract a number of conclusions that take the form of a set of guidelines related to factors that forensic artists and age progression researchers should focus their attention in order to produce improved age progression methodologies.","PeriodicalId":282164,"journal":{"name":"2016 4th International Conference on Biometrics and Forensics (IWBF)","volume":"251 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116476535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
A novel method for sclera recognition with images captured on-the-move and at-a-distance 一种基于运动和远距离图像的巩膜识别新方法
Pub Date : 2016-03-03 DOI: 10.1109/IWBF.2016.7449677
Sinan H. Alkassar, W. L. Woo, S. Dlay, J. Chambers
Sclera blood vessels have been investigated recently as an efficient biometric trait. Capturing this part of the eye with a normal camera using visible-wavelength images rather than near-infrared images has provoked research interest. However, processing noisy sclera images captured at-a-distance and on-the-move has not been extensively investigated. Therefore in this paper, we propose a new method for minimizing the effect of distance on sclera recognition. This method involves sclera template rotation alignment and a distance scaling method to minimize the error rates when noisy eye images are captured at-a-distance and on-the-move. The experimental results using the on-the-move and at-a-distance UBIRIS.v2 database show a significant improvement in term of accuracy and error rates.
巩膜血管作为一种有效的生物特征最近得到了研究。用普通相机捕捉眼睛的这一部分,使用可见光波长图像而不是近红外图像,引起了研究的兴趣。然而,处理在远距离和运动中捕获的有噪声的巩膜图像还没有得到广泛的研究。因此,我们提出了一种新的方法来最小化距离对巩膜识别的影响。该方法包括巩膜模板旋转对齐和距离缩放方法,以最大限度地减少在远距离和移动中捕获有噪声的眼睛图像时的错误率。使用移动和远程UBIRIS的实验结果。V2数据库在准确率和错误率方面有显著提高。
{"title":"A novel method for sclera recognition with images captured on-the-move and at-a-distance","authors":"Sinan H. Alkassar, W. L. Woo, S. Dlay, J. Chambers","doi":"10.1109/IWBF.2016.7449677","DOIUrl":"https://doi.org/10.1109/IWBF.2016.7449677","url":null,"abstract":"Sclera blood vessels have been investigated recently as an efficient biometric trait. Capturing this part of the eye with a normal camera using visible-wavelength images rather than near-infrared images has provoked research interest. However, processing noisy sclera images captured at-a-distance and on-the-move has not been extensively investigated. Therefore in this paper, we propose a new method for minimizing the effect of distance on sclera recognition. This method involves sclera template rotation alignment and a distance scaling method to minimize the error rates when noisy eye images are captured at-a-distance and on-the-move. The experimental results using the on-the-move and at-a-distance UBIRIS.v2 database show a significant improvement in term of accuracy and error rates.","PeriodicalId":282164,"journal":{"name":"2016 4th International Conference on Biometrics and Forensics (IWBF)","volume":"103 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121943140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Walking direction identification using perceptual hashing 基于感知哈希的行走方向识别
Pub Date : 2016-03-03 DOI: 10.1109/IWBF.2016.7449687
T. Verlekar, P. Correia
Gait has become a popular trait for biometric recognition especially in surveillance environments due to its advantage of being captured without active user participation. However the gait description obtained in such scenarios depends on the observed walking direction of the user. Hence, if the user's walking direction is unknown, comparison against a previously prepared database can be rendered impossible. This paper discusses the problem of identifying the walking direction in an unconstrained environment and proposes a novel approach to identify the walking direction. The walking direction is identified by computing a perceptual hash (PHash) over the leg region of the user and comparing it against the PHash values obtained for training sequences. The proposed method is computationally inexpensive and performs better than the state-of-the-art methods. It is also robust against appearance changes that may be caused for instance by the user wearing a coat or carrying a bag.
步态已成为生物特征识别的一个流行特征,特别是在监控环境中,由于它的优点是在没有主动用户参与的情况下被捕获。然而,在这种情况下获得的步态描述取决于观察到的用户的行走方向。因此,如果用户的行走方向是未知的,那么与先前准备好的数据库进行比较是不可能的。讨论了无约束环境下的步行方向识别问题,提出了一种新的步行方向识别方法。通过计算用户腿部区域的感知哈希值(PHash)并将其与训练序列获得的PHash值进行比较来识别行走方向。该方法计算成本低,性能优于现有方法。它还可以抵抗可能引起的外观变化,例如用户穿着外套或携带包。
{"title":"Walking direction identification using perceptual hashing","authors":"T. Verlekar, P. Correia","doi":"10.1109/IWBF.2016.7449687","DOIUrl":"https://doi.org/10.1109/IWBF.2016.7449687","url":null,"abstract":"Gait has become a popular trait for biometric recognition especially in surveillance environments due to its advantage of being captured without active user participation. However the gait description obtained in such scenarios depends on the observed walking direction of the user. Hence, if the user's walking direction is unknown, comparison against a previously prepared database can be rendered impossible. This paper discusses the problem of identifying the walking direction in an unconstrained environment and proposes a novel approach to identify the walking direction. The walking direction is identified by computing a perceptual hash (PHash) over the leg region of the user and comparing it against the PHash values obtained for training sequences. The proposed method is computationally inexpensive and performs better than the state-of-the-art methods. It is also robust against appearance changes that may be caused for instance by the user wearing a coat or carrying a bag.","PeriodicalId":282164,"journal":{"name":"2016 4th International Conference on Biometrics and Forensics (IWBF)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116836172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Short-term analysis for estimating physical parameters of speakers 扬声器物理参数估计的短期分析
Pub Date : 2016-03-03 DOI: 10.1109/IWBF.2016.7449696
Rita Singh, B. Raj, J. Baker
Conventional approaches to estimating speakers' physiometric parameters such as height, age, weight etc. from their voice analyze the speech signal at relatively coarse time resolutions, typically with analysis windows of 25ms or longer. At these resolutions the analysis effectively captures the structure of the supra-glottal vocal tract. In this paper we hypothesize that by analyzing the signal at a finer temporal resolution that is lower than a pitch period, it may be possible to analyze segments of the speech signal that are obtained entirely when the glottis is open, and thereby capture some of the sub-glottal structure that may be represented in the voice. To explore this hypothesis we propose an analysis approach that combines signal analysis techniques suited to fine-temporal-resolution analysis and well-known regression models. We test it on the prediction of heights and ages of speakers from a standard speech database. Our findings show that the higher-resolution analysis does provide benefits over conventional analysis for estimating speaker height, although it is less useful in predicting age.
从语音中估计说话人的身高、年龄、体重等生理参数的传统方法是在相对粗糙的时间分辨率下分析语音信号,分析窗口通常为25ms或更长。在这些分辨率下,分析有效地捕捉到了声门上声道的结构。在本文中,我们假设,通过以低于音高周期的更精细的时间分辨率分析信号,有可能分析完全在声门打开时获得的语音信号片段,从而捕获可能在声音中表示的一些声门下结构。为了探索这一假设,我们提出了一种分析方法,该方法结合了适合于精细时间分辨率分析的信号分析技术和众所周知的回归模型。我们用标准语音数据库对说话者的身高和年龄进行预测。我们的研究结果表明,在估计说话人身高方面,高分辨率分析确实比传统分析有好处,尽管在预测年龄方面用处不大。
{"title":"Short-term analysis for estimating physical parameters of speakers","authors":"Rita Singh, B. Raj, J. Baker","doi":"10.1109/IWBF.2016.7449696","DOIUrl":"https://doi.org/10.1109/IWBF.2016.7449696","url":null,"abstract":"Conventional approaches to estimating speakers' physiometric parameters such as height, age, weight etc. from their voice analyze the speech signal at relatively coarse time resolutions, typically with analysis windows of 25ms or longer. At these resolutions the analysis effectively captures the structure of the supra-glottal vocal tract. In this paper we hypothesize that by analyzing the signal at a finer temporal resolution that is lower than a pitch period, it may be possible to analyze segments of the speech signal that are obtained entirely when the glottis is open, and thereby capture some of the sub-glottal structure that may be represented in the voice. To explore this hypothesis we propose an analysis approach that combines signal analysis techniques suited to fine-temporal-resolution analysis and well-known regression models. We test it on the prediction of heights and ages of speakers from a standard speech database. Our findings show that the higher-resolution analysis does provide benefits over conventional analysis for estimating speaker height, although it is less useful in predicting age.","PeriodicalId":282164,"journal":{"name":"2016 4th International Conference on Biometrics and Forensics (IWBF)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124064394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Study of fusion strategies and exploiting the combination of MFCC and PNCC features for robust biometric speaker identification 融合策略研究及利用MFCC和PNCC特征的结合进行鲁棒生物特征说话人识别
Pub Date : 2016-03-03 DOI: 10.1109/IWBF.2016.7449685
Musab T. S. Al-Kaltakchi, W. L. Woo, S. Dlay, J. Chambers
In this paper, a new combination of features and normalization methods is investigated for robust biometric speaker identification. Mel Frequency Cepstral Coefficients (MFCC) are efficient for speaker identification in clean speech while Power Normalized Cepstral Coefficients (PNCC) features are robust for noisy environments. Therefore, combining both features together is better than taking each one individually. In addition, Cepstral Mean and Variance Normalization (CMVN) and Feature Warping (FW) are used to mitigate possible channel effects and the handset mismatch in voice measurements. Speaker modelling is based on a Gaussian Mixture Model (GMM) with a universal background model (UBM). Coupled parameter learning between the speaker models and UBM is utilized to improve performance. Finally, maximum, mean and weighted sum fusions of model scores are used to enhance the Speaker Identification Accuracy (SIA). Verifications conducted on the TIMIT database with and without noise confirm performance improvement.
本文研究了一种结合特征和归一化的鲁棒生物特征说话人识别方法。低频倒谱系数(MFCC)特征对纯净语音环境下的说话人识别是有效的,而功率归一化倒谱系数(PNCC)特征对噪声环境具有鲁棒性。因此,将两个功能结合在一起比单独使用它们更好。此外,倒谱均值和方差归一化(CMVN)和特征扭曲(FW)用于减轻可能的信道效应和语音测量中的手机不匹配。说话人建模基于高斯混合模型和通用背景模型。利用扬声器模型和UBM之间的耦合参数学习来提高性能。最后,利用模型分数的最大值、平均值和加权和融合来提高说话人识别的精度。在TIMIT数据库上进行的有噪声和无噪声的验证证实了性能的改善。
{"title":"Study of fusion strategies and exploiting the combination of MFCC and PNCC features for robust biometric speaker identification","authors":"Musab T. S. Al-Kaltakchi, W. L. Woo, S. Dlay, J. Chambers","doi":"10.1109/IWBF.2016.7449685","DOIUrl":"https://doi.org/10.1109/IWBF.2016.7449685","url":null,"abstract":"In this paper, a new combination of features and normalization methods is investigated for robust biometric speaker identification. Mel Frequency Cepstral Coefficients (MFCC) are efficient for speaker identification in clean speech while Power Normalized Cepstral Coefficients (PNCC) features are robust for noisy environments. Therefore, combining both features together is better than taking each one individually. In addition, Cepstral Mean and Variance Normalization (CMVN) and Feature Warping (FW) are used to mitigate possible channel effects and the handset mismatch in voice measurements. Speaker modelling is based on a Gaussian Mixture Model (GMM) with a universal background model (UBM). Coupled parameter learning between the speaker models and UBM is utilized to improve performance. Finally, maximum, mean and weighted sum fusions of model scores are used to enhance the Speaker Identification Accuracy (SIA). Verifications conducted on the TIMIT database with and without noise confirm performance improvement.","PeriodicalId":282164,"journal":{"name":"2016 4th International Conference on Biometrics and Forensics (IWBF)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124780325","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 31
Deep pair-wise similarity learning for face recognition 用于人脸识别的深度配对相似性学习
Pub Date : 2016-03-03 DOI: 10.1109/IWBF.2016.7449690
Klemen Grm, S. Dobrišek, V. Štruc
Recent advances in deep learning made it possible to build deep hierarchical models capable of delivering state-of-the-art performance in various vision tasks, such as object recognition, detection or tracking. For recognition tasks the most common approach when using deep models is to learn object representations (or features) directly from raw image-input and then feed the learned features to a suitable classifier. Deep models used in this pipeline are typically heavily parameterized and require enormous amounts of training data to deliver competitive recognition performance. Despite the use of data augmentation techniques, many application domains, predefined experimental protocols or specifics of the recognition problem limit the amount of available training data and make training an effective deep hierarchical model a difficult task. In this paper, we present a novel, deep pair-wise similarity learning (DPSL) strategy for deep models, developed specifically to overcome the problem of insufficient training data, and demonstrate its usage on the task of face recognition. Unlike existing (deep) learning strategies, DPSL operates on image-pairs and tries to learn pair-wise image similarities that can be used for recognition purposes directly instead of feature representations that need to be fed to appropriate classification techniques, as with traditional deep learning pipelines. Since our DPSL strategy assumes an image pair as the input to the learning procedure, the amount of training data available to train deep models is quadratic in the number of available training images, which is of paramount importance for models with a large number of parameters. We demonstrate the efficacy of the proposed learning strategy by developing a deep model for pose-invariant face recognition, called Pose-Invariant Similarity Index (PISI), and presenting comparative experimental results on the FERET an IJB-A datasets.
深度学习的最新进展使得建立能够在各种视觉任务(如物体识别、检测或跟踪)中提供最先进性能的深度分层模型成为可能。对于识别任务,使用深度模型时最常见的方法是直接从原始图像输入中学习对象表示(或特征),然后将学习到的特征提供给合适的分类器。在这个管道中使用的深度模型通常是高度参数化的,需要大量的训练数据来提供有竞争力的识别性能。尽管使用了数据增强技术,但许多应用领域、预定义的实验协议或识别问题的细节限制了可用训练数据的数量,并使训练有效的深度层次模型成为一项艰巨的任务。在本文中,我们提出了一种针对深度模型的新颖的深度配对相似学习(DPSL)策略,专门用于克服训练数据不足的问题,并展示了其在人脸识别任务中的应用。与现有的(深度)学习策略不同,DPSL在图像对上操作,并尝试学习可直接用于识别目的的成对图像相似性,而不是像传统的深度学习管道那样需要将特征表示提供给适当的分类技术。由于我们的DPSL策略假设一个图像对作为学习过程的输入,因此可用于训练深度模型的训练数据量在可用训练图像数量中是二次的,这对于具有大量参数的模型至关重要。我们通过开发一个姿态不变人脸识别的深度模型(称为姿态不变相似指数(PISI))来证明所提出的学习策略的有效性,并在FERET和IJB-A数据集上展示了比较实验结果。
{"title":"Deep pair-wise similarity learning for face recognition","authors":"Klemen Grm, S. Dobrišek, V. Štruc","doi":"10.1109/IWBF.2016.7449690","DOIUrl":"https://doi.org/10.1109/IWBF.2016.7449690","url":null,"abstract":"Recent advances in deep learning made it possible to build deep hierarchical models capable of delivering state-of-the-art performance in various vision tasks, such as object recognition, detection or tracking. For recognition tasks the most common approach when using deep models is to learn object representations (or features) directly from raw image-input and then feed the learned features to a suitable classifier. Deep models used in this pipeline are typically heavily parameterized and require enormous amounts of training data to deliver competitive recognition performance. Despite the use of data augmentation techniques, many application domains, predefined experimental protocols or specifics of the recognition problem limit the amount of available training data and make training an effective deep hierarchical model a difficult task. In this paper, we present a novel, deep pair-wise similarity learning (DPSL) strategy for deep models, developed specifically to overcome the problem of insufficient training data, and demonstrate its usage on the task of face recognition. Unlike existing (deep) learning strategies, DPSL operates on image-pairs and tries to learn pair-wise image similarities that can be used for recognition purposes directly instead of feature representations that need to be fed to appropriate classification techniques, as with traditional deep learning pipelines. Since our DPSL strategy assumes an image pair as the input to the learning procedure, the amount of training data available to train deep models is quadratic in the number of available training images, which is of paramount importance for models with a large number of parameters. We demonstrate the efficacy of the proposed learning strategy by developing a deep model for pose-invariant face recognition, called Pose-Invariant Similarity Index (PISI), and presenting comparative experimental results on the FERET an IJB-A datasets.","PeriodicalId":282164,"journal":{"name":"2016 4th International Conference on Biometrics and Forensics (IWBF)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122208709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
期刊
2016 4th International Conference on Biometrics and Forensics (IWBF)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1