{"title":"The use of robust cepstral features obtained from pole-zero transfer functions for speaker identification","authors":"M. Zilovic, R. Ramachandran, R. Mammone","doi":"10.1109/CCECE.1995.526612","DOIUrl":null,"url":null,"abstract":"A common problem in speaker identification systems is that a mismatch in the training and testing conditions sacrifices much performance. The authors attempt to alleviate this problem by proposing new features that show less variation when speech is corrupted by convolutional noise (channel) and/or additive noise. The conventional feature used is the linear predictive (LP) cepstrum that is derived from an all-pole transfer function which in turn achieves a good approximation to the spectral envelope of the speech. Previously, a new cepstral feature based on a pole-zero function (called the adaptive component weighted or ACW cepstrum) was introduced. The present authors propose two additional new cepstral features based on pole-zero transfer functions. One is an alternative way of doing adaptive component weighting and is called the ACW2 cepstrum. The other (known as the PFL1 cepstrum) is based on a pole-zero postfilter used in speech enhancement. Experiments involving a closed set, text-independent and vector quantizer based speaker identification system are done to compare the various features. The King database is used. The ACW and PFL1 features are generally the best. The corresponding spectra show a clear emphasis of the formants and no spectral tilt. To enhance robustness, it is important to emphasize the formants. An accurate description of the spectral envelope is not required.","PeriodicalId":158581,"journal":{"name":"Proceedings 1995 Canadian Conference on Electrical and Computer Engineering","volume":"8 3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1995-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings 1995 Canadian Conference on Electrical and Computer Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCECE.1995.526612","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
A common problem in speaker identification systems is that a mismatch in the training and testing conditions sacrifices much performance. The authors attempt to alleviate this problem by proposing new features that show less variation when speech is corrupted by convolutional noise (channel) and/or additive noise. The conventional feature used is the linear predictive (LP) cepstrum that is derived from an all-pole transfer function which in turn achieves a good approximation to the spectral envelope of the speech. Previously, a new cepstral feature based on a pole-zero function (called the adaptive component weighted or ACW cepstrum) was introduced. The present authors propose two additional new cepstral features based on pole-zero transfer functions. One is an alternative way of doing adaptive component weighting and is called the ACW2 cepstrum. The other (known as the PFL1 cepstrum) is based on a pole-zero postfilter used in speech enhancement. Experiments involving a closed set, text-independent and vector quantizer based speaker identification system are done to compare the various features. The King database is used. The ACW and PFL1 features are generally the best. The corresponding spectra show a clear emphasis of the formants and no spectral tilt. To enhance robustness, it is important to emphasize the formants. An accurate description of the spectral envelope is not required.