Musab T. S. Al-Kaltakchi, W. L. Woo, S. Dlay, J. Chambers
{"title":"I-vector和GMM-UBM方法在挑战性环境下与TIMIT和NIST 2008数据库的说话人识别比较","authors":"Musab T. S. Al-Kaltakchi, W. L. Woo, S. Dlay, J. Chambers","doi":"10.23919/eusipco.2017.8081264","DOIUrl":null,"url":null,"abstract":"In this paper, two models, the I-vector and the Gaussian Mixture Model-Universal Background Model (GMM-UBM), are compared for the speaker identification task. Four feature combinations of I-vectors with seven fusion techniques are considered: maximum, mean, weighted sum, cumulative, interleaving and concatenated for both two and four features. In addition, an Extreme Learning Machine (ELM) is exploited to identify speakers, and then Speaker Identification Accuracy (SIA) is calculated. Both systems are evaluated for 120 speakers from the TIMIT and NIST 2008 databases for clean speech. Furthermore, a comprehensive evaluation is made under Additive White Gaussian Noise (AWGN) conditions and with three types of Non Stationary Noise (NSN), both with and without handset effects for the TIMIT database. The results show that the I-vector approach is better than the GMM-UBM for both clean and AWGN conditions without a handset. However, the GMM-UBM had better accuracy for NSN types.","PeriodicalId":346811,"journal":{"name":"2017 25th European Signal Processing Conference (EUSIPCO)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":"{\"title\":\"Comparison of I-vector and GMM-UBM approaches to speaker identification with TIMIT and NIST 2008 databases in challenging environments\",\"authors\":\"Musab T. S. Al-Kaltakchi, W. L. Woo, S. Dlay, J. Chambers\",\"doi\":\"10.23919/eusipco.2017.8081264\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, two models, the I-vector and the Gaussian Mixture Model-Universal Background Model (GMM-UBM), are compared for the speaker identification task. Four feature combinations of I-vectors with seven fusion techniques are considered: maximum, mean, weighted sum, cumulative, interleaving and concatenated for both two and four features. In addition, an Extreme Learning Machine (ELM) is exploited to identify speakers, and then Speaker Identification Accuracy (SIA) is calculated. Both systems are evaluated for 120 speakers from the TIMIT and NIST 2008 databases for clean speech. Furthermore, a comprehensive evaluation is made under Additive White Gaussian Noise (AWGN) conditions and with three types of Non Stationary Noise (NSN), both with and without handset effects for the TIMIT database. The results show that the I-vector approach is better than the GMM-UBM for both clean and AWGN conditions without a handset. However, the GMM-UBM had better accuracy for NSN types.\",\"PeriodicalId\":346811,\"journal\":{\"name\":\"2017 25th European Signal Processing Conference (EUSIPCO)\",\"volume\":\"6 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"13\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 25th European Signal Processing Conference (EUSIPCO)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.23919/eusipco.2017.8081264\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 25th European Signal Processing Conference (EUSIPCO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/eusipco.2017.8081264","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Comparison of I-vector and GMM-UBM approaches to speaker identification with TIMIT and NIST 2008 databases in challenging environments
In this paper, two models, the I-vector and the Gaussian Mixture Model-Universal Background Model (GMM-UBM), are compared for the speaker identification task. Four feature combinations of I-vectors with seven fusion techniques are considered: maximum, mean, weighted sum, cumulative, interleaving and concatenated for both two and four features. In addition, an Extreme Learning Machine (ELM) is exploited to identify speakers, and then Speaker Identification Accuracy (SIA) is calculated. Both systems are evaluated for 120 speakers from the TIMIT and NIST 2008 databases for clean speech. Furthermore, a comprehensive evaluation is made under Additive White Gaussian Noise (AWGN) conditions and with three types of Non Stationary Noise (NSN), both with and without handset effects for the TIMIT database. The results show that the I-vector approach is better than the GMM-UBM for both clean and AWGN conditions without a handset. However, the GMM-UBM had better accuracy for NSN types.