I-vector和GMM-UBM方法在挑战性环境下与TIMIT和NIST 2008数据库的说话人识别比较

2017 25th European Signal Processing Conference (EUSIPCO) Pub Date : 2017-08-01 DOI:10.23919/eusipco.2017.8081264

Musab T. S. Al-Kaltakchi, W. L. Woo, S. Dlay, J. Chambers

{"title":"I-vector和GMM-UBM方法在挑战性环境下与TIMIT和NIST 2008数据库的说话人识别比较","authors":"Musab T. S. Al-Kaltakchi, W. L. Woo, S. Dlay, J. Chambers","doi":"10.23919/eusipco.2017.8081264","DOIUrl":null,"url":null,"abstract":"In this paper, two models, the I-vector and the Gaussian Mixture Model-Universal Background Model (GMM-UBM), are compared for the speaker identification task. Four feature combinations of I-vectors with seven fusion techniques are considered: maximum, mean, weighted sum, cumulative, interleaving and concatenated for both two and four features. In addition, an Extreme Learning Machine (ELM) is exploited to identify speakers, and then Speaker Identification Accuracy (SIA) is calculated. Both systems are evaluated for 120 speakers from the TIMIT and NIST 2008 databases for clean speech. Furthermore, a comprehensive evaluation is made under Additive White Gaussian Noise (AWGN) conditions and with three types of Non Stationary Noise (NSN), both with and without handset effects for the TIMIT database. The results show that the I-vector approach is better than the GMM-UBM for both clean and AWGN conditions without a handset. However, the GMM-UBM had better accuracy for NSN types.","PeriodicalId":346811,"journal":{"name":"2017 25th European Signal Processing Conference (EUSIPCO)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":"{\"title\":\"Comparison of I-vector and GMM-UBM approaches to speaker identification with TIMIT and NIST 2008 databases in challenging environments\",\"authors\":\"Musab T. S. Al-Kaltakchi, W. L. Woo, S. Dlay, J. Chambers\",\"doi\":\"10.23919/eusipco.2017.8081264\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, two models, the I-vector and the Gaussian Mixture Model-Universal Background Model (GMM-UBM), are compared for the speaker identification task. Four feature combinations of I-vectors with seven fusion techniques are considered: maximum, mean, weighted sum, cumulative, interleaving and concatenated for both two and four features. In addition, an Extreme Learning Machine (ELM) is exploited to identify speakers, and then Speaker Identification Accuracy (SIA) is calculated. Both systems are evaluated for 120 speakers from the TIMIT and NIST 2008 databases for clean speech. Furthermore, a comprehensive evaluation is made under Additive White Gaussian Noise (AWGN) conditions and with three types of Non Stationary Noise (NSN), both with and without handset effects for the TIMIT database. The results show that the I-vector approach is better than the GMM-UBM for both clean and AWGN conditions without a handset. However, the GMM-UBM had better accuracy for NSN types.\",\"PeriodicalId\":346811,\"journal\":{\"name\":\"2017 25th European Signal Processing Conference (EUSIPCO)\",\"volume\":\"6 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"13\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 25th European Signal Processing Conference (EUSIPCO)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.23919/eusipco.2017.8081264\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 25th European Signal Processing Conference (EUSIPCO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/eusipco.2017.8081264","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 13

摘要

本文将i向量模型和高斯混合模型-通用背景模型(GMM-UBM)两种模型用于说话人识别任务的比较。考虑了i向量的四种特征组合和七种融合技术:最大值、平均值、加权和、累加、交织和连接两个和四个特征。此外，利用极限学习机(ELM)识别说话人，并计算说话人识别精度(SIA)。这两个系统都对来自TIMIT和NIST 2008数据库的120名说话者进行了干净语音评估。此外，在加性高斯白噪声(AWGN)条件下，对TIMIT数据库进行了三种类型的非平稳噪声(NSN)的综合评价，包括有和没有手机效应。结果表明，在清洁和无手机的AWGN条件下，i向量方法都优于GMM-UBM方法。然而，GMM-UBM对NSN类型具有更好的准确性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Comparison of I-vector and GMM-UBM approaches to speaker identification with TIMIT and NIST 2008 databases in challenging environments

In this paper, two models, the I-vector and the Gaussian Mixture Model-Universal Background Model (GMM-UBM), are compared for the speaker identification task. Four feature combinations of I-vectors with seven fusion techniques are considered: maximum, mean, weighted sum, cumulative, interleaving and concatenated for both two and four features. In addition, an Extreme Learning Machine (ELM) is exploited to identify speakers, and then Speaker Identification Accuracy (SIA) is calculated. Both systems are evaluated for 120 speakers from the TIMIT and NIST 2008 databases for clean speech. Furthermore, a comprehensive evaluation is made under Additive White Gaussian Noise (AWGN) conditions and with three types of Non Stationary Noise (NSN), both with and without handset effects for the TIMIT database. The results show that the I-vector approach is better than the GMM-UBM for both clean and AWGN conditions without a handset. However, the GMM-UBM had better accuracy for NSN types.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2017 25th European Signal Processing Conference (EUSIPCO)

自引率

0.00%

发文量