基于GMM超向量和支持向量机的电话应用年龄和性别识别

2008 IEEE International Conference on Acoustics, Speech and Signal Processing Pub Date : 2008-05-12 DOI:10.1109/ICASSP.2008.4517932

T. Bocklet, A. Maier, Josef G. Bauer, F. Burkhardt, E. Nöth

{"title":"基于GMM超向量和支持向量机的电话应用年龄和性别识别","authors":"T. Bocklet, A. Maier, Josef G. Bauer, F. Burkhardt, E. Nöth","doi":"10.1109/ICASSP.2008.4517932","DOIUrl":null,"url":null,"abstract":"This paper compares two approaches of automatic age and gender classification with 7 classes. The first approach are Gaussian mixture models (GMMs) with universal background models (UBMs), which is well known for the task of speaker identification/verification. The training is performed by the EM algorithm or MAP adaptation respectively. For the second approach for each speaker of the test and training set a GMM model is trained. The means of each model are extracted and concatenated, which results in a GMM supervector for each speaker. These supervectors are then used in a support vector machine (SVM). Three different kernels were employed for the SVM approach: a polynomial kernel (with different polynomials), an RBF kernel and a linear GMM distance kernel, based on the KL divergence. With the SVM approach we improved the recognition rate to 74% (p < 0.001) and are in the same range as humans.","PeriodicalId":333742,"journal":{"name":"2008 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2008-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"119","resultStr":"{\"title\":\"Age and gender recognition for telephone applications based on GMM supervectors and support vector machines\",\"authors\":\"T. Bocklet, A. Maier, Josef G. Bauer, F. Burkhardt, E. Nöth\",\"doi\":\"10.1109/ICASSP.2008.4517932\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper compares two approaches of automatic age and gender classification with 7 classes. The first approach are Gaussian mixture models (GMMs) with universal background models (UBMs), which is well known for the task of speaker identification/verification. The training is performed by the EM algorithm or MAP adaptation respectively. For the second approach for each speaker of the test and training set a GMM model is trained. The means of each model are extracted and concatenated, which results in a GMM supervector for each speaker. These supervectors are then used in a support vector machine (SVM). Three different kernels were employed for the SVM approach: a polynomial kernel (with different polynomials), an RBF kernel and a linear GMM distance kernel, based on the KL divergence. With the SVM approach we improved the recognition rate to 74% (p < 0.001) and are in the same range as humans.\",\"PeriodicalId\":333742,\"journal\":{\"name\":\"2008 IEEE International Conference on Acoustics, Speech and Signal Processing\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2008-05-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"119\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2008 IEEE International Conference on Acoustics, Speech and Signal Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICASSP.2008.4517932\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 IEEE International Conference on Acoustics, Speech and Signal Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP.2008.4517932","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 119

摘要

本文比较了两种7类年龄性别自动分类方法。第一种方法是高斯混合模型(GMMs)和通用背景模型(ubm)，它以说话人识别/验证任务而闻名。分别采用EM算法和MAP自适应算法进行训练。对于第二种方法，对测试和训练集的每个说话者训练一个GMM模型。对每个模型的均值进行提取和连接，得到每个说话人的GMM超向量。然后将这些超向量用于支持向量机(SVM)。支持向量机方法采用了三种不同的核:多项式核(具有不同的多项式)，RBF核和基于KL散度的线性GMM距离核。使用SVM方法，我们将识别率提高到74% (p < 0.001)，并且与人类处于相同的范围内。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Age and gender recognition for telephone applications based on GMM supervectors and support vector machines

This paper compares two approaches of automatic age and gender classification with 7 classes. The first approach are Gaussian mixture models (GMMs) with universal background models (UBMs), which is well known for the task of speaker identification/verification. The training is performed by the EM algorithm or MAP adaptation respectively. For the second approach for each speaker of the test and training set a GMM model is trained. The means of each model are extracted and concatenated, which results in a GMM supervector for each speaker. These supervectors are then used in a support vector machine (SVM). Three different kernels were employed for the SVM approach: a polynomial kernel (with different polynomials), an RBF kernel and a linear GMM distance kernel, based on the KL divergence. With the SVM approach we improved the recognition rate to 74% (p < 0.001) and are in the same range as humans.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2008 IEEE International Conference on Acoustics, Speech and Signal Processing

自引率

0.00%

发文量

期刊最新文献

Rate-optimal MIMO transmission with mean and covariance feedback at low SNR Complexity adaptive H.264 encoding using multiple reference frames A low complexity selective mapping to reduce intercarrier interference in OFDM systems Learning to satisfy A message passing algorithm for active contours