Comparison of text-independent speaker recognition methods on telephone speech with acoustic mismatch

Proceedings : ICSLP. International Conference on Spoken Language Processing Pub Date : 1996-10-03 DOI:10.21437/ICSLP.1996-454

S. Vuuren

引用次数: 46

Abstract

We compare speaker recognition performance of vector quantization (VQ), Gaussian mixture modeling (GMM) and the Arithmetic Harmonic Sphericity measure (AHS) in adverse telephone speech conditions. The aim is to address the question: how do multimodal VQ and GMM typically compare to the simpler unimodal AHS for matched and mismatched training and testing environments? We study identification (closed set) and verification errors on a new multi environment database. We consider LPC and PLP features as well as their RASTA derivatives. We conclude that RASTA processing can remove redundancies from the features. We affirm that even when we use channel and noise compensation schemes, speaker recognition errors remain high when there is acoustic mismatch.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

声学失配电话语音的文本无关说话人识别方法比较

我们比较了矢量量化(VQ)、高斯混合建模(GMM)和算术谐波球度测量(AHS)在不利语音条件下的说话人识别性能。目的是解决以下问题:在匹配和不匹配的训练和测试环境中，多模式VQ和GMM通常如何与更简单的单模态AHS进行比较?我们研究了一个新的多环境数据库的识别(闭集)和验证误差。我们考虑了LPC和PLP特性以及它们的RASTA衍生产品。我们得出结论，RASTA处理可以从特征中去除冗余。我们确认，即使我们使用通道和噪声补偿方案，当存在声学不匹配时，说话人识别误差仍然很高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings : ICSLP. International Conference on Spoken Language Processing

自引率

0.00%

发文量

期刊最新文献

Audiovisual integration of speech by children and adults with cochlear implants AUDIOVISUAL INTEGRATION OF SPEECH BY CHILDREN AND ADULTS WITH COCHEAR IMPLANTS. Efficient adaptation of TTS duration model to new speakers SABLE: a standard for TTS markup A three-dimensional linear articulatory model based on MRI data