Modelling speaker and channel variability using deep neural networks for robust speaker verification

2016 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2016-12-01 DOI:10.1109/SLT.2016.7846264

Gautam Bhattacharya, Md. Jahangir Alam, P. Kenny, Vishwa Gupta

引用次数: 24

Abstract

We propose to improve the performance of i-vector based speaker verification by processing the i-vectors with a deep neural network before they are fed to a cosine distance or probabilistic linear discriminant analysis (PLDA) classifier. To this end we build on an existing model that we refer to as Non-linear Within Class Normalization (NWCN) and introduce a novel Speaker Classifier Network (SCN). Both models deliver impressive speaker verification performance, showing a 56% and 68% relative improvement over standard i-vectors when combined with a cosine distance backend. The NWCN model also reduces the equal error rate for PLDA from 1.78% to 1.63%. We also test these models under the constraints of domain mismatch, i.e. when no in-domain training data is available. Under these conditions, SCN features in combination with cosine distance performs better than the PLDA baseline, achieving an equal error rate of 2.92% as compared to 3.37%.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

利用深度神经网络对说话人和通道可变性进行建模，实现对说话人的鲁棒验证

我们提出在i-向量被输入余弦距离或概率线性判别分析(PLDA)分类器之前，通过深度神经网络处理i-向量来提高基于i-向量的说话人验证的性能。为此，我们建立了一个现有的模型，我们称之为非线性类内归一化(NWCN)，并引入了一个新的说话人分类器网络(SCN)。这两种模型都提供了令人印象深刻的扬声器验证性能，在与余弦距离后端相结合时，比标准i-vector显示出56%和68%的相对改进。NWCN模型还将PLDA的等错误率从1.78%降低到1.63%。我们还在领域不匹配的约束下对这些模型进行了测试，即在没有可用的领域内训练数据的情况下。在这些条件下，SCN特征与余弦距离的结合优于PLDA基线，错误率为2.92%，错误率为3.37%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2016 IEEE Spoken Language Technology Workshop (SLT)

自引率

0.00%

发文量