Speech enhancement using Long Short-Term Memory based recurrent Neural Networks for noise robust Speaker Verification

2016 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2016-09-16 DOI:10.1109/SLT.2016.7846281

Morten Kolbæk, Z. Tan, J. Jensen

引用次数: 53

Abstract

In this paper we propose to use a state-of-the-art Deep Recurrent Neural Network (DRNN) based Speech Enhancement (SE) algorithm for noise robust Speaker Verification (SV). Specifically, we study the performance of an i-vector based SV system, when tested in noisy conditions using a DRNN based SE front-end utilizing a Long Short-Term Memory (LSTM) architecture. We make comparisons to systems using a Non-negative Matrix Factorization (NMF) based front-end, and a Short-Time Spectral Amplitude Minimum Mean Square Error (STSA-MMSE) based front-end, respectively. We show in simulation experiments that a male-speaker and text-independent DRNN based SE front-end, without specific a priori knowledge about the noise type outperforms a text, noise type and speaker dependent NMF based front-end as well as a STSA-MMSE based front-end in terms of Equal Error Rates for a large range of noise types and signal to noise ratios on the RSR2015 speech corpus.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于长短期记忆的递归神经网络语音增强噪声鲁棒说话人验证

在本文中，我们建议使用最先进的基于深度递归神经网络(DRNN)的语音增强(SE)算法进行噪声鲁棒说话人验证(SV)。具体来说，我们研究了基于i向量的SV系统的性能，当使用基于DRNN的SE前端利用长短期记忆(LSTM)架构在噪声条件下进行测试时。我们分别比较了使用基于非负矩阵分解(NMF)的前端和基于短时谱幅最小均方误差(STSA-MMSE)的前端的系统。我们在模拟实验中表明，在RSR2015语音语料库上，基于男性说话人和文本无关的基于DRNN的SE前端，在没有关于噪声类型的特定先验知识的情况下，在大范围噪声类型和信噪比的相等错误率方面，优于基于文本、噪声类型和说话人的基于NMF的前端以及基于STSA-MMSE的前端。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2016 IEEE Spoken Language Technology Workshop (SLT)

自引率

0.00%

发文量