基于长短期记忆的递归神经网络语音增强噪声鲁棒说话人验证

2016 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2016-09-16 DOI:10.1109/SLT.2016.7846281

Morten Kolbæk, Z. Tan, J. Jensen

{"title":"基于长短期记忆的递归神经网络语音增强噪声鲁棒说话人验证","authors":"Morten Kolbæk, Z. Tan, J. Jensen","doi":"10.1109/SLT.2016.7846281","DOIUrl":null,"url":null,"abstract":"In this paper we propose to use a state-of-the-art Deep Recurrent Neural Network (DRNN) based Speech Enhancement (SE) algorithm for noise robust Speaker Verification (SV). Specifically, we study the performance of an i-vector based SV system, when tested in noisy conditions using a DRNN based SE front-end utilizing a Long Short-Term Memory (LSTM) architecture. We make comparisons to systems using a Non-negative Matrix Factorization (NMF) based front-end, and a Short-Time Spectral Amplitude Minimum Mean Square Error (STSA-MMSE) based front-end, respectively. We show in simulation experiments that a male-speaker and text-independent DRNN based SE front-end, without specific a priori knowledge about the noise type outperforms a text, noise type and speaker dependent NMF based front-end as well as a STSA-MMSE based front-end in terms of Equal Error Rates for a large range of noise types and signal to noise ratios on the RSR2015 speech corpus.","PeriodicalId":281635,"journal":{"name":"2016 IEEE Spoken Language Technology Workshop (SLT)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"53","resultStr":"{\"title\":\"Speech enhancement using Long Short-Term Memory based recurrent Neural Networks for noise robust Speaker Verification\",\"authors\":\"Morten Kolbæk, Z. Tan, J. Jensen\",\"doi\":\"10.1109/SLT.2016.7846281\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper we propose to use a state-of-the-art Deep Recurrent Neural Network (DRNN) based Speech Enhancement (SE) algorithm for noise robust Speaker Verification (SV). Specifically, we study the performance of an i-vector based SV system, when tested in noisy conditions using a DRNN based SE front-end utilizing a Long Short-Term Memory (LSTM) architecture. We make comparisons to systems using a Non-negative Matrix Factorization (NMF) based front-end, and a Short-Time Spectral Amplitude Minimum Mean Square Error (STSA-MMSE) based front-end, respectively. We show in simulation experiments that a male-speaker and text-independent DRNN based SE front-end, without specific a priori knowledge about the noise type outperforms a text, noise type and speaker dependent NMF based front-end as well as a STSA-MMSE based front-end in terms of Equal Error Rates for a large range of noise types and signal to noise ratios on the RSR2015 speech corpus.\",\"PeriodicalId\":281635,\"journal\":{\"name\":\"2016 IEEE Spoken Language Technology Workshop (SLT)\",\"volume\":\"28 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-09-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"53\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 IEEE Spoken Language Technology Workshop (SLT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SLT.2016.7846281\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE Spoken Language Technology Workshop (SLT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SLT.2016.7846281","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 53

摘要

在本文中，我们建议使用最先进的基于深度递归神经网络(DRNN)的语音增强(SE)算法进行噪声鲁棒说话人验证(SV)。具体来说，我们研究了基于i向量的SV系统的性能，当使用基于DRNN的SE前端利用长短期记忆(LSTM)架构在噪声条件下进行测试时。我们分别比较了使用基于非负矩阵分解(NMF)的前端和基于短时谱幅最小均方误差(STSA-MMSE)的前端的系统。我们在模拟实验中表明，在RSR2015语音语料库上，基于男性说话人和文本无关的基于DRNN的SE前端，在没有关于噪声类型的特定先验知识的情况下，在大范围噪声类型和信噪比的相等错误率方面，优于基于文本、噪声类型和说话人的基于NMF的前端以及基于STSA-MMSE的前端。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Speech enhancement using Long Short-Term Memory based recurrent Neural Networks for noise robust Speaker Verification

In this paper we propose to use a state-of-the-art Deep Recurrent Neural Network (DRNN) based Speech Enhancement (SE) algorithm for noise robust Speaker Verification (SV). Specifically, we study the performance of an i-vector based SV system, when tested in noisy conditions using a DRNN based SE front-end utilizing a Long Short-Term Memory (LSTM) architecture. We make comparisons to systems using a Non-negative Matrix Factorization (NMF) based front-end, and a Short-Time Spectral Amplitude Minimum Mean Square Error (STSA-MMSE) based front-end, respectively. We show in simulation experiments that a male-speaker and text-independent DRNN based SE front-end, without specific a priori knowledge about the noise type outperforms a text, noise type and speaker dependent NMF based front-end as well as a STSA-MMSE based front-end in terms of Equal Error Rates for a large range of noise types and signal to noise ratios on the RSR2015 speech corpus.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2016 IEEE Spoken Language Technology Workshop (SLT)

自引率

0.00%

发文量