{"title":"Speech enhancement using Long Short-Term Memory based recurrent Neural Networks for noise robust Speaker Verification","authors":"Morten Kolbæk, Z. Tan, J. Jensen","doi":"10.1109/SLT.2016.7846281","DOIUrl":null,"url":null,"abstract":"In this paper we propose to use a state-of-the-art Deep Recurrent Neural Network (DRNN) based Speech Enhancement (SE) algorithm for noise robust Speaker Verification (SV). Specifically, we study the performance of an i-vector based SV system, when tested in noisy conditions using a DRNN based SE front-end utilizing a Long Short-Term Memory (LSTM) architecture. We make comparisons to systems using a Non-negative Matrix Factorization (NMF) based front-end, and a Short-Time Spectral Amplitude Minimum Mean Square Error (STSA-MMSE) based front-end, respectively. We show in simulation experiments that a male-speaker and text-independent DRNN based SE front-end, without specific a priori knowledge about the noise type outperforms a text, noise type and speaker dependent NMF based front-end as well as a STSA-MMSE based front-end in terms of Equal Error Rates for a large range of noise types and signal to noise ratios on the RSR2015 speech corpus.","PeriodicalId":281635,"journal":{"name":"2016 IEEE Spoken Language Technology Workshop (SLT)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"53","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE Spoken Language Technology Workshop (SLT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SLT.2016.7846281","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 53
Abstract
In this paper we propose to use a state-of-the-art Deep Recurrent Neural Network (DRNN) based Speech Enhancement (SE) algorithm for noise robust Speaker Verification (SV). Specifically, we study the performance of an i-vector based SV system, when tested in noisy conditions using a DRNN based SE front-end utilizing a Long Short-Term Memory (LSTM) architecture. We make comparisons to systems using a Non-negative Matrix Factorization (NMF) based front-end, and a Short-Time Spectral Amplitude Minimum Mean Square Error (STSA-MMSE) based front-end, respectively. We show in simulation experiments that a male-speaker and text-independent DRNN based SE front-end, without specific a priori knowledge about the noise type outperforms a text, noise type and speaker dependent NMF based front-end as well as a STSA-MMSE based front-end in terms of Equal Error Rates for a large range of noise types and signal to noise ratios on the RSR2015 speech corpus.