{"title":"Segmented Time-Frequency Masking Algorithm for Speech Separation Based on Deep Neural Networks","authors":"Xinyu Guo, S. Ou, Meng Gao, Ying Gao","doi":"10.1109/CISP-BMEI51763.2020.9263673","DOIUrl":null,"url":null,"abstract":"In view of the residual problem of speech background noise in supervised model based single-channel speech separation algorithm in non-stationary noise environments, a piecewise time-frequency masking target based on Wiener filtering principle is proposed and used as the training target of neural network, which can not only track the SNR changes, but also reduce the damage to speech quality. By combing the four features of Relative spectral transform and perceptual linear prediction (RASTA-PLP) + amplitude modulation spectrogram (AMS) + Mel-frequency cepstral coefficients (MFCC) + Gammatone frequency cepstral coefficient (GFCC), the extracted multi-level voice information is used as the training features of the neural network, and then a deep neural network (DNN) based speech separation system is constructed to separate the noisy speech. The experimental results show that: compared with traditional time-frequency masking methods, the segmented time-frequency masking algorithm can improve the speech quality and clarity, and achieves the purpose of noise suppression and better speech separation performance at low SNR.","PeriodicalId":346757,"journal":{"name":"2020 13th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI)","volume":"31 10","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 13th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CISP-BMEI51763.2020.9263673","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In view of the residual problem of speech background noise in supervised model based single-channel speech separation algorithm in non-stationary noise environments, a piecewise time-frequency masking target based on Wiener filtering principle is proposed and used as the training target of neural network, which can not only track the SNR changes, but also reduce the damage to speech quality. By combing the four features of Relative spectral transform and perceptual linear prediction (RASTA-PLP) + amplitude modulation spectrogram (AMS) + Mel-frequency cepstral coefficients (MFCC) + Gammatone frequency cepstral coefficient (GFCC), the extracted multi-level voice information is used as the training features of the neural network, and then a deep neural network (DNN) based speech separation system is constructed to separate the noisy speech. The experimental results show that: compared with traditional time-frequency masking methods, the segmented time-frequency masking algorithm can improve the speech quality and clarity, and achieves the purpose of noise suppression and better speech separation performance at low SNR.