Segmented Time-Frequency Masking Algorithm for Speech Separation Based on Deep Neural Networks

2020 13th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI) Pub Date : 2020-10-17 DOI:10.1109/CISP-BMEI51763.2020.9263673

Xinyu Guo, S. Ou, Meng Gao, Ying Gao

{"title":"Segmented Time-Frequency Masking Algorithm for Speech Separation Based on Deep Neural Networks","authors":"Xinyu Guo, S. Ou, Meng Gao, Ying Gao","doi":"10.1109/CISP-BMEI51763.2020.9263673","DOIUrl":null,"url":null,"abstract":"In view of the residual problem of speech background noise in supervised model based single-channel speech separation algorithm in non-stationary noise environments, a piecewise time-frequency masking target based on Wiener filtering principle is proposed and used as the training target of neural network, which can not only track the SNR changes, but also reduce the damage to speech quality. By combing the four features of Relative spectral transform and perceptual linear prediction (RASTA-PLP) + amplitude modulation spectrogram (AMS) + Mel-frequency cepstral coefficients (MFCC) + Gammatone frequency cepstral coefficient (GFCC), the extracted multi-level voice information is used as the training features of the neural network, and then a deep neural network （DNN） based speech separation system is constructed to separate the noisy speech. The experimental results show that: compared with traditional time-frequency masking methods, the segmented time-frequency masking algorithm can improve the speech quality and clarity, and achieves the purpose of noise suppression and better speech separation performance at low SNR.","PeriodicalId":346757,"journal":{"name":"2020 13th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI)","volume":"31 10","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 13th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CISP-BMEI51763.2020.9263673","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

In view of the residual problem of speech background noise in supervised model based single-channel speech separation algorithm in non-stationary noise environments, a piecewise time-frequency masking target based on Wiener filtering principle is proposed and used as the training target of neural network, which can not only track the SNR changes, but also reduce the damage to speech quality. By combing the four features of Relative spectral transform and perceptual linear prediction (RASTA-PLP) + amplitude modulation spectrogram (AMS) + Mel-frequency cepstral coefficients (MFCC) + Gammatone frequency cepstral coefficient (GFCC), the extracted multi-level voice information is used as the training features of the neural network, and then a deep neural network （DNN） based speech separation system is constructed to separate the noisy speech. The experimental results show that: compared with traditional time-frequency masking methods, the segmented time-frequency masking algorithm can improve the speech quality and clarity, and achieves the purpose of noise suppression and better speech separation performance at low SNR.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于深度神经网络的语音分离分段时频掩蔽算法

针对非平稳噪声环境下基于监督模型的单通道语音分离算法存在的语音背景噪声残留问题，提出了一种基于维纳滤波原理的分段时频掩蔽目标作为神经网络的训练目标，既能跟踪信噪比变化，又能降低对语音质量的损害。通过结合相对谱变换和感知线性预测(RASTA-PLP) +调幅谱图(AMS) + mel -频率倒谱系数(MFCC) + Gammatone频率倒谱系数(GFCC)四个特征，将提取的多级语音信息作为神经网络的训练特征，构建基于深度神经网络(DNN)的语音分离系统，对噪声语音进行分离。实验结果表明:与传统时频掩蔽方法相比，分段时频掩蔽算法可以提高语音质量和清晰度，达到抑制噪声的目的，在低信噪比下具有更好的语音分离性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2020 13th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI)

自引率

0.00%

发文量