Segmented Time-Frequency Masking Algorithm for Speech Separation Based on Deep Neural Networks

Xinyu Guo, S. Ou, Meng Gao, Ying Gao
{"title":"Segmented Time-Frequency Masking Algorithm for Speech Separation Based on Deep Neural Networks","authors":"Xinyu Guo, S. Ou, Meng Gao, Ying Gao","doi":"10.1109/CISP-BMEI51763.2020.9263673","DOIUrl":null,"url":null,"abstract":"In view of the residual problem of speech background noise in supervised model based single-channel speech separation algorithm in non-stationary noise environments, a piecewise time-frequency masking target based on Wiener filtering principle is proposed and used as the training target of neural network, which can not only track the SNR changes, but also reduce the damage to speech quality. By combing the four features of Relative spectral transform and perceptual linear prediction (RASTA-PLP) + amplitude modulation spectrogram (AMS) + Mel-frequency cepstral coefficients (MFCC) + Gammatone frequency cepstral coefficient (GFCC), the extracted multi-level voice information is used as the training features of the neural network, and then a deep neural network (DNN) based speech separation system is constructed to separate the noisy speech. The experimental results show that: compared with traditional time-frequency masking methods, the segmented time-frequency masking algorithm can improve the speech quality and clarity, and achieves the purpose of noise suppression and better speech separation performance at low SNR.","PeriodicalId":346757,"journal":{"name":"2020 13th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI)","volume":"31 10","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 13th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CISP-BMEI51763.2020.9263673","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

In view of the residual problem of speech background noise in supervised model based single-channel speech separation algorithm in non-stationary noise environments, a piecewise time-frequency masking target based on Wiener filtering principle is proposed and used as the training target of neural network, which can not only track the SNR changes, but also reduce the damage to speech quality. By combing the four features of Relative spectral transform and perceptual linear prediction (RASTA-PLP) + amplitude modulation spectrogram (AMS) + Mel-frequency cepstral coefficients (MFCC) + Gammatone frequency cepstral coefficient (GFCC), the extracted multi-level voice information is used as the training features of the neural network, and then a deep neural network (DNN) based speech separation system is constructed to separate the noisy speech. The experimental results show that: compared with traditional time-frequency masking methods, the segmented time-frequency masking algorithm can improve the speech quality and clarity, and achieves the purpose of noise suppression and better speech separation performance at low SNR.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于深度神经网络的语音分离分段时频掩蔽算法
针对非平稳噪声环境下基于监督模型的单通道语音分离算法存在的语音背景噪声残留问题,提出了一种基于维纳滤波原理的分段时频掩蔽目标作为神经网络的训练目标,既能跟踪信噪比变化,又能降低对语音质量的损害。通过结合相对谱变换和感知线性预测(RASTA-PLP) +调幅谱图(AMS) + mel -频率倒谱系数(MFCC) + Gammatone频率倒谱系数(GFCC)四个特征,将提取的多级语音信息作为神经网络的训练特征,构建基于深度神经网络(DNN)的语音分离系统,对噪声语音进行分离。实验结果表明:与传统时频掩蔽方法相比,分段时频掩蔽算法可以提高语音质量和清晰度,达到抑制噪声的目的,在低信噪比下具有更好的语音分离性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Network Attack Detection based on Domain Attack Behavior Analysis Feature selection of time series based on reinforcement learning An Improved Double-Layer Kalman Filter Attitude Algorithm For Motion Capture System Probability Boltzmann Machine Network for Face Detection on Video Evolutionary Optimized Multiple Instance Concept Learning for Beat-to-Beat Heart Rate Estimation from Electrocardiograms
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1