基于基函数的混合噪声信号的生成及基于dnn的语音增强模型的学习

2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP) Pub Date : 2017-09-01 DOI:10.1109/MLSP.2017.8168192

Shi-Xue Wen, Jun Du, Chin-Hui Lee

{"title":"基于基函数的混合噪声信号的生成及基于dnn的语音增强模型的学习","authors":"Shi-Xue Wen, Jun Du, Chin-Hui Lee","doi":"10.1109/MLSP.2017.8168192","DOIUrl":null,"url":null,"abstract":"We first examine the generalization issue with the noise samples used in training nonlinear mapping functions between noisy and clean speech features for deep neural network (DNN) based speech enhancement. Then an empirical proof is established to explain why the DNN-based approach has a good noise generalization capability provided that a large collection of noise types are included in generating diverse noisy speech samples for training. It is shown that an arbitrary noise signal segment can be well represented by a linear combination of microstructure noise bases. Accordingly, we propose to generate these mixing noise signals by designing a set of compact and analytic noise bases without using any realistic noise types. The experiments demonstrate that this noise generation scheme can yield comparable performance to that using 50 real noise types. Furthermore, by supplementing the collected noise types with the synthesized noise bases, we observe remarkable performance improvements implying that not only a large collection of real-world noise signals can be alleviated, but also a good noise generalization capability can be achieved.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"110 1","pages":"1-6"},"PeriodicalIF":0.0000,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"On generating mixing noise signals with basis functions for simulating noisy speech and learning dnn-based speech enhancement models\",\"authors\":\"Shi-Xue Wen, Jun Du, Chin-Hui Lee\",\"doi\":\"10.1109/MLSP.2017.8168192\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We first examine the generalization issue with the noise samples used in training nonlinear mapping functions between noisy and clean speech features for deep neural network (DNN) based speech enhancement. Then an empirical proof is established to explain why the DNN-based approach has a good noise generalization capability provided that a large collection of noise types are included in generating diverse noisy speech samples for training. It is shown that an arbitrary noise signal segment can be well represented by a linear combination of microstructure noise bases. Accordingly, we propose to generate these mixing noise signals by designing a set of compact and analytic noise bases without using any realistic noise types. The experiments demonstrate that this noise generation scheme can yield comparable performance to that using 50 real noise types. Furthermore, by supplementing the collected noise types with the synthesized noise bases, we observe remarkable performance improvements implying that not only a large collection of real-world noise signals can be alleviated, but also a good noise generalization capability can be achieved.\",\"PeriodicalId\":6542,\"journal\":{\"name\":\"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)\",\"volume\":\"110 1\",\"pages\":\"1-6\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/MLSP.2017.8168192\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MLSP.2017.8168192","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

我们首先研究了基于深度神经网络(DNN)的语音增强中用于训练噪声和干净语音特征之间非线性映射函数的噪声样本的泛化问题。然后建立了一个经验证明来解释为什么基于dnn的方法具有良好的噪声泛化能力，前提是在生成用于训练的各种噪声语音样本时包含大量噪声类型。结果表明，任意噪声信号段都可以用微结构噪声基的线性组合来表示。因此，我们建议在不使用任何实际噪声类型的情况下，通过设计一套紧凑的解析噪声基来产生这些混合噪声信号。实验表明，该噪声生成方案与使用50种真实噪声类型的噪声生成方案具有相当的性能。此外，通过将收集到的噪声类型与合成的噪声基相补充，我们观察到显著的性能改进，这意味着不仅可以减轻大量真实噪声信号的收集，而且可以实现良好的噪声泛化能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

On generating mixing noise signals with basis functions for simulating noisy speech and learning dnn-based speech enhancement models

We first examine the generalization issue with the noise samples used in training nonlinear mapping functions between noisy and clean speech features for deep neural network (DNN) based speech enhancement. Then an empirical proof is established to explain why the DNN-based approach has a good noise generalization capability provided that a large collection of noise types are included in generating diverse noisy speech samples for training. It is shown that an arbitrary noise signal segment can be well represented by a linear combination of microstructure noise bases. Accordingly, we propose to generate these mixing noise signals by designing a set of compact and analytic noise bases without using any realistic noise types. The experiments demonstrate that this noise generation scheme can yield comparable performance to that using 50 real noise types. Furthermore, by supplementing the collected noise types with the synthesized noise bases, we observe remarkable performance improvements implying that not only a large collection of real-world noise signals can be alleviated, but also a good noise generalization capability can be achieved.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)

自引率

0.00%

发文量