利用非均匀频率分辨率谱图改进语音增强深度去噪自编码器

2021 7th International Conference on Applied System Innovation (ICASI) Pub Date : 2021-09-24 DOI:10.1109/ICASI52993.2021.9568478

J. Hung, Shu-Ting Tsai, Yan-Tong Chen

{"title":"利用非均匀频率分辨率谱图改进语音增强深度去噪自编码器","authors":"J. Hung, Shu-Ting Tsai, Yan-Tong Chen","doi":"10.1109/ICASI52993.2021.9568478","DOIUrl":null,"url":null,"abstract":"This study focuses on improving the deep denoising autoencoder (DDAE) for speech enhancement by reducing the size of its input feature. DDAE is a well-known deep learning structure that learns the mapping from the noisy signal to the clean noise-free counterpart. One of the most commonly used representative for the input signal used to train the DDAE is the spectrogram, which is the ordered series of the short-time Fourier transform (STFT) of each frame for the input signal. In this study, we examine a variant of the spectrogram as the input to a DDAE, which possesses a non-uniform acoustic frequency resolution and thus downscales the original spectrogram. Stating in more details, we decompose the original full-resolution spectrogram into four sub-bands, and then down-sample the sub-band spectral points in turn. The higher frequencies the sub-band has, the greater decimation factor it gets. The overall spectral drop rate is around 50%. The preliminary experiments conducted on the utterances corrupted by various noise types (babble, babycry, car, engine and white) reveal that halving the input spectral points with the non-uniform sampling can benefit the learned DDAE to provide higher speech quality and intelligibility of the test signals. Therefore, this new method can improve the denoising performance of the DDAE as well as reduce its computation complexity.","PeriodicalId":103254,"journal":{"name":"2021 7th International Conference on Applied System Innovation (ICASI)","volume":"132 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Exploiting the Non-Uniform Frequency-Resolution Spectrograms to Improve the Deep Denoising Auto-Encoder for Speech Enhancement\",\"authors\":\"J. Hung, Shu-Ting Tsai, Yan-Tong Chen\",\"doi\":\"10.1109/ICASI52993.2021.9568478\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This study focuses on improving the deep denoising autoencoder (DDAE) for speech enhancement by reducing the size of its input feature. DDAE is a well-known deep learning structure that learns the mapping from the noisy signal to the clean noise-free counterpart. One of the most commonly used representative for the input signal used to train the DDAE is the spectrogram, which is the ordered series of the short-time Fourier transform (STFT) of each frame for the input signal. In this study, we examine a variant of the spectrogram as the input to a DDAE, which possesses a non-uniform acoustic frequency resolution and thus downscales the original spectrogram. Stating in more details, we decompose the original full-resolution spectrogram into four sub-bands, and then down-sample the sub-band spectral points in turn. The higher frequencies the sub-band has, the greater decimation factor it gets. The overall spectral drop rate is around 50%. The preliminary experiments conducted on the utterances corrupted by various noise types (babble, babycry, car, engine and white) reveal that halving the input spectral points with the non-uniform sampling can benefit the learned DDAE to provide higher speech quality and intelligibility of the test signals. Therefore, this new method can improve the denoising performance of the DDAE as well as reduce its computation complexity.\",\"PeriodicalId\":103254,\"journal\":{\"name\":\"2021 7th International Conference on Applied System Innovation (ICASI)\",\"volume\":\"132 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-09-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 7th International Conference on Applied System Innovation (ICASI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICASI52993.2021.9568478\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 7th International Conference on Applied System Innovation (ICASI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASI52993.2021.9568478","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

本研究主要针对深度去噪自动编码器(DDAE)的语音增强问题，通过减小其输入特征的大小来进行改进。DDAE是一种众所周知的深度学习结构，它学习从有噪声信号到干净无噪声信号的映射。用于训练DDAE的输入信号最常用的代表之一是频谱图，它是输入信号每帧的短时傅里叶变换(STFT)的有序序列。在本研究中，我们研究了频谱图的变体作为DDAE的输入，DDAE具有非均匀的声学频率分辨率，从而缩小了原始频谱图。详细地说，我们将原始的全分辨率光谱图分解成四个子波段，然后依次对子波段的光谱点进行下采样。子频带的频率越高，抽取因子就越大。整体光谱下降率约为50%。通过对各种噪声类型(咿呀学语、婴儿啼哭、汽车噪声、发动机噪声和白色噪声)干扰的语音进行初步实验，发现采用非均匀采样将输入频谱点减半有利于学习后的DDAE提供更高的语音质量和测试信号的可理解性。因此，该方法在提高DDAE去噪性能的同时，降低了DDAE的计算复杂度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Exploiting the Non-Uniform Frequency-Resolution Spectrograms to Improve the Deep Denoising Auto-Encoder for Speech Enhancement

This study focuses on improving the deep denoising autoencoder (DDAE) for speech enhancement by reducing the size of its input feature. DDAE is a well-known deep learning structure that learns the mapping from the noisy signal to the clean noise-free counterpart. One of the most commonly used representative for the input signal used to train the DDAE is the spectrogram, which is the ordered series of the short-time Fourier transform (STFT) of each frame for the input signal. In this study, we examine a variant of the spectrogram as the input to a DDAE, which possesses a non-uniform acoustic frequency resolution and thus downscales the original spectrogram. Stating in more details, we decompose the original full-resolution spectrogram into four sub-bands, and then down-sample the sub-band spectral points in turn. The higher frequencies the sub-band has, the greater decimation factor it gets. The overall spectral drop rate is around 50%. The preliminary experiments conducted on the utterances corrupted by various noise types (babble, babycry, car, engine and white) reveal that halving the input spectral points with the non-uniform sampling can benefit the learned DDAE to provide higher speech quality and intelligibility of the test signals. Therefore, this new method can improve the denoising performance of the DDAE as well as reduce its computation complexity.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 7th International Conference on Applied System Innovation (ICASI)

自引率

0.00%

发文量