{"title":"利用非均匀频率分辨率谱图改进语音增强深度去噪自编码器","authors":"J. Hung, Shu-Ting Tsai, Yan-Tong Chen","doi":"10.1109/ICASI52993.2021.9568478","DOIUrl":null,"url":null,"abstract":"This study focuses on improving the deep denoising autoencoder (DDAE) for speech enhancement by reducing the size of its input feature. DDAE is a well-known deep learning structure that learns the mapping from the noisy signal to the clean noise-free counterpart. One of the most commonly used representative for the input signal used to train the DDAE is the spectrogram, which is the ordered series of the short-time Fourier transform (STFT) of each frame for the input signal. In this study, we examine a variant of the spectrogram as the input to a DDAE, which possesses a non-uniform acoustic frequency resolution and thus downscales the original spectrogram. Stating in more details, we decompose the original full-resolution spectrogram into four sub-bands, and then down-sample the sub-band spectral points in turn. The higher frequencies the sub-band has, the greater decimation factor it gets. The overall spectral drop rate is around 50%. The preliminary experiments conducted on the utterances corrupted by various noise types (babble, babycry, car, engine and white) reveal that halving the input spectral points with the non-uniform sampling can benefit the learned DDAE to provide higher speech quality and intelligibility of the test signals. Therefore, this new method can improve the denoising performance of the DDAE as well as reduce its computation complexity.","PeriodicalId":103254,"journal":{"name":"2021 7th International Conference on Applied System Innovation (ICASI)","volume":"132 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Exploiting the Non-Uniform Frequency-Resolution Spectrograms to Improve the Deep Denoising Auto-Encoder for Speech Enhancement\",\"authors\":\"J. Hung, Shu-Ting Tsai, Yan-Tong Chen\",\"doi\":\"10.1109/ICASI52993.2021.9568478\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This study focuses on improving the deep denoising autoencoder (DDAE) for speech enhancement by reducing the size of its input feature. DDAE is a well-known deep learning structure that learns the mapping from the noisy signal to the clean noise-free counterpart. One of the most commonly used representative for the input signal used to train the DDAE is the spectrogram, which is the ordered series of the short-time Fourier transform (STFT) of each frame for the input signal. In this study, we examine a variant of the spectrogram as the input to a DDAE, which possesses a non-uniform acoustic frequency resolution and thus downscales the original spectrogram. Stating in more details, we decompose the original full-resolution spectrogram into four sub-bands, and then down-sample the sub-band spectral points in turn. The higher frequencies the sub-band has, the greater decimation factor it gets. The overall spectral drop rate is around 50%. The preliminary experiments conducted on the utterances corrupted by various noise types (babble, babycry, car, engine and white) reveal that halving the input spectral points with the non-uniform sampling can benefit the learned DDAE to provide higher speech quality and intelligibility of the test signals. Therefore, this new method can improve the denoising performance of the DDAE as well as reduce its computation complexity.\",\"PeriodicalId\":103254,\"journal\":{\"name\":\"2021 7th International Conference on Applied System Innovation (ICASI)\",\"volume\":\"132 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-09-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 7th International Conference on Applied System Innovation (ICASI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICASI52993.2021.9568478\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 7th International Conference on Applied System Innovation (ICASI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASI52993.2021.9568478","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Exploiting the Non-Uniform Frequency-Resolution Spectrograms to Improve the Deep Denoising Auto-Encoder for Speech Enhancement
This study focuses on improving the deep denoising autoencoder (DDAE) for speech enhancement by reducing the size of its input feature. DDAE is a well-known deep learning structure that learns the mapping from the noisy signal to the clean noise-free counterpart. One of the most commonly used representative for the input signal used to train the DDAE is the spectrogram, which is the ordered series of the short-time Fourier transform (STFT) of each frame for the input signal. In this study, we examine a variant of the spectrogram as the input to a DDAE, which possesses a non-uniform acoustic frequency resolution and thus downscales the original spectrogram. Stating in more details, we decompose the original full-resolution spectrogram into four sub-bands, and then down-sample the sub-band spectral points in turn. The higher frequencies the sub-band has, the greater decimation factor it gets. The overall spectral drop rate is around 50%. The preliminary experiments conducted on the utterances corrupted by various noise types (babble, babycry, car, engine and white) reveal that halving the input spectral points with the non-uniform sampling can benefit the learned DDAE to provide higher speech quality and intelligibility of the test signals. Therefore, this new method can improve the denoising performance of the DDAE as well as reduce its computation complexity.