利用非均匀频率分辨率谱图改进语音增强深度去噪自编码器

J. Hung, Shu-Ting Tsai, Yan-Tong Chen
{"title":"利用非均匀频率分辨率谱图改进语音增强深度去噪自编码器","authors":"J. Hung, Shu-Ting Tsai, Yan-Tong Chen","doi":"10.1109/ICASI52993.2021.9568478","DOIUrl":null,"url":null,"abstract":"This study focuses on improving the deep denoising autoencoder (DDAE) for speech enhancement by reducing the size of its input feature. DDAE is a well-known deep learning structure that learns the mapping from the noisy signal to the clean noise-free counterpart. One of the most commonly used representative for the input signal used to train the DDAE is the spectrogram, which is the ordered series of the short-time Fourier transform (STFT) of each frame for the input signal. In this study, we examine a variant of the spectrogram as the input to a DDAE, which possesses a non-uniform acoustic frequency resolution and thus downscales the original spectrogram. Stating in more details, we decompose the original full-resolution spectrogram into four sub-bands, and then down-sample the sub-band spectral points in turn. The higher frequencies the sub-band has, the greater decimation factor it gets. The overall spectral drop rate is around 50%. The preliminary experiments conducted on the utterances corrupted by various noise types (babble, babycry, car, engine and white) reveal that halving the input spectral points with the non-uniform sampling can benefit the learned DDAE to provide higher speech quality and intelligibility of the test signals. Therefore, this new method can improve the denoising performance of the DDAE as well as reduce its computation complexity.","PeriodicalId":103254,"journal":{"name":"2021 7th International Conference on Applied System Innovation (ICASI)","volume":"132 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Exploiting the Non-Uniform Frequency-Resolution Spectrograms to Improve the Deep Denoising Auto-Encoder for Speech Enhancement\",\"authors\":\"J. Hung, Shu-Ting Tsai, Yan-Tong Chen\",\"doi\":\"10.1109/ICASI52993.2021.9568478\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This study focuses on improving the deep denoising autoencoder (DDAE) for speech enhancement by reducing the size of its input feature. DDAE is a well-known deep learning structure that learns the mapping from the noisy signal to the clean noise-free counterpart. One of the most commonly used representative for the input signal used to train the DDAE is the spectrogram, which is the ordered series of the short-time Fourier transform (STFT) of each frame for the input signal. In this study, we examine a variant of the spectrogram as the input to a DDAE, which possesses a non-uniform acoustic frequency resolution and thus downscales the original spectrogram. Stating in more details, we decompose the original full-resolution spectrogram into four sub-bands, and then down-sample the sub-band spectral points in turn. The higher frequencies the sub-band has, the greater decimation factor it gets. The overall spectral drop rate is around 50%. The preliminary experiments conducted on the utterances corrupted by various noise types (babble, babycry, car, engine and white) reveal that halving the input spectral points with the non-uniform sampling can benefit the learned DDAE to provide higher speech quality and intelligibility of the test signals. Therefore, this new method can improve the denoising performance of the DDAE as well as reduce its computation complexity.\",\"PeriodicalId\":103254,\"journal\":{\"name\":\"2021 7th International Conference on Applied System Innovation (ICASI)\",\"volume\":\"132 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-09-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 7th International Conference on Applied System Innovation (ICASI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICASI52993.2021.9568478\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 7th International Conference on Applied System Innovation (ICASI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASI52993.2021.9568478","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

本研究主要针对深度去噪自动编码器(DDAE)的语音增强问题,通过减小其输入特征的大小来进行改进。DDAE是一种众所周知的深度学习结构,它学习从有噪声信号到干净无噪声信号的映射。用于训练DDAE的输入信号最常用的代表之一是频谱图,它是输入信号每帧的短时傅里叶变换(STFT)的有序序列。在本研究中,我们研究了频谱图的变体作为DDAE的输入,DDAE具有非均匀的声学频率分辨率,从而缩小了原始频谱图。详细地说,我们将原始的全分辨率光谱图分解成四个子波段,然后依次对子波段的光谱点进行下采样。子频带的频率越高,抽取因子就越大。整体光谱下降率约为50%。通过对各种噪声类型(咿呀学语、婴儿啼哭、汽车噪声、发动机噪声和白色噪声)干扰的语音进行初步实验,发现采用非均匀采样将输入频谱点减半有利于学习后的DDAE提供更高的语音质量和测试信号的可理解性。因此,该方法在提高DDAE去噪性能的同时,降低了DDAE的计算复杂度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Exploiting the Non-Uniform Frequency-Resolution Spectrograms to Improve the Deep Denoising Auto-Encoder for Speech Enhancement
This study focuses on improving the deep denoising autoencoder (DDAE) for speech enhancement by reducing the size of its input feature. DDAE is a well-known deep learning structure that learns the mapping from the noisy signal to the clean noise-free counterpart. One of the most commonly used representative for the input signal used to train the DDAE is the spectrogram, which is the ordered series of the short-time Fourier transform (STFT) of each frame for the input signal. In this study, we examine a variant of the spectrogram as the input to a DDAE, which possesses a non-uniform acoustic frequency resolution and thus downscales the original spectrogram. Stating in more details, we decompose the original full-resolution spectrogram into four sub-bands, and then down-sample the sub-band spectral points in turn. The higher frequencies the sub-band has, the greater decimation factor it gets. The overall spectral drop rate is around 50%. The preliminary experiments conducted on the utterances corrupted by various noise types (babble, babycry, car, engine and white) reveal that halving the input spectral points with the non-uniform sampling can benefit the learned DDAE to provide higher speech quality and intelligibility of the test signals. Therefore, this new method can improve the denoising performance of the DDAE as well as reduce its computation complexity.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Skeleton Moving Pose-based Human Fall Detection with Sparse Coding and Temporal Pyramid Pooling Novel Lactobacillus Fermentation Prediction Using Deep Learning A Review on the Resilience Assessment of Power Systems under Disasters Graph Signal Denoising Method via Hybrid Neumann-Series and Edge-Variant Graph Filters A Broadband Double Ridged Horn Antenna for Radiated Immunity and Emissions Test from 18 GHz to 50 GHz
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1