{"title":"DCTCN: Deep Complex Temporal Convolutional Network for Real Time Speech Enhancement","authors":"Huanbin Zou, Jie Zhu","doi":"10.1109/ICICIP53388.2021.9642159","DOIUrl":null,"url":null,"abstract":"Recently, deep learning-based speech enhancement approaches have been researched extensively. Most methods focus on reconstructing the target clean speech’s magnitude spectrum from noisy speech’s magnitude spectrum, and then combine the noisy speech’s phase spectrum to synthesize the waveform. In this paper, we propose a complex network-based model called Deep Complex Temporal Convolutional Network (DCTCN) to estimate the complex-valued short-time Fourier transform (STFT) of target speech from noisy speech. We design a temporal convolutional network (TCN) block based on complex dilated causal convolution. In our proposed DCTCN, we achieve an outstanding denoising performance with a low complexity of 1.33M parameters. The experiments are conducted on the DNS Challenge dataset, and the results show that complex operations and TCN blocks have significant positive effects in noise suppression.","PeriodicalId":435799,"journal":{"name":"2021 11th International Conference on Intelligent Control and Information Processing (ICICIP)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 11th International Conference on Intelligent Control and Information Processing (ICICIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICICIP53388.2021.9642159","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Recently, deep learning-based speech enhancement approaches have been researched extensively. Most methods focus on reconstructing the target clean speech’s magnitude spectrum from noisy speech’s magnitude spectrum, and then combine the noisy speech’s phase spectrum to synthesize the waveform. In this paper, we propose a complex network-based model called Deep Complex Temporal Convolutional Network (DCTCN) to estimate the complex-valued short-time Fourier transform (STFT) of target speech from noisy speech. We design a temporal convolutional network (TCN) block based on complex dilated causal convolution. In our proposed DCTCN, we achieve an outstanding denoising performance with a low complexity of 1.33M parameters. The experiments are conducted on the DNS Challenge dataset, and the results show that complex operations and TCN blocks have significant positive effects in noise suppression.