Linhui Sun , Xiaolong Zhou , Aifei Gong , Lei Ye , Pingan Li , Eng Siong Chng
{"title":"基于共享信道注意编码器和联合约束的噪声感知网络用于噪声语音分离","authors":"Linhui Sun , Xiaolong Zhou , Aifei Gong , Lei Ye , Pingan Li , Eng Siong Chng","doi":"10.1016/j.dsp.2024.104891","DOIUrl":null,"url":null,"abstract":"<div><div>Recently, significant progress has been made in the end-to-end single-channel speech separation in clean environments. For noisy speech separation, existing research mainly uses deep neural networks to implicitly process the noise in speech signals, which does not fully utilize the impact of noise reconstruction errors on network training. We propose a lightweight noise-aware network with shared channel-attention encoder and joint constraint, named NSCJnet, which aims to improve the speech separation system performance in noisy environments. Firstly, to reduce network parameters, the model uses a parameter sharing channel attention encoder to convert noisy speech signals into a feature space. In addition, the channel attention layer (CAlayer) in encoder enhances the network's representational capacity and separation performance in noisy environments by calculating different weights of the filters in the convolution. Secondly, to make the network converge quickly, we regard noise as an estimation target of equal significance to speech, which compel the network to separate residual noise from the estimated speech, effectively suppressing lingering noise within the speech signal. Furthermore, by integrating a multi-resolution frequency constraint into the time domain loss, we introduce a weighted time-frequency joint loss constraint, empowering the network to acquire information across both dimensions to conducive to separating mixed speech with noise. It automatically strengthens important features for separation and suppresses unimportant ones during the learning process. The results on the noisy WHAM! dataset and the noisy Libri2Mix dataset show that our method has less computational complexity, and outperforms some advanced methods in various speech quality and intelligibility metrics.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"157 ","pages":"Article 104891"},"PeriodicalIF":2.9000,"publicationDate":"2024-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Noise-aware network with shared channel-attention encoder and joint constraint for noisy speech separation\",\"authors\":\"Linhui Sun , Xiaolong Zhou , Aifei Gong , Lei Ye , Pingan Li , Eng Siong Chng\",\"doi\":\"10.1016/j.dsp.2024.104891\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Recently, significant progress has been made in the end-to-end single-channel speech separation in clean environments. For noisy speech separation, existing research mainly uses deep neural networks to implicitly process the noise in speech signals, which does not fully utilize the impact of noise reconstruction errors on network training. We propose a lightweight noise-aware network with shared channel-attention encoder and joint constraint, named NSCJnet, which aims to improve the speech separation system performance in noisy environments. Firstly, to reduce network parameters, the model uses a parameter sharing channel attention encoder to convert noisy speech signals into a feature space. In addition, the channel attention layer (CAlayer) in encoder enhances the network's representational capacity and separation performance in noisy environments by calculating different weights of the filters in the convolution. Secondly, to make the network converge quickly, we regard noise as an estimation target of equal significance to speech, which compel the network to separate residual noise from the estimated speech, effectively suppressing lingering noise within the speech signal. Furthermore, by integrating a multi-resolution frequency constraint into the time domain loss, we introduce a weighted time-frequency joint loss constraint, empowering the network to acquire information across both dimensions to conducive to separating mixed speech with noise. It automatically strengthens important features for separation and suppresses unimportant ones during the learning process. The results on the noisy WHAM! dataset and the noisy Libri2Mix dataset show that our method has less computational complexity, and outperforms some advanced methods in various speech quality and intelligibility metrics.</div></div>\",\"PeriodicalId\":51011,\"journal\":{\"name\":\"Digital Signal Processing\",\"volume\":\"157 \",\"pages\":\"Article 104891\"},\"PeriodicalIF\":2.9000,\"publicationDate\":\"2024-11-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Digital Signal Processing\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1051200424005153\",\"RegionNum\":3,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital Signal Processing","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1051200424005153","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
Noise-aware network with shared channel-attention encoder and joint constraint for noisy speech separation
Recently, significant progress has been made in the end-to-end single-channel speech separation in clean environments. For noisy speech separation, existing research mainly uses deep neural networks to implicitly process the noise in speech signals, which does not fully utilize the impact of noise reconstruction errors on network training. We propose a lightweight noise-aware network with shared channel-attention encoder and joint constraint, named NSCJnet, which aims to improve the speech separation system performance in noisy environments. Firstly, to reduce network parameters, the model uses a parameter sharing channel attention encoder to convert noisy speech signals into a feature space. In addition, the channel attention layer (CAlayer) in encoder enhances the network's representational capacity and separation performance in noisy environments by calculating different weights of the filters in the convolution. Secondly, to make the network converge quickly, we regard noise as an estimation target of equal significance to speech, which compel the network to separate residual noise from the estimated speech, effectively suppressing lingering noise within the speech signal. Furthermore, by integrating a multi-resolution frequency constraint into the time domain loss, we introduce a weighted time-frequency joint loss constraint, empowering the network to acquire information across both dimensions to conducive to separating mixed speech with noise. It automatically strengthens important features for separation and suppresses unimportant ones during the learning process. The results on the noisy WHAM! dataset and the noisy Libri2Mix dataset show that our method has less computational complexity, and outperforms some advanced methods in various speech quality and intelligibility metrics.
期刊介绍:
Digital Signal Processing: A Review Journal is one of the oldest and most established journals in the field of signal processing yet it aims to be the most innovative. The Journal invites top quality research articles at the frontiers of research in all aspects of signal processing. Our objective is to provide a platform for the publication of ground-breaking research in signal processing with both academic and industrial appeal.
The journal has a special emphasis on statistical signal processing methodology such as Bayesian signal processing, and encourages articles on emerging applications of signal processing such as:
• big data• machine learning• internet of things• information security• systems biology and computational biology,• financial time series analysis,• autonomous vehicles,• quantum computing,• neuromorphic engineering,• human-computer interaction and intelligent user interfaces,• environmental signal processing,• geophysical signal processing including seismic signal processing,• chemioinformatics and bioinformatics,• audio, visual and performance arts,• disaster management and prevention,• renewable energy,