Noise-aware network with shared channel-attention encoder and joint constraint for noisy speech separation

IF 2.9 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Digital Signal Processing Pub Date : 2024-11-26 DOI:10.1016/j.dsp.2024.104891
Linhui Sun , Xiaolong Zhou , Aifei Gong , Lei Ye , Pingan Li , Eng Siong Chng
{"title":"Noise-aware network with shared channel-attention encoder and joint constraint for noisy speech separation","authors":"Linhui Sun ,&nbsp;Xiaolong Zhou ,&nbsp;Aifei Gong ,&nbsp;Lei Ye ,&nbsp;Pingan Li ,&nbsp;Eng Siong Chng","doi":"10.1016/j.dsp.2024.104891","DOIUrl":null,"url":null,"abstract":"<div><div>Recently, significant progress has been made in the end-to-end single-channel speech separation in clean environments. For noisy speech separation, existing research mainly uses deep neural networks to implicitly process the noise in speech signals, which does not fully utilize the impact of noise reconstruction errors on network training. We propose a lightweight noise-aware network with shared channel-attention encoder and joint constraint, named NSCJnet, which aims to improve the speech separation system performance in noisy environments. Firstly, to reduce network parameters, the model uses a parameter sharing channel attention encoder to convert noisy speech signals into a feature space. In addition, the channel attention layer (CAlayer) in encoder enhances the network's representational capacity and separation performance in noisy environments by calculating different weights of the filters in the convolution. Secondly, to make the network converge quickly, we regard noise as an estimation target of equal significance to speech, which compel the network to separate residual noise from the estimated speech, effectively suppressing lingering noise within the speech signal. Furthermore, by integrating a multi-resolution frequency constraint into the time domain loss, we introduce a weighted time-frequency joint loss constraint, empowering the network to acquire information across both dimensions to conducive to separating mixed speech with noise. It automatically strengthens important features for separation and suppresses unimportant ones during the learning process. The results on the noisy WHAM! dataset and the noisy Libri2Mix dataset show that our method has less computational complexity, and outperforms some advanced methods in various speech quality and intelligibility metrics.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"157 ","pages":"Article 104891"},"PeriodicalIF":2.9000,"publicationDate":"2024-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital Signal Processing","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1051200424005153","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

Abstract

Recently, significant progress has been made in the end-to-end single-channel speech separation in clean environments. For noisy speech separation, existing research mainly uses deep neural networks to implicitly process the noise in speech signals, which does not fully utilize the impact of noise reconstruction errors on network training. We propose a lightweight noise-aware network with shared channel-attention encoder and joint constraint, named NSCJnet, which aims to improve the speech separation system performance in noisy environments. Firstly, to reduce network parameters, the model uses a parameter sharing channel attention encoder to convert noisy speech signals into a feature space. In addition, the channel attention layer (CAlayer) in encoder enhances the network's representational capacity and separation performance in noisy environments by calculating different weights of the filters in the convolution. Secondly, to make the network converge quickly, we regard noise as an estimation target of equal significance to speech, which compel the network to separate residual noise from the estimated speech, effectively suppressing lingering noise within the speech signal. Furthermore, by integrating a multi-resolution frequency constraint into the time domain loss, we introduce a weighted time-frequency joint loss constraint, empowering the network to acquire information across both dimensions to conducive to separating mixed speech with noise. It automatically strengthens important features for separation and suppresses unimportant ones during the learning process. The results on the noisy WHAM! dataset and the noisy Libri2Mix dataset show that our method has less computational complexity, and outperforms some advanced methods in various speech quality and intelligibility metrics.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于共享信道注意编码器和联合约束的噪声感知网络用于噪声语音分离
近年来,清洁环境下的端到端单通道语音分离技术取得了重大进展。对于带噪语音分离,现有研究主要是利用深度神经网络对语音信号中的噪声进行隐式处理,没有充分利用噪声重构误差对网络训练的影响。为了提高语音分离系统在噪声环境下的性能,提出了一种具有共享信道注意编码器和联合约束的轻量级噪声感知网络NSCJnet。首先,为了减小网络参数,该模型使用参数共享通道关注编码器将带噪声的语音信号转换为特征空间;此外,编码器中的信道注意层(callayer)通过计算卷积中滤波器的不同权重,增强了网络在噪声环境下的表示能力和分离性能。其次,为了使网络快速收敛,我们将噪声作为与语音同等重要的估计目标,迫使网络将残差噪声从估计的语音中分离出来,有效地抑制语音信号中的残留噪声。此外,通过将多分辨率频率约束集成到时域损失中,我们引入了加权时频联合损失约束,使网络能够跨两个维度获取信息,从而有利于分离混合语音和噪声。在学习过程中,它会自动强化重要的特征进行分离,并抑制不重要的特征。结果在嘈杂的WHAM!数据集和带噪声的Libri2Mix数据集表明,我们的方法具有较低的计算复杂度,并且在各种语音质量和可理解性指标上优于一些先进的方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Digital Signal Processing
Digital Signal Processing 工程技术-工程:电子与电气
CiteScore
5.30
自引率
17.20%
发文量
435
审稿时长
66 days
期刊介绍: Digital Signal Processing: A Review Journal is one of the oldest and most established journals in the field of signal processing yet it aims to be the most innovative. The Journal invites top quality research articles at the frontiers of research in all aspects of signal processing. Our objective is to provide a platform for the publication of ground-breaking research in signal processing with both academic and industrial appeal. The journal has a special emphasis on statistical signal processing methodology such as Bayesian signal processing, and encourages articles on emerging applications of signal processing such as: • big data• machine learning• internet of things• information security• systems biology and computational biology,• financial time series analysis,• autonomous vehicles,• quantum computing,• neuromorphic engineering,• human-computer interaction and intelligent user interfaces,• environmental signal processing,• geophysical signal processing including seismic signal processing,• chemioinformatics and bioinformatics,• audio, visual and performance arts,• disaster management and prevention,• renewable energy,
期刊最新文献
HDA-DGCN: Hierarchical data-driven aggregation network assisted dynamic graph convolutional framework for meteorological prediction A machine learning-based feature extraction method for image classification using ResNet architecture Real-time multi-IRS partitioning for sum-rate optimization in a UAV-IRS-aided vehicular communication system BE-SGGAN: Content-aware bit-depth enhancement by semantic guided GAN Average error rate analysis of the fading channel model with second-order scattering and fluctuating line-of-sight
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1