MFFN: Multi-level Feature Fusion Network for monaural speech separation

IF 3 3区 计算机科学 Q2 ACOUSTICS Speech Communication Pub Date : 2025-04-01 DOI:10.1016/j.specom.2025.103229
Jianjun Lei, Yun He, Ying Wang
{"title":"MFFN: Multi-level Feature Fusion Network for monaural speech separation","authors":"Jianjun Lei,&nbsp;Yun He,&nbsp;Ying Wang","doi":"10.1016/j.specom.2025.103229","DOIUrl":null,"url":null,"abstract":"<div><div>Monaural speech separation based on Dual-path networks has recently been widely developed due to their outstanding processing ability for long feature sequences. However, these methods often exploit a fixed receptive field during feature learning, which hardly captures feature information at different scales and thus restricts the model’s performance. This paper proposes a novel Multi-level Feature Fusion Network (<em>MFFN</em>) to facilitate dual-path networks for monaural speech separation by capturing multi-scale information. The <em>MFFN</em> integrates information of different scales from long sequences by using a multi-scale sampling strategy and employs Squeeze-and-Excitation blocks in parallel to extract features along the channel and temporal dimensions. Moreover, we introduce a collaborative attention mechanism to fuse feature information across different levels, further improving the model’s representation capability. Finally, we conduct extensive experiments on noise-free datasets, WSJ0-2mix and Libri2mix, and the noisy datasets, WHAM! and WHAMR!. The results demonstrate that our <em>MFFN</em> outperforms some current methods without using data augmentation technologies.</div></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"171 ","pages":"Article 103229"},"PeriodicalIF":3.0000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Speech Communication","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167639325000445","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ACOUSTICS","Score":null,"Total":0}
引用次数: 0

Abstract

Monaural speech separation based on Dual-path networks has recently been widely developed due to their outstanding processing ability for long feature sequences. However, these methods often exploit a fixed receptive field during feature learning, which hardly captures feature information at different scales and thus restricts the model’s performance. This paper proposes a novel Multi-level Feature Fusion Network (MFFN) to facilitate dual-path networks for monaural speech separation by capturing multi-scale information. The MFFN integrates information of different scales from long sequences by using a multi-scale sampling strategy and employs Squeeze-and-Excitation blocks in parallel to extract features along the channel and temporal dimensions. Moreover, we introduce a collaborative attention mechanism to fuse feature information across different levels, further improving the model’s representation capability. Finally, we conduct extensive experiments on noise-free datasets, WSJ0-2mix and Libri2mix, and the noisy datasets, WHAM! and WHAMR!. The results demonstrate that our MFFN outperforms some current methods without using data augmentation technologies.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
MFFN:用于单词语音分离的多级特征融合网络
基于双径网络的单耳语音分离由于其对长特征序列的处理能力而得到了广泛的发展。然而,这些方法在特征学习过程中往往利用固定的接受场,难以捕获不同尺度的特征信息,从而限制了模型的性能。本文提出了一种新的多层特征融合网络(MFFN),通过捕获多尺度信息来实现单语音分离的双路径网络。MFFN采用多尺度采样策略整合长序列的不同尺度信息,并采用挤压和激励块并行提取沿通道和时间维度的特征。此外,我们引入了协同关注机制,融合了不同层次的特征信息,进一步提高了模型的表征能力。最后,我们在无噪声数据集WSJ0-2mix和Libri2mix以及有噪声数据集WHAM!和WHAMR !。结果表明,在不使用数据增强技术的情况下,我们的MFFN优于当前的一些方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Speech Communication
Speech Communication 工程技术-计算机:跨学科应用
CiteScore
6.80
自引率
6.20%
发文量
94
审稿时长
19.2 weeks
期刊介绍: Speech Communication is an interdisciplinary journal whose primary objective is to fulfil the need for the rapid dissemination and thorough discussion of basic and applied research results. The journal''s primary objectives are: • to present a forum for the advancement of human and human-machine speech communication science; • to stimulate cross-fertilization between different fields of this domain; • to contribute towards the rapid and wide diffusion of scientifically sound contributions in this domain.
期刊最新文献
Editorial Board MS-VBRVQ: Multi-scale variable bitrate speech residual vector quantization Hand gesture realisation of contrastive focus in real-time whisper-to-speech synthesis: Investigating the transfer from implicit to explicit control of intonation Lateral channel dynamics and F3 modulation: Quantifying para-sagittal articulation in Australian English /l/ A review on speech emotion recognition for low-resource and Indigenous languages
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1