针对转向向量不确定性的多通道语音提取深度知情时空频谱滤波技术

IF 3.4 2区 物理与天体物理 Q1 ACOUSTICS Applied Acoustics Pub Date : 2024-09-05 DOI:10.1016/j.apacoust.2024.110259
Xiaoxue Luo , Yuxuan Ke , Xiaodong Li , Chengshi Zheng
{"title":"针对转向向量不确定性的多通道语音提取深度知情时空频谱滤波技术","authors":"Xiaoxue Luo ,&nbsp;Yuxuan Ke ,&nbsp;Xiaodong Li ,&nbsp;Chengshi Zheng","doi":"10.1016/j.apacoust.2024.110259","DOIUrl":null,"url":null,"abstract":"<div><p>Adaptive beamforming combined with post-filtering is one of the most widely used techniques in suppressing directional interference and environmental ambient noise, as well as reverberation. However, many adaptive beamforming methods are often relatively sensitive to the steering vector mismatch, and their performance degrades a lot for practical applications, although pioneer researchers have made great efforts on improving the robustness. To achieve better performance in challenging scenarios, this paper proposes a two-stage deep informed spatio-spectral filtering for multi-channel speech extraction, which removes interference, noise, and reverberation simultaneously when the steering vector error exists. In the first stage, a direction-informed dual-path beamforming network was introduced to extract the target directional speech with only its early reflections. To improve the robustness, an information rectification block was designed to compensate for the signal model mismatch, and the steering vector uncertainty was taken into account in the training phase. Besides, a dual-path beamforming module was adopted to reduce magnitude distortion and improve phase recovery simultaneously. In the second stage, a magnitude-phase fusion network was proposed, serving as the post-processing module to further fuse the magnitude and phase estimated by the first stage. Experimental results confirmed that the proposed method was more robust to the signal model mismatch and achieved better performance than other baseline methods in terms of speech quality and intelligibility.</p></div>","PeriodicalId":55506,"journal":{"name":"Applied Acoustics","volume":null,"pages":null},"PeriodicalIF":3.4000,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Deep informed spatio-spectral filtering for multi-channel speech extraction against steering vector uncertainties\",\"authors\":\"Xiaoxue Luo ,&nbsp;Yuxuan Ke ,&nbsp;Xiaodong Li ,&nbsp;Chengshi Zheng\",\"doi\":\"10.1016/j.apacoust.2024.110259\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Adaptive beamforming combined with post-filtering is one of the most widely used techniques in suppressing directional interference and environmental ambient noise, as well as reverberation. However, many adaptive beamforming methods are often relatively sensitive to the steering vector mismatch, and their performance degrades a lot for practical applications, although pioneer researchers have made great efforts on improving the robustness. To achieve better performance in challenging scenarios, this paper proposes a two-stage deep informed spatio-spectral filtering for multi-channel speech extraction, which removes interference, noise, and reverberation simultaneously when the steering vector error exists. In the first stage, a direction-informed dual-path beamforming network was introduced to extract the target directional speech with only its early reflections. To improve the robustness, an information rectification block was designed to compensate for the signal model mismatch, and the steering vector uncertainty was taken into account in the training phase. Besides, a dual-path beamforming module was adopted to reduce magnitude distortion and improve phase recovery simultaneously. In the second stage, a magnitude-phase fusion network was proposed, serving as the post-processing module to further fuse the magnitude and phase estimated by the first stage. Experimental results confirmed that the proposed method was more robust to the signal model mismatch and achieved better performance than other baseline methods in terms of speech quality and intelligibility.</p></div>\",\"PeriodicalId\":55506,\"journal\":{\"name\":\"Applied Acoustics\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2024-09-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied Acoustics\",\"FirstCategoryId\":\"101\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0003682X24004109\",\"RegionNum\":2,\"RegionCategory\":\"物理与天体物理\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ACOUSTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Acoustics","FirstCategoryId":"101","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0003682X24004109","RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ACOUSTICS","Score":null,"Total":0}
引用次数: 0

摘要

自适应波束成形与后置滤波相结合,是抑制定向干扰和环境噪声以及混响的最广泛应用技术之一。然而,许多自适应波束成形方法往往对转向矢量失配比较敏感,在实际应用中性能会大打折扣,尽管先驱研究人员在提高鲁棒性方面做出了巨大努力。为了在具有挑战性的场景中获得更好的性能,本文提出了一种用于多通道语音提取的两阶段深度知情时空谱滤波方法,在存在转向矢量误差时同时去除干扰、噪声和混响。在第一阶段,引入了方向信息双路径波束成形网络,只提取目标方向语音的早期反射。为了提高鲁棒性,设计了一个信息整流块来补偿信号模型的不匹配,并在训练阶段考虑了转向矢量的不确定性。此外,还采用了双路径波束成形模块,以同时减少幅度失真和改善相位恢复。在第二阶段,提出了幅相融合网络,作为后处理模块,进一步融合第一阶段估计的幅相。实验结果证实,所提出的方法对信号模型失配具有更强的鲁棒性,在语音质量和可懂度方面都优于其他基线方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Deep informed spatio-spectral filtering for multi-channel speech extraction against steering vector uncertainties

Adaptive beamforming combined with post-filtering is one of the most widely used techniques in suppressing directional interference and environmental ambient noise, as well as reverberation. However, many adaptive beamforming methods are often relatively sensitive to the steering vector mismatch, and their performance degrades a lot for practical applications, although pioneer researchers have made great efforts on improving the robustness. To achieve better performance in challenging scenarios, this paper proposes a two-stage deep informed spatio-spectral filtering for multi-channel speech extraction, which removes interference, noise, and reverberation simultaneously when the steering vector error exists. In the first stage, a direction-informed dual-path beamforming network was introduced to extract the target directional speech with only its early reflections. To improve the robustness, an information rectification block was designed to compensate for the signal model mismatch, and the steering vector uncertainty was taken into account in the training phase. Besides, a dual-path beamforming module was adopted to reduce magnitude distortion and improve phase recovery simultaneously. In the second stage, a magnitude-phase fusion network was proposed, serving as the post-processing module to further fuse the magnitude and phase estimated by the first stage. Experimental results confirmed that the proposed method was more robust to the signal model mismatch and achieved better performance than other baseline methods in terms of speech quality and intelligibility.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Applied Acoustics
Applied Acoustics 物理-声学
CiteScore
7.40
自引率
11.80%
发文量
618
审稿时长
7.5 months
期刊介绍: Since its launch in 1968, Applied Acoustics has been publishing high quality research papers providing state-of-the-art coverage of research findings for engineers and scientists involved in applications of acoustics in the widest sense. Applied Acoustics looks not only at recent developments in the understanding of acoustics but also at ways of exploiting that understanding. The Journal aims to encourage the exchange of practical experience through publication and in so doing creates a fund of technological information that can be used for solving related problems. The presentation of information in graphical or tabular form is especially encouraged. If a report of a mathematical development is a necessary part of a paper it is important to ensure that it is there only as an integral part of a practical solution to a problem and is supported by data. Applied Acoustics encourages the exchange of practical experience in the following ways: • Complete Papers • Short Technical Notes • Review Articles; and thereby provides a wealth of technological information that can be used to solve related problems. Manuscripts that address all fields of applications of acoustics ranging from medicine and NDT to the environment and buildings are welcome.
期刊最新文献
Fibonacci array-based temporal-spatial localization with neural networks Semi-analytical prediction of energy-based acoustical parameters in proscenium theatres Preparation and performance analysis of porous materials for road noise abatement using waste rubber tires Acoustic characteristics of whispered vowels: A dynamic feature exploration A high DOF and azimuth resolution beamforming via enhanced virtual aperture extension of joint linear prediction and inverse beamforming
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1