利用方向特征和旋转转向进行环境声神经语音提取

IF 3.4 2区 物理与天体物理 Q1 ACOUSTICS Applied Acoustics Pub Date : 2024-11-06 DOI:10.1016/j.apacoust.2024.110384
Shiqi Wang , Hongbing Qiu , Xiyu Song , Mei Wang , Fangzhi Yao
{"title":"利用方向特征和旋转转向进行环境声神经语音提取","authors":"Shiqi Wang ,&nbsp;Hongbing Qiu ,&nbsp;Xiyu Song ,&nbsp;Mei Wang ,&nbsp;Fangzhi Yao","doi":"10.1016/j.apacoust.2024.110384","DOIUrl":null,"url":null,"abstract":"<div><div>In scenes with noise and overlapping speakers, directionally extracting audio tracks corresponding to individual speakers is crucial for immersive and interactive spatial audio systems. Although neural networks have been successful in this task, existing steering approaches for adjusting the direction of neural speech extraction mainly target spatial audio directly collected by microphone arrays, while directional speech extraction with Ambisonics spatial audio is less well studied. Therefore, to encode the target directional information as input for the neural network, this paper proposes two Ambisonics directional features based on the spatial feature difference and beamforming principle: the relative harmonic difference and the directional signal enhancement ratio. Using the special property of Ambisonics' rotation transform, a rotary steering pre-processing is also proposed to align the target speaker's direction with a fixed reference by inversely rotating the sound field, thereby simplifying multi-directional extraction to fixed-directional extraction. Finally, we integrate these proposed approaches with the existing temporal-spectral-spatial filtering neural networks to establish a generalized framework for steerable speech extraction and conduct experiments on a simulated Ambisonics dataset containing multiple speakers and noise sources. The experiments show that the proposed approaches outperform existing conditional steering and can be applied to various existing neural network architectures.</div></div>","PeriodicalId":55506,"journal":{"name":"Applied Acoustics","volume":"228 ","pages":"Article 110384"},"PeriodicalIF":3.4000,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Ambisonics neural speech extraction with directional feature and rotary steering\",\"authors\":\"Shiqi Wang ,&nbsp;Hongbing Qiu ,&nbsp;Xiyu Song ,&nbsp;Mei Wang ,&nbsp;Fangzhi Yao\",\"doi\":\"10.1016/j.apacoust.2024.110384\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>In scenes with noise and overlapping speakers, directionally extracting audio tracks corresponding to individual speakers is crucial for immersive and interactive spatial audio systems. Although neural networks have been successful in this task, existing steering approaches for adjusting the direction of neural speech extraction mainly target spatial audio directly collected by microphone arrays, while directional speech extraction with Ambisonics spatial audio is less well studied. Therefore, to encode the target directional information as input for the neural network, this paper proposes two Ambisonics directional features based on the spatial feature difference and beamforming principle: the relative harmonic difference and the directional signal enhancement ratio. Using the special property of Ambisonics' rotation transform, a rotary steering pre-processing is also proposed to align the target speaker's direction with a fixed reference by inversely rotating the sound field, thereby simplifying multi-directional extraction to fixed-directional extraction. Finally, we integrate these proposed approaches with the existing temporal-spectral-spatial filtering neural networks to establish a generalized framework for steerable speech extraction and conduct experiments on a simulated Ambisonics dataset containing multiple speakers and noise sources. The experiments show that the proposed approaches outperform existing conditional steering and can be applied to various existing neural network architectures.</div></div>\",\"PeriodicalId\":55506,\"journal\":{\"name\":\"Applied Acoustics\",\"volume\":\"228 \",\"pages\":\"Article 110384\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2024-11-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied Acoustics\",\"FirstCategoryId\":\"101\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0003682X24005358\",\"RegionNum\":2,\"RegionCategory\":\"物理与天体物理\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ACOUSTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Acoustics","FirstCategoryId":"101","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0003682X24005358","RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ACOUSTICS","Score":null,"Total":0}
引用次数: 0

摘要

在有噪音和扬声器重叠的场景中,定向提取与单个扬声器相对应的音轨对于身临其境的交互式空间音频系统至关重要。虽然神经网络在这项任务中取得了成功,但现有的调整神经语音提取方向的转向方法主要针对麦克风阵列直接采集的空间音频,而对 Ambisonics 空间音频的定向语音提取研究较少。因此,为了编码目标方向信息作为神经网络的输入,本文基于空间特征差和波束成形原理,提出了两个 Ambisonics 方向特征:相对谐波差和方向信号增强比。利用 Ambisonics 旋转变换的特殊性质,本文还提出了一种旋转转向预处理方法,通过声场的反向旋转将目标扬声器的方向与固定参考对齐,从而将多方向提取简化为固定方向提取。最后,我们将这些建议的方法与现有的时间-光谱-空间滤波神经网络相结合,建立了一个可转向语音提取的通用框架,并在包含多个扬声器和噪声源的模拟 Ambisonics 数据集上进行了实验。实验表明,所提出的方法优于现有的条件转向方法,并可应用于现有的各种神经网络架构。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Ambisonics neural speech extraction with directional feature and rotary steering
In scenes with noise and overlapping speakers, directionally extracting audio tracks corresponding to individual speakers is crucial for immersive and interactive spatial audio systems. Although neural networks have been successful in this task, existing steering approaches for adjusting the direction of neural speech extraction mainly target spatial audio directly collected by microphone arrays, while directional speech extraction with Ambisonics spatial audio is less well studied. Therefore, to encode the target directional information as input for the neural network, this paper proposes two Ambisonics directional features based on the spatial feature difference and beamforming principle: the relative harmonic difference and the directional signal enhancement ratio. Using the special property of Ambisonics' rotation transform, a rotary steering pre-processing is also proposed to align the target speaker's direction with a fixed reference by inversely rotating the sound field, thereby simplifying multi-directional extraction to fixed-directional extraction. Finally, we integrate these proposed approaches with the existing temporal-spectral-spatial filtering neural networks to establish a generalized framework for steerable speech extraction and conduct experiments on a simulated Ambisonics dataset containing multiple speakers and noise sources. The experiments show that the proposed approaches outperform existing conditional steering and can be applied to various existing neural network architectures.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Applied Acoustics
Applied Acoustics 物理-声学
CiteScore
7.40
自引率
11.80%
发文量
618
审稿时长
7.5 months
期刊介绍: Since its launch in 1968, Applied Acoustics has been publishing high quality research papers providing state-of-the-art coverage of research findings for engineers and scientists involved in applications of acoustics in the widest sense. Applied Acoustics looks not only at recent developments in the understanding of acoustics but also at ways of exploiting that understanding. The Journal aims to encourage the exchange of practical experience through publication and in so doing creates a fund of technological information that can be used for solving related problems. The presentation of information in graphical or tabular form is especially encouraged. If a report of a mathematical development is a necessary part of a paper it is important to ensure that it is there only as an integral part of a practical solution to a problem and is supported by data. Applied Acoustics encourages the exchange of practical experience in the following ways: • Complete Papers • Short Technical Notes • Review Articles; and thereby provides a wealth of technological information that can be used to solve related problems. Manuscripts that address all fields of applications of acoustics ranging from medicine and NDT to the environment and buildings are welcome.
期刊最新文献
Development of a code-switched Hindi-Marathi dataset and transformer-based architecture for enhanced speech recognition using dynamic switching algorithms Eco-design of airborne sound insulation in Recycled Lightweight Concrete walls for Brazilian social housing: A reliability-based approach Does loudspeaker directivity really influence the reconstructed indoor temperature quality using Acoustic travel-time TOMography? A new deep learning forward BSS (D-FBSS) algorithm for acoustic noise reduction and speech enhancement Source depth classification in shallow sea negative thermocline waveguide with small aperture vertical arrays
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1