考虑到掩码空间释放的双耳波束成形

IF 4.1 2区 计算机科学 Q1 ACOUSTICS IEEE/ACM Transactions on Audio, Speech, and Language Processing Pub Date : 2024-08-29 DOI:10.1109/TASLP.2024.3451988
Johannes W. de Vries;Steven van de Par;Geert Leus;Richard Heusdens;Richard C. Hendriks
{"title":"考虑到掩码空间释放的双耳波束成形","authors":"Johannes W. de Vries;Steven van de Par;Geert Leus;Richard Heusdens;Richard C. Hendriks","doi":"10.1109/TASLP.2024.3451988","DOIUrl":null,"url":null,"abstract":"Hearing impairment is a prevalent problem with daily challenges like impaired speech intelligibility and sound localisation. One of the shortcomings of spatial filtering in hearing aids is that speech intelligibility is often not optimised directly, meaning that different auditory processes contributing to intelligibility are often not considered. One example is the perceptual phenomenon known as spatial release from masking (SRM). This paper develops a signal model that explicitly considers SRM in the beamforming design, achieved by transforming the binaural intelligibility prediction model (BSIM) into a signal processing framework. The resulting extended signal model is used to analyse the performance of reference beamformers and design a novel beamformer that more closely considers how the auditory system perceives binaural sound. It can be shown that the binaural minimum variance distortionless response (BMVDR) beamformer is also an optimal solution for the extended, perceived model, suggesting that SRM does not play a significant role in intelligibility enhancement after optimal beamforming. However, the optimal beamformer is no longer unique in the extended signal model. The additional secondary degrees of freedom can be used to preserve binaural cues of interfering sources while still achieving the same perceived performance of the BMVDR beamformer, though with a possible high sensitivity to intelligibility model mismatch errors.","PeriodicalId":13332,"journal":{"name":"IEEE/ACM Transactions on Audio, Speech, and Language Processing","volume":"32 ","pages":"4002-4012"},"PeriodicalIF":4.1000,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Binaural Beamforming Taking Into Account Spatial Release From Masking\",\"authors\":\"Johannes W. de Vries;Steven van de Par;Geert Leus;Richard Heusdens;Richard C. Hendriks\",\"doi\":\"10.1109/TASLP.2024.3451988\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Hearing impairment is a prevalent problem with daily challenges like impaired speech intelligibility and sound localisation. One of the shortcomings of spatial filtering in hearing aids is that speech intelligibility is often not optimised directly, meaning that different auditory processes contributing to intelligibility are often not considered. One example is the perceptual phenomenon known as spatial release from masking (SRM). This paper develops a signal model that explicitly considers SRM in the beamforming design, achieved by transforming the binaural intelligibility prediction model (BSIM) into a signal processing framework. The resulting extended signal model is used to analyse the performance of reference beamformers and design a novel beamformer that more closely considers how the auditory system perceives binaural sound. It can be shown that the binaural minimum variance distortionless response (BMVDR) beamformer is also an optimal solution for the extended, perceived model, suggesting that SRM does not play a significant role in intelligibility enhancement after optimal beamforming. However, the optimal beamformer is no longer unique in the extended signal model. The additional secondary degrees of freedom can be used to preserve binaural cues of interfering sources while still achieving the same perceived performance of the BMVDR beamformer, though with a possible high sensitivity to intelligibility model mismatch errors.\",\"PeriodicalId\":13332,\"journal\":{\"name\":\"IEEE/ACM Transactions on Audio, Speech, and Language Processing\",\"volume\":\"32 \",\"pages\":\"4002-4012\"},\"PeriodicalIF\":4.1000,\"publicationDate\":\"2024-08-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE/ACM Transactions on Audio, Speech, and Language Processing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10659165/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ACOUSTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE/ACM Transactions on Audio, Speech, and Language Processing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10659165/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ACOUSTICS","Score":null,"Total":0}
引用次数: 0

摘要

听力障碍是一个普遍存在的问题,其日常挑战包括语音清晰度受损和声音定位。助听器空间滤波技术的缺点之一是往往不能直接优化言语清晰度,这意味着往往没有考虑到不同的听觉过程对清晰度的影响。其中一个例子就是被称为 "掩蔽空间释放"(SRM)的知觉现象。本文通过将双耳可懂度预测模型(BSIM)转化为信号处理框架,开发了一种信号模型,在波束成形设计中明确考虑了 SRM。由此产生的扩展信号模型被用于分析参考波束成形器的性能,并设计出一种新型波束成形器,更贴近地考虑听觉系统如何感知双耳声音。结果表明,双耳最小方差无失真响应(BMVDR)波束成形器也是扩展的感知模型的最佳解决方案,这表明在最佳波束成形之后,SRM 在可懂度增强方面并没有发挥重要作用。然而,在扩展信号模型中,最佳波束成形器不再是唯一的。额外的二级自由度可用于保留干扰源的双耳线索,同时仍能达到与 BMVDR 波束成形器相同的感知性能,但对可懂度模型不匹配误差的敏感度可能较高。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Binaural Beamforming Taking Into Account Spatial Release From Masking
Hearing impairment is a prevalent problem with daily challenges like impaired speech intelligibility and sound localisation. One of the shortcomings of spatial filtering in hearing aids is that speech intelligibility is often not optimised directly, meaning that different auditory processes contributing to intelligibility are often not considered. One example is the perceptual phenomenon known as spatial release from masking (SRM). This paper develops a signal model that explicitly considers SRM in the beamforming design, achieved by transforming the binaural intelligibility prediction model (BSIM) into a signal processing framework. The resulting extended signal model is used to analyse the performance of reference beamformers and design a novel beamformer that more closely considers how the auditory system perceives binaural sound. It can be shown that the binaural minimum variance distortionless response (BMVDR) beamformer is also an optimal solution for the extended, perceived model, suggesting that SRM does not play a significant role in intelligibility enhancement after optimal beamforming. However, the optimal beamformer is no longer unique in the extended signal model. The additional secondary degrees of freedom can be used to preserve binaural cues of interfering sources while still achieving the same perceived performance of the BMVDR beamformer, though with a possible high sensitivity to intelligibility model mismatch errors.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
IEEE/ACM Transactions on Audio, Speech, and Language Processing
IEEE/ACM Transactions on Audio, Speech, and Language Processing ACOUSTICS-ENGINEERING, ELECTRICAL & ELECTRONIC
CiteScore
11.30
自引率
11.10%
发文量
217
期刊介绍: The IEEE/ACM Transactions on Audio, Speech, and Language Processing covers audio, speech and language processing and the sciences that support them. In audio processing: transducers, room acoustics, active sound control, human audition, analysis/synthesis/coding of music, and consumer audio. In speech processing: areas such as speech analysis, synthesis, coding, speech and speaker recognition, speech production and perception, and speech enhancement. In language processing: speech and text analysis, understanding, generation, dialog management, translation, summarization, question answering and document indexing and retrieval, as well as general language modeling.
期刊最新文献
List of Reviewers IPDnet: A Universal Direct-Path IPD Estimation Network for Sound Source Localization MO-Transformer: Extract High-Level Relationship Between Words for Neural Machine Translation Online Neural Speaker Diarization With Target Speaker Tracking Blind Audio Bandwidth Extension: A Diffusion-Based Zero-Shot Approach
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1