The Optimal Speech-to-Background Ratio for Balancing Speech Recognition With Environmental Sound Recognition.

IF 2.8 2区医学 Q1 AUDIOLOGY & SPEECH-LANGUAGE PATHOLOGY Ear and Hearing Pub Date : 2024-11-01 Epub Date: 2024-05-31 DOI:10.1097/AUD.0000000000001532

Eric M Johnson, Eric W Healy

{"title":"The Optimal Speech-to-Background Ratio for Balancing Speech Recognition With Environmental Sound Recognition.","authors":"Eric M Johnson, Eric W Healy","doi":"10.1097/AUD.0000000000001532","DOIUrl":null,"url":null,"abstract":"Objectives: This study aimed to determine the speech-to-background ratios (SBRs) at which normal-hearing (NH) and hearing-impaired (HI) listeners can recognize both speech and environmental sounds when the two types of signals are mixed. Also examined were the effect of individual sounds on speech recognition and environmental sound recognition (ESR), and the impact of divided versus selective attention on these tasks.Design: In Experiment 1 (divided attention), 11 NH and 10 HI listeners heard sentences mixed with environmental sounds at various SBRs and performed speech recognition and ESR tasks concurrently in each trial. In Experiment 2 (selective attention), 20 NH listeners performed these tasks in separate trials. Psychometric functions were generated for each task, listener group, and environmental sound. The range over which speech recognition and ESR were both high was determined, as was the optimal SBR for balancing recognition with ESR, defined as the point of intersection between each pair of normalized psychometric functions.Results: The NH listeners achieved greater than 95% accuracy on concurrent speech recognition and ESR over an SBR range of approximately 20 dB or greater. The optimal SBR for maximizing both speech recognition and ESR for NH listeners was approximately +12 dB. For the HI listeners, the range over which 95% performance was observed on both tasks was far smaller (span of 1 dB), with an optimal value of +5 dB. Acoustic analyses indicated that the speech and environmental sound stimuli were similarly audible, regardless of the hearing status of the listener, but that the speech fluctuated more than the environmental sounds. Divided versus selective attention conditions produced differences in performance that were statistically significant yet only modest in magnitude. In all conditions and for both listener groups, recognition was higher for environmental sounds than for speech when presented at equal intensities (i.e., 0 dB SBR), indicating that the environmental sounds were more effective maskers of speech than the converse. Each of the 25 environmental sounds used in this study (with one exception) had a span of SBRs over which speech recognition and ESR were both higher than 95%. These ranges tended to overlap substantially.Conclusions: A range of SBRs exists over which speech and environmental sounds can be simultaneously recognized with high accuracy by NH and HI listeners, but this range is larger for NH listeners. The single optimal SBR for jointly maximizing speech recognition and ESR also differs between NH and HI listeners. The greater masking effectiveness of the environmental sounds relative to the speech may be related to the lower degree of fluctuation present in the environmental sounds as well as possibly task differences between speech recognition and ESR (open versus closed set). The observed differences between the NH and HI results may possibly be related to the HI listeners' smaller fluctuating masker benefit. As noise-reduction systems become increasingly effective, the current results could potentially guide the design of future systems that provide listeners with highly intelligible speech without depriving them of access to important environmental sounds.","PeriodicalId":55172,"journal":{"name":"Ear and Hearing","volume":" ","pages":"1444-1460"},"PeriodicalIF":2.8000,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11493516/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ear and Hearing","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1097/AUD.0000000000001532","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/5/31 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"AUDIOLOGY & SPEECH-LANGUAGE PATHOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Objectives: This study aimed to determine the speech-to-background ratios (SBRs) at which normal-hearing (NH) and hearing-impaired (HI) listeners can recognize both speech and environmental sounds when the two types of signals are mixed. Also examined were the effect of individual sounds on speech recognition and environmental sound recognition (ESR), and the impact of divided versus selective attention on these tasks.

Design: In Experiment 1 (divided attention), 11 NH and 10 HI listeners heard sentences mixed with environmental sounds at various SBRs and performed speech recognition and ESR tasks concurrently in each trial. In Experiment 2 (selective attention), 20 NH listeners performed these tasks in separate trials. Psychometric functions were generated for each task, listener group, and environmental sound. The range over which speech recognition and ESR were both high was determined, as was the optimal SBR for balancing recognition with ESR, defined as the point of intersection between each pair of normalized psychometric functions.

Results: The NH listeners achieved greater than 95% accuracy on concurrent speech recognition and ESR over an SBR range of approximately 20 dB or greater. The optimal SBR for maximizing both speech recognition and ESR for NH listeners was approximately +12 dB. For the HI listeners, the range over which 95% performance was observed on both tasks was far smaller (span of 1 dB), with an optimal value of +5 dB. Acoustic analyses indicated that the speech and environmental sound stimuli were similarly audible, regardless of the hearing status of the listener, but that the speech fluctuated more than the environmental sounds. Divided versus selective attention conditions produced differences in performance that were statistically significant yet only modest in magnitude. In all conditions and for both listener groups, recognition was higher for environmental sounds than for speech when presented at equal intensities (i.e., 0 dB SBR), indicating that the environmental sounds were more effective maskers of speech than the converse. Each of the 25 environmental sounds used in this study (with one exception) had a span of SBRs over which speech recognition and ESR were both higher than 95%. These ranges tended to overlap substantially.

Conclusions: A range of SBRs exists over which speech and environmental sounds can be simultaneously recognized with high accuracy by NH and HI listeners, but this range is larger for NH listeners. The single optimal SBR for jointly maximizing speech recognition and ESR also differs between NH and HI listeners. The greater masking effectiveness of the environmental sounds relative to the speech may be related to the lower degree of fluctuation present in the environmental sounds as well as possibly task differences between speech recognition and ESR (open versus closed set). The observed differences between the NH and HI results may possibly be related to the HI listeners' smaller fluctuating masker benefit. As noise-reduction systems become increasingly effective, the current results could potentially guide the design of future systems that provide listeners with highly intelligible speech without depriving them of access to important environmental sounds.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

平衡语音识别与环境声音识别的最佳语音背景比

研究目的本研究旨在确定正常听力（NH）和听力受损（HI）听者在两种信号混合时能够识别语音和环境声音的语音与背景比率（SBR）。此外，还研究了单个声音对语音识别和环境声音识别（ESR）的影响，以及分散注意和选择性注意对这些任务的影响：在实验 1（分散注意）中，11 名 NH 听者和 10 名 HI 听者在不同的 SBR 条件下听到了与环境声音混合的句子，并在每次试验中同时完成了语音识别和环境声音识别任务。在实验 2（选择性注意）中，20 名 NH 听者在单独的试验中完成了这些任务。针对每项任务、听者群体和环境声音都生成了心理测量函数。确定了语音识别率和 ESR 均较高的范围，以及平衡识别率和 ESR 的最佳 SBR（定义为每对归一化心理测量函数之间的交点）：在大约 20 分贝或更大的 SBR 范围内，NH 听力者的同时语音识别和 ESR 准确率超过 95%。对 NH 听力者而言，语音识别率和 ESR 均达到最大值的最佳 SBR 约为 +12 dB。而对于听力正常的听者来说，两项任务都能达到 95% 效果的范围要小得多（跨度为 1 dB），最佳值为 +5 dB。声学分析表明，无论听者的听力状况如何，语音和环境声音刺激的可听度相似，但语音的波动比环境声音大。分散注意与选择性注意条件下的成绩差异在统计学上有显著意义，但幅度不大。在所有条件下，两组听者在同等强度（即 0 dB SBR）下对环境声的识别率均高于语音，这表明环境声比语音更有效地掩蔽了环境声。本研究中使用的 25 种环境声音（只有一种例外）都有一个 SBR 跨度，在此跨度内，语音识别率和 ESR 均高于 95%。这些范围往往有很大程度的重叠：结论：存在一定范围的 SBR，在此范围内，NH 和 HI 听者可以同时高准确度地识别语音和环境声音，但 NH 听者的范围更大。同时最大限度提高语音识别率和 ESR 的单一最佳 SBR 在 NH 和 HI 听力者之间也存在差异。相对于语音而言，环境声音的掩蔽效果更强，这可能与环境声音的波动程度较低有关，也可能与语音识别和 ESR 之间的任务差异（开放集与封闭集）有关。观察到的 NH 和 HI 结果之间的差异可能与 HI 听者的掩蔽器收益波动较小有关。随着降噪系统变得越来越有效，目前的结果有可能指导未来系统的设计，为听者提供高清晰度的语音，同时又不剥夺他们获得重要环境声音的机会。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Ear and Hearing 医学-耳鼻喉科学

CiteScore

5.90

自引率

10.80%

发文量

207

审稿时长

6-12 weeks

期刊介绍： From the basic science of hearing and balance disorders to auditory electrophysiology to amplification and the psychological factors of hearing loss, Ear and Hearing covers all aspects of auditory and vestibular disorders. This multidisciplinary journal consolidates the various factors that contribute to identification, remediation, and audiologic and vestibular rehabilitation. It is the one journal that serves the diverse interest of all members of this professional community -- otologists, audiologists, educators, and to those involved in the design, manufacture, and distribution of amplification systems. The original articles published in the journal focus on assessment, diagnosis, and management of auditory and vestibular disorders.