Voice cloning is used to generate synthetic speech that mimics vocal characteristics of human talkers. This experiment used voice cloning to compare human and synthetic speech for intelligibility, human-likeness, and perceptual similarity, all tested in young adults with normal hearing. Masked-sentence recognition was evaluated using speech produced by five human talkers and their synthetically generated voice clones presented in speech-shaped noise at -6 dB signal-to-noise ratio. There were two types of sentences: semantically meaningful and nonsense. Human and automatic speech recognition scoring was used to evaluate performance. Participants were asked to rate human-likeness and determine whether pairs of sentences were produced by the same versus different people. As expected, sentence-recognition scores were worse for nonsense sentences compared to meaningful sentences, but they were similar for speech produced by human talkers and voice clones. Human-likeness scores were also similar for speech produced by human talkers and their voice clones. Participants were very good at identifying differences between voices but were less accurate at distinguishing between human/clone pairs, often leaning towards thinking they were produced by the same person. Reliability scoring by automatic speech recognition agreed with human reliability scoring for 98% of keywords and was minimally dependent on the context of the target sentences. Results provide preliminary support for the use of voice clones when evaluating the recognition of human and synthetic speech. More generally, voice synthesis and automatic speech recognition are promising tools for evaluating speech recognition in human listeners.
{"title":"Masked-speech Recognition Using Human and Synthetic Cloned Speech.","authors":"Lauren Calandruccio, Mohsen Hariri, Emily Buss, Vipin Chaudhary","doi":"10.1177/23312165251403080","DOIUrl":"10.1177/23312165251403080","url":null,"abstract":"<p><p>Voice cloning is used to generate synthetic speech that mimics vocal characteristics of human talkers. This experiment used voice cloning to compare human and synthetic speech for intelligibility, human-likeness, and perceptual similarity, all tested in young adults with normal hearing. Masked-sentence recognition was evaluated using speech produced by five human talkers and their synthetically generated voice clones presented in speech-shaped noise at -6 dB signal-to-noise ratio. There were two types of sentences: semantically meaningful and nonsense. Human and automatic speech recognition scoring was used to evaluate performance. Participants were asked to rate human-likeness and determine whether pairs of sentences were produced by the same versus different people. As expected, sentence-recognition scores were worse for nonsense sentences compared to meaningful sentences, but they were similar for speech produced by human talkers and voice clones. Human-likeness scores were also similar for speech produced by human talkers and their voice clones. Participants were very good at identifying differences between voices but were less accurate at distinguishing between human/clone pairs, often leaning towards thinking they were produced by the same person. Reliability scoring by automatic speech recognition agreed with human reliability scoring for 98% of keywords and was minimally dependent on the context of the target sentences. Results provide preliminary support for the use of voice clones when evaluating the recognition of human and synthetic speech. More generally, voice synthesis and automatic speech recognition are promising tools for evaluating speech recognition in human listeners.</p>","PeriodicalId":48678,"journal":{"name":"Trends in Hearing","volume":"29 ","pages":"23312165251403080"},"PeriodicalIF":3.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12686364/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145702670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-14DOI: 10.1177/23312165241266322
David López-Ramos, Miriam I. Marrufo-Pérez, Almudena Eustaquio-Martín, Luis E. López-Bascuas, Enrique A. Lopez-Poveda
Noise adaptation is the improvement in auditory function as the signal of interest is delayed in the noise. Here, we investigated if noise adaptation occurs in spectral, temporal, and spectrotemporal modulation detection as well as in speech recognition. Eighteen normal-hearing adults participated in the experiments. In the modulation detection tasks, the signal was a 200ms spectrally and/or temporally modulated ripple noise. The spectral modulation rate was two cycles per octave, the temporal modulation rate was 10 Hz, and the spectrotemporal modulations combined these two modulations, which resulted in a downward-moving ripple. A control experiment was performed to determine if the results generalized to upward-moving ripples. In the speech recognition task, the signal consisted of disyllabic words unprocessed or vocoded to maintain only envelope cues. Modulation detection thresholds at 0 dB signal-to-noise ratio and speech reception thresholds were measured in quiet and in white noise (at 60 dB SPL) for noise-signal onset delays of 50 ms (early condition) and 800 ms (late condition). Adaptation was calculated as the threshold difference between the early and late conditions. Adaptation in word recognition was statistically significant for vocoded words (2.1 dB) but not for natural words (0.6 dB). Adaptation was found to be statistically significant in spectral (2.1 dB) and temporal (2.2 dB) modulation detection but not in spectrotemporal modulation detection (downward ripple: 0.0 dB, upward ripple: −0.4 dB). Findings suggest that noise adaptation in speech recognition is unrelated to improvements in the encoding of spectrotemporal modulation cues.
{"title":"Adaptation to Noise in Spectrotemporal Modulation Detection and Word Recognition","authors":"David López-Ramos, Miriam I. Marrufo-Pérez, Almudena Eustaquio-Martín, Luis E. López-Bascuas, Enrique A. Lopez-Poveda","doi":"10.1177/23312165241266322","DOIUrl":"https://doi.org/10.1177/23312165241266322","url":null,"abstract":"Noise adaptation is the improvement in auditory function as the signal of interest is delayed in the noise. Here, we investigated if noise adaptation occurs in spectral, temporal, and spectrotemporal modulation detection as well as in speech recognition. Eighteen normal-hearing adults participated in the experiments. In the modulation detection tasks, the signal was a 200ms spectrally and/or temporally modulated ripple noise. The spectral modulation rate was two cycles per octave, the temporal modulation rate was 10 Hz, and the spectrotemporal modulations combined these two modulations, which resulted in a downward-moving ripple. A control experiment was performed to determine if the results generalized to upward-moving ripples. In the speech recognition task, the signal consisted of disyllabic words unprocessed or vocoded to maintain only envelope cues. Modulation detection thresholds at 0 dB signal-to-noise ratio and speech reception thresholds were measured in quiet and in white noise (at 60 dB SPL) for noise-signal onset delays of 50 ms (early condition) and 800 ms (late condition). Adaptation was calculated as the threshold difference between the early and late conditions. Adaptation in word recognition was statistically significant for vocoded words (2.1 dB) but not for natural words (0.6 dB). Adaptation was found to be statistically significant in spectral (2.1 dB) and temporal (2.2 dB) modulation detection but not in spectrotemporal modulation detection (downward ripple: 0.0 dB, upward ripple: −0.4 dB). Findings suggest that noise adaptation in speech recognition is unrelated to improvements in the encoding of spectrotemporal modulation cues.","PeriodicalId":48678,"journal":{"name":"Trends in Hearing","volume":"44 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142256779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-27DOI: 10.1177/23312165241240572
Maartje M. E. Hendrikse, Gertjan Dingemanse, André Goedegebure
Realistic outcome measures that reflect everyday hearing challenges are needed to assess hearing aid and cochlear implant (CI) fitting. Literature suggests that listening effort measures may be more sensitive to differences between hearing-device settings than established speech intelligibility measures when speech intelligibility is near maximum. Which method provides the most effective measurement of listening effort for this purpose is currently unclear. This study aimed to investigate the feasibility of two tests for measuring changes in listening effort in CI users due to signal-to-noise ratio (SNR) differences, as would arise from different hearing-device settings. By comparing the effect size of SNR differences on listening effort measures with test–retest differences, the study evaluated the suitability of these tests for clinical use. Nineteen CI users underwent two listening effort tests at two SNRs (+4 and +8 dB relative to individuals’ 50% speech perception threshold). We employed dual-task paradigms—a sentence-final word identification and recall test (SWIRT) and a sentence verification test (SVT)—to assess listening effort at these two SNRs. Our results show a significant difference in listening effort between the SNRs for both test methods, although the effect size was comparable to the test–retest difference, and the sensitivity was not superior to speech intelligibility measures. Thus, the implementations of SVT and SWIRT used in this study are not suitable for clinical use to measure listening effort differences of this magnitude in individual CI users. However, they can be used in research involving CI users to analyze group data.
在评估助听器和人工耳蜗(CI)验配时,需要能反映日常听力挑战的真实结果测量。文献表明,当言语清晰度接近最大值时,听力努力测量法可能比既定的言语清晰度测量法对听力设备设置之间的差异更加敏感。目前还不清楚哪种方法能最有效地测量听力强度。本研究旨在调查两种测试方法的可行性,以测量 CI 用户因不同听力设备设置造成的信噪比(SNR)差异而引起的聆听强度变化。通过比较信噪比差异对听力测量的影响大小和测试-再测试差异,该研究评估了这些测试在临床应用中的适用性。19 名 CI 用户在两种信噪比(相对于个人 50%言语感知阈值的 +4 和 +8 dB)下进行了两次听力努力测试。我们采用了双任务范式--句子末尾单词识别和回忆测试 (SWIRT) 和句子验证测试 (SVT)--来评估这两种信噪比下的听力强度。我们的结果表明,两种测试方法在不同信噪比下的听力努力程度存在显著差异,但其效应大小与测试-重复差异相当,灵敏度也不优于语音清晰度测量。因此,本研究中使用的 SVT 和 SWIRT 不适合在临床上用于测量 CI 用户听力差异。不过,它们可用于涉及 CI 用户的研究,以分析群体数据。
{"title":"On the Feasibility of Using Behavioral Listening Effort Test Methods to Evaluate Auditory Performance in Cochlear Implant Users","authors":"Maartje M. E. Hendrikse, Gertjan Dingemanse, André Goedegebure","doi":"10.1177/23312165241240572","DOIUrl":"https://doi.org/10.1177/23312165241240572","url":null,"abstract":"Realistic outcome measures that reflect everyday hearing challenges are needed to assess hearing aid and cochlear implant (CI) fitting. Literature suggests that listening effort measures may be more sensitive to differences between hearing-device settings than established speech intelligibility measures when speech intelligibility is near maximum. Which method provides the most effective measurement of listening effort for this purpose is currently unclear. This study aimed to investigate the feasibility of two tests for measuring changes in listening effort in CI users due to signal-to-noise ratio (SNR) differences, as would arise from different hearing-device settings. By comparing the effect size of SNR differences on listening effort measures with test–retest differences, the study evaluated the suitability of these tests for clinical use. Nineteen CI users underwent two listening effort tests at two SNRs (+4 and +8 dB relative to individuals’ 50% speech perception threshold). We employed dual-task paradigms—a sentence-final word identification and recall test (SWIRT) and a sentence verification test (SVT)—to assess listening effort at these two SNRs. Our results show a significant difference in listening effort between the SNRs for both test methods, although the effect size was comparable to the test–retest difference, and the sensitivity was not superior to speech intelligibility measures. Thus, the implementations of SVT and SWIRT used in this study are not suitable for clinical use to measure listening effort differences of this magnitude in individual CI users. However, they can be used in research involving CI users to analyze group data.","PeriodicalId":48678,"journal":{"name":"Trends in Hearing","volume":"37 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140810906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-24DOI: 10.1177/23312165241246616
Dina Lelic, Line Louise Aaberg Nielsen, Anja Kofoed Pedersen, Tobias Neher
Negativity bias is a cognitive bias that results in negative events being perceptually more salient than positive ones. For hearing care, this means that hearing aid benefits can potentially be overshadowed by adverse experiences. Research has shown that sustaining focus on positive experiences has the potential to mitigate negativity bias. The purpose of the current study was to investigate whether a positive focus (PF) intervention can improve speech-in-noise abilities for experienced hearing aid users. Thirty participants were randomly allocated to a control or PF group (N = 2 × 15). Prior to hearing aid fitting, all participants filled out the short form of the Speech, Spatial and Qualities of Hearing scale (SSQ12) based on their own hearing aids. At the first visit, they were fitted with study hearing aids, and speech-in-noise testing was performed. Both groups then wore the study hearing aids for two weeks and sent daily text messages reporting hours of hearing aid use to an experimenter. In addition, the PF group was instructed to focus on positive listening experiences and to also report them in the daily text messages. After the 2-week trial, all participants filled out the SSQ12 questionnaire based on the study hearing aids and completed the speech-in-noise testing again. Speech-in-noise performance and SSQ12 Qualities score were improved for the PF group but not for the control group. This finding indicates that the PF intervention can improve subjective and objective hearing aid benefits.
{"title":"Focusing on Positive Listening Experiences Improves Speech Intelligibility in Experienced Hearing Aid Users","authors":"Dina Lelic, Line Louise Aaberg Nielsen, Anja Kofoed Pedersen, Tobias Neher","doi":"10.1177/23312165241246616","DOIUrl":"https://doi.org/10.1177/23312165241246616","url":null,"abstract":"Negativity bias is a cognitive bias that results in negative events being perceptually more salient than positive ones. For hearing care, this means that hearing aid benefits can potentially be overshadowed by adverse experiences. Research has shown that sustaining focus on positive experiences has the potential to mitigate negativity bias. The purpose of the current study was to investigate whether a positive focus (PF) intervention can improve speech-in-noise abilities for experienced hearing aid users. Thirty participants were randomly allocated to a control or PF group (N = 2 × 15). Prior to hearing aid fitting, all participants filled out the short form of the Speech, Spatial and Qualities of Hearing scale (SSQ12) based on their own hearing aids. At the first visit, they were fitted with study hearing aids, and speech-in-noise testing was performed. Both groups then wore the study hearing aids for two weeks and sent daily text messages reporting hours of hearing aid use to an experimenter. In addition, the PF group was instructed to focus on positive listening experiences and to also report them in the daily text messages. After the 2-week trial, all participants filled out the SSQ12 questionnaire based on the study hearing aids and completed the speech-in-noise testing again. Speech-in-noise performance and SSQ12 Qualities score were improved for the PF group but not for the control group. This finding indicates that the PF intervention can improve subjective and objective hearing aid benefits.","PeriodicalId":48678,"journal":{"name":"Trends in Hearing","volume":"46 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140802270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-17DOI: 10.1177/23312165241246597
Florian Denk, Luca Wiederschein, Markus Kemper, Hendrik Husstedt
Hearing aids and other hearing devices should provide the user with a benefit, for example, compensate for effects of a hearing loss or cancel undesired sounds. However, wearing hearing devices can also have negative effects on perception, previously demonstrated mostly for spatial hearing, sound quality and the perception of the own voice. When hearing devices are set to transparency, that is, provide no gain and resemble open-ear listening as well as possible, these side effects can be studied in isolation. In the present work, we conducted a series of experiments that are concerned with the effect of transparent hearing devices on speech perception in a collocated speech-in-noise task. In such a situation, listening through a hearing device is not expected to have any negative effect, since both speech and noise undergo identical processing, such that the signal-to-noise ratio at ear is not altered and spatial effects are irrelevant. However, we found a consistent hearing device disadvantage for speech intelligibility and similar trends for rated listening effort. Several hypotheses for the possible origin for this disadvantage were tested by including several different devices, gain settings and stimulus levels. While effects of self-noise and nonlinear distortions were ruled out, the exact reason for a hearing device disadvantage on speech perception is still unclear. However, a significant relation to auditory model predictions demonstrate that the speech intelligibility disadvantage is related to sound quality, and is most probably caused by insufficient equalization, artifacts of frequency-dependent signal processing and processing delays.
{"title":"(Why) Do Transparent Hearing Devices Impair Speech Perception in Collocated Noise?","authors":"Florian Denk, Luca Wiederschein, Markus Kemper, Hendrik Husstedt","doi":"10.1177/23312165241246597","DOIUrl":"https://doi.org/10.1177/23312165241246597","url":null,"abstract":"Hearing aids and other hearing devices should provide the user with a benefit, for example, compensate for effects of a hearing loss or cancel undesired sounds. However, wearing hearing devices can also have negative effects on perception, previously demonstrated mostly for spatial hearing, sound quality and the perception of the own voice. When hearing devices are set to transparency, that is, provide no gain and resemble open-ear listening as well as possible, these side effects can be studied in isolation. In the present work, we conducted a series of experiments that are concerned with the effect of transparent hearing devices on speech perception in a collocated speech-in-noise task. In such a situation, listening through a hearing device is not expected to have any negative effect, since both speech and noise undergo identical processing, such that the signal-to-noise ratio at ear is not altered and spatial effects are irrelevant. However, we found a consistent hearing device disadvantage for speech intelligibility and similar trends for rated listening effort. Several hypotheses for the possible origin for this disadvantage were tested by including several different devices, gain settings and stimulus levels. While effects of self-noise and nonlinear distortions were ruled out, the exact reason for a hearing device disadvantage on speech perception is still unclear. However, a significant relation to auditory model predictions demonstrate that the speech intelligibility disadvantage is related to sound quality, and is most probably caused by insufficient equalization, artifacts of frequency-dependent signal processing and processing delays.","PeriodicalId":48678,"journal":{"name":"Trends in Hearing","volume":"29 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140614038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-13DOI: 10.1177/23312165241245219
Jonas Althoff, Tom Gajecki, Waldo Nogueira
For people with profound hearing loss, a cochlear implant (CI) is able to provide access to sounds that support speech perception. With current technology, most CI users obtain very good speech understanding in quiet listening environments. However, many CI users still struggle when listening to music. Efforts have been made to preprocess music for CI users and improve their music enjoyment. This work investigates potential modifications of instrumental music to make it more accessible for CI users. For this purpose, we used two datasets with varying complexity and containing individual tracks of instrumental music. The first dataset contained trios and it was newly created and synthesized for this study. The second dataset contained orchestral music with a large number of instruments. Bilateral CI users and normal hearing listeners were asked to remix the multitracks grouped into melody, bass, accompaniment, and percussion. Remixes could be performed in the amplitude, spatial, and spectral domains. Results showed that CI users preferred tracks being panned toward the right side, especially the percussion component. When CI users were grouped into frequent or occasional music listeners, significant differences in remixing preferences in all domains were observed.
对于重度听力损失患者来说,人工耳蜗(CI)能够提供支持言语感知的声音。利用现有技术,大多数 CI 用户在安静的聆听环境中都能很好地理解语音。然而,许多 CI 用户在聆听音乐时仍有困难。人们一直在努力为 CI 用户预处理音乐,提高他们的音乐欣赏能力。这项工作研究了对器乐进行修改的可能性,以使 CI 用户更容易接受音乐。为此,我们使用了两个复杂程度不同的数据集,其中包含器乐的单个音轨。第一个数据集包含三重奏,是为本研究新创建和合成的。第二个数据集包含大量乐器的管弦乐。研究人员要求双侧 CI 使用者和听力正常的听者将多轨音乐按旋律、低音、伴奏和打击乐进行混音。混音可以在振幅、空间和频谱域进行。结果显示,CI 用户更喜欢向右侧平移的曲目,尤其是打击乐部分。如果将 CI 用户分为经常听音乐和偶尔听音乐两类,则会发现他们在所有领域的混音偏好都存在显著差异。
{"title":"Remixing Preferences for Western Instrumental Classical Music of Bilateral Cochlear Implant Users","authors":"Jonas Althoff, Tom Gajecki, Waldo Nogueira","doi":"10.1177/23312165241245219","DOIUrl":"https://doi.org/10.1177/23312165241245219","url":null,"abstract":"For people with profound hearing loss, a cochlear implant (CI) is able to provide access to sounds that support speech perception. With current technology, most CI users obtain very good speech understanding in quiet listening environments. However, many CI users still struggle when listening to music. Efforts have been made to preprocess music for CI users and improve their music enjoyment. This work investigates potential modifications of instrumental music to make it more accessible for CI users. For this purpose, we used two datasets with varying complexity and containing individual tracks of instrumental music. The first dataset contained trios and it was newly created and synthesized for this study. The second dataset contained orchestral music with a large number of instruments. Bilateral CI users and normal hearing listeners were asked to remix the multitracks grouped into melody, bass, accompaniment, and percussion. Remixes could be performed in the amplitude, spatial, and spectral domains. Results showed that CI users preferred tracks being panned toward the right side, especially the percussion component. When CI users were grouped into frequent or occasional music listeners, significant differences in remixing preferences in all domains were observed.","PeriodicalId":48678,"journal":{"name":"Trends in Hearing","volume":"1 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140583743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-13DOI: 10.1177/23312165241245240
M. A. Johns, R. C. Calloway, I. M. D. Karunathilake, L. P. Decruy, S. Anderson, J. Z. Simon, S. E. Kuchinsky
Listening to speech in noise can require substantial mental effort, even among younger normal-hearing adults. The task-evoked pupil response (TEPR) has been shown to track the increased effort exerted to recognize words or sentences in increasing noise. However, few studies have examined the trajectory of listening effort across longer, more natural, stretches of speech, or the extent to which expectations about upcoming listening difficulty modulate the TEPR. Seventeen younger normal-hearing adults listened to 60-s-long audiobook passages, repeated three times in a row, at two different signal-to-noise ratios (SNRs) while pupil size was recorded. There was a significant interaction between SNR, repetition, and baseline pupil size on sustained listening effort. At lower baseline pupil sizes, potentially reflecting lower attention mobilization, TEPRs were more sustained in the harder SNR condition, particularly when attention mobilization remained low by the third presentation. At intermediate baseline pupil sizes, differences between conditions were largely absent, suggesting these listeners had optimally mobilized their attention for both SNRs. Lastly, at higher baseline pupil sizes, potentially reflecting overmobilization of attention, the effect of SNR was initially reversed for the second and third presentations: participants initially appeared to disengage in the harder SNR condition, resulting in reduced TEPRs that recovered in the second half of the story. Together, these findings suggest that the unfolding of listening effort over time depends critically on the extent to which individuals have successfully mobilized their attention in anticipation of difficult listening conditions.
{"title":"Attention Mobilization as a Modulator of Listening Effort: Evidence From Pupillometry","authors":"M. A. Johns, R. C. Calloway, I. M. D. Karunathilake, L. P. Decruy, S. Anderson, J. Z. Simon, S. E. Kuchinsky","doi":"10.1177/23312165241245240","DOIUrl":"https://doi.org/10.1177/23312165241245240","url":null,"abstract":"Listening to speech in noise can require substantial mental effort, even among younger normal-hearing adults. The task-evoked pupil response (TEPR) has been shown to track the increased effort exerted to recognize words or sentences in increasing noise. However, few studies have examined the trajectory of listening effort across longer, more natural, stretches of speech, or the extent to which expectations about upcoming listening difficulty modulate the TEPR. Seventeen younger normal-hearing adults listened to 60-s-long audiobook passages, repeated three times in a row, at two different signal-to-noise ratios (SNRs) while pupil size was recorded. There was a significant interaction between SNR, repetition, and baseline pupil size on sustained listening effort. At lower baseline pupil sizes, potentially reflecting lower attention mobilization, TEPRs were more sustained in the harder SNR condition, particularly when attention mobilization remained low by the third presentation. At intermediate baseline pupil sizes, differences between conditions were largely absent, suggesting these listeners had optimally mobilized their attention for both SNRs. Lastly, at higher baseline pupil sizes, potentially reflecting overmobilization of attention, the effect of SNR was initially reversed for the second and third presentations: participants initially appeared to disengage in the harder SNR condition, resulting in reduced TEPRs that recovered in the second half of the story. Together, these findings suggest that the unfolding of listening effort over time depends critically on the extent to which individuals have successfully mobilized their attention in anticipation of difficult listening conditions.","PeriodicalId":48678,"journal":{"name":"Trends in Hearing","volume":"50 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140583845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-01DOI: 10.1177/23312165241242235
LaGuinn Sherlock, Gregory Ellis, Alyssa Davidson, Douglas Brungart
The objective of this project was to establish cutoff scores on the tinnitus subscale of the Tinnitus and Hearing Survey (THS) using a large sample of United States service members (SM) with the end goal of guiding clinical referrals for tinnitus evaluation. A total of 4,589 SM undergoing annual audiometric surveillance were prospectively recruited to complete the THS tinnitus subscale (THS-T). A subset of 1,304 participants also completed the Tinnitus Functional Index (TFI). The original 5-point response scale of the THS (THS-T16) was modified to an 11-point scale (THS-T40) for some participants, to align with the response scale of the TFI. Age, sex, hearing loss, and self-reported tinnitus bother were also recorded. The THS-T was relatively insensitive to hearing, but self-reported bothersome tinnitus was significantly associated with the THS-T40 score. Receiver operating characteristic analysis was used to determine cutoff scores on the THS-T that aligned with recommended cutoff values for clinical intervention on the TFI. A cutoff of 9 on the THS-T40 aligns with a TFI cutoff of 25, indicating a patient may need intervention for tinnitus. A cutoff of 15 aligns with a TFI cutoff of 50, indicating that more aggressive intervention for tinnitus is warranted. The THS-T is a viable tool to identify patients with tinnitus complaints warranting clinical evaluation for use by hearing conservation programs and primary care clinics. The THS-T40 cutoff scores of 9 and 15 provide clinical reference points to guide referrals to audiology.
{"title":"Rapid Assessment of Tinnitus Complaints with a Modified Version of the Tinnitus and Hearing Survey.","authors":"LaGuinn Sherlock, Gregory Ellis, Alyssa Davidson, Douglas Brungart","doi":"10.1177/23312165241242235","DOIUrl":"10.1177/23312165241242235","url":null,"abstract":"<p><p>The objective of this project was to establish cutoff scores on the tinnitus subscale of the Tinnitus and Hearing Survey (THS) using a large sample of United States service members (SM) with the end goal of guiding clinical referrals for tinnitus evaluation. A total of 4,589 SM undergoing annual audiometric surveillance were prospectively recruited to complete the THS tinnitus subscale (THS-T). A subset of 1,304 participants also completed the Tinnitus Functional Index (TFI). The original 5-point response scale of the THS (THS-T<sub>16</sub>) was modified to an 11-point scale (THS-T<sub>40</sub>) for some participants, to align with the response scale of the TFI. Age, sex, hearing loss, and self-reported tinnitus bother were also recorded. The THS-T was relatively insensitive to hearing, but self-reported bothersome tinnitus was significantly associated with the THS-T<sub>40</sub> score. Receiver operating characteristic analysis was used to determine cutoff scores on the THS-T that aligned with recommended cutoff values for clinical intervention on the TFI. A cutoff of 9 on the THS-T<sub>40</sub> aligns with a TFI cutoff of 25, indicating a patient may need intervention for tinnitus. A cutoff of 15 aligns with a TFI cutoff of 50, indicating that more aggressive intervention for tinnitus is warranted. The THS-T is a viable tool to identify patients with tinnitus complaints warranting clinical evaluation for use by hearing conservation programs and primary care clinics. The THS-T<sub>40</sub> cutoff scores of 9 and 15 provide clinical reference points to guide referrals to audiology.</p>","PeriodicalId":48678,"journal":{"name":"Trends in Hearing","volume":"28 ","pages":"23312165241242235"},"PeriodicalIF":3.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11092559/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140913240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-01DOI: 10.1177/23312165241260029
Andrew T Sabin, Dale McElhone, Daniel Gauger, Bill Rabinowitz
The extent to which active noise cancelation (ANC), when combined with hearing assistance, can improve speech intelligibility in noise is not well understood. One possible source of benefit is ANC's ability to reduce the sound level of the direct (i.e., vent-transmitted) path. This reduction lowers the "floor" imposed by the direct path, thereby allowing any increases to the signal-to-noise ratio (SNR) created in the amplified path to be "realized" at the eardrum. Here we used a modeling approach to estimate this benefit. We compared pairs of simulated hearing aids that differ only in terms of their ability to provide ANC and computed intelligibility metrics on their outputs. The difference in metric scores between simulated devices is termed the "ANC Benefit." These simulations show that ANC Benefit increases as (1) the environmental sound level increases, (2) the ability of the hearing aid to improve SNR increases, (3) the strength of the ANC increases, and (4) the hearing loss severity decreases. The predicted size of the ANC Benefit can be substantial. For a moderate hearing loss, the model predicts improvement in intelligibility metrics of >30% when environments are moderately loud (>70 dB SPL) and devices are moderately capable of increasing SNR (by >4 dB). It appears that ANC can be a critical ingredient in hearing devices that attempt to improve SNR in loud environments. ANC will become more and more important as advanced SNR-improving algorithms (e.g., artificial intelligence speech enhancement) are included in hearing devices.
{"title":"Modeling the Intelligibility Benefit of Active Noise Cancelation in Hearing Devices That Improve Signal-to-Noise Ratio.","authors":"Andrew T Sabin, Dale McElhone, Daniel Gauger, Bill Rabinowitz","doi":"10.1177/23312165241260029","DOIUrl":"10.1177/23312165241260029","url":null,"abstract":"<p><p>The extent to which active noise cancelation (ANC), when combined with hearing assistance, can improve speech intelligibility in noise is not well understood. One possible source of benefit is ANC's ability to reduce the sound level of the direct (i.e., vent-transmitted) path. This reduction lowers the \"floor\" imposed by the direct path, thereby allowing any increases to the signal-to-noise ratio (SNR) created in the amplified path to be \"realized\" at the eardrum. Here we used a modeling approach to estimate this benefit. We compared pairs of simulated hearing aids that differ only in terms of their ability to provide ANC and computed intelligibility metrics on their outputs. The difference in metric scores between simulated devices is termed the \"ANC Benefit.\" These simulations show that ANC Benefit increases as (1) the environmental sound level increases, (2) the ability of the hearing aid to improve SNR increases, (3) the strength of the ANC increases, and (4) the hearing loss severity decreases. The predicted size of the ANC Benefit can be substantial. For a moderate hearing loss, the model predicts improvement in intelligibility metrics of >30% when environments are moderately loud (>70 dB SPL) and devices are moderately capable of increasing SNR (by >4 dB). It appears that ANC can be a critical ingredient in hearing devices that attempt to improve SNR in loud environments. ANC will become more and more important as advanced SNR-improving algorithms (e.g., artificial intelligence speech enhancement) are included in hearing devices.</p>","PeriodicalId":48678,"journal":{"name":"Trends in Hearing","volume":"28 ","pages":"23312165241260029"},"PeriodicalIF":3.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11149449/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141238476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-01DOI: 10.1177/23312165241266316
Alexis Deighton MacIntyre, Robert P Carlyon, Tobias Goehring
During continuous speech perception, endogenous neural activity becomes time-locked to acoustic stimulus features, such as the speech amplitude envelope. This speech-brain coupling can be decoded using non-invasive brain imaging techniques, including electroencephalography (EEG). Neural decoding may provide clinical use as an objective measure of stimulus encoding by the brain-for example during cochlear implant listening, wherein the speech signal is severely spectrally degraded. Yet, interplay between acoustic and linguistic factors may lead to top-down modulation of perception, thereby complicating audiological applications. To address this ambiguity, we assess neural decoding of the speech envelope under spectral degradation with EEG in acoustically hearing listeners (n = 38; 18-35 years old) using vocoded speech. We dissociate sensory encoding from higher-order processing by employing intelligible (English) and non-intelligible (Dutch) stimuli, with auditory attention sustained using a repeated-phrase detection task. Subject-specific and group decoders were trained to reconstruct the speech envelope from held-out EEG data, with decoder significance determined via random permutation testing. Whereas speech envelope reconstruction did not vary by spectral resolution, intelligible speech was associated with better decoding accuracy in general. Results were similar across subject-specific and group analyses, with less consistent effects of spectral degradation in group decoding. Permutation tests revealed possible differences in decoder statistical significance by experimental condition. In general, while robust neural decoding was observed at the individual and group level, variability within participants would most likely prevent the clinical use of such a measure to differentiate levels of spectral degradation and intelligibility on an individual basis.
{"title":"Neural Decoding of the Speech Envelope: Effects of Intelligibility and Spectral Degradation.","authors":"Alexis Deighton MacIntyre, Robert P Carlyon, Tobias Goehring","doi":"10.1177/23312165241266316","DOIUrl":"10.1177/23312165241266316","url":null,"abstract":"<p><p>During continuous speech perception, endogenous neural activity becomes time-locked to acoustic stimulus features, such as the speech amplitude envelope. This speech-brain coupling can be decoded using non-invasive brain imaging techniques, including electroencephalography (EEG). Neural decoding may provide clinical use as an objective measure of stimulus encoding by the brain-for example during cochlear implant listening, wherein the speech signal is severely spectrally degraded. Yet, interplay between acoustic and linguistic factors may lead to top-down modulation of perception, thereby complicating audiological applications. To address this ambiguity, we assess neural decoding of the speech envelope under spectral degradation with EEG in acoustically hearing listeners (<i>n</i> = 38; 18-35 years old) using vocoded speech. We dissociate sensory encoding from higher-order processing by employing intelligible (English) and non-intelligible (Dutch) stimuli, with auditory attention sustained using a repeated-phrase detection task. Subject-specific and group decoders were trained to reconstruct the speech envelope from held-out EEG data, with decoder significance determined via random permutation testing. Whereas speech envelope reconstruction did not vary by spectral resolution, intelligible speech was associated with better decoding accuracy in general. Results were similar across subject-specific and group analyses, with less consistent effects of spectral degradation in group decoding. Permutation tests revealed possible differences in decoder statistical significance by experimental condition. In general, while robust neural decoding was observed at the individual and group level, variability within participants would most likely prevent the clinical use of such a measure to differentiate levels of spectral degradation and intelligibility on an individual basis.</p>","PeriodicalId":48678,"journal":{"name":"Trends in Hearing","volume":"28 ","pages":"23312165241266316"},"PeriodicalIF":3.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11345737/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142057002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}