Pub Date : 2025-01-01Epub Date: 2025-11-13DOI: 10.1177/23312165251389585
Nick Sommerhalder, Zbyněk Bureš, Oliver Profant, Tobias Kleinjung, Patrick Neff, Martin Meyer
Adults with chronic subjective tinnitus often struggle with speech recognition in challenging listening environments. While most research demonstrates deficits in speech recognition among individuals with tinnitus, studies focusing on older adults remain scarce. Besides speech recognition deficits, tinnitus has been linked to diminished cognitive performance, particularly in executive functions, yet its associations with specific cognitive domains in ageing populations are not fully understood. Our previous study of younger adults found that individuals with tinnitus exhibit deficits in speech recognition and interference control. Building on this, we hypothesized that these deficits are also present for older adults. We conducted a cross-sectional study of older adults (aged 60-79), 32 with tinnitus and 31 controls matched for age, gender, education, and approximately matched for hearing loss. Participants underwent audiometric, speech recognition, and cognitive tasks. The tinnitus participants performed more poorly in speech-in-noise and gated speech tasks, whereas no group differences were observed in the other suprathreshold auditory tasks. With regard to cognition, individuals with tinnitus showed reduced interference control, emotional interference, cognitive flexibility, and verbal working memory, correlating with tinnitus distress and loudness. It is concluded that tinnitus-related deficits persist and even worsen with age. Our results suggest that altered central mechanisms contribute to speech recognition difficulties in older adults with tinnitus.
{"title":"Association of Tinnitus With Speech Recognition and Executive Functions in Older Adults.","authors":"Nick Sommerhalder, Zbyněk Bureš, Oliver Profant, Tobias Kleinjung, Patrick Neff, Martin Meyer","doi":"10.1177/23312165251389585","DOIUrl":"10.1177/23312165251389585","url":null,"abstract":"<p><p>Adults with chronic subjective tinnitus often struggle with speech recognition in challenging listening environments. While most research demonstrates deficits in speech recognition among individuals with tinnitus, studies focusing on older adults remain scarce. Besides speech recognition deficits, tinnitus has been linked to diminished cognitive performance, particularly in executive functions, yet its associations with specific cognitive domains in ageing populations are not fully understood. Our previous study of younger adults found that individuals with tinnitus exhibit deficits in speech recognition and interference control. Building on this, we hypothesized that these deficits are also present for older adults. We conducted a cross-sectional study of older adults (aged 60-79), 32 with tinnitus and 31 controls matched for age, gender, education, and approximately matched for hearing loss. Participants underwent audiometric, speech recognition, and cognitive tasks. The tinnitus participants performed more poorly in speech-in-noise and gated speech tasks, whereas no group differences were observed in the other suprathreshold auditory tasks. With regard to cognition, individuals with tinnitus showed reduced interference control, emotional interference, cognitive flexibility, and verbal working memory, correlating with tinnitus distress and loudness. It is concluded that tinnitus-related deficits persist and even worsen with age. Our results suggest that altered central mechanisms contribute to speech recognition difficulties in older adults with tinnitus.</p>","PeriodicalId":48678,"journal":{"name":"Trends in Hearing","volume":"29 ","pages":"23312165251389585"},"PeriodicalIF":3.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12615926/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145514780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01Epub Date: 2025-11-24DOI: 10.1177/23312165251397373
E Sebastian Lelo de Larrea-Mancera, Tess K Koerner, William J Bologna, Sara Momtaz, Katherine N Menon, Audrey Carrillo, Eric C Hoover, G Christopher Stecker, Frederick J Gallun, Aaron R Seitz
Previous research has demonstrated that remote testing of suprathreshold auditory function using distributed technologies can produce results that closely match those obtained in laboratory settings with specialized, calibrated equipment. This work has facilitated the validation of various behavioral measures in remote settings that provide valuable insights into auditory function. In the current study, we sought to address whether a broad battery of auditory assessments could explain variance in self-report of hearing handicap. To address this, we used a portable psychophysics assessment tool along with an online recruitment tool (Prolific) to collect auditory task data from participants with (n= 84) and without (n= 108) self-reported hearing difficulty. Results indicate several measures of auditory processing differentiate participants with and without self-reported hearing difficulty. In addition, we report the factor structure of the test battery to clarify the underlying constructs and the extent to which they individually or jointly inform hearing function. Relationships between measures of auditory processing were found to be largely consistent with a hypothesized construct model that guided task selection. Overall, this study advances our understanding of the relationship between auditory and cognitive processing in those with and without subjective hearing difficulty. More broadly, these results indicate promise that these measures can be used in larger scale research studies in remote settings and have potential to contribute to telehealth approaches to better address people's hearing needs.
{"title":"At-Home Auditory Assessment Using Portable Automated Rapid Testing (PART) to Understand Self-Reported Hearing Difficulties.","authors":"E Sebastian Lelo de Larrea-Mancera, Tess K Koerner, William J Bologna, Sara Momtaz, Katherine N Menon, Audrey Carrillo, Eric C Hoover, G Christopher Stecker, Frederick J Gallun, Aaron R Seitz","doi":"10.1177/23312165251397373","DOIUrl":"10.1177/23312165251397373","url":null,"abstract":"<p><p>Previous research has demonstrated that remote testing of suprathreshold auditory function using distributed technologies can produce results that closely match those obtained in laboratory settings with specialized, calibrated equipment. This work has facilitated the validation of various behavioral measures in remote settings that provide valuable insights into auditory function. In the current study, we sought to address whether a broad battery of auditory assessments could explain variance in self-report of hearing handicap. To address this, we used a portable psychophysics assessment tool along with an online recruitment tool (Prolific) to collect auditory task data from participants with (<i>n</i> <i>=</i> 84) and without (<i>n</i> <i>=</i> 108) self-reported hearing difficulty. Results indicate several measures of auditory processing differentiate participants with and without self-reported hearing difficulty. In addition, we report the factor structure of the test battery to clarify the underlying constructs and the extent to which they individually or jointly inform hearing function. Relationships between measures of auditory processing were found to be largely consistent with a hypothesized construct model that guided task selection. Overall, this study advances our understanding of the relationship between auditory and cognitive processing in those with and without subjective hearing difficulty. More broadly, these results indicate promise that these measures can be used in larger scale research studies in remote settings and have potential to contribute to telehealth approaches to better address people's hearing needs.</p>","PeriodicalId":48678,"journal":{"name":"Trends in Hearing","volume":"29 ","pages":"23312165251397373"},"PeriodicalIF":3.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12644446/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145597487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01Epub Date: 2025-11-25DOI: 10.1177/23312165251396644
Vahid Ashkanichenarlogh, Paula Folkeard, Susan Scollie, Volker Kühnel, Vijay Parsa
This study evaluated a deep-neural-network denoising system using model-based design, comparing it with adaptive filtering and beamforming across various noise types, SNRs, and hearing-aid fittings. A KEMAR manikin fitted with five audiograms was recorded in reverberant and non-reverberant rooms, yielding 1,152 recordings. Speech intelligibility was estimated using the HASPI from 1,152 KEMAR manikin recordings. Effects of processing strategy and acoustic factors were tested with model-based within-device design that account for repeated recordings per device/program and fitting. Linear mixed model results showed that the DNN with beamforming outperformed conventional processing, with strongest gains at 0 and +5 dB SNR, moderate benefits at -5 dB in low reverberation, and none in medium reverberation. Across SNRs and noise types, the DNN combined with beamforming yielded the highest predicted intelligibility, with benefits attenuated under moderate reverberation. Azimuth effects varied; because estimates were derived from a better-ear metric on manikin recordings. Additionally, this paper reports comparisons using metrics of sound quality, for an intrusive metric (HASQI) and the pMOS non-intrusive metric. Results indicated that model type interacted with processing and acoustic factors. HASQI and pMOS scores increased with SNR and were moderately correlated (r² ≈ 0.479), supporting the use of non-intrusive metrics for large-scale assessment. However, pMOS showed greater variability across hearing aid programs and environments, suggesting non-intrusive models capture processing effects differently than intrusive metrics. These findings highlight the promise and limits of non-intrusive evaluation while emphasizing the benefit of combining deep learning with beamforming to improve intelligibility and quality.
{"title":"Objective Evaluation of a Deep Learning-Based Noise Reduction Algorithm for Hearing Aids Under Diverse Fitting and Listening Conditions.","authors":"Vahid Ashkanichenarlogh, Paula Folkeard, Susan Scollie, Volker Kühnel, Vijay Parsa","doi":"10.1177/23312165251396644","DOIUrl":"10.1177/23312165251396644","url":null,"abstract":"<p><p>This study evaluated a deep-neural-network denoising system using model-based design, comparing it with adaptive filtering and beamforming across various noise types, SNRs, and hearing-aid fittings. A KEMAR manikin fitted with five audiograms was recorded in reverberant and non-reverberant rooms, yielding 1,152 recordings. Speech intelligibility was estimated using the HASPI from 1,152 KEMAR manikin recordings. Effects of processing strategy and acoustic factors were tested with model-based within-device design that account for repeated recordings per device/program and fitting. Linear mixed model results showed that the DNN with beamforming outperformed conventional processing, with strongest gains at 0 and +5 dB SNR, moderate benefits at -5 dB in low reverberation, and none in medium reverberation. Across SNRs and noise types, the DNN combined with beamforming yielded the highest predicted intelligibility, with benefits attenuated under moderate reverberation. Azimuth effects varied; because estimates were derived from a better-ear metric on manikin recordings. Additionally, this paper reports comparisons using metrics of sound quality, for an intrusive metric (HASQI) and the pMOS non-intrusive metric. Results indicated that model type interacted with processing and acoustic factors. HASQI and pMOS scores increased with SNR and were moderately correlated (r² ≈ 0.479), supporting the use of non-intrusive metrics for large-scale assessment. However, pMOS showed greater variability across hearing aid programs and environments, suggesting non-intrusive models capture processing effects differently than intrusive metrics. These findings highlight the promise and limits of non-intrusive evaluation while emphasizing the benefit of combining deep learning with beamforming to improve intelligibility and quality.</p>","PeriodicalId":48678,"journal":{"name":"Trends in Hearing","volume":"29 ","pages":"23312165251396644"},"PeriodicalIF":3.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12647563/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145606795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01Epub Date: 2025-04-13DOI: 10.1177/23312165251333528
John Kyle Cooper, Jonas Vanthornhout, Astrid van Wieringen, Tom Francart
Speech intelligibility in challenging listening environments relies on the integration of audiovisual cues. Measuring the effectiveness of audiovisual integration in these challenging listening environments can be difficult due to the complexity of such environments. The Audiovisual True-to-Life Assessment of Auditory Rehabilitation (AVATAR) is a paradigm that was developed to provide an ecological environment to capture both the audio and visual aspects of speech intelligibility measures. Previous research has shown the benefit from audiovisual cues can be measured using behavioral (e.g., word recognition) and electrophysiological (e.g., neural tracking) measures. The current research examines, when using the AVATAR paradigm, if electrophysiological measures of speech intelligibility yield similar outcomes as behavioral measures. We hypothesized visual cues would enhance both the behavioral and electrophysiological scores as the signal-to-noise ratio (SNR) of the speech signal decreased. Twenty young (18-25 years old) participants (1 male and 19 female) with normal hearing participated in our study. For our behavioral experiment, we administered lists of sentences using an adaptive procedure to estimate a speech reception threshold (SRT). For our electrophysiological experiment, we administered 35 lists of sentences randomized across five SNR levels (silence, 0, -3, -6, and -9 dB) and two visual conditions (audio-only and audiovisual). We used a neural tracking decoder to measure the reconstruction accuracies for each participant. We observed most participants had higher reconstruction accuracies for the audiovisual condition compared to the audio-only condition in conditions with moderate to high levels of noise. We found the electrophysiological measure may correlate with the behavioral measure that shows audiovisual benefit.
{"title":"Objectively Measuring Audiovisual Effects in Noise Using Virtual Human Speakers.","authors":"John Kyle Cooper, Jonas Vanthornhout, Astrid van Wieringen, Tom Francart","doi":"10.1177/23312165251333528","DOIUrl":"https://doi.org/10.1177/23312165251333528","url":null,"abstract":"<p><p>Speech intelligibility in challenging listening environments relies on the integration of audiovisual cues. Measuring the effectiveness of audiovisual integration in these challenging listening environments can be difficult due to the complexity of such environments. The Audiovisual True-to-Life Assessment of Auditory Rehabilitation (AVATAR) is a paradigm that was developed to provide an ecological environment to capture both the audio and visual aspects of speech intelligibility measures. Previous research has shown the benefit from audiovisual cues can be measured using behavioral (e.g., word recognition) and electrophysiological (e.g., neural tracking) measures. The current research examines, when using the AVATAR paradigm, if electrophysiological measures of speech intelligibility yield similar outcomes as behavioral measures. We hypothesized visual cues would enhance both the behavioral and electrophysiological scores as the signal-to-noise ratio (SNR) of the speech signal decreased. Twenty young (18-25 years old) participants (1 male and 19 female) with normal hearing participated in our study. For our behavioral experiment, we administered lists of sentences using an adaptive procedure to estimate a speech reception threshold (SRT). For our electrophysiological experiment, we administered 35 lists of sentences randomized across five SNR levels (silence, 0, -3, -6, and -9 dB) and two visual conditions (audio-only and audiovisual). We used a neural tracking decoder to measure the reconstruction accuracies for each participant. We observed most participants had higher reconstruction accuracies for the audiovisual condition compared to the audio-only condition in conditions with moderate to high levels of noise. We found the electrophysiological measure may correlate with the behavioral measure that shows audiovisual benefit.</p>","PeriodicalId":48678,"journal":{"name":"Trends in Hearing","volume":"29 ","pages":"23312165251333528"},"PeriodicalIF":2.6,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12033406/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144043708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01DOI: 10.1177/23312165241311721
Onn Wah Lee, Demi Gao, Tommy Peng, Julia Wunderlich, Darren Mao, Gautam Balasubramanian, Colette M McKay
This study used functional near-infrared spectroscopy (fNIRS) to measure aspects of the speech discrimination ability of sleeping infants. We examined the morphology of the fNIRS response to three different speech contrasts, namely "Tea/Ba," "Bee/Ba," and "Ga/Ba." Sixteen infants aged between 3 and 13 months old were included in this study and their fNIRS data were recorded during natural sleep. The stimuli were presented using a nonsilence baseline paradigm, where repeated standard stimuli were presented between the novel stimuli blocks without any silence periods. The morphology of fNIRS responses varied between speech contrasts. The data were fit with a model in which the responses were the sum of two independent and concurrent response mechanisms that were derived from previously published fNIRS detection responses. These independent components were an oxyhemoglobin (HbO)-positive early-latency response and an HbO-negative late latency response, hypothesized to be related to an auditory canonical response and a brain arousal response, respectively. The goodness of fit of the model with the data was high with median goodness of fit of 81%. The data showed that both response components had later latency when the left ear was the test ear (p < .05) compared to the right ear and that the negative component, due to brain arousal, was smallest for the most subtle contrast, "Ga/Ba" (p = .003).
本研究使用功能近红外光谱(fNIRS)来测量睡眠婴儿的语言识别能力。我们研究了三种不同语音对比的fNIRS反应形态,即“Tea/Ba”、“Bee/Ba”和“Ga/Ba”。16名年龄在3到13个月之间的婴儿参与了这项研究,并记录了他们在自然睡眠期间的近红外光谱数据。刺激采用非沉默基线范式,在新刺激块之间重复呈现标准刺激,没有任何沉默期。不同语音对照的fNIRS反应形态不同。数据拟合了一个模型,其中响应是两个独立且并发的响应机制的总和,这些响应机制来源于先前发表的fNIRS检测响应。这些独立的成分是一个氧合血红蛋白(HbO)阳性的早期潜伏期反应和一个氧合血红蛋白阴性的晚期潜伏期反应,假设分别与听觉规范反应和大脑唤醒反应有关。模型与数据的拟合优度较高,中位拟合优度为81%。数据显示,当左耳为测试耳时,两种反应成分的潜伏期均较晚(p p = 0.003)。
{"title":"Measuring Speech Discrimination Ability in Sleeping Infants Using fNIRS-A Proof of Principle.","authors":"Onn Wah Lee, Demi Gao, Tommy Peng, Julia Wunderlich, Darren Mao, Gautam Balasubramanian, Colette M McKay","doi":"10.1177/23312165241311721","DOIUrl":"10.1177/23312165241311721","url":null,"abstract":"<p><p>This study used functional near-infrared spectroscopy (fNIRS) to measure aspects of the speech discrimination ability of sleeping infants. We examined the morphology of the fNIRS response to three different speech contrasts, namely \"Tea/Ba,\" \"Bee/Ba,\" and \"Ga/Ba.\" Sixteen infants aged between 3 and 13 months old were included in this study and their fNIRS data were recorded during natural sleep. The stimuli were presented using a nonsilence baseline paradigm, where repeated standard stimuli were presented between the novel stimuli blocks without any silence periods. The morphology of fNIRS responses varied between speech contrasts. The data were fit with a model in which the responses were the sum of two independent and concurrent response mechanisms that were derived from previously published fNIRS detection responses. These independent components were an oxyhemoglobin (HbO)-positive early-latency response and an HbO-negative late latency response, hypothesized to be related to an auditory canonical response and a brain arousal response, respectively. The goodness of fit of the model with the data was high with median goodness of fit of 81%. The data showed that both response components had later latency when the left ear was the test ear (<i>p</i> < .05) compared to the right ear and that the negative component, due to brain arousal, was smallest for the most subtle contrast, \"Ga/Ba\" (<i>p</i> = .003).</p>","PeriodicalId":48678,"journal":{"name":"Trends in Hearing","volume":"29 ","pages":"23312165241311721"},"PeriodicalIF":2.6,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11758514/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143030151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01Epub Date: 2025-08-11DOI: 10.1177/23312165251365802
Ragini Sinha, Ann-Christin Scherer, Simon Doclo, Christian Rollwage, Jan Rennies
Speaker-conditioned target speaker extraction algorithms aim at extracting the target speaker from a mixture of multiple speakers by using additional information about the target speaker. Previous studies have evaluated the performance of these algorithms using either instrumental measures or subjective assessments with normal-hearing listeners or with hearing-impaired listeners. Notably, a previous study employing a quasicausal algorithm reported significant intelligibility improvements for both normal-hearing and hearing-impaired listeners, while another study demonstrated that a fully causal algorithm could enhance speech intelligibility and reduce listening effort for normal-hearing listeners. Building on these findings, this study focuses on an in-depth subjective assessment of two fully causal deep neural network-based speaker-conditioned target speaker extraction algorithms with hearing-impaired listeners, both without hearing loss compensation (unaided) and with linear hearing loss compensation (aided). Three different subjective performance measurement methods were used to cover a broad range of listening conditions, namely paired comparison, speech recognition thresholds, and categorically scaled perceived listening effort. The subjective evaluation results with 15 hearing-impaired listeners showed that one algorithm significantly reduced listening effort and improved intelligibility compared to unprocessed stimuli and the other algorithm. The data also suggest that hearing-impaired listeners experience a greater benefit in terms of listening effort (for both male and female interfering speakers) and speech recognition thresholds, especially in the presence of female interfering speakers than normal-hearing listeners, and that hearing loss compensation (linear amplification) is not required to obtain an algorithm benefit.
{"title":"Evaluation of Speaker-Conditioned Target Speaker Extraction Algorithms for Hearing-Impaired Listeners.","authors":"Ragini Sinha, Ann-Christin Scherer, Simon Doclo, Christian Rollwage, Jan Rennies","doi":"10.1177/23312165251365802","DOIUrl":"10.1177/23312165251365802","url":null,"abstract":"<p><p>Speaker-conditioned target speaker extraction algorithms aim at extracting the target speaker from a mixture of multiple speakers by using additional information about the target speaker. Previous studies have evaluated the performance of these algorithms using either instrumental measures or subjective assessments with normal-hearing listeners or with hearing-impaired listeners. Notably, a previous study employing a quasicausal algorithm reported significant intelligibility improvements for both normal-hearing and hearing-impaired listeners, while another study demonstrated that a fully causal algorithm could enhance speech intelligibility and reduce listening effort for normal-hearing listeners. Building on these findings, this study focuses on an in-depth subjective assessment of two fully causal deep neural network-based speaker-conditioned target speaker extraction algorithms with hearing-impaired listeners, both without hearing loss compensation (unaided) and with linear hearing loss compensation (aided). Three different subjective performance measurement methods were used to cover a broad range of listening conditions, namely paired comparison, speech recognition thresholds, and categorically scaled perceived listening effort. The subjective evaluation results with 15 hearing-impaired listeners showed that one algorithm significantly reduced listening effort and improved intelligibility compared to unprocessed stimuli and the other algorithm. The data also suggest that hearing-impaired listeners experience a greater benefit in terms of listening effort (for both male and female interfering speakers) and speech recognition thresholds, especially in the presence of female interfering speakers than normal-hearing listeners, and that hearing loss compensation (linear amplification) is not required to obtain an algorithm benefit.</p>","PeriodicalId":48678,"journal":{"name":"Trends in Hearing","volume":"29 ","pages":"23312165251365802"},"PeriodicalIF":3.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12340209/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144817996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01Epub Date: 2025-08-10DOI: 10.1177/23312165251365824
Julie Kirwan, Deniz Başkent, Anita Wagner
Emotions can be communicated through visual and dynamic characteristics such as smiles and gestures, but also through auditory channels such as laughter, music, and human speech. Pupil dilation has become a notable marker for visual emotion processing; however the pupil's sensitivity to emotional sounds, specifically speech, remains largely underexplored. This study investigated the processing of emotional pseudospeech, which are speech-like sentences devoid of semantic content. We measured participants' pupil dilations while they listened to pseudospeech, music, and human vocalizations, and subsequently performed an emotion recognition task. Our results showed that emotional pseudospeech can trigger increases of pupil dilation compared to neutral pseudospeech, supporting the use of pupillometry as a tool for indexing prosodic emotion processing in the absence of semantics. However, pupil responses to pseudospeech were smaller and slower than the responses evoked by human vocalizations. The pupillary response was not sensitive enough to distinguish between emotion categories in pseudospeech, but pupil dilations to music and vocalizations reflected some emotion-specific pupillary curves. The valence of the stimulus had a stronger overall influence on pupil size than arousal. These results highlight the potential for pupillometry in studying auditory emotion processing and provide a foundation for contextualizing pseudospeech alongside other affective auditory stimuli.
{"title":"The Time Course of the Pupillary Response to Auditory Emotions in Pseudospeech, Music, and Vocalizations.","authors":"Julie Kirwan, Deniz Başkent, Anita Wagner","doi":"10.1177/23312165251365824","DOIUrl":"10.1177/23312165251365824","url":null,"abstract":"<p><p>Emotions can be communicated through visual and dynamic characteristics such as smiles and gestures, but also through auditory channels such as laughter, music, and human speech. Pupil dilation has become a notable marker for visual emotion processing; however the pupil's sensitivity to emotional sounds, specifically speech, remains largely underexplored. This study investigated the processing of emotional pseudospeech, which are speech-like sentences devoid of semantic content. We measured participants' pupil dilations while they listened to pseudospeech, music, and human vocalizations, and subsequently performed an emotion recognition task. Our results showed that emotional pseudospeech can trigger increases of pupil dilation compared to neutral pseudospeech, supporting the use of pupillometry as a tool for indexing prosodic emotion processing in the absence of semantics. However, pupil responses to pseudospeech were smaller and slower than the responses evoked by human vocalizations. The pupillary response was not sensitive enough to distinguish between emotion categories in pseudospeech, but pupil dilations to music and vocalizations reflected some emotion-specific pupillary curves. The valence of the stimulus had a stronger overall influence on pupil size than arousal. These results highlight the potential for pupillometry in studying auditory emotion processing and provide a foundation for contextualizing pseudospeech alongside other affective auditory stimuli.</p>","PeriodicalId":48678,"journal":{"name":"Trends in Hearing","volume":"29 ","pages":"23312165251365824"},"PeriodicalIF":3.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12340197/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144817997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01Epub Date: 2025-03-25DOI: 10.1177/23312165251328055
Lucas S Baltzell, Kosta Kokkinakis, Amy Li, Anusha Yellamsetty, Katherine Teece, Peggy B Nelson
In October of 2022, the US Food and Drug Administration finalized regulations establishing the category of self-fitting over-the-counter (OTC) hearing aids, intended to reduce barriers to hearing aid adoption for individuals with self-perceived mild to moderate hearing loss. Since then a number of self-fitting OTC hearing aids have entered the market, and a small number of published studies have demonstrated the effectiveness of a self-fitted OTC intervention against a traditional clinician-fitted intervention. Given the variety of self-fitting approaches available, and the small number of studies demonstrating effectiveness, the goal of the present study was to evaluate the effectiveness of a commercially available self-fitting OTC hearing aid intervention against a clinician-fitted intervention. Consistent with previous studies, we found that the self-fitted intervention was not inferior to the clinician-fitted intervention for self-reported benefit and objective speech-in-noise outcomes. We found statistically significant improvements in self-fitted outcomes compared to clinician-fitted outcomes, though deviations from best audiological practices in our clinician-fitted intervention may have influenced our results. In addition to presenting our results, we discuss the state of evaluating the noninferiority of self-fitted interventions and offer some new perspectives.
{"title":"Validation of a Self-Fitting Over-the-Counter Hearing Aid Intervention Compared with a Clinician-Fitted Hearing Aid Intervention: A Within-Subjects Crossover Design Using the Same Device.","authors":"Lucas S Baltzell, Kosta Kokkinakis, Amy Li, Anusha Yellamsetty, Katherine Teece, Peggy B Nelson","doi":"10.1177/23312165251328055","DOIUrl":"10.1177/23312165251328055","url":null,"abstract":"<p><p>In October of 2022, the US Food and Drug Administration finalized regulations establishing the category of self-fitting over-the-counter (OTC) hearing aids, intended to reduce barriers to hearing aid adoption for individuals with self-perceived mild to moderate hearing loss. Since then a number of self-fitting OTC hearing aids have entered the market, and a small number of published studies have demonstrated the effectiveness of a self-fitted OTC intervention against a traditional clinician-fitted intervention. Given the variety of self-fitting approaches available, and the small number of studies demonstrating effectiveness, the goal of the present study was to evaluate the effectiveness of a commercially available self-fitting OTC hearing aid intervention against a clinician-fitted intervention. Consistent with previous studies, we found that the self-fitted intervention was not inferior to the clinician-fitted intervention for self-reported benefit and objective speech-in-noise outcomes. We found statistically significant improvements in self-fitted outcomes compared to clinician-fitted outcomes, though deviations from best audiological practices in our clinician-fitted intervention may have influenced our results. In addition to presenting our results, we discuss the state of evaluating the noninferiority of self-fitted interventions and offer some new perspectives.</p>","PeriodicalId":48678,"journal":{"name":"Trends in Hearing","volume":"29 ","pages":"23312165251328055"},"PeriodicalIF":2.6,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11938855/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143701449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Voice cloning is used to generate synthetic speech that mimics vocal characteristics of human talkers. This experiment used voice cloning to compare human and synthetic speech for intelligibility, human-likeness, and perceptual similarity, all tested in young adults with normal hearing. Masked-sentence recognition was evaluated using speech produced by five human talkers and their synthetically generated voice clones presented in speech-shaped noise at -6 dB signal-to-noise ratio. There were two types of sentences: semantically meaningful and nonsense. Human and automatic speech recognition scoring was used to evaluate performance. Participants were asked to rate human-likeness and determine whether pairs of sentences were produced by the same versus different people. As expected, sentence-recognition scores were worse for nonsense sentences compared to meaningful sentences, but they were similar for speech produced by human talkers and voice clones. Human-likeness scores were also similar for speech produced by human talkers and their voice clones. Participants were very good at identifying differences between voices but were less accurate at distinguishing between human/clone pairs, often leaning towards thinking they were produced by the same person. Reliability scoring by automatic speech recognition agreed with human reliability scoring for 98% of keywords and was minimally dependent on the context of the target sentences. Results provide preliminary support for the use of voice clones when evaluating the recognition of human and synthetic speech. More generally, voice synthesis and automatic speech recognition are promising tools for evaluating speech recognition in human listeners.
{"title":"Masked-speech Recognition Using Human and Synthetic Cloned Speech.","authors":"Lauren Calandruccio, Mohsen Hariri, Emily Buss, Vipin Chaudhary","doi":"10.1177/23312165251403080","DOIUrl":"10.1177/23312165251403080","url":null,"abstract":"<p><p>Voice cloning is used to generate synthetic speech that mimics vocal characteristics of human talkers. This experiment used voice cloning to compare human and synthetic speech for intelligibility, human-likeness, and perceptual similarity, all tested in young adults with normal hearing. Masked-sentence recognition was evaluated using speech produced by five human talkers and their synthetically generated voice clones presented in speech-shaped noise at -6 dB signal-to-noise ratio. There were two types of sentences: semantically meaningful and nonsense. Human and automatic speech recognition scoring was used to evaluate performance. Participants were asked to rate human-likeness and determine whether pairs of sentences were produced by the same versus different people. As expected, sentence-recognition scores were worse for nonsense sentences compared to meaningful sentences, but they were similar for speech produced by human talkers and voice clones. Human-likeness scores were also similar for speech produced by human talkers and their voice clones. Participants were very good at identifying differences between voices but were less accurate at distinguishing between human/clone pairs, often leaning towards thinking they were produced by the same person. Reliability scoring by automatic speech recognition agreed with human reliability scoring for 98% of keywords and was minimally dependent on the context of the target sentences. Results provide preliminary support for the use of voice clones when evaluating the recognition of human and synthetic speech. More generally, voice synthesis and automatic speech recognition are promising tools for evaluating speech recognition in human listeners.</p>","PeriodicalId":48678,"journal":{"name":"Trends in Hearing","volume":"29 ","pages":"23312165251403080"},"PeriodicalIF":3.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12686364/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145702670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-14DOI: 10.1177/23312165241266322
David López-Ramos, Miriam I. Marrufo-Pérez, Almudena Eustaquio-Martín, Luis E. López-Bascuas, Enrique A. Lopez-Poveda
Noise adaptation is the improvement in auditory function as the signal of interest is delayed in the noise. Here, we investigated if noise adaptation occurs in spectral, temporal, and spectrotemporal modulation detection as well as in speech recognition. Eighteen normal-hearing adults participated in the experiments. In the modulation detection tasks, the signal was a 200ms spectrally and/or temporally modulated ripple noise. The spectral modulation rate was two cycles per octave, the temporal modulation rate was 10 Hz, and the spectrotemporal modulations combined these two modulations, which resulted in a downward-moving ripple. A control experiment was performed to determine if the results generalized to upward-moving ripples. In the speech recognition task, the signal consisted of disyllabic words unprocessed or vocoded to maintain only envelope cues. Modulation detection thresholds at 0 dB signal-to-noise ratio and speech reception thresholds were measured in quiet and in white noise (at 60 dB SPL) for noise-signal onset delays of 50 ms (early condition) and 800 ms (late condition). Adaptation was calculated as the threshold difference between the early and late conditions. Adaptation in word recognition was statistically significant for vocoded words (2.1 dB) but not for natural words (0.6 dB). Adaptation was found to be statistically significant in spectral (2.1 dB) and temporal (2.2 dB) modulation detection but not in spectrotemporal modulation detection (downward ripple: 0.0 dB, upward ripple: −0.4 dB). Findings suggest that noise adaptation in speech recognition is unrelated to improvements in the encoding of spectrotemporal modulation cues.
{"title":"Adaptation to Noise in Spectrotemporal Modulation Detection and Word Recognition","authors":"David López-Ramos, Miriam I. Marrufo-Pérez, Almudena Eustaquio-Martín, Luis E. López-Bascuas, Enrique A. Lopez-Poveda","doi":"10.1177/23312165241266322","DOIUrl":"https://doi.org/10.1177/23312165241266322","url":null,"abstract":"Noise adaptation is the improvement in auditory function as the signal of interest is delayed in the noise. Here, we investigated if noise adaptation occurs in spectral, temporal, and spectrotemporal modulation detection as well as in speech recognition. Eighteen normal-hearing adults participated in the experiments. In the modulation detection tasks, the signal was a 200ms spectrally and/or temporally modulated ripple noise. The spectral modulation rate was two cycles per octave, the temporal modulation rate was 10 Hz, and the spectrotemporal modulations combined these two modulations, which resulted in a downward-moving ripple. A control experiment was performed to determine if the results generalized to upward-moving ripples. In the speech recognition task, the signal consisted of disyllabic words unprocessed or vocoded to maintain only envelope cues. Modulation detection thresholds at 0 dB signal-to-noise ratio and speech reception thresholds were measured in quiet and in white noise (at 60 dB SPL) for noise-signal onset delays of 50 ms (early condition) and 800 ms (late condition). Adaptation was calculated as the threshold difference between the early and late conditions. Adaptation in word recognition was statistically significant for vocoded words (2.1 dB) but not for natural words (0.6 dB). Adaptation was found to be statistically significant in spectral (2.1 dB) and temporal (2.2 dB) modulation detection but not in spectrotemporal modulation detection (downward ripple: 0.0 dB, upward ripple: −0.4 dB). Findings suggest that noise adaptation in speech recognition is unrelated to improvements in the encoding of spectrotemporal modulation cues.","PeriodicalId":48678,"journal":{"name":"Trends in Hearing","volume":"44 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142256779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}