Francisco Alves, Mário Santos, André Alvarenga, Lorena Petrella
The calibration of ultrasound diagnostic equipment is essential to ensure their effectiveness and safety. Calibrating acoustic fields using hydrophones involves the measurement of the maximum pressure point. Since the signal at the hydrophone output is proportional to the average pressure incident on its surface, when the active area of the hydrophone is larger than the area of the ultrasonic beam at the focus, the lower acoustic pressures surrounding the point of maximum pressure will cause an underestimation of this value. This phenomenon is referred as the spatial averaging effect. The main limitation in the use of smaller hydrophones is the inherent reduction in sensitivity. The International Electrotechnical Commission standards provide methods to correct for the spatial averaging effect when the ratio of the - 6 dB beam width to the hydrophone diameter (Rbh) is higher than 1.5. In this study, a novel method is presented for spatial averaging correction, developed using computational simulation. It consists of an empirical correction factor and allows extending corrections for Rbh values as low as 0.35, with errors below 3%, addressing the compromise between precision and sensitivity of the hydrophone. This method also generalizes to ultrasonic probes with varying characteristics.
{"title":"Empirical correction method for spatial averaging effect in ultrasonic device calibration: Enhancing the precision-sensitivity trade-off.","authors":"Francisco Alves, Mário Santos, André Alvarenga, Lorena Petrella","doi":"10.1121/10.0042356","DOIUrl":"https://doi.org/10.1121/10.0042356","url":null,"abstract":"<p><p>The calibration of ultrasound diagnostic equipment is essential to ensure their effectiveness and safety. Calibrating acoustic fields using hydrophones involves the measurement of the maximum pressure point. Since the signal at the hydrophone output is proportional to the average pressure incident on its surface, when the active area of the hydrophone is larger than the area of the ultrasonic beam at the focus, the lower acoustic pressures surrounding the point of maximum pressure will cause an underestimation of this value. This phenomenon is referred as the spatial averaging effect. The main limitation in the use of smaller hydrophones is the inherent reduction in sensitivity. The International Electrotechnical Commission standards provide methods to correct for the spatial averaging effect when the ratio of the - 6 dB beam width to the hydrophone diameter (Rbh) is higher than 1.5. In this study, a novel method is presented for spatial averaging correction, developed using computational simulation. It consists of an empirical correction factor and allows extending corrections for Rbh values as low as 0.35, with errors below 3%, addressing the compromise between precision and sensitivity of the hydrophone. This method also generalizes to ultrasonic probes with varying characteristics.</p>","PeriodicalId":17168,"journal":{"name":"Journal of the Acoustical Society of America","volume":"159 2","pages":"1027-1035"},"PeriodicalIF":2.3,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146105957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This study evaluates Count-the-Dots Audiogram approaches as a simplified clinically viable method to closely estimate the American National Standards Institute [ANSI (1997). S3.5-1997) Speech Intelligibility Index (SII)] standard in quiet environments. We compared audibility calculations and predicted intelligibility scores between Count-the-Dots methods and multiple ANSI [(1997). S3.5-1997)] SII variants, using eight frequency Band Importance Functions (BIF) for 14 776 audiograms from the National Health and Nutrition Examination Survey dataset. Results showed that Count-the-Dots methods closely approximate the ANSI [(1997). S3.5-1997)] SII model as long as the speech levels and the BIF used for calculations were equivalent between the two methods. This was true for audibility calculations and speech intelligibility predictions. However, deviations occurred at higher speech levels [≥65 dB sound pressure level (SPL)] because of differences in how masking is modeled. Count-the-Dots audiogram approaches offer a clinically viable, intuitive alternative for counseling purposes in quiet settings, particularly at natural speech levels (about 55 dB SPL). However, for speech-in-noise conditions, high-level speech, or aided speech inputs, the ANSI [(1997). S3.5-1997)] SII remains the preferred model because of its more detailed acoustic modeling.
本研究评估了点数听力图方法作为一种简化的临床可行方法来密切评估美国国家标准协会[ANSI(1997)]。S3.5-1997)安静环境下的语音清晰度指数(SII)]标准。我们比较了“点计数”方法和多个ANSI[(1997)]之间的可听性计算和预测可理解性分数。S3.5-1997)] SII变体,使用8个频带重要性函数(BIF)对来自国家健康和营养检查调查数据集的14776个听力图进行分析。结果表明,Count-the-Dots方法非常接近ANSI[(1997)]。(S3.5-1997)] SII模型,只要两种方法计算的语音电平和BIF相等。对于可听性计算和语音可理解性预测来说,这是正确的。然而,在较高的语音水平[≥65 dB声压级(SPL)]下,由于掩蔽建模方式的差异,出现了偏差。在安静环境下,特别是在自然语音水平(约55 dB SPL)下,点数听力学方法为咨询目的提供了临床可行的、直观的替代方案。然而,对于噪声中的语音条件,高级语音或辅助语音输入,ANSI[(1997)]。S3.5-1997)] SII仍然是首选模型,因为它更详细的声学建模。
{"title":"Validation of Count-the-Dots audiogram approaches to calculating speech intelligibility indices.","authors":"Koenraad S Rhebergen, Chaslav V Pavlovic","doi":"10.1121/10.0042425","DOIUrl":"https://doi.org/10.1121/10.0042425","url":null,"abstract":"<p><p>This study evaluates Count-the-Dots Audiogram approaches as a simplified clinically viable method to closely estimate the American National Standards Institute [ANSI (1997). S3.5-1997) Speech Intelligibility Index (SII)] standard in quiet environments. We compared audibility calculations and predicted intelligibility scores between Count-the-Dots methods and multiple ANSI [(1997). S3.5-1997)] SII variants, using eight frequency Band Importance Functions (BIF) for 14 776 audiograms from the National Health and Nutrition Examination Survey dataset. Results showed that Count-the-Dots methods closely approximate the ANSI [(1997). S3.5-1997)] SII model as long as the speech levels and the BIF used for calculations were equivalent between the two methods. This was true for audibility calculations and speech intelligibility predictions. However, deviations occurred at higher speech levels [≥65 dB sound pressure level (SPL)] because of differences in how masking is modeled. Count-the-Dots audiogram approaches offer a clinically viable, intuitive alternative for counseling purposes in quiet settings, particularly at natural speech levels (about 55 dB SPL). However, for speech-in-noise conditions, high-level speech, or aided speech inputs, the ANSI [(1997). S3.5-1997)] SII remains the preferred model because of its more detailed acoustic modeling.</p>","PeriodicalId":17168,"journal":{"name":"Journal of the Acoustical Society of America","volume":"159 2","pages":"1337-1347"},"PeriodicalIF":2.3,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146149759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Itai Allouche, Itay Asael, Rotem Rousso, Vered Dassa, Ann Bradlow, Seung-Eun Kim, Matthew Goldrick, Joseph Keshet
Despite their success in speech processing, neural networks often operate as black boxes, prompting the following questions: What informs their decisions, and how can we interpret them? This work examines this issue in the context of lexical stress. A dataset of English disyllabic words was automatically constructed from read and spontaneous speech. Several convolutional neural network (CNN) architectures were trained to predict stress position from a spectrographic representation of disyllabic words lacking minimal stress pairs (e.g., initial stress WAllet, final stress exTEND), achieving up to 92% accuracy on held-out test data. Layerwise relevance propagation, a technique for neural network interpretability analysis, revealed that predictions for held-out minimal pairs (PROtest vs proTEST) were most strongly influenced by information in stressed versus unstressed syllables, particularly the spectral properties of stressed vowels. However, the classifiers also attended to information throughout the word. A feature-specific relevance analysis is proposed, and its results suggest that the best-performing classifier is strongly influenced by the stressed vowel's first and second formants, with some evidence that its pitch and third formant also contribute. These results reveal deep learning's ability to acquire distributed cues to stress from naturally occurring data, extending traditional phonetic work based around highly controlled stimuli.
尽管神经网络在语音处理方面取得了成功,但它们经常像黑盒子一样运作,这引发了以下问题:是什么影响了它们的决定,我们如何解释这些决定?这项工作考察了这个问题在词汇重音的背景下。从阅读和自发语音中自动构建英语双音节词数据集。几个卷积神经网络(CNN)架构被训练来从缺乏最小应力对的双音节单词的频谱表示(例如,初始应力WAllet,最终应力exTEND)中预测应力位置,在保持测试数据上达到高达92%的准确率。分层关联传播是一种神经网络可解释性分析技术,它揭示了对保持最小对(PROtest vs . PROtest)的预测最强烈地受到重音和非重音音节信息的影响,尤其是重音元音的频谱特性。然而,分类器也关注整个单词的信息。提出了一种特征相关分析,其结果表明,表现最好的分类器受到重读元音的第一个和第二个共振峰的强烈影响,有证据表明其音高和第三个共振峰也有贡献。这些结果揭示了深度学习从自然发生的数据中获取分布式压力线索的能力,扩展了基于高度受控刺激的传统语音工作。
{"title":"How does a deep neural network look at lexical stress in English words?","authors":"Itai Allouche, Itay Asael, Rotem Rousso, Vered Dassa, Ann Bradlow, Seung-Eun Kim, Matthew Goldrick, Joseph Keshet","doi":"10.1121/10.0042429","DOIUrl":"https://doi.org/10.1121/10.0042429","url":null,"abstract":"<p><p>Despite their success in speech processing, neural networks often operate as black boxes, prompting the following questions: What informs their decisions, and how can we interpret them? This work examines this issue in the context of lexical stress. A dataset of English disyllabic words was automatically constructed from read and spontaneous speech. Several convolutional neural network (CNN) architectures were trained to predict stress position from a spectrographic representation of disyllabic words lacking minimal stress pairs (e.g., initial stress WAllet, final stress exTEND), achieving up to 92% accuracy on held-out test data. Layerwise relevance propagation, a technique for neural network interpretability analysis, revealed that predictions for held-out minimal pairs (PROtest vs proTEST) were most strongly influenced by information in stressed versus unstressed syllables, particularly the spectral properties of stressed vowels. However, the classifiers also attended to information throughout the word. A feature-specific relevance analysis is proposed, and its results suggest that the best-performing classifier is strongly influenced by the stressed vowel's first and second formants, with some evidence that its pitch and third formant also contribute. These results reveal deep learning's ability to acquire distributed cues to stress from naturally occurring data, extending traditional phonetic work based around highly controlled stimuli.</p>","PeriodicalId":17168,"journal":{"name":"Journal of the Acoustical Society of America","volume":"159 2","pages":"1348-1358"},"PeriodicalIF":2.3,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146157220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Antonio Stanziola, Simon R Arridge, Bradley E Treeby, Benjamin T Cox
Efficient numerical solution of the acoustic Helmholtz equation in heterogeneous media remains challenging, particularly for large-scale problems with spatially varying density-a limitation that restricts applications in biomedical acoustics and seismic imaging. A fast iterative solver that extends the convergent Born series [Osnabrugge, Leedumrongwatthanakun, and Vellekoop, J. Comput. Phys. 322, 113-124 (2016)] method to handle arbitrary variations in sound speed, density, and absorption simultaneously is presented. This approach reformulates the Helmholtz equation as a first-order system and applies the universal split-preconditioner from Vettenburg and Vellekoop [arXiv:2207.14222v2 (2022)], yielding a matrix-free algorithm that leverages Fast Fourier Transforms for computational efficiency. Unlike existing Born series methods, this solver accommodates heterogeneous density without requiring expensive matrix decompositions or preprocessing steps, making it suitable for large-scale three-dimensional problems with minimal memory overhead. The method provides forward and adjoint solutions, enabling its application for inverse problems. Accuracy is validated through comparison against an analytical solution and the solver's practical utility is demonstrated through transcranial ultrasound simulations. The solver achieves convergence for strong scattering scenarios, offering a computationally efficient alternative to time-domain methods and matrix-based Helmholtz solvers for applications ranging from medical ultrasound treatment planning to seismic exploration.
{"title":"Iterative Born solver for the acoustic Helmholtz equation with heterogeneous sound speed and density.","authors":"Antonio Stanziola, Simon R Arridge, Bradley E Treeby, Benjamin T Cox","doi":"10.1121/10.0042259","DOIUrl":"https://doi.org/10.1121/10.0042259","url":null,"abstract":"<p><p>Efficient numerical solution of the acoustic Helmholtz equation in heterogeneous media remains challenging, particularly for large-scale problems with spatially varying density-a limitation that restricts applications in biomedical acoustics and seismic imaging. A fast iterative solver that extends the convergent Born series [Osnabrugge, Leedumrongwatthanakun, and Vellekoop, J. Comput. Phys. 322, 113-124 (2016)] method to handle arbitrary variations in sound speed, density, and absorption simultaneously is presented. This approach reformulates the Helmholtz equation as a first-order system and applies the universal split-preconditioner from Vettenburg and Vellekoop [arXiv:2207.14222v2 (2022)], yielding a matrix-free algorithm that leverages Fast Fourier Transforms for computational efficiency. Unlike existing Born series methods, this solver accommodates heterogeneous density without requiring expensive matrix decompositions or preprocessing steps, making it suitable for large-scale three-dimensional problems with minimal memory overhead. The method provides forward and adjoint solutions, enabling its application for inverse problems. Accuracy is validated through comparison against an analytical solution and the solver's practical utility is demonstrated through transcranial ultrasound simulations. The solver achieves convergence for strong scattering scenarios, offering a computationally efficient alternative to time-domain methods and matrix-based Helmholtz solvers for applications ranging from medical ultrasound treatment planning to seismic exploration.</p>","PeriodicalId":17168,"journal":{"name":"Journal of the Acoustical Society of America","volume":"159 2","pages":"1457-1470"},"PeriodicalIF":2.3,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146180755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Baojin Li, Shuang Zhao, Zhenjie Wang, Shuqiang Xue, Yixu Liu
The accuracy of underwater acoustic positioning is seriously affected by the spatiotemporal variations of sound speed. For precise positioning of seafloor fixed points, the simplified reference sound speed profiles (SSPs) are commonly adopted to correct the influence of these variations. However, in underwater dynamic target positioning, the simplified method that only considers acoustic ray tracing equivalence at fixed depths can lead to significant representativeness errors at other depths. To address this issue, we propose a criterion for minimizing acoustic ray tracing errors through the entire depth range and then employ a metaheuristic algorithm to solve the combinatorial optimization problem involved in this criterion. The results show that, compared to the maximum sound speed deviation, area difference, and genetic algorithm based on the minimum acoustic ray tracing error criterion at fixed depth methods, the SSP simplified by the proposed method exhibits higher geometric accuracy, acoustic ray tracing accuracy, and positioning accuracy through the entire depth range. The proposed method is suitable for underwater dynamic target positioning, especially in scenarios with significant depth variations.
{"title":"A simplified method of sound speed profiles for precise positioning of underwater dynamic targets.","authors":"Baojin Li, Shuang Zhao, Zhenjie Wang, Shuqiang Xue, Yixu Liu","doi":"10.1121/10.0042427","DOIUrl":"https://doi.org/10.1121/10.0042427","url":null,"abstract":"<p><p>The accuracy of underwater acoustic positioning is seriously affected by the spatiotemporal variations of sound speed. For precise positioning of seafloor fixed points, the simplified reference sound speed profiles (SSPs) are commonly adopted to correct the influence of these variations. However, in underwater dynamic target positioning, the simplified method that only considers acoustic ray tracing equivalence at fixed depths can lead to significant representativeness errors at other depths. To address this issue, we propose a criterion for minimizing acoustic ray tracing errors through the entire depth range and then employ a metaheuristic algorithm to solve the combinatorial optimization problem involved in this criterion. The results show that, compared to the maximum sound speed deviation, area difference, and genetic algorithm based on the minimum acoustic ray tracing error criterion at fixed depth methods, the SSP simplified by the proposed method exhibits higher geometric accuracy, acoustic ray tracing accuracy, and positioning accuracy through the entire depth range. The proposed method is suitable for underwater dynamic target positioning, especially in scenarios with significant depth variations.</p>","PeriodicalId":17168,"journal":{"name":"Journal of the Acoustical Society of America","volume":"159 2","pages":"1446-1456"},"PeriodicalIF":2.3,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146180814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Josef Schlittenlacher, Agatha R Cox, Brian C J Moore
Loudness increases with increasing duration up to 200 ms after sound onset. This temporal integration is well documented in quiet but less understood in the presence of other sounds and for very short durations. The present study investigates the temporal integration of partial loudness for bursts of noise in the presence of equally intense background noise. Level differences required for equal loudness between a reference burst duration of 20 ms and target burst durations of 1, 2, 5, and 10 ms were obtained using a 1-up/1-down staircase procedure in the laboratory and online for burst repetition rates of 5, 10, and 20 Hz and for rectangular and Hann shaped bursts. All results showed that the short duration bursts were perceived as louder than expected from the temporal integration of energy. The difference was equivalent to a change in level up to 6.7 dB and was larger for higher burst repetition rates. The difference was higher when using abrupt onsets and offsets for both target and reference compared to bursts with a Hann window shape. Differences between experiments conducted in the laboratory and online were small (up to 1.2 dB) but were statistically significant.
{"title":"Time-varying partial loudness of noise burst sequences in stationary noise with a similar level.","authors":"Josef Schlittenlacher, Agatha R Cox, Brian C J Moore","doi":"10.1121/10.0042387","DOIUrl":"https://doi.org/10.1121/10.0042387","url":null,"abstract":"<p><p>Loudness increases with increasing duration up to 200 ms after sound onset. This temporal integration is well documented in quiet but less understood in the presence of other sounds and for very short durations. The present study investigates the temporal integration of partial loudness for bursts of noise in the presence of equally intense background noise. Level differences required for equal loudness between a reference burst duration of 20 ms and target burst durations of 1, 2, 5, and 10 ms were obtained using a 1-up/1-down staircase procedure in the laboratory and online for burst repetition rates of 5, 10, and 20 Hz and for rectangular and Hann shaped bursts. All results showed that the short duration bursts were perceived as louder than expected from the temporal integration of energy. The difference was equivalent to a change in level up to 6.7 dB and was larger for higher burst repetition rates. The difference was higher when using abrupt onsets and offsets for both target and reference compared to bursts with a Hann window shape. Differences between experiments conducted in the laboratory and online were small (up to 1.2 dB) but were statistically significant.</p>","PeriodicalId":17168,"journal":{"name":"Journal of the Acoustical Society of America","volume":"159 2","pages":"1048-1056"},"PeriodicalIF":2.3,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146106009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xianghao Hou, Yuxuan Chen, Weisi Hua, Xinyu Gu, Yixin Yang
Robust active tracking of multiple underwater targets in environments with strong clutter and nonstationary measurement noise is a key research topic in underwater acoustic signal and information processing. Under these underwater conditions, existing random finite set (RFS) multi-target tracking algorithms suffer from serious contamination of the observation likelihood by clutter, low discrimination between targets and clutter, and poor tracking accuracy. To address these issues, this paper proposes a two-stage modified variational Bayesian delta-generalized labeled multi-Bernoulli multi-target tracking algorithm. First, in the delta-generalized labeled multi-Bernoulli filtering update stage, this method introduces the Sage-Husa (SH) estimation technique based on the minimum residual criterion to roughly correct the measurement noise covariance matrix. It effectively alleviates the contamination of the likelihood function by clutter in adaptive RFS and improves the discrimination between targets and clutter under complex noise conditions. Second, in the stage of multi-target state estimation, the measurement noise covariance estimate is further optimized through variational Bayesian framework, thereby achieving real-time correction of measurement noise caused by unknown underwater environments and significantly enhancing the robustness of underwater multi-target active tracking. Both simulation and experimental results show that the proposed algorithm significantly outperforms traditional and existing adaptive generalized labeled multi-Bernoulli methods in scenarios with strong clutter and nonstationary measurement noise.
{"title":"A modified robust multi-target tracking method with high clutter and underwater nonstationary measurement noisea).","authors":"Xianghao Hou, Yuxuan Chen, Weisi Hua, Xinyu Gu, Yixin Yang","doi":"10.1121/10.0042218","DOIUrl":"https://doi.org/10.1121/10.0042218","url":null,"abstract":"<p><p>Robust active tracking of multiple underwater targets in environments with strong clutter and nonstationary measurement noise is a key research topic in underwater acoustic signal and information processing. Under these underwater conditions, existing random finite set (RFS) multi-target tracking algorithms suffer from serious contamination of the observation likelihood by clutter, low discrimination between targets and clutter, and poor tracking accuracy. To address these issues, this paper proposes a two-stage modified variational Bayesian delta-generalized labeled multi-Bernoulli multi-target tracking algorithm. First, in the delta-generalized labeled multi-Bernoulli filtering update stage, this method introduces the Sage-Husa (SH) estimation technique based on the minimum residual criterion to roughly correct the measurement noise covariance matrix. It effectively alleviates the contamination of the likelihood function by clutter in adaptive RFS and improves the discrimination between targets and clutter under complex noise conditions. Second, in the stage of multi-target state estimation, the measurement noise covariance estimate is further optimized through variational Bayesian framework, thereby achieving real-time correction of measurement noise caused by unknown underwater environments and significantly enhancing the robustness of underwater multi-target active tracking. Both simulation and experimental results show that the proposed algorithm significantly outperforms traditional and existing adaptive generalized labeled multi-Bernoulli methods in scenarios with strong clutter and nonstationary measurement noise.</p>","PeriodicalId":17168,"journal":{"name":"Journal of the Acoustical Society of America","volume":"159 2","pages":"1086-1104"},"PeriodicalIF":2.3,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146106023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This study characterized emotional responses to environmental sounds in 35 adults, including 18 cochlear implant (CI) users and 17 listeners with normal hearing (NH), using a comprehensive battery of self-report, behavioral, and autonomic measures. Changes in emotional reactions, pupil dilation, and skin conductance were assessed while participants listened to a series of emotionally evocative, naturally occurring sounds. The CI listeners exhibited a constricted range of emotional responses to the sounds, wherein they perceived pleasant and unpleasant sounds to be significantly less pleasant and less unpleasant, respectively, than the NH listeners. This reduced valence range was statistically associated with self-reported emotional deficits in daily life. Furthermore, the CI listeners exhibited significantly slower sound-evoked pupil dilations than the NH listeners, suggesting that they were slower to process the emotionally evocative sounds. These findings can support clinicians in identifying targets for counseling and rehabilitation to improve quality of life for adult CI listeners. The differences in emotional responses to naturalistic stimuli in CI listeners also highlight the need for future research to explore ecologically valid measures of assessment and rehabilitation.
{"title":"Emotional and autonomic responses to natural sounds in listeners with cochlear implantsa).","authors":"Prabuddha Bhatarai, Kelly N Jahn","doi":"10.1121/10.0042405","DOIUrl":"https://doi.org/10.1121/10.0042405","url":null,"abstract":"<p><p>This study characterized emotional responses to environmental sounds in 35 adults, including 18 cochlear implant (CI) users and 17 listeners with normal hearing (NH), using a comprehensive battery of self-report, behavioral, and autonomic measures. Changes in emotional reactions, pupil dilation, and skin conductance were assessed while participants listened to a series of emotionally evocative, naturally occurring sounds. The CI listeners exhibited a constricted range of emotional responses to the sounds, wherein they perceived pleasant and unpleasant sounds to be significantly less pleasant and less unpleasant, respectively, than the NH listeners. This reduced valence range was statistically associated with self-reported emotional deficits in daily life. Furthermore, the CI listeners exhibited significantly slower sound-evoked pupil dilations than the NH listeners, suggesting that they were slower to process the emotionally evocative sounds. These findings can support clinicians in identifying targets for counseling and rehabilitation to improve quality of life for adult CI listeners. The differences in emotional responses to naturalistic stimuli in CI listeners also highlight the need for future research to explore ecologically valid measures of assessment and rehabilitation.</p>","PeriodicalId":17168,"journal":{"name":"Journal of the Acoustical Society of America","volume":"159 2","pages":"1235-1246"},"PeriodicalIF":2.3,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146142262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xinyi N Zhang, Arian Shamei, Florian Grond, Ingrid Verduyckt, Rachel E Bouserhal
Speech takes place in physical environments with visual and acoustic properties, yet how these elements and their interaction influence speech production is not fully understood. While a room's appearance can suggest its acoustics, it is unclear whether people adjust their speech based on this visual information. Previous research shows that higher reverberation leads to reduced speech level, but how auditory and visual information interact in this process remains limited. This study examined how audiovisual information affects speech level by immersing participants in virtual environments with varying reverberation and room visuals (hemi-anechoic room, classroom, and gymnasium) while completing speech tasks. Speech level was analyzed using generalized additive mixed-effects modeling to assess temporal changes during utterances across conditions. Results showed that visual information significantly influenced speech level, though not strictly in line with expected acoustics or perceived room size; auditory information had a stronger overall effect than visual information. Visual information had an earlier influence that diminished over time, whereas the auditory effect increased and plateaued. These findings contribute to the understanding of multisensory integration in speech control and have implications in enhancing vocal performance and supporting more naturalistic communication in virtual environments.
{"title":"The temporal effects of auditory and visual immersion on speech level in virtual environments.","authors":"Xinyi N Zhang, Arian Shamei, Florian Grond, Ingrid Verduyckt, Rachel E Bouserhal","doi":"10.1121/10.0042240","DOIUrl":"https://doi.org/10.1121/10.0042240","url":null,"abstract":"<p><p>Speech takes place in physical environments with visual and acoustic properties, yet how these elements and their interaction influence speech production is not fully understood. While a room's appearance can suggest its acoustics, it is unclear whether people adjust their speech based on this visual information. Previous research shows that higher reverberation leads to reduced speech level, but how auditory and visual information interact in this process remains limited. This study examined how audiovisual information affects speech level by immersing participants in virtual environments with varying reverberation and room visuals (hemi-anechoic room, classroom, and gymnasium) while completing speech tasks. Speech level was analyzed using generalized additive mixed-effects modeling to assess temporal changes during utterances across conditions. Results showed that visual information significantly influenced speech level, though not strictly in line with expected acoustics or perceived room size; auditory information had a stronger overall effect than visual information. Visual information had an earlier influence that diminished over time, whereas the auditory effect increased and plateaued. These findings contribute to the understanding of multisensory integration in speech control and have implications in enhancing vocal performance and supporting more naturalistic communication in virtual environments.</p>","PeriodicalId":17168,"journal":{"name":"Journal of the Acoustical Society of America","volume":"159 1","pages":"384-397"},"PeriodicalIF":2.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145959731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Douglas Gillespie, Jamie Macaulay, Michael Oswald, Marie Roch
Detection, classification, and localisation of animal sounds are essential in many ecological studies, including density estimation and behavioural studies. Real-time acoustic processing can also be used in mitigation exercises, with the possibility of curtailing harmful human activities when animals are present. Animal vocalisations vary widely, and there is no single detection algorithm that can robustly detect all sound types. Human-in-the loop analysis is often required to validate algorithm performance and deal with unexpected noise sources such as are often encountered in real-world situations. The PAMGuard software combines advanced automatic analysis algorithms, including AI methods, with interactive visual tools allowing users to develop efficient workflows for both real-time use and for processing archived datasets. A modular framework enables users to configure multiple detectors, classifiers, and localisers suitable for the equipment and species of interest in a particular application. Multiple detectors for different sound types can be run concurrently on the same data. An extensible "plug-in" interface also makes it possible for third parties to independently develop new modules to run within the software framework. Here, we describe the software's core functionality, illustrated using workflows for both real-time and offline use, and present an update on the latest features.
{"title":"PAMGuard: Application software for passive acoustic detection, classification, and localisation of animal sounds.","authors":"Douglas Gillespie, Jamie Macaulay, Michael Oswald, Marie Roch","doi":"10.1121/10.0042245","DOIUrl":"https://doi.org/10.1121/10.0042245","url":null,"abstract":"<p><p>Detection, classification, and localisation of animal sounds are essential in many ecological studies, including density estimation and behavioural studies. Real-time acoustic processing can also be used in mitigation exercises, with the possibility of curtailing harmful human activities when animals are present. Animal vocalisations vary widely, and there is no single detection algorithm that can robustly detect all sound types. Human-in-the loop analysis is often required to validate algorithm performance and deal with unexpected noise sources such as are often encountered in real-world situations. The PAMGuard software combines advanced automatic analysis algorithms, including AI methods, with interactive visual tools allowing users to develop efficient workflows for both real-time use and for processing archived datasets. A modular framework enables users to configure multiple detectors, classifiers, and localisers suitable for the equipment and species of interest in a particular application. Multiple detectors for different sound types can be run concurrently on the same data. An extensible \"plug-in\" interface also makes it possible for third parties to independently develop new modules to run within the software framework. Here, we describe the software's core functionality, illustrated using workflows for both real-time and offline use, and present an update on the latest features.</p>","PeriodicalId":17168,"journal":{"name":"Journal of the Acoustical Society of America","volume":"159 1","pages":"437-443"},"PeriodicalIF":2.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145985116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}