Xinyi N Zhang, Arian Shamei, Florian Grond, Ingrid Verduyckt, Rachel E Bouserhal
Speech takes place in physical environments with visual and acoustic properties, yet how these elements and their interaction influence speech production is not fully understood. While a room's appearance can suggest its acoustics, it is unclear whether people adjust their speech based on this visual information. Previous research shows that higher reverberation leads to reduced speech level, but how auditory and visual information interact in this process remains limited. This study examined how audiovisual information affects speech level by immersing participants in virtual environments with varying reverberation and room visuals (hemi-anechoic room, classroom, and gymnasium) while completing speech tasks. Speech level was analyzed using generalized additive mixed-effects modeling to assess temporal changes during utterances across conditions. Results showed that visual information significantly influenced speech level, though not strictly in line with expected acoustics or perceived room size; auditory information had a stronger overall effect than visual information. Visual information had an earlier influence that diminished over time, whereas the auditory effect increased and plateaued. These findings contribute to the understanding of multisensory integration in speech control and have implications in enhancing vocal performance and supporting more naturalistic communication in virtual environments.
{"title":"The temporal effects of auditory and visual immersion on speech level in virtual environments.","authors":"Xinyi N Zhang, Arian Shamei, Florian Grond, Ingrid Verduyckt, Rachel E Bouserhal","doi":"10.1121/10.0042240","DOIUrl":"https://doi.org/10.1121/10.0042240","url":null,"abstract":"<p><p>Speech takes place in physical environments with visual and acoustic properties, yet how these elements and their interaction influence speech production is not fully understood. While a room's appearance can suggest its acoustics, it is unclear whether people adjust their speech based on this visual information. Previous research shows that higher reverberation leads to reduced speech level, but how auditory and visual information interact in this process remains limited. This study examined how audiovisual information affects speech level by immersing participants in virtual environments with varying reverberation and room visuals (hemi-anechoic room, classroom, and gymnasium) while completing speech tasks. Speech level was analyzed using generalized additive mixed-effects modeling to assess temporal changes during utterances across conditions. Results showed that visual information significantly influenced speech level, though not strictly in line with expected acoustics or perceived room size; auditory information had a stronger overall effect than visual information. Visual information had an earlier influence that diminished over time, whereas the auditory effect increased and plateaued. These findings contribute to the understanding of multisensory integration in speech control and have implications in enhancing vocal performance and supporting more naturalistic communication in virtual environments.</p>","PeriodicalId":17168,"journal":{"name":"Journal of the Acoustical Society of America","volume":"159 1","pages":"384-397"},"PeriodicalIF":2.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145959731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Douglas Gillespie, Jamie Macaulay, Michael Oswald, Marie Roch
Detection, classification, and localisation of animal sounds are essential in many ecological studies, including density estimation and behavioural studies. Real-time acoustic processing can also be used in mitigation exercises, with the possibility of curtailing harmful human activities when animals are present. Animal vocalisations vary widely, and there is no single detection algorithm that can robustly detect all sound types. Human-in-the loop analysis is often required to validate algorithm performance and deal with unexpected noise sources such as are often encountered in real-world situations. The PAMGuard software combines advanced automatic analysis algorithms, including AI methods, with interactive visual tools allowing users to develop efficient workflows for both real-time use and for processing archived datasets. A modular framework enables users to configure multiple detectors, classifiers, and localisers suitable for the equipment and species of interest in a particular application. Multiple detectors for different sound types can be run concurrently on the same data. An extensible "plug-in" interface also makes it possible for third parties to independently develop new modules to run within the software framework. Here, we describe the software's core functionality, illustrated using workflows for both real-time and offline use, and present an update on the latest features.
{"title":"PAMGuard: Application software for passive acoustic detection, classification, and localisation of animal sounds.","authors":"Douglas Gillespie, Jamie Macaulay, Michael Oswald, Marie Roch","doi":"10.1121/10.0042245","DOIUrl":"https://doi.org/10.1121/10.0042245","url":null,"abstract":"<p><p>Detection, classification, and localisation of animal sounds are essential in many ecological studies, including density estimation and behavioural studies. Real-time acoustic processing can also be used in mitigation exercises, with the possibility of curtailing harmful human activities when animals are present. Animal vocalisations vary widely, and there is no single detection algorithm that can robustly detect all sound types. Human-in-the loop analysis is often required to validate algorithm performance and deal with unexpected noise sources such as are often encountered in real-world situations. The PAMGuard software combines advanced automatic analysis algorithms, including AI methods, with interactive visual tools allowing users to develop efficient workflows for both real-time use and for processing archived datasets. A modular framework enables users to configure multiple detectors, classifiers, and localisers suitable for the equipment and species of interest in a particular application. Multiple detectors for different sound types can be run concurrently on the same data. An extensible \"plug-in\" interface also makes it possible for third parties to independently develop new modules to run within the software framework. Here, we describe the software's core functionality, illustrated using workflows for both real-time and offline use, and present an update on the latest features.</p>","PeriodicalId":17168,"journal":{"name":"Journal of the Acoustical Society of America","volume":"159 1","pages":"437-443"},"PeriodicalIF":2.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145985116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hua-Wei Ji, Li-Ming Lin, Jiang-Hai Wang, Di-Wei Xiong, Chong-Jin Du
Acoustic lens focusing is a commonly used method in high-intensity focused ultrasound (HIFU). However, traditional acoustic lens focusing suffers from low focusing efficiency and excessive sidelobes, which affect the efficacy and safety of HIFU treatment. To address this issue, this paper designs a periodic trapezoidal‑groove acoustic metasurface lens by leveraging the extraordinary acoustic transmission effect. Subsequently, its focal sound‑pressure level is calculated through theoretical analysis and finite‑element simulation, and is further validated experimentally. Finally, the influence of structural parameters-such as the period, center width, depth, and taper angle of the trapezoidal groove, as well as the amplitude of the excitation source-on the focusing performance of the acoustic metasurface lens is systematically analyzed. The results demonstrate that the periodic trapezoidal‑groove acoustic metasurface lens can further enhance focusing and suppress sidelobes within a specific frequency range; the frequency corresponding to the maximum sound pressure is determined by the period of the trapezoidal groove; and the shift of Wood's anomaly frequency is primarily governed by the groove depth. This study provides insights for the development of high‑performance acoustic‑lens-focused ultrasound transducers.
{"title":"The study on the design and performance analysis of acoustic metamaterial lens.","authors":"Hua-Wei Ji, Li-Ming Lin, Jiang-Hai Wang, Di-Wei Xiong, Chong-Jin Du","doi":"10.1121/10.0042194","DOIUrl":"https://doi.org/10.1121/10.0042194","url":null,"abstract":"<p><p>Acoustic lens focusing is a commonly used method in high-intensity focused ultrasound (HIFU). However, traditional acoustic lens focusing suffers from low focusing efficiency and excessive sidelobes, which affect the efficacy and safety of HIFU treatment. To address this issue, this paper designs a periodic trapezoidal‑groove acoustic metasurface lens by leveraging the extraordinary acoustic transmission effect. Subsequently, its focal sound‑pressure level is calculated through theoretical analysis and finite‑element simulation, and is further validated experimentally. Finally, the influence of structural parameters-such as the period, center width, depth, and taper angle of the trapezoidal groove, as well as the amplitude of the excitation source-on the focusing performance of the acoustic metasurface lens is systematically analyzed. The results demonstrate that the periodic trapezoidal‑groove acoustic metasurface lens can further enhance focusing and suppress sidelobes within a specific frequency range; the frequency corresponding to the maximum sound pressure is determined by the period of the trapezoidal groove; and the shift of Wood's anomaly frequency is primarily governed by the groove depth. This study provides insights for the development of high‑performance acoustic‑lens-focused ultrasound transducers.</p>","PeriodicalId":17168,"journal":{"name":"Journal of the Acoustical Society of America","volume":"159 1","pages":"234-246"},"PeriodicalIF":2.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145933942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jie Wang, Zhuoran Shi, Shengjian Wu, Stefan Stenfelt, Jinqiu Sang, Xiaodong Li, Chengshi Zheng
Otoacoustic emissions represent cochlear responses to auditory stimuli, enabling the investigation of air conduction (AC) and bone conduction (BC) transmission. This study developed and validated a non-invasive, objective method for measuring the sensitivity difference between AC and BC transmission, here termed bone-air difference transfer property (BADTP), using stimulus frequency otoacoustic emission (SFOAE). The BADTP was defined as the difference between the AC transfer property and the BC transfer property. To cross-validate the objective approach, BADTP was compared with subjectively obtained hearing thresholds. Measurements were conducted across frequencies from 1000 to 4000 Hz in ten individuals with normal hearing. Results revealed that the mean differences between the two methods were within 2 dB at frequencies from 1000 to 1600 Hz, while both methods showed similar trends from 1850 to 4000 Hz. The proposed SFOAE-based method for measuring provides valuable insight into BC transmission, with potential applications for objective assessment of BC function in research settings.
{"title":"Comparisons of air and bone conduction transfer properties utilizing stimulus frequency otoacoustic emissions.","authors":"Jie Wang, Zhuoran Shi, Shengjian Wu, Stefan Stenfelt, Jinqiu Sang, Xiaodong Li, Chengshi Zheng","doi":"10.1121/10.0042219","DOIUrl":"https://doi.org/10.1121/10.0042219","url":null,"abstract":"<p><p>Otoacoustic emissions represent cochlear responses to auditory stimuli, enabling the investigation of air conduction (AC) and bone conduction (BC) transmission. This study developed and validated a non-invasive, objective method for measuring the sensitivity difference between AC and BC transmission, here termed bone-air difference transfer property (BADTP), using stimulus frequency otoacoustic emission (SFOAE). The BADTP was defined as the difference between the AC transfer property and the BC transfer property. To cross-validate the objective approach, BADTP was compared with subjectively obtained hearing thresholds. Measurements were conducted across frequencies from 1000 to 4000 Hz in ten individuals with normal hearing. Results revealed that the mean differences between the two methods were within 2 dB at frequencies from 1000 to 1600 Hz, while both methods showed similar trends from 1850 to 4000 Hz. The proposed SFOAE-based method for measuring provides valuable insight into BC transmission, with potential applications for objective assessment of BC function in research settings.</p>","PeriodicalId":17168,"journal":{"name":"Journal of the Acoustical Society of America","volume":"159 1","pages":"315-326"},"PeriodicalIF":2.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145959689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Philipp Götz, Cagdas Tuna, Andreas Brendel, Andreas Walther, Emanuël A P Habets
The ability to infer a general representation of the acoustic environment from a reverberant recording is a key objective in numerous applications. We propose a multi-stage approach that integrates task-agnostic representation learning with uncertainty quantification. Leveraging the conformal prediction framework, our method models the error incurred in the estimation of the acoustic environment embedded in a reverberant recording, which reflects the ambiguity inherent in distinguishing between an unknown source signal and the induced reverberation. Although our approach is flexible and agnostic to specific downstream objectives, experiments on real-world data demonstrate competitive performance on established parameter estimation tasks when compared to baselines trained end-to-end or with contrastive losses. Furthermore, a latent disentanglement analysis reveals the interpretability of the learned representations, which effectively capture distinct factors of variation within the acoustic environment.
{"title":"Multi-stage representation learning for blind Room-Acoustic parameter estimation with uncertainty quantification.","authors":"Philipp Götz, Cagdas Tuna, Andreas Brendel, Andreas Walther, Emanuël A P Habets","doi":"10.1121/10.0042193","DOIUrl":"https://doi.org/10.1121/10.0042193","url":null,"abstract":"<p><p>The ability to infer a general representation of the acoustic environment from a reverberant recording is a key objective in numerous applications. We propose a multi-stage approach that integrates task-agnostic representation learning with uncertainty quantification. Leveraging the conformal prediction framework, our method models the error incurred in the estimation of the acoustic environment embedded in a reverberant recording, which reflects the ambiguity inherent in distinguishing between an unknown source signal and the induced reverberation. Although our approach is flexible and agnostic to specific downstream objectives, experiments on real-world data demonstrate competitive performance on established parameter estimation tasks when compared to baselines trained end-to-end or with contrastive losses. Furthermore, a latent disentanglement analysis reveals the interpretability of the learned representations, which effectively capture distinct factors of variation within the acoustic environment.</p>","PeriodicalId":17168,"journal":{"name":"Journal of the Acoustical Society of America","volume":"159 1","pages":"247-259"},"PeriodicalIF":2.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145952471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
To address the non-degradability and toxicity of conventional acoustic materials, this study proposes a sustainable spiral-shaped sound absorber composed of plant fiber-based fibrous paper and recycled coffee waste (CW). The strong mechanical bonding between CW and Kozo fibrous paper in this composite acoustic material was observed using metallurgical microscopy, resulting in an environmentally friendly structure capable of controlling broadband noise. A prediction model based on parallel-slit theory was developed to evaluate the influence of key structural parameters-CW layer mass density, fibrous paper length, and absorber width-on sound absorption coefficients. Optimization reveals that wide spiral-shaped geometry paired with a high-density CW layer (0.04-0.05 kg/m2) enhances low-frequency noise reduction (<1000 Hz), whereas narrow configurations with a medium-density CW layer (0.03-0.04 kg/m2) improves high-frequency attenuation (>2000 Hz). The sound absorption coefficients of five prepared samples were measured using the two-microphone impedance tube method. The sound absorption coefficient showed significant improvement with the addition of an appropriate amount of CW in the mid- and high-frequency range. This work advances the development of lightweight, efficient, and sustainable acoustic solutions, providing a scalable strategy for the next generation of eco-friendly materials in line with circular economy principles and low-carbon manufacturing practices.
{"title":"Development of eco-friendly spiral-shaped sound absorber made from handcrafted fibrous paper enhanced with spent coffee waste for broadband noise control.","authors":"Jie Jin, Yunle Cao, Haipeng Hao, Yecheng Feng, Daitong Wei, Zhuqing Zhang","doi":"10.1121/10.0041885","DOIUrl":"https://doi.org/10.1121/10.0041885","url":null,"abstract":"<p><p>To address the non-degradability and toxicity of conventional acoustic materials, this study proposes a sustainable spiral-shaped sound absorber composed of plant fiber-based fibrous paper and recycled coffee waste (CW). The strong mechanical bonding between CW and Kozo fibrous paper in this composite acoustic material was observed using metallurgical microscopy, resulting in an environmentally friendly structure capable of controlling broadband noise. A prediction model based on parallel-slit theory was developed to evaluate the influence of key structural parameters-CW layer mass density, fibrous paper length, and absorber width-on sound absorption coefficients. Optimization reveals that wide spiral-shaped geometry paired with a high-density CW layer (0.04-0.05 kg/m2) enhances low-frequency noise reduction (<1000 Hz), whereas narrow configurations with a medium-density CW layer (0.03-0.04 kg/m2) improves high-frequency attenuation (>2000 Hz). The sound absorption coefficients of five prepared samples were measured using the two-microphone impedance tube method. The sound absorption coefficient showed significant improvement with the addition of an appropriate amount of CW in the mid- and high-frequency range. This work advances the development of lightweight, efficient, and sustainable acoustic solutions, providing a scalable strategy for the next generation of eco-friendly materials in line with circular economy principles and low-carbon manufacturing practices.</p>","PeriodicalId":17168,"journal":{"name":"Journal of the Acoustical Society of America","volume":"159 1","pages":"260-271"},"PeriodicalIF":2.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145952421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yufei Wang, Yu Tian, Shilong Li, Jie Sun, Jiancheng Yu
Passive sonar surveillance by autonomous underwater vehicles (AUVs) is often hindered by non-stationary, nonlinear speed-dependent self-noise. To address this, we propose Speed-UT2-CGAN, a motion-aware sonar denoising framework utilizing a dual-branch conditional generative adversarial network that combines a U-Net convolutional branch for local feature extraction from time-domain audio sequences and a transformer-based attention branch for long-range temporal dependencies. The architecture incorporates AUV speed as an additional conditioning input to dynamically adapt to speed-dependent noise characteristics, and is trained with a combination of adversarial, time-domain, and frequency-domain loss functions to ensure accurate spectral and temporal reconstruction. Experiments on synthetic mixtures combining real AUV self-noise recordings from lake trials with ShipsEar vessel signals demonstrate that Speed-UT2-CGAN significantly outperforms traditional methods, speech enhancement generative adversarial network, and dual-path recurrent neural network, for a single AUV in shallow-water lake trials at 0, 2, and 3 knots, achieving an output average signal-to-noise ratio of 6.6 at -5 dB input and an average correlation coefficient of 0.87. These results confirm the effectiveness of motion-aware speed conditioning for passive sonar enhancement in single-sensor AUV systems, under controlled synthetic-data conditions representative of AUV constant depth, speed, and heading in shallow-water lake environments.
{"title":"Motion-aware sonar denoising for autonomous underwater vehicles self-noise using a speed-conditioned U-Net-transformer dual-branch conditional generative adversarial network.","authors":"Yufei Wang, Yu Tian, Shilong Li, Jie Sun, Jiancheng Yu","doi":"10.1121/10.0042221","DOIUrl":"https://doi.org/10.1121/10.0042221","url":null,"abstract":"<p><p>Passive sonar surveillance by autonomous underwater vehicles (AUVs) is often hindered by non-stationary, nonlinear speed-dependent self-noise. To address this, we propose Speed-UT2-CGAN, a motion-aware sonar denoising framework utilizing a dual-branch conditional generative adversarial network that combines a U-Net convolutional branch for local feature extraction from time-domain audio sequences and a transformer-based attention branch for long-range temporal dependencies. The architecture incorporates AUV speed as an additional conditioning input to dynamically adapt to speed-dependent noise characteristics, and is trained with a combination of adversarial, time-domain, and frequency-domain loss functions to ensure accurate spectral and temporal reconstruction. Experiments on synthetic mixtures combining real AUV self-noise recordings from lake trials with ShipsEar vessel signals demonstrate that Speed-UT2-CGAN significantly outperforms traditional methods, speech enhancement generative adversarial network, and dual-path recurrent neural network, for a single AUV in shallow-water lake trials at 0, 2, and 3 knots, achieving an output average signal-to-noise ratio of 6.6 at -5 dB input and an average correlation coefficient of 0.87. These results confirm the effectiveness of motion-aware speed conditioning for passive sonar enhancement in single-sensor AUV systems, under controlled synthetic-data conditions representative of AUV constant depth, speed, and heading in shallow-water lake environments.</p>","PeriodicalId":17168,"journal":{"name":"Journal of the Acoustical Society of America","volume":"159 1","pages":"327-342"},"PeriodicalIF":2.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145959738","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wind-driven breaking waves generate the background sound throughout the ocean. An accurate source level for wind-driven breaking waves is needed for estimating the ambient sound levels needed for sound exposure modeling, environmental assessments, and assessing the detection performance of sonars. Previous models applied a constant roll-off of sound levels at -16 dB/decade at all wind speeds, and these models' source levels were flat at frequencies below ∼1000 Hz due to a lack of measurements. Here, we analyzed 16 long-term archival datasets with limited anthropogenic sound sources to estimate the wind-driven source level down to 100 Hz. We estimated the site-specific areic propagation loss (APL) using a ray-based model and then added the APL to the median received levels at each wind speed to obtain the source level. An equation for the areic dipole source level is provided that increases as wind speed cubed, like most other air-ocean coupling processes. The model may be used to estimate sediment properties (given a wind speed history and measured sound levels) or to estimate wind speeds (given the sediment type and measured sound levels). It is well suited for estimating ambient sound levels from wind for soundscape modeling. An open-source implementation is available.
{"title":"Estimating sediment properties using a new source level function for wind-driven underwater sound derived from long-term archival data.","authors":"S Bruce Martin, Martin Siderius","doi":"10.1121/10.0042217","DOIUrl":"https://doi.org/10.1121/10.0042217","url":null,"abstract":"<p><p>Wind-driven breaking waves generate the background sound throughout the ocean. An accurate source level for wind-driven breaking waves is needed for estimating the ambient sound levels needed for sound exposure modeling, environmental assessments, and assessing the detection performance of sonars. Previous models applied a constant roll-off of sound levels at -16 dB/decade at all wind speeds, and these models' source levels were flat at frequencies below ∼1000 Hz due to a lack of measurements. Here, we analyzed 16 long-term archival datasets with limited anthropogenic sound sources to estimate the wind-driven source level down to 100 Hz. We estimated the site-specific areic propagation loss (APL) using a ray-based model and then added the APL to the median received levels at each wind speed to obtain the source level. An equation for the areic dipole source level is provided that increases as wind speed cubed, like most other air-ocean coupling processes. The model may be used to estimate sediment properties (given a wind speed history and measured sound levels) or to estimate wind speeds (given the sediment type and measured sound levels). It is well suited for estimating ambient sound levels from wind for soundscape modeling. An open-source implementation is available.</p>","PeriodicalId":17168,"journal":{"name":"Journal of the Acoustical Society of America","volume":"159 1","pages":"300-314"},"PeriodicalIF":2.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145959759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sound level modeling has emerged as an essential tool for predicting acoustic environments. We present the development and analysis of models using a dataset previously applied for sound exceedance level modeling in the contiguous United States. This dataset comprises acoustic exceedance levels measured in diverse locations including National Park Service sites and urban environments. We applied advanced python libraries to train Random Forest regression models to predict exceedance levels from 99 geospatial variables. In total, 3 general and 5 ancillary fully data-driven models (not modeling actual physics of sound propagation) were developed, and the particular performance and limitations of each model is discussed. Results show promising predictive power, with R2 between 0.54 and 0.91 and root mean squared error between 1.77 and 5.97 dB, where models incorporating more urban information performed better. These results highlight the strength of the models, with performance variability primarily attributed to the limited coverage of diverse natural and urban environments in the current dataset. Results are accessible via an interactive online dashboard, allowing users without machine learning expertise to analyze different aspects of the models. This platform supports broader accessibility, encouraging a wider audience to engage with outdoor sound level modeling and its applications.
{"title":"Explainable machine learning models for outdoor exceedance level prediction based on geospatial variables.","authors":"Ciro Régulo Martínez, Débora Pollicelli, Juan Bajo, Sharolyn J Anderson, Claudio Delrieux","doi":"10.1121/10.0042225","DOIUrl":"https://doi.org/10.1121/10.0042225","url":null,"abstract":"<p><p>Sound level modeling has emerged as an essential tool for predicting acoustic environments. We present the development and analysis of models using a dataset previously applied for sound exceedance level modeling in the contiguous United States. This dataset comprises acoustic exceedance levels measured in diverse locations including National Park Service sites and urban environments. We applied advanced python libraries to train Random Forest regression models to predict exceedance levels from 99 geospatial variables. In total, 3 general and 5 ancillary fully data-driven models (not modeling actual physics of sound propagation) were developed, and the particular performance and limitations of each model is discussed. Results show promising predictive power, with R2 between 0.54 and 0.91 and root mean squared error between 1.77 and 5.97 dB, where models incorporating more urban information performed better. These results highlight the strength of the models, with performance variability primarily attributed to the limited coverage of diverse natural and urban environments in the current dataset. Results are accessible via an interactive online dashboard, allowing users without machine learning expertise to analyze different aspects of the models. This platform supports broader accessibility, encouraging a wider audience to engage with outdoor sound level modeling and its applications.</p>","PeriodicalId":17168,"journal":{"name":"Journal of the Acoustical Society of America","volume":"159 1","pages":"459-469"},"PeriodicalIF":2.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145989765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mark Gibson, Marcel Schlechtweg, Xianhui Wang, Judit Ayala Alcalde, Mark Arvidson, Li Xu
We report results for a psycho-acoustic experiment examining Spanish vowel ([a,e,i,o,u]). recognition in speech-shaped noise (SSN) and background babble (1-16 talkers) by two listening groups: native Spanish speakers (SP group) and native English speakers (EN group). The motivation for the current study is to investigate acoustic-phonetic and informational masking (APM and IM, respectively) effects (1) on segment/phoneme recognition, and (2) by participants who do not speak the language of the target or masker (as well as native speakers of Spanish) in order to disambiguate the effects of APM and IM. For the tests, background noise, both SSN and background babble, were presented at three signal-to-noise ratios (at 0, -6, and -12 dB) while a target containing one of the five Spanish vowels was presented in the syllables [da, de, di, do, du]. Inter-group differences in response accuracy point to significant effects of APM as listening conditions erode, and minimal effects due to higher-order factors based on masker meaningfulness, semantic content, and language familiarity.
我们报告了一项检测西班牙元音([a,e,i,o,u])的心理声学实验的结果。两组以西班牙语为母语的人(SP组)和以英语为母语的人(EN组)对语音形状噪声(SSN)和背景胡言乱语(1-16个说话者)的识别。本研究的动机是调查声学-语音掩蔽和信息掩蔽(分别为APM和IM)对片段/音素识别的影响(1),以及(2)不讲目标语言或掩蔽者语言的参与者(以及母语为西班牙语的人),以消除APM和IM的影响。在测试中,背景噪声(SSN和背景牙牙学语)以三种信噪比(0,-6和-12 dB)呈现,同时在音节[da, de, di, do, du]中呈现包含五个西班牙语元音之一的目标。组间反应准确性的差异表明,随着听力条件的侵蚀,APM的影响显著,而基于掩模意义、语义内容和语言熟悉度的高阶因素的影响最小。
{"title":"Acoustic-phonetic masking in Spanish vowel recognition by native English- and Spanish-speaking subjects.","authors":"Mark Gibson, Marcel Schlechtweg, Xianhui Wang, Judit Ayala Alcalde, Mark Arvidson, Li Xu","doi":"10.1121/10.0041884","DOIUrl":"10.1121/10.0041884","url":null,"abstract":"<p><p>We report results for a psycho-acoustic experiment examining Spanish vowel ([a,e,i,o,u]). recognition in speech-shaped noise (SSN) and background babble (1-16 talkers) by two listening groups: native Spanish speakers (SP group) and native English speakers (EN group). The motivation for the current study is to investigate acoustic-phonetic and informational masking (APM and IM, respectively) effects (1) on segment/phoneme recognition, and (2) by participants who do not speak the language of the target or masker (as well as native speakers of Spanish) in order to disambiguate the effects of APM and IM. For the tests, background noise, both SSN and background babble, were presented at three signal-to-noise ratios (at 0, -6, and -12 dB) while a target containing one of the five Spanish vowels was presented in the syllables [da, de, di, do, du]. Inter-group differences in response accuracy point to significant effects of APM as listening conditions erode, and minimal effects due to higher-order factors based on masker meaningfulness, semantic content, and language familiarity.</p>","PeriodicalId":17168,"journal":{"name":"Journal of the Acoustical Society of America","volume":"159 1","pages":"105-116"},"PeriodicalIF":2.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145889486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}