The exceptional sound reception capabilities of odontocetes provide valuable inspiration for the design of advanced directional acoustic systems. Inspired by the sound reception system of the finless porpoise (Neophocaena asiaeorientalis sunameri), we developed a biomimetic receiver comprising artificial analogs of the mandible, external mandibular fat, and internal mandibular fat. The biomimetic external and internal mandibular fats were fabricated from soft silica gel, and the biomimetic mandible was made of aluminum. Simulation and experiment results demonstrate that the designed receiver enables effective control of underwater sound reception directivity. Specifically, widening the biomimetic external mandibular fat by 6.0 cm resulted in a 19.0° increase in the main beam angle and a 24.4° reduction in the 3-dB beam width at 25 kHz. Meanwhile, the biomimetic internal mandibular fat primarily functioned as a waveguide, effectively channeling acoustic energy. This receiver could serve as a useful tool for investigating odontocete sound reception mechanisms and has potential applications in underwater detection and communication.
{"title":"Directivity control of underwater sound reception inspired by the finless porpoise.","authors":"Wenzhan Ou, Zhongchang Song, Xin Ye, Jinhu Zhang, Nana Zhou, Xuming Peng, Yu Zhang","doi":"10.1121/10.0042246","DOIUrl":"https://doi.org/10.1121/10.0042246","url":null,"abstract":"<p><p>The exceptional sound reception capabilities of odontocetes provide valuable inspiration for the design of advanced directional acoustic systems. Inspired by the sound reception system of the finless porpoise (Neophocaena asiaeorientalis sunameri), we developed a biomimetic receiver comprising artificial analogs of the mandible, external mandibular fat, and internal mandibular fat. The biomimetic external and internal mandibular fats were fabricated from soft silica gel, and the biomimetic mandible was made of aluminum. Simulation and experiment results demonstrate that the designed receiver enables effective control of underwater sound reception directivity. Specifically, widening the biomimetic external mandibular fat by 6.0 cm resulted in a 19.0° increase in the main beam angle and a 24.4° reduction in the 3-dB beam width at 25 kHz. Meanwhile, the biomimetic internal mandibular fat primarily functioned as a waveguide, effectively channeling acoustic energy. This receiver could serve as a useful tool for investigating odontocete sound reception mechanisms and has potential applications in underwater detection and communication.</p>","PeriodicalId":17168,"journal":{"name":"Journal of the Acoustical Society of America","volume":"159 1","pages":"592-599"},"PeriodicalIF":2.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146018896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yufei Wang, Yu Tian, Shilong Li, Jie Sun, Jiancheng Yu
Passive sonar surveillance by autonomous underwater vehicles (AUVs) is often hindered by non-stationary, nonlinear speed-dependent self-noise. To address this, we propose Speed-UT2-CGAN, a motion-aware sonar denoising framework utilizing a dual-branch conditional generative adversarial network that combines a U-Net convolutional branch for local feature extraction from time-domain audio sequences and a transformer-based attention branch for long-range temporal dependencies. The architecture incorporates AUV speed as an additional conditioning input to dynamically adapt to speed-dependent noise characteristics, and is trained with a combination of adversarial, time-domain, and frequency-domain loss functions to ensure accurate spectral and temporal reconstruction. Experiments on synthetic mixtures combining real AUV self-noise recordings from lake trials with ShipsEar vessel signals demonstrate that Speed-UT2-CGAN significantly outperforms traditional methods, speech enhancement generative adversarial network, and dual-path recurrent neural network, for a single AUV in shallow-water lake trials at 0, 2, and 3 knots, achieving an output average signal-to-noise ratio of 6.6 at -5 dB input and an average correlation coefficient of 0.87. These results confirm the effectiveness of motion-aware speed conditioning for passive sonar enhancement in single-sensor AUV systems, under controlled synthetic-data conditions representative of AUV constant depth, speed, and heading in shallow-water lake environments.
{"title":"Motion-aware sonar denoising for autonomous underwater vehicles self-noise using a speed-conditioned U-Net-transformer dual-branch conditional generative adversarial network.","authors":"Yufei Wang, Yu Tian, Shilong Li, Jie Sun, Jiancheng Yu","doi":"10.1121/10.0042221","DOIUrl":"https://doi.org/10.1121/10.0042221","url":null,"abstract":"<p><p>Passive sonar surveillance by autonomous underwater vehicles (AUVs) is often hindered by non-stationary, nonlinear speed-dependent self-noise. To address this, we propose Speed-UT2-CGAN, a motion-aware sonar denoising framework utilizing a dual-branch conditional generative adversarial network that combines a U-Net convolutional branch for local feature extraction from time-domain audio sequences and a transformer-based attention branch for long-range temporal dependencies. The architecture incorporates AUV speed as an additional conditioning input to dynamically adapt to speed-dependent noise characteristics, and is trained with a combination of adversarial, time-domain, and frequency-domain loss functions to ensure accurate spectral and temporal reconstruction. Experiments on synthetic mixtures combining real AUV self-noise recordings from lake trials with ShipsEar vessel signals demonstrate that Speed-UT2-CGAN significantly outperforms traditional methods, speech enhancement generative adversarial network, and dual-path recurrent neural network, for a single AUV in shallow-water lake trials at 0, 2, and 3 knots, achieving an output average signal-to-noise ratio of 6.6 at -5 dB input and an average correlation coefficient of 0.87. These results confirm the effectiveness of motion-aware speed conditioning for passive sonar enhancement in single-sensor AUV systems, under controlled synthetic-data conditions representative of AUV constant depth, speed, and heading in shallow-water lake environments.</p>","PeriodicalId":17168,"journal":{"name":"Journal of the Acoustical Society of America","volume":"159 1","pages":"327-342"},"PeriodicalIF":2.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145959738","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wind-driven breaking waves generate the background sound throughout the ocean. An accurate source level for wind-driven breaking waves is needed for estimating the ambient sound levels needed for sound exposure modeling, environmental assessments, and assessing the detection performance of sonars. Previous models applied a constant roll-off of sound levels at -16 dB/decade at all wind speeds, and these models' source levels were flat at frequencies below ∼1000 Hz due to a lack of measurements. Here, we analyzed 16 long-term archival datasets with limited anthropogenic sound sources to estimate the wind-driven source level down to 100 Hz. We estimated the site-specific areic propagation loss (APL) using a ray-based model and then added the APL to the median received levels at each wind speed to obtain the source level. An equation for the areic dipole source level is provided that increases as wind speed cubed, like most other air-ocean coupling processes. The model may be used to estimate sediment properties (given a wind speed history and measured sound levels) or to estimate wind speeds (given the sediment type and measured sound levels). It is well suited for estimating ambient sound levels from wind for soundscape modeling. An open-source implementation is available.
{"title":"Estimating sediment properties using a new source level function for wind-driven underwater sound derived from long-term archival data.","authors":"S Bruce Martin, Martin Siderius","doi":"10.1121/10.0042217","DOIUrl":"https://doi.org/10.1121/10.0042217","url":null,"abstract":"<p><p>Wind-driven breaking waves generate the background sound throughout the ocean. An accurate source level for wind-driven breaking waves is needed for estimating the ambient sound levels needed for sound exposure modeling, environmental assessments, and assessing the detection performance of sonars. Previous models applied a constant roll-off of sound levels at -16 dB/decade at all wind speeds, and these models' source levels were flat at frequencies below ∼1000 Hz due to a lack of measurements. Here, we analyzed 16 long-term archival datasets with limited anthropogenic sound sources to estimate the wind-driven source level down to 100 Hz. We estimated the site-specific areic propagation loss (APL) using a ray-based model and then added the APL to the median received levels at each wind speed to obtain the source level. An equation for the areic dipole source level is provided that increases as wind speed cubed, like most other air-ocean coupling processes. The model may be used to estimate sediment properties (given a wind speed history and measured sound levels) or to estimate wind speeds (given the sediment type and measured sound levels). It is well suited for estimating ambient sound levels from wind for soundscape modeling. An open-source implementation is available.</p>","PeriodicalId":17168,"journal":{"name":"Journal of the Acoustical Society of America","volume":"159 1","pages":"300-314"},"PeriodicalIF":2.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145959759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sound level modeling has emerged as an essential tool for predicting acoustic environments. We present the development and analysis of models using a dataset previously applied for sound exceedance level modeling in the contiguous United States. This dataset comprises acoustic exceedance levels measured in diverse locations including National Park Service sites and urban environments. We applied advanced python libraries to train Random Forest regression models to predict exceedance levels from 99 geospatial variables. In total, 3 general and 5 ancillary fully data-driven models (not modeling actual physics of sound propagation) were developed, and the particular performance and limitations of each model is discussed. Results show promising predictive power, with R2 between 0.54 and 0.91 and root mean squared error between 1.77 and 5.97 dB, where models incorporating more urban information performed better. These results highlight the strength of the models, with performance variability primarily attributed to the limited coverage of diverse natural and urban environments in the current dataset. Results are accessible via an interactive online dashboard, allowing users without machine learning expertise to analyze different aspects of the models. This platform supports broader accessibility, encouraging a wider audience to engage with outdoor sound level modeling and its applications.
{"title":"Explainable machine learning models for outdoor exceedance level prediction based on geospatial variables.","authors":"Ciro Régulo Martínez, Débora Pollicelli, Juan Bajo, Sharolyn J Anderson, Claudio Delrieux","doi":"10.1121/10.0042225","DOIUrl":"https://doi.org/10.1121/10.0042225","url":null,"abstract":"<p><p>Sound level modeling has emerged as an essential tool for predicting acoustic environments. We present the development and analysis of models using a dataset previously applied for sound exceedance level modeling in the contiguous United States. This dataset comprises acoustic exceedance levels measured in diverse locations including National Park Service sites and urban environments. We applied advanced python libraries to train Random Forest regression models to predict exceedance levels from 99 geospatial variables. In total, 3 general and 5 ancillary fully data-driven models (not modeling actual physics of sound propagation) were developed, and the particular performance and limitations of each model is discussed. Results show promising predictive power, with R2 between 0.54 and 0.91 and root mean squared error between 1.77 and 5.97 dB, where models incorporating more urban information performed better. These results highlight the strength of the models, with performance variability primarily attributed to the limited coverage of diverse natural and urban environments in the current dataset. Results are accessible via an interactive online dashboard, allowing users without machine learning expertise to analyze different aspects of the models. This platform supports broader accessibility, encouraging a wider audience to engage with outdoor sound level modeling and its applications.</p>","PeriodicalId":17168,"journal":{"name":"Journal of the Acoustical Society of America","volume":"159 1","pages":"459-469"},"PeriodicalIF":2.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145989765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sound field reconstruction estimates a continuous acoustic field from limited measurements. Yet, sensor placement is often handled by non-adaptive methods (prior designs) that suit spatially stationary fields but can be inefficient for nonstationary sound fields. Within a Bayesian/Gaussian process framework, we first analyze standard non-adaptive criteria (entropy, mutual information, Bayesian Cramér-Rao bound, and transductive experimental design), clarifying equivalences and their geometric consequences-most notably farthest-point tendencies that yield space-filling coverage. Motivated by these insights, we propose an adaptive sampling (AS) strategy that selects sensors sequentially, where leave-one-out cross-validation targets high-error regions (exploitation), whereas a wavelength-based spacing rule (minimum λ/4) maintains global coverage (exploration) and prevents clustering. In simulations, AS matches space-filling designs on stationary fields and substantially improves efficiency on nonstationary fields; for the same normalized mean square error, AS uses about half as many sensors, in terms of the median, as non-adaptive methods. These results indicate that AS can substantially improve the efficiency of sensor placement in practical, sequential measurement workflows.
{"title":"Adaptive sampling for optimized sensor placement in sound field reconstruction.","authors":"Yiming Han, Fanqin Hong, Dongcai Wang, Yong Shen","doi":"10.1121/10.0042254","DOIUrl":"https://doi.org/10.1121/10.0042254","url":null,"abstract":"<p><p>Sound field reconstruction estimates a continuous acoustic field from limited measurements. Yet, sensor placement is often handled by non-adaptive methods (prior designs) that suit spatially stationary fields but can be inefficient for nonstationary sound fields. Within a Bayesian/Gaussian process framework, we first analyze standard non-adaptive criteria (entropy, mutual information, Bayesian Cramér-Rao bound, and transductive experimental design), clarifying equivalences and their geometric consequences-most notably farthest-point tendencies that yield space-filling coverage. Motivated by these insights, we propose an adaptive sampling (AS) strategy that selects sensors sequentially, where leave-one-out cross-validation targets high-error regions (exploitation), whereas a wavelength-based spacing rule (minimum λ/4) maintains global coverage (exploration) and prevents clustering. In simulations, AS matches space-filling designs on stationary fields and substantially improves efficiency on nonstationary fields; for the same normalized mean square error, AS uses about half as many sensors, in terms of the median, as non-adaptive methods. These results indicate that AS can substantially improve the efficiency of sensor placement in practical, sequential measurement workflows.</p>","PeriodicalId":17168,"journal":{"name":"Journal of the Acoustical Society of America","volume":"159 1","pages":"553-566"},"PeriodicalIF":2.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146010935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sen Wang, Lourenço Tércio Lima Pereira, Riccardo Zamponi, Daniele Ragni
This study investigated the noise emission and thrust performance of a heavy-lift unmanned air vehicle (UAV) with a coaxial propulsion system that operates under differential rotor speeds. The UAV adopted an octo-quad architecture, where each rotor pair consists of two propellers with different blades, allowing independent operation of fore and aft rotors in corotating (CR) and contra-rotating (CTR) configurations. Acoustic emissions and thrust were measured under steady conditions. The study compared the performances of CR and CTR configurations and examined the influence of differential rotor speed on the noise emission of the vehicle under different loads for both configurations. The results indicate that the CTR configuration achieves a maximum load factor 0.28 higher than that of the CR configuration and features lower noise at the same thrust when employing differential rotor speed. For both configurations, the drone's noise was influenced by the aerodynamic characteristics of propellers. Specifically, increasing the fore rotor speed relative to the aft rotor amplifies the noise, whereas increasing the aft rotor speed reduces noise without compromising thrust. Corresponding noise spectra were analyzed across different load factors. The results provide insights that can inform about the optimization of noise emission and performance of UAVs with coaxial propulsion systems.
{"title":"Influence of differential rotor speeds on the performance and acoustic emission of coaxial propellersa).","authors":"Sen Wang, Lourenço Tércio Lima Pereira, Riccardo Zamponi, Daniele Ragni","doi":"10.1121/10.0042251","DOIUrl":"https://doi.org/10.1121/10.0042251","url":null,"abstract":"<p><p>This study investigated the noise emission and thrust performance of a heavy-lift unmanned air vehicle (UAV) with a coaxial propulsion system that operates under differential rotor speeds. The UAV adopted an octo-quad architecture, where each rotor pair consists of two propellers with different blades, allowing independent operation of fore and aft rotors in corotating (CR) and contra-rotating (CTR) configurations. Acoustic emissions and thrust were measured under steady conditions. The study compared the performances of CR and CTR configurations and examined the influence of differential rotor speed on the noise emission of the vehicle under different loads for both configurations. The results indicate that the CTR configuration achieves a maximum load factor 0.28 higher than that of the CR configuration and features lower noise at the same thrust when employing differential rotor speed. For both configurations, the drone's noise was influenced by the aerodynamic characteristics of propellers. Specifically, increasing the fore rotor speed relative to the aft rotor amplifies the noise, whereas increasing the aft rotor speed reduces noise without compromising thrust. Corresponding noise spectra were analyzed across different load factors. The results provide insights that can inform about the optimization of noise emission and performance of UAVs with coaxial propulsion systems.</p>","PeriodicalId":17168,"journal":{"name":"Journal of the Acoustical Society of America","volume":"159 1","pages":"539-552"},"PeriodicalIF":2.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146011074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yang Shen, Chuan-Xing Bi, Xiao-Zheng Zhang, Yong-Bin Zhang, Lu Zhu, Rong Zhou
An overcomplete dictionary is constructed by combining two sparse bases, designed for the spatially sparse and extended source cases, respectively. By utilizing this dictionary, the compressive equivalent source method is expected to achieve sparse reconstruction of sound fields radiated by unknown sources. However, prior studies and numerical simulations presented in this paper reveal that an unsuitable sparse basis would be selected for sound field representation, thereby degrading reconstruction performance. To address this limitation, this paper proposes an adaptive sparse basis compressive equivalent source method by introducing joint sparsity and low-rank constraints. The method adjusts the sparse representation by formulating the reconstruction as a Bayesian optimization problem that simultaneously promotes sparsity and low-rank structures of source strength coefficients. Both numerical simulations and experimental results across three source cases demonstrate that the proposed method can effectively select suitable sparse bases. Consequently, higher reconstruction accuracy than the conventional compressive equivalent source method using the overcomplete dictionary can be achieved (particularly in spatially sparse and combined source cases). Moreover, the reconstructions obtained by the proposed method exhibit greater robustness. This method provides a solution for reconstruction without prior knowledge of source characteristics, offering practical advantages for noise source identification applications.
{"title":"Adaptive sparse basis compressive equivalent source method for sound field reconstruction.","authors":"Yang Shen, Chuan-Xing Bi, Xiao-Zheng Zhang, Yong-Bin Zhang, Lu Zhu, Rong Zhou","doi":"10.1121/10.0042257","DOIUrl":"https://doi.org/10.1121/10.0042257","url":null,"abstract":"<p><p>An overcomplete dictionary is constructed by combining two sparse bases, designed for the spatially sparse and extended source cases, respectively. By utilizing this dictionary, the compressive equivalent source method is expected to achieve sparse reconstruction of sound fields radiated by unknown sources. However, prior studies and numerical simulations presented in this paper reveal that an unsuitable sparse basis would be selected for sound field representation, thereby degrading reconstruction performance. To address this limitation, this paper proposes an adaptive sparse basis compressive equivalent source method by introducing joint sparsity and low-rank constraints. The method adjusts the sparse representation by formulating the reconstruction as a Bayesian optimization problem that simultaneously promotes sparsity and low-rank structures of source strength coefficients. Both numerical simulations and experimental results across three source cases demonstrate that the proposed method can effectively select suitable sparse bases. Consequently, higher reconstruction accuracy than the conventional compressive equivalent source method using the overcomplete dictionary can be achieved (particularly in spatially sparse and combined source cases). Moreover, the reconstructions obtained by the proposed method exhibit greater robustness. This method provides a solution for reconstruction without prior knowledge of source characteristics, offering practical advantages for noise source identification applications.</p>","PeriodicalId":17168,"journal":{"name":"Journal of the Acoustical Society of America","volume":"159 1","pages":"789-801"},"PeriodicalIF":2.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146052953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This study performed direct aeroacoustic simulations for two flute headjoints to clarify the mechanism by which the harmonic structure changes with jet angle (angle between jet and the mouth opening in flute playing). As jet angle is increased (jet is directed perpendicular to mouth opening), the second harmonic is intensified more than the third harmonic. This harmonic structure change occurs because the jet deflects towards the inside of the pipe with increasing jet angle, which increases the actual jet offset (relative height of jet to edge). This jet deflection was found to be caused by the pressure gradient between the inside and outside of the pipe. As jet angle was increased, the jet was directed horizontally to the inner edge wall, resulting in a decrease in the pressure inside the pipe, whereas the angle between the jet and outer edge wall increased to increase the pressure outside. When the inclination of the inner edge wall was changed to be more perpendicular to the jet, the pressure around the wall increased, and the jet was deflected further outward. The angle between the jet and the edge wall affects the jet deflection and harmonic structure.
{"title":"Mechanism of harmonic structure change with jet angle in flute playing.","authors":"Kimie Onogi, Hiroshi Yokoyama, Tsukasa Yoshinaga, Akiyoshi Iida","doi":"10.1121/10.0042263","DOIUrl":"https://doi.org/10.1121/10.0042263","url":null,"abstract":"<p><p>This study performed direct aeroacoustic simulations for two flute headjoints to clarify the mechanism by which the harmonic structure changes with jet angle (angle between jet and the mouth opening in flute playing). As jet angle is increased (jet is directed perpendicular to mouth opening), the second harmonic is intensified more than the third harmonic. This harmonic structure change occurs because the jet deflects towards the inside of the pipe with increasing jet angle, which increases the actual jet offset (relative height of jet to edge). This jet deflection was found to be caused by the pressure gradient between the inside and outside of the pipe. As jet angle was increased, the jet was directed horizontally to the inner edge wall, resulting in a decrease in the pressure inside the pipe, whereas the angle between the jet and outer edge wall increased to increase the pressure outside. When the inclination of the inner edge wall was changed to be more perpendicular to the jet, the pressure around the wall increased, and the jet was deflected further outward. The angle between the jet and the edge wall affects the jet deflection and harmonic structure.</p>","PeriodicalId":17168,"journal":{"name":"Journal of the Acoustical Society of America","volume":"159 1","pages":"862-873"},"PeriodicalIF":2.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146064340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mark Gibson, Marcel Schlechtweg, Xianhui Wang, Judit Ayala Alcalde, Mark Arvidson, Li Xu
We report results for a psycho-acoustic experiment examining Spanish vowel ([a,e,i,o,u]). recognition in speech-shaped noise (SSN) and background babble (1-16 talkers) by two listening groups: native Spanish speakers (SP group) and native English speakers (EN group). The motivation for the current study is to investigate acoustic-phonetic and informational masking (APM and IM, respectively) effects (1) on segment/phoneme recognition, and (2) by participants who do not speak the language of the target or masker (as well as native speakers of Spanish) in order to disambiguate the effects of APM and IM. For the tests, background noise, both SSN and background babble, were presented at three signal-to-noise ratios (at 0, -6, and -12 dB) while a target containing one of the five Spanish vowels was presented in the syllables [da, de, di, do, du]. Inter-group differences in response accuracy point to significant effects of APM as listening conditions erode, and minimal effects due to higher-order factors based on masker meaningfulness, semantic content, and language familiarity.
我们报告了一项检测西班牙元音([a,e,i,o,u])的心理声学实验的结果。两组以西班牙语为母语的人(SP组)和以英语为母语的人(EN组)对语音形状噪声(SSN)和背景胡言乱语(1-16个说话者)的识别。本研究的动机是调查声学-语音掩蔽和信息掩蔽(分别为APM和IM)对片段/音素识别的影响(1),以及(2)不讲目标语言或掩蔽者语言的参与者(以及母语为西班牙语的人),以消除APM和IM的影响。在测试中,背景噪声(SSN和背景牙牙学语)以三种信噪比(0,-6和-12 dB)呈现,同时在音节[da, de, di, do, du]中呈现包含五个西班牙语元音之一的目标。组间反应准确性的差异表明,随着听力条件的侵蚀,APM的影响显著,而基于掩模意义、语义内容和语言熟悉度的高阶因素的影响最小。
{"title":"Acoustic-phonetic masking in Spanish vowel recognition by native English- and Spanish-speaking subjects.","authors":"Mark Gibson, Marcel Schlechtweg, Xianhui Wang, Judit Ayala Alcalde, Mark Arvidson, Li Xu","doi":"10.1121/10.0041884","DOIUrl":"10.1121/10.0041884","url":null,"abstract":"<p><p>We report results for a psycho-acoustic experiment examining Spanish vowel ([a,e,i,o,u]). recognition in speech-shaped noise (SSN) and background babble (1-16 talkers) by two listening groups: native Spanish speakers (SP group) and native English speakers (EN group). The motivation for the current study is to investigate acoustic-phonetic and informational masking (APM and IM, respectively) effects (1) on segment/phoneme recognition, and (2) by participants who do not speak the language of the target or masker (as well as native speakers of Spanish) in order to disambiguate the effects of APM and IM. For the tests, background noise, both SSN and background babble, were presented at three signal-to-noise ratios (at 0, -6, and -12 dB) while a target containing one of the five Spanish vowels was presented in the syllables [da, de, di, do, du]. Inter-group differences in response accuracy point to significant effects of APM as listening conditions erode, and minimal effects due to higher-order factors based on masker meaningfulness, semantic content, and language familiarity.</p>","PeriodicalId":17168,"journal":{"name":"Journal of the Acoustical Society of America","volume":"159 1","pages":"105-116"},"PeriodicalIF":2.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145889486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ruihao Jing, Jichao Zhang, Zhongxin Bai, Ji Xu, Xiao-Lei Zhang, Kunde Yang
This paper addresses the challenge of underwater acoustic target detection, a critical task in marine monitoring and passive sonar systems, which is often hindered by complex noise environments and imbalanced labeled data where the targets appear very sparse in the long collected data. Traditional models take the minimization of the binary cross-entropy (BCE) as the optimization criterion. However, underwater target detection is fundamentally a class-imbalanced classification problem that uses the receiver operating characteristic curve as the evaluation metric instead of the classification accuracy, while BCE maximizes the classification accuracy on training data. To address this, three optimization methods are proposed to directly maximize the area under the receiver operating characteristic curve (AUC). Additionally, the Neyman-Pearson criterion from classical detection theory is incorporated into the AUC optimization framework, forming a curriculum learning strategy that progressively optimizes the partial area under the curve (pAUC). To overcome the scarcity of underwater data, a cross-domain knowledge transfer method is implemented from the airborne to underwater acoustic domains, which accelerates model convergence and improves generalization. Experimental results demonstrate that the proposed AUC- and pAUC-based loss functions outperform BCE and achieve state-of-the-art performance under low signal-to-noise ratio and mismatched conditions.
水声目标检测是海洋监测和被动声纳系统中的一项关键任务,复杂的噪声环境和不平衡的标记数据往往阻碍了水声目标检测,目标在长时间收集的数据中显得非常稀疏。传统模型以二值交叉熵(BCE)最小化作为优化准则。然而,水下目标检测本质上是一个类别不平衡分类问题,它以接收机工作特征曲线作为评价指标,而不是分类精度,而BCE是在训练数据上实现分类精度最大化。为了解决这个问题,提出了三种优化方法来直接最大化接收器工作特性曲线下的面积。此外,将经典检测理论中的Neyman-Pearson准则纳入AUC优化框架,形成渐进式优化曲线下部分面积(partial area under the curve, pAUC)的课程学习策略。针对水下数据的稀缺性,提出了一种从机载到水声的跨域知识转移方法,加快了模型的收敛速度,提高了模型的泛化能力。实验结果表明,所提出的基于AUC和pauc的损失函数在低信噪比和不匹配条件下的性能优于BCE。
{"title":"Optimizing partial receiver operating characteristic curve via curriculum learning and Neyman-Pearson criterion for robust underwater acoustic target detection.","authors":"Ruihao Jing, Jichao Zhang, Zhongxin Bai, Ji Xu, Xiao-Lei Zhang, Kunde Yang","doi":"10.1121/10.0041972","DOIUrl":"10.1121/10.0041972","url":null,"abstract":"<p><p>This paper addresses the challenge of underwater acoustic target detection, a critical task in marine monitoring and passive sonar systems, which is often hindered by complex noise environments and imbalanced labeled data where the targets appear very sparse in the long collected data. Traditional models take the minimization of the binary cross-entropy (BCE) as the optimization criterion. However, underwater target detection is fundamentally a class-imbalanced classification problem that uses the receiver operating characteristic curve as the evaluation metric instead of the classification accuracy, while BCE maximizes the classification accuracy on training data. To address this, three optimization methods are proposed to directly maximize the area under the receiver operating characteristic curve (AUC). Additionally, the Neyman-Pearson criterion from classical detection theory is incorporated into the AUC optimization framework, forming a curriculum learning strategy that progressively optimizes the partial area under the curve (pAUC). To overcome the scarcity of underwater data, a cross-domain knowledge transfer method is implemented from the airborne to underwater acoustic domains, which accelerates model convergence and improves generalization. Experimental results demonstrate that the proposed AUC- and pAUC-based loss functions outperform BCE and achieve state-of-the-art performance under low signal-to-noise ratio and mismatched conditions.</p>","PeriodicalId":17168,"journal":{"name":"Journal of the Acoustical Society of America","volume":"159 1","pages":"11-24"},"PeriodicalIF":2.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145889475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}