James J Finneran, Austin O'Kelley, Sean J Avery, Madilyn R Pardini, Angelo Incitti, Katie A Christman, Jason Mulsow, Dorian S Houser
The dolphin's ability to detect changes in the phase of complex echo highlights was investigated by training two bottlenose dolphins to detect a phase "jitter" applied to the second highlight of a two-highlight "phantom" echo. One dolphin had normal hearing and the other had high-frequency hearing loss. The phase jitter changed the complex echo temporal and spectral fine structure without altering the echo energy, temporal and spectral envelopes, or spectral notch spacing. Over the course of testing, the inter-highlight interval (IHI) and frequency content of the echoes were varied. When echo IHIs were less than 300 μs, phase jitter thresholds for the two dolphins were equal, despite large differences in high-frequency hearing ability. At larger time separations, the perceptual cue appeared to change, presumably from spectral to temporal. High-pass filtering of echoes suggested that the lower range of echolocation frequencies are most useful for detecting echo highlight phase shifts. Simple models for across-channel spectral profile differences and differences in repetition pitch provided mixed results: both models matched the experimental data well for some conditions but poorly for other conditions, especially as the IHI and high-pass cutoff frequencies increased.
{"title":"Dolphin biosonar detection of phase changes in complex echo highlights.","authors":"James J Finneran, Austin O'Kelley, Sean J Avery, Madilyn R Pardini, Angelo Incitti, Katie A Christman, Jason Mulsow, Dorian S Houser","doi":"10.1121/10.0042277","DOIUrl":"https://doi.org/10.1121/10.0042277","url":null,"abstract":"<p><p>The dolphin's ability to detect changes in the phase of complex echo highlights was investigated by training two bottlenose dolphins to detect a phase \"jitter\" applied to the second highlight of a two-highlight \"phantom\" echo. One dolphin had normal hearing and the other had high-frequency hearing loss. The phase jitter changed the complex echo temporal and spectral fine structure without altering the echo energy, temporal and spectral envelopes, or spectral notch spacing. Over the course of testing, the inter-highlight interval (IHI) and frequency content of the echoes were varied. When echo IHIs were less than 300 μs, phase jitter thresholds for the two dolphins were equal, despite large differences in high-frequency hearing ability. At larger time separations, the perceptual cue appeared to change, presumably from spectral to temporal. High-pass filtering of echoes suggested that the lower range of echolocation frequencies are most useful for detecting echo highlight phase shifts. Simple models for across-channel spectral profile differences and differences in repetition pitch provided mixed results: both models matched the experimental data well for some conditions but poorly for other conditions, especially as the IHI and high-pass cutoff frequencies increased.</p>","PeriodicalId":17168,"journal":{"name":"Journal of the Acoustical Society of America","volume":"159 1","pages":"888-905"},"PeriodicalIF":2.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146064349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A weighted masking method based on the coherent-to-diffuse ratio is presented for robust binaural speech enhancement in real-time hearable devices. The method applies manually tuned weights across custom-defined critical frequency bands to improve the quality and intelligibility of frontal target speech in multi-talker reverberant environments. The algorithm was implemented in real time on a functional hearable prototype and evaluated in a perceptual listening study under realistic binaural hearing conditions. Subjective assessments with normal-hearing participants, including evaluations of audio quality, speech intelligibility, and spatial localization, demonstrated consistent improvements compared to baseline coherence-based filtering methods. Results indicate that the method suppresses diffuse background noise while preserving interaural spatial cues important for listening comfort and spatial awareness in complex acoustic scenes. These findings support the applicability of coherence-weighted masking in real-time binaural enhancement tasks under reverberant, multi-talker conditions, including potential use in hearable and hearing aid technologies. In addition to perceptual listening tests, objective evaluations across multiple reverberant environments demonstrate consistent performance improvements over baseline methods.
{"title":"Enhancing binaural speech perception in noise via weighted coherence masking for hearables.","authors":"Reza Ghanavi, Craig T Jin","doi":"10.1121/10.0042015","DOIUrl":"10.1121/10.0042015","url":null,"abstract":"<p><p>A weighted masking method based on the coherent-to-diffuse ratio is presented for robust binaural speech enhancement in real-time hearable devices. The method applies manually tuned weights across custom-defined critical frequency bands to improve the quality and intelligibility of frontal target speech in multi-talker reverberant environments. The algorithm was implemented in real time on a functional hearable prototype and evaluated in a perceptual listening study under realistic binaural hearing conditions. Subjective assessments with normal-hearing participants, including evaluations of audio quality, speech intelligibility, and spatial localization, demonstrated consistent improvements compared to baseline coherence-based filtering methods. Results indicate that the method suppresses diffuse background noise while preserving interaural spatial cues important for listening comfort and spatial awareness in complex acoustic scenes. These findings support the applicability of coherence-weighted masking in real-time binaural enhancement tasks under reverberant, multi-talker conditions, including potential use in hearable and hearing aid technologies. In addition to perceptual listening tests, objective evaluations across multiple reverberant environments demonstrate consistent performance improvements over baseline methods.</p>","PeriodicalId":17168,"journal":{"name":"Journal of the Acoustical Society of America","volume":"159 1","pages":"44-59"},"PeriodicalIF":2.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145889451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jonas M Schmid, Johannes D Schmid, Martin Eser, Steffen Marburg
Accurate acoustic simulations of enclosed spaces require precise boundary conditions, typically expressed through surface impedances for wave-based methods. Conventional measurement techniques rely on simplifying assumptions about the sound field and mounting conditions, limiting their validity for real-world scenarios. To overcome these limitations, this study introduces a Bayesian framework for the in situ estimation of frequency-dependent surface impedances from sparse interior sound pressure measurements. The approach employs simulation-based inference, which leverages the expressiveness of neural network architectures to directly map simulated data to posterior distributions of model parameters, bypassing conventional sampling-based Bayesian approaches and offering advantages for high-dimensional inference problems. Impedance behavior is modeled using a damped oscillator model extended with a fractional calculus term. The framework is verified on a finite element model of a cuboid room with a volume of 1.95 m3 and further tested with impedance tube measurements used as reference, achieving robust and accurate estimation of all six individual impedances from 63 to 500 Hz. Application to a numerical car cabin model further demonstrates reliable uncertainty quantification and high predictive accuracy for complex-shaped geometries. Posterior predictive checks and coverage diagnostics confirm well-calibrated inference, highlighting the method's potential for generalizable and physically consistent characterization of acoustic boundary conditions in real-world interior environments.
{"title":"In situ estimation of the acoustic surface impedance using simulation-based inferencea).","authors":"Jonas M Schmid, Johannes D Schmid, Martin Eser, Steffen Marburg","doi":"10.1121/10.0042242","DOIUrl":"https://doi.org/10.1121/10.0042242","url":null,"abstract":"<p><p>Accurate acoustic simulations of enclosed spaces require precise boundary conditions, typically expressed through surface impedances for wave-based methods. Conventional measurement techniques rely on simplifying assumptions about the sound field and mounting conditions, limiting their validity for real-world scenarios. To overcome these limitations, this study introduces a Bayesian framework for the in situ estimation of frequency-dependent surface impedances from sparse interior sound pressure measurements. The approach employs simulation-based inference, which leverages the expressiveness of neural network architectures to directly map simulated data to posterior distributions of model parameters, bypassing conventional sampling-based Bayesian approaches and offering advantages for high-dimensional inference problems. Impedance behavior is modeled using a damped oscillator model extended with a fractional calculus term. The framework is verified on a finite element model of a cuboid room with a volume of 1.95 m3 and further tested with impedance tube measurements used as reference, achieving robust and accurate estimation of all six individual impedances from 63 to 500 Hz. Application to a numerical car cabin model further demonstrates reliable uncertainty quantification and high predictive accuracy for complex-shaped geometries. Posterior predictive checks and coverage diagnostics confirm well-calibrated inference, highlighting the method's potential for generalizable and physically consistent characterization of acoustic boundary conditions in real-world interior environments.</p>","PeriodicalId":17168,"journal":{"name":"Journal of the Acoustical Society of America","volume":"159 1","pages":"422-436"},"PeriodicalIF":2.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145966240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Intracochlear recorded growth functions of cochlear potentials evoked by acoustic tones were analyzed by their phase coherence, variance, and density distribution. Potentials obtained from electrocochleography recorded in cochlear implant users with ipsilateral residual hearing were analyzed in the complex domain. The study shows that (1) the phase of the cochlear microphonic (CM) is coherent across stimulation levels, (2) CM potentials are circularly symmetric normally distributed in the complex domain, regardless of stimulation level or response amplitude, and (3) the variance of CM potentials remains consistent, irrespective of stimulation level. Based on these findings, we introduce a method that utilizes the phase coherence across stimulation levels to improve the accuracy of CM amplitude estimates, particularly when amplitudes are close to the noise floor. A key aspect of this method is addressing the bias introduced by the circularly symmetric normal distribution of additive Gaussian noise in the complex domain, which can lead to overestimation of signal magnitude. The results show that phase coherence can be used to enhance the accuracy of amplitude estimates of cochlear potentials, thereby making the recording process more efficient. Furthermore, the consistent variance of cochlear potentials enables noise estimations from recordings containing receptor potentials, independent of the stimulation level.
{"title":"Phase coherence of cochlear microphonic potentials in cochlear implant users with ipsilateral residual hearing.","authors":"Benjamin Krüger, Waldo Nogueira","doi":"10.1121/10.0039540","DOIUrl":"https://doi.org/10.1121/10.0039540","url":null,"abstract":"<p><p>Intracochlear recorded growth functions of cochlear potentials evoked by acoustic tones were analyzed by their phase coherence, variance, and density distribution. Potentials obtained from electrocochleography recorded in cochlear implant users with ipsilateral residual hearing were analyzed in the complex domain. The study shows that (1) the phase of the cochlear microphonic (CM) is coherent across stimulation levels, (2) CM potentials are circularly symmetric normally distributed in the complex domain, regardless of stimulation level or response amplitude, and (3) the variance of CM potentials remains consistent, irrespective of stimulation level. Based on these findings, we introduce a method that utilizes the phase coherence across stimulation levels to improve the accuracy of CM amplitude estimates, particularly when amplitudes are close to the noise floor. A key aspect of this method is addressing the bias introduced by the circularly symmetric normal distribution of additive Gaussian noise in the complex domain, which can lead to overestimation of signal magnitude. The results show that phase coherence can be used to enhance the accuracy of amplitude estimates of cochlear potentials, thereby making the recording process more efficient. Furthermore, the consistent variance of cochlear potentials enables noise estimations from recordings containing receptor potentials, independent of the stimulation level.</p>","PeriodicalId":17168,"journal":{"name":"Journal of the Acoustical Society of America","volume":"159 1","pages":"924-940"},"PeriodicalIF":2.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146086191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Passive acoustic monitoring (PAM) using Cetacean Porpoise Detectors (C-PODs) is a frequently applied method for studying the presence of harbor porpoises. In quiet environments, the KERNO classifier, an algorithm supplied by the manufacturer, can easily detect narrow-band high-frequency click trains emitted by echolocating harbor porpoises. However, precision is low in noisy habitats, as found in a monitoring data set (0.632; Ems, Elbe, and Wismar Bay, Germany, 2018-2023). We validated and labeled a subsample of 235 529 click trains (Elbe and Ems estuaries, 2023-2024) identified by the KERNO classifier and exported their physical characteristics to train a machine learning (ML) model. Extreme gradient boosting performed very well on the testing data (accuracy: 0.985) and the monitoring data set (0.849). The results show that the model could generalize well, beyond the training data. Moreover, this ML tool can reduce the risk of missing out low-precision estimates from random manual validation. The ML tool can complement the validation process, especially if intervals with only one click train are manually validated because false model predictions occurred particularly in these intervals. Hence, this validation tool may significantly improve the workflow in PAM studies using C-PODs, especially in noisy habitats.
利用鲸豚探测器(C-PODs)进行被动声监测(PAM)是一种常用的研究港鼠存在的方法。在安静的环境中,KERNO分类器,一个由制造商提供的算法,可以很容易地检测到由回声定位海港鼠海豚发出的窄带高频咔嗒声。然而,在一个监测数据集(0.632;Ems, Elbe, and Wismar Bay, Germany, 2018-2023)中发现,在嘈杂的栖息地中,精度很低。我们验证并标记了KERNO分类器识别的235 529个点击列车(易北河和Ems河口,2023-2024)的子样本,并导出它们的物理特征来训练机器学习(ML)模型。极端梯度增强在测试数据(精度:0.985)和监测数据集(0.849)上表现非常好。结果表明,该模型具有较好的泛化能力,超出了训练数据的范围。此外,这个机器学习工具可以减少从随机手动验证中错过低精度估计的风险。机器学习工具可以补充验证过程,特别是如果只有一次点击训练的间隔是手动验证的,因为在这些间隔中特别会发生错误的模型预测。因此,该验证工具可以显著改善使用c - pod的PAM研究的工作流程,特别是在嘈杂的环境中。
{"title":"Harbor porpoises and the machine: Assisting manual validation of click trains recorded by Cetacean Porpoise Detector.","authors":"Marco F W Gauger, Thomas Taupp","doi":"10.1121/10.0042220","DOIUrl":"https://doi.org/10.1121/10.0042220","url":null,"abstract":"<p><p>Passive acoustic monitoring (PAM) using Cetacean Porpoise Detectors (C-PODs) is a frequently applied method for studying the presence of harbor porpoises. In quiet environments, the KERNO classifier, an algorithm supplied by the manufacturer, can easily detect narrow-band high-frequency click trains emitted by echolocating harbor porpoises. However, precision is low in noisy habitats, as found in a monitoring data set (0.632; Ems, Elbe, and Wismar Bay, Germany, 2018-2023). We validated and labeled a subsample of 235 529 click trains (Elbe and Ems estuaries, 2023-2024) identified by the KERNO classifier and exported their physical characteristics to train a machine learning (ML) model. Extreme gradient boosting performed very well on the testing data (accuracy: 0.985) and the monitoring data set (0.849). The results show that the model could generalize well, beyond the training data. Moreover, this ML tool can reduce the risk of missing out low-precision estimates from random manual validation. The ML tool can complement the validation process, especially if intervals with only one click train are manually validated because false model predictions occurred particularly in these intervals. Hence, this validation tool may significantly improve the workflow in PAM studies using C-PODs, especially in noisy habitats.</p>","PeriodicalId":17168,"journal":{"name":"Journal of the Acoustical Society of America","volume":"159 1","pages":"632-646"},"PeriodicalIF":2.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146030101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tassadaq Hussain, Nasir Saleem, Kia Dashtipour, Shafique Ahmed, Jen-Cheng Hou, Usman Anwar, Yu Tsao, Tughrul Arsalan, Amir Hussain
In real-world environments, background noise significantly degrades the intelligibility and clarity of human speech. Existing audio-visual speech enhancement (AVSE) techniques often pose challenges in dynamic and noisy conditions. This study examines the inclusion of emotional features as a novel contextual cue within the AVSE framework. We analyze that incorporating emotional understanding from facial landmarks improves speech enhancement performance. We propose a deep learning-based emotion-aware audio-visual speech enhancement system (EAVSE) that uses auditory, visual, and emotional information. The proposed EAVSE extracts emotional features from facial landmarks and combines them with audio and visual modalities. Enriched multi-model data are processed by a UNet-based encoder-decoder network for joint learning and optimization. The network iteratively refines the feature representation, guided by a distortion-inspired loss function. We train and evaluate the model on the Carnegie Mellon University Multimodal Opinion Sentiment and Emotion Intensity dataset, known for its diverse audio-visual recordings with annotated emotions. Compared to AVSE benchmark and audio-only speech enhancement systems, the proposed model achieves significant improvements in both objective [Perceptual Evaluation of Speech Quality (PESQ), Short-Time Objective Intelligibility (STOI)] and subjective speech quality metrics. In particular, the scale-invariant signal-to-distortion ratio loss function demonstrates superior performance. This suggests the usefulness of the emotional contextual cues for AVSE. The experimental findings demonstrate the effectiveness of the AVSE, particularly in challenging noisy environments [signal-to-noise ratio (SNR) ≤ -7.5 dB]. The proposed model achieved Δ STOI of 7.32%, Δ PESQ of 0.33, and Δ S-SNR of 7.8 dB over noisy benchmark at 0 dB SNR.
在现实环境中,背景噪声会显著降低人类语言的可理解性和清晰度。现有的视听语音增强(AVSE)技术在动态和噪声条件下经常面临挑战。本研究将情绪特征作为一种新的语境线索纳入AVSE框架。我们分析了结合面部标志的情感理解可以提高语音增强性能。我们提出了一种基于深度学习的情感感知视听语音增强系统(EAVSE),该系统使用听觉、视觉和情感信息。提出的EAVSE从面部标志中提取情感特征,并将其与音频和视觉模式相结合。丰富的多模型数据由基于unet的编码器-解码器网络处理,进行联合学习和优化。该网络在畸变启发损失函数的指导下迭代地改进特征表示。我们在卡内基梅隆大学的多模态意见情绪和情绪强度数据集上训练和评估模型,该数据集以其带有注释情绪的多种视听记录而闻名。与AVSE基准和纯音频语音增强系统相比,该模型在客观[语音质量感知评价(PESQ),短时客观可理解性(STOI)]和主观语音质量指标方面都取得了显着改进。其中,尺度不变的信失真比损失函数表现出优异的性能。这表明情绪情境线索对AVSE的有用性。实验结果证明了AVSE的有效性,特别是在具有挑战性的噪声环境中[信噪比(SNR)≤-7.5 dB]。该模型在0 dB信噪比下,与噪声基准相比,Δ STOI为7.32%,Δ PESQ为0.33,Δ s -信噪比为7.8 dB。
{"title":"Audio-visual speech enhancement in noisy environments using emotion-based contextual cues.","authors":"Tassadaq Hussain, Nasir Saleem, Kia Dashtipour, Shafique Ahmed, Jen-Cheng Hou, Usman Anwar, Yu Tsao, Tughrul Arsalan, Amir Hussain","doi":"10.1121/10.0042239","DOIUrl":"https://doi.org/10.1121/10.0042239","url":null,"abstract":"<p><p>In real-world environments, background noise significantly degrades the intelligibility and clarity of human speech. Existing audio-visual speech enhancement (AVSE) techniques often pose challenges in dynamic and noisy conditions. This study examines the inclusion of emotional features as a novel contextual cue within the AVSE framework. We analyze that incorporating emotional understanding from facial landmarks improves speech enhancement performance. We propose a deep learning-based emotion-aware audio-visual speech enhancement system (EAVSE) that uses auditory, visual, and emotional information. The proposed EAVSE extracts emotional features from facial landmarks and combines them with audio and visual modalities. Enriched multi-model data are processed by a UNet-based encoder-decoder network for joint learning and optimization. The network iteratively refines the feature representation, guided by a distortion-inspired loss function. We train and evaluate the model on the Carnegie Mellon University Multimodal Opinion Sentiment and Emotion Intensity dataset, known for its diverse audio-visual recordings with annotated emotions. Compared to AVSE benchmark and audio-only speech enhancement systems, the proposed model achieves significant improvements in both objective [Perceptual Evaluation of Speech Quality (PESQ), Short-Time Objective Intelligibility (STOI)] and subjective speech quality metrics. In particular, the scale-invariant signal-to-distortion ratio loss function demonstrates superior performance. This suggests the usefulness of the emotional contextual cues for AVSE. The experimental findings demonstrate the effectiveness of the AVSE, particularly in challenging noisy environments [signal-to-noise ratio (SNR) ≤ -7.5 dB]. The proposed model achieved Δ STOI of 7.32%, Δ PESQ of 0.33, and Δ S-SNR of 7.8 dB over noisy benchmark at 0 dB SNR.</p>","PeriodicalId":17168,"journal":{"name":"Journal of the Acoustical Society of America","volume":"159 1","pages":"470-483"},"PeriodicalIF":2.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145989760","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The purpose of this study is to estimate the consolidation of sediments from measured wave speeds and attenuations, through parameter fitting of acoustic propagation models, particularly visco-elastic models. Consolidation is quantified by Pride's consolidation parameter, which is directly related to the ratio of the static frame bulk modulus to the grain bulk modulus. In physical terms, it represents the static mechanical stiffness of the skeletal frame of a porous medium. The effects of consolidation are demonstrated with the data from the New England Mud Patch, specifically mineralogy and acoustic data are used to invert for model parameters, particularly the frame bulk modulus, from which consolidation is calculated. Ideally, a porous medium model should be used, but an elastic approximation, which has fewer input parameters, is more efficient. In the process of doing so, an improved elastic approximation model was developed. It is shown that, at the New England Mud Patch, using the improved elastic model inversion, consolidation is shown to increase monotonically with depth.
{"title":"An elastic approximation of seabed acoustics and its connection to consolidation in the New England Mud Patch.","authors":"Nicholas P Chotiros, Gopu R Potty","doi":"10.1121/10.0042252","DOIUrl":"https://doi.org/10.1121/10.0042252","url":null,"abstract":"<p><p>The purpose of this study is to estimate the consolidation of sediments from measured wave speeds and attenuations, through parameter fitting of acoustic propagation models, particularly visco-elastic models. Consolidation is quantified by Pride's consolidation parameter, which is directly related to the ratio of the static frame bulk modulus to the grain bulk modulus. In physical terms, it represents the static mechanical stiffness of the skeletal frame of a porous medium. The effects of consolidation are demonstrated with the data from the New England Mud Patch, specifically mineralogy and acoustic data are used to invert for model parameters, particularly the frame bulk modulus, from which consolidation is calculated. Ideally, a porous medium model should be used, but an elastic approximation, which has fewer input parameters, is more efficient. In the process of doing so, an improved elastic approximation model was developed. It is shown that, at the New England Mud Patch, using the improved elastic model inversion, consolidation is shown to increase monotonically with depth.</p>","PeriodicalId":17168,"journal":{"name":"Journal of the Acoustical Society of America","volume":"159 1","pages":"484-495"},"PeriodicalIF":2.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145989805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This work presents experimental and theoretical results concerning the dispersion and attenuation caused by scattering during the propagation of ultrasonic waves on the surface of a polycrystal. Rayleigh and head waves are measured in the case of two Inconel® 600 samples with different average grain sizes. The coherent, i.e., ensemble-averaged, waves are estimated, as well as their frequency-dependent phase velocities and scattering mean-free paths. The results obtained from a contactless laser setup are compared to those obtained from a transducer array placed on the surface of the sample. The influence of contact is highlighted, particularly at low frequency and in the small-grained sample, where the attenuation by scattering is lower. Moreover, the two-point correlation (TPC) functions of both samples are estimated, and it is shown that neither is exponential. Standard theoretical models are adapted to these particular TPCs and yield effective bulk wavenumbers, from which effective surface wavenumbers can be calculated via a simple and approximate method. The theoretical results are then compared to the experimental ones.
{"title":"Attenuation and dispersion of coherent Rayleigh and head waves on the surface of a polycrystal.","authors":"Clément du Burck, Arnaud Derode","doi":"10.1121/10.0042276","DOIUrl":"https://doi.org/10.1121/10.0042276","url":null,"abstract":"<p><p>This work presents experimental and theoretical results concerning the dispersion and attenuation caused by scattering during the propagation of ultrasonic waves on the surface of a polycrystal. Rayleigh and head waves are measured in the case of two Inconel® 600 samples with different average grain sizes. The coherent, i.e., ensemble-averaged, waves are estimated, as well as their frequency-dependent phase velocities and scattering mean-free paths. The results obtained from a contactless laser setup are compared to those obtained from a transducer array placed on the surface of the sample. The influence of contact is highlighted, particularly at low frequency and in the small-grained sample, where the attenuation by scattering is lower. Moreover, the two-point correlation (TPC) functions of both samples are estimated, and it is shown that neither is exponential. Standard theoretical models are adapted to these particular TPCs and yield effective bulk wavenumbers, from which effective surface wavenumbers can be calculated via a simple and approximate method. The theoretical results are then compared to the experimental ones.</p>","PeriodicalId":17168,"journal":{"name":"Journal of the Acoustical Society of America","volume":"159 1","pages":"955-973"},"PeriodicalIF":2.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146086261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ziwei Huang, Liang An, Yang Ye, Zizhan Wang, Qing Fan, Qixuan Zhu, Ziqing Ding
Underwater sound propagation modeling is crucial for ocean environmental monitoring, underwater communication, and target localization. Traditional underwater acoustics models are limited by high computational costs and restricted adaptability, while data-driven machine learning methods lack physical constraints, leading to poor generalization and reliance on large datasets. Although Physics-Informed Neural Networks have recently emerged to integrate physical priors, they still face challenges in achieving accurate long-range extrapolation. To address this limitation, we propose U-PARANET, a physics-informed machine learning method that incorporates the parabolic equation as a hard constraint directly into its architecture. The model leverages the parabolic equation's recursive, range-stepping structure within a neural network framework, enhancing stability and mitigating error accumulation over long-range propagation. Validation on both simulated and experimental data shows that U-PARANET accurately predicts transmission loss and phase structures, with good agreement in spatial field patterns. Specifically, the mean absolute error for transmission loss prediction is 1.40 dB in an ideal shallow-water environment, 1.06 dB in a simulation using SWellEx-96 environmental parameters, and 2.87 dB on SWellEx-96 experimental data. In conclusion, the proposed method exhibits excellent long-range modeling capabilities, demonstrating robust extrapolation in challenging, realistic environments.
{"title":"A parabolic equation-based physics-informed machine learning method for underwater sound propagation modelinga).","authors":"Ziwei Huang, Liang An, Yang Ye, Zizhan Wang, Qing Fan, Qixuan Zhu, Ziqing Ding","doi":"10.1121/10.0042152","DOIUrl":"https://doi.org/10.1121/10.0042152","url":null,"abstract":"<p><p>Underwater sound propagation modeling is crucial for ocean environmental monitoring, underwater communication, and target localization. Traditional underwater acoustics models are limited by high computational costs and restricted adaptability, while data-driven machine learning methods lack physical constraints, leading to poor generalization and reliance on large datasets. Although Physics-Informed Neural Networks have recently emerged to integrate physical priors, they still face challenges in achieving accurate long-range extrapolation. To address this limitation, we propose U-PARANET, a physics-informed machine learning method that incorporates the parabolic equation as a hard constraint directly into its architecture. The model leverages the parabolic equation's recursive, range-stepping structure within a neural network framework, enhancing stability and mitigating error accumulation over long-range propagation. Validation on both simulated and experimental data shows that U-PARANET accurately predicts transmission loss and phase structures, with good agreement in spatial field patterns. Specifically, the mean absolute error for transmission loss prediction is 1.40 dB in an ideal shallow-water environment, 1.06 dB in a simulation using SWellEx-96 environmental parameters, and 2.87 dB on SWellEx-96 experimental data. In conclusion, the proposed method exhibits excellent long-range modeling capabilities, demonstrating robust extrapolation in challenging, realistic environments.</p>","PeriodicalId":17168,"journal":{"name":"Journal of the Acoustical Society of America","volume":"159 1","pages":"906-923"},"PeriodicalIF":2.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146086263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Face masks are widely employed for personal protection in the post-Covid era. However, their impact on consonant production remains unclear. As a part of a series of investigations into the effects of wearing face masks, the present study aims to acoustically examine the effects of face masks on consonant production. Speech signals, including six plosives (/p, pʰ, t, tʰ, k, kʰ/) and three fricatives (/f, s, ɕ/) in Mandarin under masked and unmasked conditions, were segmented from continuous speech and measured. Acoustic measurements, including temporal and spectral parameters, were employed. Significant alterations were observed in the acoustic measures associated with the masked condition compared to the unmasked condition, encompassing reduced closure durations for plosives, and reduced durations and spectral peak locations for fricatives. In addition, changes were observed in the center of gravity, variance, skewness, and kurtosis for both plosives and fricatives. However, the voice onset time for plosives did not exhibit a statistically significant change. Similar patterns with varying degrees of change were observed between men and women. Wearing a face mask substantially influences consonant production, potentially diminishing the audibility of these sounds and highlighting the necessity for compensatory strategies in contexts requiring clear oral communication.
{"title":"Acoustic changes in consonant production with a face maska).","authors":"Feiyun Jiang, Yang Chen, Manwa L Ng","doi":"10.1121/10.0042274","DOIUrl":"https://doi.org/10.1121/10.0042274","url":null,"abstract":"<p><p>Face masks are widely employed for personal protection in the post-Covid era. However, their impact on consonant production remains unclear. As a part of a series of investigations into the effects of wearing face masks, the present study aims to acoustically examine the effects of face masks on consonant production. Speech signals, including six plosives (/p, pʰ, t, tʰ, k, kʰ/) and three fricatives (/f, s, ɕ/) in Mandarin under masked and unmasked conditions, were segmented from continuous speech and measured. Acoustic measurements, including temporal and spectral parameters, were employed. Significant alterations were observed in the acoustic measures associated with the masked condition compared to the unmasked condition, encompassing reduced closure durations for plosives, and reduced durations and spectral peak locations for fricatives. In addition, changes were observed in the center of gravity, variance, skewness, and kurtosis for both plosives and fricatives. However, the voice onset time for plosives did not exhibit a statistically significant change. Similar patterns with varying degrees of change were observed between men and women. Wearing a face mask substantially influences consonant production, potentially diminishing the audibility of these sounds and highlighting the necessity for compensatory strategies in contexts requiring clear oral communication.</p>","PeriodicalId":17168,"journal":{"name":"Journal of the Acoustical Society of America","volume":"159 1","pages":"874-887"},"PeriodicalIF":2.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146064363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}