The control of speech can be modeled as a dynamical system in which articulators are driven toward target positions. These models are typically evaluated using fleshpoint data, such as electromagnetic articulography (EMA), but recent methodological advances make ultrasound imaging a promising alternative. We evaluate whether the parameters of a linear harmonic oscillator can be reliably estimated from ultrasound tongue kinematics and compare these with parameters estimated from simultaneously recorded EMA data. We find that ultrasound and EMA yield comparable dynamical parameters, while mandibular short tendon tracking also adequately captures jaw motion. This supports using ultrasound kinematics to evaluate dynamical articulatory models.
{"title":"Dynamical model parameters from ultrasound tongue kinematics.","authors":"Sam Kirkham, Patrycja Strycharczuk","doi":"10.1121/10.0039769","DOIUrl":"https://doi.org/10.1121/10.0039769","url":null,"abstract":"<p><p>The control of speech can be modeled as a dynamical system in which articulators are driven toward target positions. These models are typically evaluated using fleshpoint data, such as electromagnetic articulography (EMA), but recent methodological advances make ultrasound imaging a promising alternative. We evaluate whether the parameters of a linear harmonic oscillator can be reliably estimated from ultrasound tongue kinematics and compare these with parameters estimated from simultaneously recorded EMA data. We find that ultrasound and EMA yield comparable dynamical parameters, while mandibular short tendon tracking also adequately captures jaw motion. This supports using ultrasound kinematics to evaluate dynamical articulatory models.</p>","PeriodicalId":73538,"journal":{"name":"JASA express letters","volume":"5 11","pages":""},"PeriodicalIF":1.4,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145440014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sajad Sadeghkhani, Maryam Karimi Boroujeni, Hilmi R Dajani, Saeid R Seydnejad, Christian Giguère
The frequency following response (FFR) reflects the brain's neural encoding of pitch through the fundamental frequency (F0). While autocorrelation has long been the standard method for estimating F0 in FFRs, few alternatives have been explored. We propose a harmonic-structure-based approach that leverages knowledge of the stimulus F0 to guide a selective filterbank, extracting harmonic energy while suppressing noise. F0 is then tracked by identifying the most prominent spectral peak. Applied to FFRs recorded from 16 listeners in response to natural speech stimuli, the method reduced F0 tracking error by 8.8% to 47.4% compared to autocorrelation, offering a more accurate approach.
{"title":"A method for pitch tracking in the frequency following response using harmonic amplitude summation filterbank.","authors":"Sajad Sadeghkhani, Maryam Karimi Boroujeni, Hilmi R Dajani, Saeid R Seydnejad, Christian Giguère","doi":"10.1121/10.0039749","DOIUrl":"https://doi.org/10.1121/10.0039749","url":null,"abstract":"<p><p>The frequency following response (FFR) reflects the brain's neural encoding of pitch through the fundamental frequency (F0). While autocorrelation has long been the standard method for estimating F0 in FFRs, few alternatives have been explored. We propose a harmonic-structure-based approach that leverages knowledge of the stimulus F0 to guide a selective filterbank, extracting harmonic energy while suppressing noise. F0 is then tracked by identifying the most prominent spectral peak. Applied to FFRs recorded from 16 listeners in response to natural speech stimuli, the method reduced F0 tracking error by 8.8% to 47.4% compared to autocorrelation, offering a more accurate approach.</p>","PeriodicalId":73538,"journal":{"name":"JASA express letters","volume":"5 11","pages":""},"PeriodicalIF":1.4,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145439997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Currently, diagnosis of voice disorders is often made when patients visit the clinic, by which time speakers already experience vocal difficulties. The goal of this study was to develop a voice inversion system that predicts how speakers modulate vocal physiology from the produced voice, toward early detection of unhealthy vocal behavior. Two neural networks, a Bayesian neural network and a deep ensemble of neural networks, were developed that predict changes in vocal physiological parameters and their confidence intervals. Comparison to human data showed that both networks were able to predict meaningful differences in vocal behavior across subjects, demonstrating their potential toward ambulatory monitoring of vocal behavior at the physiological level.
{"title":"Toward ambulatory monitoring of vocal behavior at the physiological level using deep ensembles and Bayesian neural networks.","authors":"Zhaoyan Zhang","doi":"10.1121/10.0039842","DOIUrl":"10.1121/10.0039842","url":null,"abstract":"<p><p>Currently, diagnosis of voice disorders is often made when patients visit the clinic, by which time speakers already experience vocal difficulties. The goal of this study was to develop a voice inversion system that predicts how speakers modulate vocal physiology from the produced voice, toward early detection of unhealthy vocal behavior. Two neural networks, a Bayesian neural network and a deep ensemble of neural networks, were developed that predict changes in vocal physiological parameters and their confidence intervals. Comparison to human data showed that both networks were able to predict meaningful differences in vocal behavior across subjects, demonstrating their potential toward ambulatory monitoring of vocal behavior at the physiological level.</p>","PeriodicalId":73538,"journal":{"name":"JASA express letters","volume":"5 11","pages":""},"PeriodicalIF":1.4,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12614231/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145491058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Márcio Marques, Leonardo Mendonça, Arthur Bizzi, Leonardo Moreira, Christian Oliveira, Deborah Oliveira, Lucas Fernandez, Vitor Balestro, João Pereira, Daniel Yukimura, Tiago Novello, Pavel Petrov, Lucas Nissenbaum
Physics-informed neural networks (PINNs) have emerged as a promising tool for simulating various phenomena. However, their application in underwater acoustics remains challenging, primarily due to the need to sample large computational domains and to convergence to trivial solutions. This study presents a strategy to address these issues by combining adaptive domain sampling with absorbing boundary conditions. The adaptive sampler dynamically focuses computational effort on regions where the acoustic energy is localized, while the absorbing boundaries perform training stabilization. Numerical experiments show that our method improves the stability and convergence of PINN training, leading to more accurate and reliable wave propagation simulations.
{"title":"Stable adaptive training for physics-informed neural networks in acoustic wave propagation.","authors":"Márcio Marques, Leonardo Mendonça, Arthur Bizzi, Leonardo Moreira, Christian Oliveira, Deborah Oliveira, Lucas Fernandez, Vitor Balestro, João Pereira, Daniel Yukimura, Tiago Novello, Pavel Petrov, Lucas Nissenbaum","doi":"10.1121/10.0039767","DOIUrl":"https://doi.org/10.1121/10.0039767","url":null,"abstract":"<p><p>Physics-informed neural networks (PINNs) have emerged as a promising tool for simulating various phenomena. However, their application in underwater acoustics remains challenging, primarily due to the need to sample large computational domains and to convergence to trivial solutions. This study presents a strategy to address these issues by combining adaptive domain sampling with absorbing boundary conditions. The adaptive sampler dynamically focuses computational effort on regions where the acoustic energy is localized, while the absorbing boundaries perform training stabilization. Numerical experiments show that our method improves the stability and convergence of PINN training, leading to more accurate and reliable wave propagation simulations.</p>","PeriodicalId":73538,"journal":{"name":"JASA express letters","volume":"5 11","pages":""},"PeriodicalIF":1.4,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145453683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This study characterizes the statistical dependence of low-frequency ambient sound on wind speed using 8 years of Ocean Observatories Initiative hydrophone data. Data from two bottom-mounted hydrophones, sampled at 200 Hz, are compared to a National Oceanic and Atmospheric Administration surface winds model. One hydrophone is on the continental slope and one is in open ocean. Wind dependence on ambient sound levels is reported for both locations across all studied frequencies (0.1-90 Hz). A piecewise, log-linear model is fit to ambient sound and wind speed, and different regions of wind dependence are discussed. Directional surface wave spectra from a nearby buoy are compared with acoustic measurements below 1.2 Hz. Time-frequency characteristics in acoustic data are largely explained by local surface spectra.
{"title":"Characterizing wind-dependent low-frequency ambient sound with ocean observatories initiative hydrophones.","authors":"John Ragland, Shima Abadi","doi":"10.1121/10.0039811","DOIUrl":"https://doi.org/10.1121/10.0039811","url":null,"abstract":"<p><p>This study characterizes the statistical dependence of low-frequency ambient sound on wind speed using 8 years of Ocean Observatories Initiative hydrophone data. Data from two bottom-mounted hydrophones, sampled at 200 Hz, are compared to a National Oceanic and Atmospheric Administration surface winds model. One hydrophone is on the continental slope and one is in open ocean. Wind dependence on ambient sound levels is reported for both locations across all studied frequencies (0.1-90 Hz). A piecewise, log-linear model is fit to ambient sound and wind speed, and different regions of wind dependence are discussed. Directional surface wave spectra from a nearby buoy are compared with acoustic measurements below 1.2 Hz. Time-frequency characteristics in acoustic data are largely explained by local surface spectra.</p>","PeriodicalId":73538,"journal":{"name":"JASA express letters","volume":"5 11","pages":""},"PeriodicalIF":1.4,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145453574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this Letter, a simplified expression of the vertical dimension narrowband interference pattern in deep water direct arrival zone is theoretically derived. The correlation between the vertical dimension interference pattern and source depth is analyzed, and the depth span corresponding to a complete interference cycle (interference period) is derived theoretically. A source depth estimation method is proposed based on matching the vertical interference pattern calculated from the data with theoretical predictions. The effectiveness of the theoretical approach in predicting the vertical interference pattern and effectiveness of the proposed depth estimation method are verified by experimental data.
{"title":"Vertical dimension acoustic field interference pattern prediction and source depth estimation in deep ocean.","authors":"Guangying Zheng, Hao Wang, Shuaishuai Zhang, Linlang Bai, Fangwei Zhu, Jiabao Feng, Wang Hao","doi":"10.1121/10.0039752","DOIUrl":"https://doi.org/10.1121/10.0039752","url":null,"abstract":"<p><p>In this Letter, a simplified expression of the vertical dimension narrowband interference pattern in deep water direct arrival zone is theoretically derived. The correlation between the vertical dimension interference pattern and source depth is analyzed, and the depth span corresponding to a complete interference cycle (interference period) is derived theoretically. A source depth estimation method is proposed based on matching the vertical interference pattern calculated from the data with theoretical predictions. The effectiveness of the theoretical approach in predicting the vertical interference pattern and effectiveness of the proposed depth estimation method are verified by experimental data.</p>","PeriodicalId":73538,"journal":{"name":"JASA express letters","volume":"5 11","pages":""},"PeriodicalIF":1.4,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145440184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Spatiotemporal modulation (STM) has recently been used to improve diffusion from finite apertures of conventional acoustic diffuser profiles by introducing STM of the termination impedance of the diffuser wells and to scatter acoustic energy at frequencies that are up- and down-shifted from the incident wave frequency by integer multiples of the modulation frequencies. The present work employs a semi-analytical model of acoustic scattering from an STM acoustic metasurface to investigate the tunability of scattering from a flat metasurface with STM admittance by demonstrating nonreciprocal and diffuse scattering of sound via a parametric study on the modulation frequency and admittance amplitude. The results provide insight into the use of STM of input admittance to control acoustic scattering from acoustic metasurfaces.
{"title":"Tunable acoustic scattering from a spatiotemporally modulated flat metasurface.","authors":"Janghoon Kang, Michael R Haberman","doi":"10.1121/10.0039860","DOIUrl":"https://doi.org/10.1121/10.0039860","url":null,"abstract":"<p><p>Spatiotemporal modulation (STM) has recently been used to improve diffusion from finite apertures of conventional acoustic diffuser profiles by introducing STM of the termination impedance of the diffuser wells and to scatter acoustic energy at frequencies that are up- and down-shifted from the incident wave frequency by integer multiples of the modulation frequencies. The present work employs a semi-analytical model of acoustic scattering from an STM acoustic metasurface to investigate the tunability of scattering from a flat metasurface with STM admittance by demonstrating nonreciprocal and diffuse scattering of sound via a parametric study on the modulation frequency and admittance amplitude. The results provide insight into the use of STM of input admittance to control acoustic scattering from acoustic metasurfaces.</p>","PeriodicalId":73538,"journal":{"name":"JASA express letters","volume":"5 11","pages":""},"PeriodicalIF":1.4,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145497767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shadle [(2023). J. Acoust. Soc. Am. 153, 1412-1426] proposed that the spectral peak in mid-frequency (FM) is a superior measure of place of articulation of sibilant fricatives to the most commonly used measure, the first spectral moment (M1). It is examined as to whether FM predicts adult listener's ratings of the place of articulation of 2.5-3.5-year-old children's word-initial /s/ and /ʃ/ when compared to M1. Regression models reveal that FM in 3-9 kHz range best predicts listener's ratings of children's fricatives. These results provide additional validation for FM as a measure of fricatives' place of articulation, including in children's speech.
Shadle[(2023)。j . Acoust。Soc。Am. 153, 1412-1426]提出,中频(FM)的谱峰比最常用的第一个谱矩(M1)更好地衡量了元音摩擦音的发音位置。研究FM是否能预测成人听者对2.5-3.5岁儿童的单词首字母/s/和/ h /的发音位置与M1相比的评价。回归模型显示,3- 9khz范围的调频最能预测听者对儿童摩擦音的评分。这些结果为FM作为摩擦音发音位置的测量提供了额外的验证,包括在儿童讲话中。
{"title":"Beyond spectral moments: Validating alternative measures of sibilant fricatives using listener ratings of children's speech.","authors":"Eugene Wong, Benjamin Munson","doi":"10.1121/10.0039497","DOIUrl":"10.1121/10.0039497","url":null,"abstract":"<p><p>Shadle [(2023). J. Acoust. Soc. Am. 153, 1412-1426] proposed that the spectral peak in mid-frequency (FM) is a superior measure of place of articulation of sibilant fricatives to the most commonly used measure, the first spectral moment (M1). It is examined as to whether FM predicts adult listener's ratings of the place of articulation of 2.5-3.5-year-old children's word-initial /s/ and /ʃ/ when compared to M1. Regression models reveal that FM in 3-9 kHz range best predicts listener's ratings of children's fricatives. These results provide additional validation for FM as a measure of fricatives' place of articulation, including in children's speech.</p>","PeriodicalId":73538,"journal":{"name":"JASA express letters","volume":"5 10","pages":""},"PeriodicalIF":1.4,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12499953/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145202228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Makayle S Kellison, Matthew J Casiano, Kent L Gee, Christopher J Brown, Tomas E Nesman
This Letter presents an analysis of near-field acoustic data collected on Space Launch System's Mobile Launcher tower during the Artemis I mission. Twelve pressure sensors located two and four effective nozzle diameters (De) from the vehicle centerline recorded maximum overall sound pressure levels ranging from ∼ 162 dB to more than 170 dB, originating ∼ 10 De downstream of the nozzle exit plane. Frequency-dependent characteristics are also discussed. The peak noise is radiated over a broader frequency range than in the far field. Low-frequency noise locations match other rockets, but high-frequency locations diverge, falling between prior measurements of undeflected and deflected plumes.
{"title":"Plume-generated near-field acoustics during liftoff of Artemis I.","authors":"Makayle S Kellison, Matthew J Casiano, Kent L Gee, Christopher J Brown, Tomas E Nesman","doi":"10.1121/10.0039568","DOIUrl":"https://doi.org/10.1121/10.0039568","url":null,"abstract":"<p><p>This Letter presents an analysis of near-field acoustic data collected on Space Launch System's Mobile Launcher tower during the Artemis I mission. Twelve pressure sensors located two and four effective nozzle diameters (De) from the vehicle centerline recorded maximum overall sound pressure levels ranging from ∼ 162 dB to more than 170 dB, originating ∼ 10 De downstream of the nozzle exit plane. Frequency-dependent characteristics are also discussed. The peak noise is radiated over a broader frequency range than in the far field. Low-frequency noise locations match other rockets, but high-frequency locations diverge, falling between prior measurements of undeflected and deflected plumes.</p>","PeriodicalId":73538,"journal":{"name":"JASA express letters","volume":"5 10","pages":""},"PeriodicalIF":1.4,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145294678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sadie O'Neill, Morgan Barkhouse, Chhayakant Patro, Nirmal Srinivasan
Spatial release from masking for an individual is dependent on the spatial separation between the target and the maskers, age, auditory capabilities, and working memory capacity. In this paper, a task is presented that estimates an individual's working memory capacity using a divided-attention version of the classic spatial release from masking task. Speech identification thresholds, temporal overlap thresholds, and working memory were measured for younger and older adults. The results showed younger listeners had better thresholds than older listeners across all the tests. Overall, this test can simultaneously estimate an individual's working memory and spatial processing capabilities.
{"title":"Estimating working memory based on a divided-attention version of the spatial release from masking taska).","authors":"Sadie O'Neill, Morgan Barkhouse, Chhayakant Patro, Nirmal Srinivasan","doi":"10.1121/10.0039672","DOIUrl":"https://doi.org/10.1121/10.0039672","url":null,"abstract":"<p><p>Spatial release from masking for an individual is dependent on the spatial separation between the target and the maskers, age, auditory capabilities, and working memory capacity. In this paper, a task is presented that estimates an individual's working memory capacity using a divided-attention version of the classic spatial release from masking task. Speech identification thresholds, temporal overlap thresholds, and working memory were measured for younger and older adults. The results showed younger listeners had better thresholds than older listeners across all the tests. Overall, this test can simultaneously estimate an individual's working memory and spatial processing capabilities.</p>","PeriodicalId":73538,"journal":{"name":"JASA express letters","volume":"5 10","pages":""},"PeriodicalIF":1.4,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145350368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}