Listeners bin continuous changes in the speech signal into phonetic categories but vary in how consistently/discretely they do so. Categorization may relate to speech-in-noise (SIN) perception. Yet, it is unclear if and how perceptual gradience, consistency, and other cognitive factors (e.g., working memory) collectively predict SIN performance. Here, we estimated perceptual gradiency and response consistency during vowel labeling and assessed working memory and SIN performance. We found perceptual consistency and working memory were the best predictors of listeners' SIN scores. Our findings emphasize the importance of perceptual consistency over categoricity for noise-degraded speech perception.
{"title":"Consistency in phonetic categorization predicts successful speech-in-noise perception.","authors":"Rose Rizzi, Gavin M Bidelman","doi":"10.1121/10.0041846","DOIUrl":"10.1121/10.0041846","url":null,"abstract":"<p><p>Listeners bin continuous changes in the speech signal into phonetic categories but vary in how consistently/discretely they do so. Categorization may relate to speech-in-noise (SIN) perception. Yet, it is unclear if and how perceptual gradience, consistency, and other cognitive factors (e.g., working memory) collectively predict SIN performance. Here, we estimated perceptual gradiency and response consistency during vowel labeling and assessed working memory and SIN performance. We found perceptual consistency and working memory were the best predictors of listeners' SIN scores. Our findings emphasize the importance of perceptual consistency over categoricity for noise-degraded speech perception.</p>","PeriodicalId":73538,"journal":{"name":"JASA express letters","volume":"5 12","pages":""},"PeriodicalIF":1.4,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12700568/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145716868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Miriam Oschkinat, Melanie Weirich, Daniel Duran, Stefanie Jannedy
This study analyzed fundamental frequency (f0) data from 717 German speakers, collected via the Plapper smartphone app, to investigate phonetic variation as a function of imagined addressee authority. Participants justified crossing a street during a red light, addressing either an imagined friend (male or female) or an imagined male police officer. Speakers consistently produced higher f0 when addressing the police officer, regardless of sex or age. The findings support Bell's Audience Design model, which posits that speakers adapt their speech to gain approval, and Ohala's frequency code theory, which associates elevated f0 with submissiveness in interactions involving authorities.
{"title":"Register effects of imagined addressees on f0 across generations.","authors":"Miriam Oschkinat, Melanie Weirich, Daniel Duran, Stefanie Jannedy","doi":"10.1121/10.0039810","DOIUrl":"10.1121/10.0039810","url":null,"abstract":"<p><p>This study analyzed fundamental frequency (f0) data from 717 German speakers, collected via the Plapper smartphone app, to investigate phonetic variation as a function of imagined addressee authority. Participants justified crossing a street during a red light, addressing either an imagined friend (male or female) or an imagined male police officer. Speakers consistently produced higher f0 when addressing the police officer, regardless of sex or age. The findings support Bell's Audience Design model, which posits that speakers adapt their speech to gain approval, and Ohala's frequency code theory, which associates elevated f0 with submissiveness in interactions involving authorities.</p>","PeriodicalId":73538,"journal":{"name":"JASA express letters","volume":"5 11","pages":""},"PeriodicalIF":1.4,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145484181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ding-Lan Tang, Caroline A Niziolek, Benjamin Parrell
This study examined both the between-subject and within-subject relationships between vowel duration and formant variability during productions of both isolated words and connected speech by analyzing three existing datasets (N = 132). A positive between-subject correlation was observed in isolated words and, marginally, in connected speech. This finding is consistent with the idea that individuals who are more variable rely more on feedback-based control for vowel production, as longer durations allow more time for online corrections. Conversely, no such correlation was found within speakers at the trial level, suggesting that individuals do not modify their vowel duration online for each production.
{"title":"Formant variability is related to vowel duration across speakers.","authors":"Ding-Lan Tang, Caroline A Niziolek, Benjamin Parrell","doi":"10.1121/10.0039754","DOIUrl":"https://doi.org/10.1121/10.0039754","url":null,"abstract":"<p><p>This study examined both the between-subject and within-subject relationships between vowel duration and formant variability during productions of both isolated words and connected speech by analyzing three existing datasets (N = 132). A positive between-subject correlation was observed in isolated words and, marginally, in connected speech. This finding is consistent with the idea that individuals who are more variable rely more on feedback-based control for vowel production, as longer durations allow more time for online corrections. Conversely, no such correlation was found within speakers at the trial level, suggesting that individuals do not modify their vowel duration online for each production.</p>","PeriodicalId":73538,"journal":{"name":"JASA express letters","volume":"5 11","pages":""},"PeriodicalIF":1.4,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145440098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Accurate estimation of attenuation coefficient slope (ACS) enhances quantitative ultrasound imaging. We investigate diffraction bias in reference-phantom-free ACS estimation for synthetic transmit aperture (STA) imaging with a one-dimensional linear array. Simulations and experiments show that the elevational focus, rather than the lateral focus, introduces depth-dependent bias: ACS underestimation before focus and overestimation beyond. Applying a Gaussian diffraction correction reduced the ACS error from ±0.28 to ±0.06 dB/cm-MHz in simulations and from ±0.32 to ±0.09 dB/cm-MHz in the experiments. These results indicate that accounting for elevational diffraction reduces bias in reference-phantom-free ACS estimation in STA imaging.
{"title":"Elevational diffraction effect in estimating the local attenuation-coefficient slope using synthetic-transmit-aperture ultrasound imaging.","authors":"Khalid Abdalla, Na Zhao, Yuan Xu","doi":"10.1121/10.0039768","DOIUrl":"https://doi.org/10.1121/10.0039768","url":null,"abstract":"<p><p>Accurate estimation of attenuation coefficient slope (ACS) enhances quantitative ultrasound imaging. We investigate diffraction bias in reference-phantom-free ACS estimation for synthetic transmit aperture (STA) imaging with a one-dimensional linear array. Simulations and experiments show that the elevational focus, rather than the lateral focus, introduces depth-dependent bias: ACS underestimation before focus and overestimation beyond. Applying a Gaussian diffraction correction reduced the ACS error from ±0.28 to ±0.06 dB/cm-MHz in simulations and from ±0.32 to ±0.09 dB/cm-MHz in the experiments. These results indicate that accounting for elevational diffraction reduces bias in reference-phantom-free ACS estimation in STA imaging.</p>","PeriodicalId":73538,"journal":{"name":"JASA express letters","volume":"5 11","pages":""},"PeriodicalIF":1.4,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145440080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The production of underwater noise from on-land detonations is of concern, especially near sensitive marine mammal habitats. Despite this, there is a lack of public experimental data to analyze the characteristics of this type of noise. This paper quantifies noise from near-water land detonations, based on measurements obtained at Bentinck Island Demolition Range, Vancouver Island. The measurements show that ground-to-water propagation is dominant and that air-to-water coupling via evanescent waves is also present but mostly perceptible only at close distances from the detonation. A simple wavenumber integration model is used to illustrate the depth dependency of the evanescent field.
{"title":"Underwater noise from on-land blasting.","authors":"J E Quijano, M W Koessler","doi":"10.1121/10.0039805","DOIUrl":"https://doi.org/10.1121/10.0039805","url":null,"abstract":"<p><p>The production of underwater noise from on-land detonations is of concern, especially near sensitive marine mammal habitats. Despite this, there is a lack of public experimental data to analyze the characteristics of this type of noise. This paper quantifies noise from near-water land detonations, based on measurements obtained at Bentinck Island Demolition Range, Vancouver Island. The measurements show that ground-to-water propagation is dominant and that air-to-water coupling via evanescent waves is also present but mostly perceptible only at close distances from the detonation. A simple wavenumber integration model is used to illustrate the depth dependency of the evanescent field.</p>","PeriodicalId":73538,"journal":{"name":"JASA express letters","volume":"5 11","pages":""},"PeriodicalIF":1.4,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145440180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The control of speech can be modeled as a dynamical system in which articulators are driven toward target positions. These models are typically evaluated using fleshpoint data, such as electromagnetic articulography (EMA), but recent methodological advances make ultrasound imaging a promising alternative. We evaluate whether the parameters of a linear harmonic oscillator can be reliably estimated from ultrasound tongue kinematics and compare these with parameters estimated from simultaneously recorded EMA data. We find that ultrasound and EMA yield comparable dynamical parameters, while mandibular short tendon tracking also adequately captures jaw motion. This supports using ultrasound kinematics to evaluate dynamical articulatory models.
{"title":"Dynamical model parameters from ultrasound tongue kinematics.","authors":"Sam Kirkham, Patrycja Strycharczuk","doi":"10.1121/10.0039769","DOIUrl":"https://doi.org/10.1121/10.0039769","url":null,"abstract":"<p><p>The control of speech can be modeled as a dynamical system in which articulators are driven toward target positions. These models are typically evaluated using fleshpoint data, such as electromagnetic articulography (EMA), but recent methodological advances make ultrasound imaging a promising alternative. We evaluate whether the parameters of a linear harmonic oscillator can be reliably estimated from ultrasound tongue kinematics and compare these with parameters estimated from simultaneously recorded EMA data. We find that ultrasound and EMA yield comparable dynamical parameters, while mandibular short tendon tracking also adequately captures jaw motion. This supports using ultrasound kinematics to evaluate dynamical articulatory models.</p>","PeriodicalId":73538,"journal":{"name":"JASA express letters","volume":"5 11","pages":""},"PeriodicalIF":1.4,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145440014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sajad Sadeghkhani, Maryam Karimi Boroujeni, Hilmi R Dajani, Saeid R Seydnejad, Christian Giguère
The frequency following response (FFR) reflects the brain's neural encoding of pitch through the fundamental frequency (F0). While autocorrelation has long been the standard method for estimating F0 in FFRs, few alternatives have been explored. We propose a harmonic-structure-based approach that leverages knowledge of the stimulus F0 to guide a selective filterbank, extracting harmonic energy while suppressing noise. F0 is then tracked by identifying the most prominent spectral peak. Applied to FFRs recorded from 16 listeners in response to natural speech stimuli, the method reduced F0 tracking error by 8.8% to 47.4% compared to autocorrelation, offering a more accurate approach.
{"title":"A method for pitch tracking in the frequency following response using harmonic amplitude summation filterbank.","authors":"Sajad Sadeghkhani, Maryam Karimi Boroujeni, Hilmi R Dajani, Saeid R Seydnejad, Christian Giguère","doi":"10.1121/10.0039749","DOIUrl":"https://doi.org/10.1121/10.0039749","url":null,"abstract":"<p><p>The frequency following response (FFR) reflects the brain's neural encoding of pitch through the fundamental frequency (F0). While autocorrelation has long been the standard method for estimating F0 in FFRs, few alternatives have been explored. We propose a harmonic-structure-based approach that leverages knowledge of the stimulus F0 to guide a selective filterbank, extracting harmonic energy while suppressing noise. F0 is then tracked by identifying the most prominent spectral peak. Applied to FFRs recorded from 16 listeners in response to natural speech stimuli, the method reduced F0 tracking error by 8.8% to 47.4% compared to autocorrelation, offering a more accurate approach.</p>","PeriodicalId":73538,"journal":{"name":"JASA express letters","volume":"5 11","pages":""},"PeriodicalIF":1.4,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145439997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Currently, diagnosis of voice disorders is often made when patients visit the clinic, by which time speakers already experience vocal difficulties. The goal of this study was to develop a voice inversion system that predicts how speakers modulate vocal physiology from the produced voice, toward early detection of unhealthy vocal behavior. Two neural networks, a Bayesian neural network and a deep ensemble of neural networks, were developed that predict changes in vocal physiological parameters and their confidence intervals. Comparison to human data showed that both networks were able to predict meaningful differences in vocal behavior across subjects, demonstrating their potential toward ambulatory monitoring of vocal behavior at the physiological level.
{"title":"Toward ambulatory monitoring of vocal behavior at the physiological level using deep ensembles and Bayesian neural networks.","authors":"Zhaoyan Zhang","doi":"10.1121/10.0039842","DOIUrl":"10.1121/10.0039842","url":null,"abstract":"<p><p>Currently, diagnosis of voice disorders is often made when patients visit the clinic, by which time speakers already experience vocal difficulties. The goal of this study was to develop a voice inversion system that predicts how speakers modulate vocal physiology from the produced voice, toward early detection of unhealthy vocal behavior. Two neural networks, a Bayesian neural network and a deep ensemble of neural networks, were developed that predict changes in vocal physiological parameters and their confidence intervals. Comparison to human data showed that both networks were able to predict meaningful differences in vocal behavior across subjects, demonstrating their potential toward ambulatory monitoring of vocal behavior at the physiological level.</p>","PeriodicalId":73538,"journal":{"name":"JASA express letters","volume":"5 11","pages":""},"PeriodicalIF":1.4,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12614231/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145491058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Márcio Marques, Leonardo Mendonça, Arthur Bizzi, Leonardo Moreira, Christian Oliveira, Deborah Oliveira, Lucas Fernandez, Vitor Balestro, João Pereira, Daniel Yukimura, Tiago Novello, Pavel Petrov, Lucas Nissenbaum
Physics-informed neural networks (PINNs) have emerged as a promising tool for simulating various phenomena. However, their application in underwater acoustics remains challenging, primarily due to the need to sample large computational domains and to convergence to trivial solutions. This study presents a strategy to address these issues by combining adaptive domain sampling with absorbing boundary conditions. The adaptive sampler dynamically focuses computational effort on regions where the acoustic energy is localized, while the absorbing boundaries perform training stabilization. Numerical experiments show that our method improves the stability and convergence of PINN training, leading to more accurate and reliable wave propagation simulations.
{"title":"Stable adaptive training for physics-informed neural networks in acoustic wave propagation.","authors":"Márcio Marques, Leonardo Mendonça, Arthur Bizzi, Leonardo Moreira, Christian Oliveira, Deborah Oliveira, Lucas Fernandez, Vitor Balestro, João Pereira, Daniel Yukimura, Tiago Novello, Pavel Petrov, Lucas Nissenbaum","doi":"10.1121/10.0039767","DOIUrl":"https://doi.org/10.1121/10.0039767","url":null,"abstract":"<p><p>Physics-informed neural networks (PINNs) have emerged as a promising tool for simulating various phenomena. However, their application in underwater acoustics remains challenging, primarily due to the need to sample large computational domains and to convergence to trivial solutions. This study presents a strategy to address these issues by combining adaptive domain sampling with absorbing boundary conditions. The adaptive sampler dynamically focuses computational effort on regions where the acoustic energy is localized, while the absorbing boundaries perform training stabilization. Numerical experiments show that our method improves the stability and convergence of PINN training, leading to more accurate and reliable wave propagation simulations.</p>","PeriodicalId":73538,"journal":{"name":"JASA express letters","volume":"5 11","pages":""},"PeriodicalIF":1.4,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145453683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This study characterizes the statistical dependence of low-frequency ambient sound on wind speed using 8 years of Ocean Observatories Initiative hydrophone data. Data from two bottom-mounted hydrophones, sampled at 200 Hz, are compared to a National Oceanic and Atmospheric Administration surface winds model. One hydrophone is on the continental slope and one is in open ocean. Wind dependence on ambient sound levels is reported for both locations across all studied frequencies (0.1-90 Hz). A piecewise, log-linear model is fit to ambient sound and wind speed, and different regions of wind dependence are discussed. Directional surface wave spectra from a nearby buoy are compared with acoustic measurements below 1.2 Hz. Time-frequency characteristics in acoustic data are largely explained by local surface spectra.
{"title":"Characterizing wind-dependent low-frequency ambient sound with ocean observatories initiative hydrophones.","authors":"John Ragland, Shima Abadi","doi":"10.1121/10.0039811","DOIUrl":"https://doi.org/10.1121/10.0039811","url":null,"abstract":"<p><p>This study characterizes the statistical dependence of low-frequency ambient sound on wind speed using 8 years of Ocean Observatories Initiative hydrophone data. Data from two bottom-mounted hydrophones, sampled at 200 Hz, are compared to a National Oceanic and Atmospheric Administration surface winds model. One hydrophone is on the continental slope and one is in open ocean. Wind dependence on ambient sound levels is reported for both locations across all studied frequencies (0.1-90 Hz). A piecewise, log-linear model is fit to ambient sound and wind speed, and different regions of wind dependence are discussed. Directional surface wave spectra from a nearby buoy are compared with acoustic measurements below 1.2 Hz. Time-frequency characteristics in acoustic data are largely explained by local surface spectra.</p>","PeriodicalId":73538,"journal":{"name":"JASA express letters","volume":"5 11","pages":""},"PeriodicalIF":1.4,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145453574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}