Jorge Wuth, Rodrigo Mahu, Israel Cohen, Richard M Stern, Néstor Becerra Yoma
This paper presents a unified model for combining beamforming and blind source separation (BSS). The validity of the model's assumptions is confirmed by recovering target speech information in noise accurately using Oracle information. Using real static human-robot interaction (HRI) data, the proposed combination of BSS with the minimum-variance distortionless response beamformer provides a greater signal-to-noise ratio (SNR) than previous parallel and cascade systems that combine BSS and beamforming. In the difficult-to-model HRI dynamic environment, the system provides a SNR gain that was 2.8 dB greater than the results obtained with the cascade combination, where the parallel combination is infeasible.
{"title":"A unified beamforming and source separation model for static and dynamic human-robot interaction.","authors":"Jorge Wuth, Rodrigo Mahu, Israel Cohen, Richard M Stern, Néstor Becerra Yoma","doi":"10.1121/10.0025238","DOIUrl":"10.1121/10.0025238","url":null,"abstract":"<p><p>This paper presents a unified model for combining beamforming and blind source separation (BSS). The validity of the model's assumptions is confirmed by recovering target speech information in noise accurately using Oracle information. Using real static human-robot interaction (HRI) data, the proposed combination of BSS with the minimum-variance distortionless response beamformer provides a greater signal-to-noise ratio (SNR) than previous parallel and cascade systems that combine BSS and beamforming. In the difficult-to-model HRI dynamic environment, the system provides a SNR gain that was 2.8 dB greater than the results obtained with the cascade combination, where the parallel combination is infeasible.</p>","PeriodicalId":73538,"journal":{"name":"JASA express letters","volume":"4 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140029710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Acoustic propagation through a random distribution of 1 m ice cubes, from 100 to 1000 Hz, was simulated in a 3D finite element model. The effective sound speed and attenuation as functions of frequency were calculated from the simulated signals. Attempts were made to fit a number of models to the wave speed and attenuation, including single scattering, lossy water, and Biot approximations. An extended Biot model, developed for acoustic propagation in granular seabed sediments, was able to fit the simulation up to 300 Hz. Beyond this frequency, the simulation shows that multiple scattering dominates.
{"title":"Modeling and simulation of underwater acoustic propagation through a random distribution of ice blocks.","authors":"Nicholas P Chotiros, Sverre Holm","doi":"10.1121/10.0025395","DOIUrl":"https://doi.org/10.1121/10.0025395","url":null,"abstract":"<p><p>Acoustic propagation through a random distribution of 1 m ice cubes, from 100 to 1000 Hz, was simulated in a 3D finite element model. The effective sound speed and attenuation as functions of frequency were calculated from the simulated signals. Attempts were made to fit a number of models to the wave speed and attenuation, including single scattering, lossy water, and Biot approximations. An extended Biot model, developed for acoustic propagation in granular seabed sediments, was able to fit the simulation up to 300 Hz. Beyond this frequency, the simulation shows that multiple scattering dominates.</p>","PeriodicalId":73538,"journal":{"name":"JASA express letters","volume":"4 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140186476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lu Han, Ming Wu, Xiangning Liao, Xiaoyi Gao, Jun Yang, Xiaochun Yin
Directional sound radiation focuses sound in a specific direction and reduces sound radiation in other directions. This study uses a flat panel driven by an actuator array to realize two-dimensional directional sound radiation by the acoustic contrast control algorithm. The aliasing effect at higher frequencies is analyzed based on the modal vibration of the panel, and a method for estimating the high frequency limit is proposed. Actuator arrays with different parameters are simulated to verify the efficacy of the proposed method and compare the acoustic contrast response with the conventional loudspeaker arrays.
{"title":"Directional sound radiation from a rectangular panel and the high frequency limit.","authors":"Lu Han, Ming Wu, Xiangning Liao, Xiaoyi Gao, Jun Yang, Xiaochun Yin","doi":"10.1121/10.0024757","DOIUrl":"https://doi.org/10.1121/10.0024757","url":null,"abstract":"<p><p>Directional sound radiation focuses sound in a specific direction and reduces sound radiation in other directions. This study uses a flat panel driven by an actuator array to realize two-dimensional directional sound radiation by the acoustic contrast control algorithm. The aliasing effect at higher frequencies is analyzed based on the modal vibration of the panel, and a method for estimating the high frequency limit is proposed. Actuator arrays with different parameters are simulated to verify the efficacy of the proposed method and compare the acoustic contrast response with the conventional loudspeaker arrays.</p>","PeriodicalId":73538,"journal":{"name":"JASA express letters","volume":"4 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139682069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yadong Liu, Jahurul Islam, Kate Radford, Oksana Tkachman, Bryan Gick
This study examines the lateral biases in tongue movements during speech production. It builds on previous research on asymmetry in various aspects of human biology and behavior, focusing on the tongue's asymmetric behavior during speech. The findings reveal that speakers have a pronounced preference toward one side of the tongue during lateral releases with a majority displaying the left-side bias. This lateral bias in tongue speech movements is referred to as tonguedness. This research contributes to our understanding of the articulatory mechanisms involved in tongue movements and underscores the importance of considering lateral biases in speech production research.
{"title":"Tonguedness in speech: Lateral bias in lingual bracing.","authors":"Yadong Liu, Jahurul Islam, Kate Radford, Oksana Tkachman, Bryan Gick","doi":"10.1121/10.0024756","DOIUrl":"10.1121/10.0024756","url":null,"abstract":"<p><p>This study examines the lateral biases in tongue movements during speech production. It builds on previous research on asymmetry in various aspects of human biology and behavior, focusing on the tongue's asymmetric behavior during speech. The findings reveal that speakers have a pronounced preference toward one side of the tongue during lateral releases with a majority displaying the left-side bias. This lateral bias in tongue speech movements is referred to as tonguedness. This research contributes to our understanding of the articulatory mechanisms involved in tongue movements and underscores the importance of considering lateral biases in speech production research.</p>","PeriodicalId":73538,"journal":{"name":"JASA express letters","volume":"4 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10848656/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139718105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Urszula Oszczapinska, Laurie M Heller, Seojun Jang, Bridget Nance
Listeners recognizing environmental sounds must contend with variations in level due to the source level and the environment. Nonetheless, variations in level disrupt short-term sound recognition [Susini, Houix, Seropian, and Lemaitre (2019). J. Acoust. Soc. Am. 146(2), EL172-EL176] suggesting that loudness is encoded. We asked whether the experimental custom of setting sounds to equal levels disrupts long-term recognition, especially if it creates a mismatch with ecological loudness. Environmental sounds were played at equalized or ecological levels. Although recognition improved with increased loudness and familiarity, this relationship was unaffected by equalization or real-life experience with the source. However, sound pleasantness was altered by deviations from the ecological level.
{"title":"Ecological sound loudness in environmental sound representations.","authors":"Urszula Oszczapinska, Laurie M Heller, Seojun Jang, Bridget Nance","doi":"10.1121/10.0024995","DOIUrl":"10.1121/10.0024995","url":null,"abstract":"<p><p>Listeners recognizing environmental sounds must contend with variations in level due to the source level and the environment. Nonetheless, variations in level disrupt short-term sound recognition [Susini, Houix, Seropian, and Lemaitre (2019). J. Acoust. Soc. Am. 146(2), EL172-EL176] suggesting that loudness is encoded. We asked whether the experimental custom of setting sounds to equal levels disrupts long-term recognition, especially if it creates a mismatch with ecological loudness. Environmental sounds were played at equalized or ecological levels. Although recognition improved with increased loudness and familiarity, this relationship was unaffected by equalization or real-life experience with the source. However, sound pleasantness was altered by deviations from the ecological level.</p>","PeriodicalId":73538,"journal":{"name":"JASA express letters","volume":"4 2","pages":""},"PeriodicalIF":1.2,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139907086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Scattering measurements were made off the coast of Pacific Grove, CA at 200 kHz, in an exposed fractured granite seafloor. Using inertial sensors and a split-beam transducer, data were processed to obtain a range of grazing angles corresponding to scattering strength, and signal processing techniques were used to extract the relevant portion of each ping. The ensonified angular width from a circular aperture is presented. Scattering strength measurements using different assumptions regarding the grazing angle were compared. The empirical Lommel-Seeliger model provided a good fit to measured data with a parameter of -18.4 dB.
在加利福尼亚州太平洋格罗夫海岸外的裸露断裂花岗岩海底,以 200 千赫的频率进行了散射测量。利用惯性传感器和分波束换能器对数据进行处理,以获得与散射强度相对应的掠角范围,并利用信号处理技术提取每个坪的相关部分。图中给出了来自圆形孔径的激波角宽度。对采用不同掠射角假设的散射强度测量结果进行了比较。Lommel-Seeliger 经验模型与参数为 -18.4 dB 的测量数据拟合良好。
{"title":"Scattering measurements of rocky seafloors using a split-beam echosounder.","authors":"Jen A Gruber, Derek R Olson","doi":"10.1121/10.0024755","DOIUrl":"https://doi.org/10.1121/10.0024755","url":null,"abstract":"<p><p>Scattering measurements were made off the coast of Pacific Grove, CA at 200 kHz, in an exposed fractured granite seafloor. Using inertial sensors and a split-beam transducer, data were processed to obtain a range of grazing angles corresponding to scattering strength, and signal processing techniques were used to extract the relevant portion of each ping. The ensonified angular width from a circular aperture is presented. Scattering strength measurements using different assumptions regarding the grazing angle were compared. The empirical Lommel-Seeliger model provided a good fit to measured data with a parameter of -18.4 dB.</p>","PeriodicalId":73538,"journal":{"name":"JASA express letters","volume":"4 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139718104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Coherent processing in synthetic aperture sonar (SAS) requires platform motion estimation and compensation with sub-wavelength accuracy for high-resolution imaging. Micronavigation, i.e., through-the-sensor platform motion estimation, is essential when positioning information from navigational instruments is absent or inadequately accurate. A machine learning method based on variational Bayesian inference has been proposed for unsupervised data-driven micronavigation. Herein, the multiple-input multiple-output arrangement of a multi-band SAS system is exploited and combined with a hierarchical variational inference scheme, which self-supervises the learning of platform motion and results in improved micronavigation accuracy.
合成孔径声纳(SAS)的相干处理需要亚波长精度的平台运动估计和补偿,以实现高分辨率成像。在没有导航仪器提供定位信息或定位信息不够准确的情况下,微导航(即通过传感器进行平台运动估计)至关重要。有人提出了一种基于变异贝叶斯推理的机器学习方法,用于无监督数据驱动的微导航。在此,利用多波段 SAS 系统的多输入多输出安排,并结合分层变异推理方案,对平台运动进行自我监督学习,从而提高微导航精度。
{"title":"Platform motion estimation in multi-band synthetic aperture sonar with coupled variational autoencoders.","authors":"Angeliki Xenaki, Yan Pailhas, Alessandro Monti","doi":"10.1121/10.0024998","DOIUrl":"https://doi.org/10.1121/10.0024998","url":null,"abstract":"<p><p>Coherent processing in synthetic aperture sonar (SAS) requires platform motion estimation and compensation with sub-wavelength accuracy for high-resolution imaging. Micronavigation, i.e., through-the-sensor platform motion estimation, is essential when positioning information from navigational instruments is absent or inadequately accurate. A machine learning method based on variational Bayesian inference has been proposed for unsupervised data-driven micronavigation. Herein, the multiple-input multiple-output arrangement of a multi-band SAS system is exploited and combined with a hierarchical variational inference scheme, which self-supervises the learning of platform motion and results in improved micronavigation accuracy.</p>","PeriodicalId":73538,"journal":{"name":"JASA express letters","volume":"4 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139907087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This study investigates Whisper's automatic speech recognition (ASR) system performance across diverse native and non-native English accents. Results reveal superior recognition in American compared to British and Australian English accents with similar performance in Canadian English. Overall, native English accents demonstrate higher accuracy than non-native accents. Exploring connections between speaker traits [sex, native language (L1) typology, and second language (L2) proficiency] and word error rate uncovers notable associations. Furthermore, Whisper exhibits enhanced performance in read speech over conversational speech with modifications based on speaker gender. The implications of these findings are discussed.
{"title":"Evaluating OpenAI's Whisper ASR: Performance analysis across diverse accents and speaker traits.","authors":"Calbert Graham, Nathan Roll","doi":"10.1121/10.0024876","DOIUrl":"10.1121/10.0024876","url":null,"abstract":"<p><p>This study investigates Whisper's automatic speech recognition (ASR) system performance across diverse native and non-native English accents. Results reveal superior recognition in American compared to British and Australian English accents with similar performance in Canadian English. Overall, native English accents demonstrate higher accuracy than non-native accents. Exploring connections between speaker traits [sex, native language (L1) typology, and second language (L2) proficiency] and word error rate uncovers notable associations. Furthermore, Whisper exhibits enhanced performance in read speech over conversational speech with modifications based on speaker gender. The implications of these findings are discussed.</p>","PeriodicalId":73538,"journal":{"name":"JASA express letters","volume":"4 2","pages":""},"PeriodicalIF":1.2,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139934516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This work extends Doak's momentum potential theory to multi-chemical-component and reactive, time-stationary fluctuating flows. Additional mixture-related components are found to be superimposed on the canonical vortical, acoustic, and thermal parts of momentum fluctuations and total fluctuating enthalpy. These extended relations are used to develop a time-averaged model that relates the acoustic power radiated to the far-field with clearly defined vortical, acoustic, thermal, and compositional near-field sources. The resulting model is designed to offer a more general and comprehensive way to describe the noise generated within combustion chambers.
{"title":"Extension of Doak's momentum potential theory for multi-species and reacting flows.","authors":"Raffaele D'Aniello, Mario Casel, Karsten Knobloch","doi":"10.1121/10.0024994","DOIUrl":"10.1121/10.0024994","url":null,"abstract":"<p><p>This work extends Doak's momentum potential theory to multi-chemical-component and reactive, time-stationary fluctuating flows. Additional mixture-related components are found to be superimposed on the canonical vortical, acoustic, and thermal parts of momentum fluctuations and total fluctuating enthalpy. These extended relations are used to develop a time-averaged model that relates the acoustic power radiated to the far-field with clearly defined vortical, acoustic, thermal, and compositional near-field sources. The resulting model is designed to offer a more general and comprehensive way to describe the noise generated within combustion chambers.</p>","PeriodicalId":73538,"journal":{"name":"JASA express letters","volume":"4 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139974848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nina R Benway, Jonathan L Preston, Asif Salekin, Elaine Hitchcock, Tara McAllister
The effects of different acoustic representations and normalizations were compared for classifiers predicting perception of children's rhotic versus derhotic /ɹ/. Formant and Mel frequency cepstral coefficient (MFCC) representations for 350 speakers were z-standardized, either relative to values in the same utterance or age-and-sex data for typical /ɹ/. Statistical modeling indicated age-and-sex normalization significantly increased classifier performances. Clinically interpretable formants performed similarly to MFCCs and were endorsed for deep neural network engineering, achieving mean test-participant-specific F1-score = 0.81 after personalization and replication (σx = 0.10, med = 0.83, n = 48). Shapley additive explanations analysis indicated the third formant most influenced fully rhotic predictions.
我们比较了不同声学表征和归一化对预测儿童发音/ɹ/的分类器的影响。对 350 名说话者的声形和梅尔频率倒谱系数(MFCC)表征进行了 z 标准化,或者相对于同一语料中的值,或者相对于典型 /ɹ/ 的年龄和性别数据。统计建模表明,年龄和性别标准化显著提高了分类器的性能。临床可解释声母的表现与 MFCC 相似,并得到了深度神经网络工程的认可,在个性化和复制后,测试参与者特定的平均 F1 分数 = 0.81(σx = 0.10,中间值 = 0.83,n = 48)。夏普利加法解释分析表明,第三声母对完全斜音预测的影响最大。
{"title":"Evaluating acoustic representations and normalization for rhoticity classification in children with speech sound disorders.","authors":"Nina R Benway, Jonathan L Preston, Asif Salekin, Elaine Hitchcock, Tara McAllister","doi":"10.1121/10.0024632","DOIUrl":"10.1121/10.0024632","url":null,"abstract":"<p><p>The effects of different acoustic representations and normalizations were compared for classifiers predicting perception of children's rhotic versus derhotic /ɹ/. Formant and Mel frequency cepstral coefficient (MFCC) representations for 350 speakers were z-standardized, either relative to values in the same utterance or age-and-sex data for typical /ɹ/. Statistical modeling indicated age-and-sex normalization significantly increased classifier performances. Clinically interpretable formants performed similarly to MFCCs and were endorsed for deep neural network engineering, achieving mean test-participant-specific F1-score = 0.81 after personalization and replication (σx = 0.10, med = 0.83, n = 48). Shapley additive explanations analysis indicated the third formant most influenced fully rhotic predictions.</p>","PeriodicalId":73538,"journal":{"name":"JASA express letters","volume":"4 2","pages":""},"PeriodicalIF":1.2,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11522988/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139652385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}