首页 > 最新文献

JASA express letters最新文献

英文 中文
Dolphin and porpoise detections by the F-POD are not independent: Implications for sympatric species monitoring. F-POD 对海豚和江豚的探测并非相互独立:对同域物种监测的影响。
Pub Date : 2024-03-01 DOI: 10.1121/10.0025304
Mel Cosentino, Cristina Marcolin, Emily T Griffiths, Estel Sánchez-Camí, Jakob Tougaard

The F-POD is designed for passive acoustic monitoring of odontocetes. The offline classifiers can identify and separate porpoise-like sounds from dolphin-like sounds. We show that these two classifiers are not working independently. Run together, virtually no detections of both species were reported within the same minute, whereas 10% of the detection positive minutes were reported positive for both species when the two classifiers were run sequentially. This has important implications for interpretation of data in areas containing both species groups, and we call for reporting all analysis details in such studies and for further description and analysis of the classifiers.

F-POD 专用于对齿鲸进行被动声学监测。离线分类器可以识别和区分江豚类声音和海豚类声音。我们的研究表明,这两种分类器并非独立工作。当两个分类器同时运行时,在同一分钟内几乎没有两个物种的检测报告,而当两个分类器依次运行时,10%的检测阳性分钟对两个物种都呈阳性报告。这对解释包含这两个物种群的地区的数据具有重要影响,我们呼吁在此类研究中报告所有分析细节,并对分类器进行进一步描述和分析。
{"title":"Dolphin and porpoise detections by the F-POD are not independent: Implications for sympatric species monitoring.","authors":"Mel Cosentino, Cristina Marcolin, Emily T Griffiths, Estel Sánchez-Camí, Jakob Tougaard","doi":"10.1121/10.0025304","DOIUrl":"10.1121/10.0025304","url":null,"abstract":"<p><p>The F-POD is designed for passive acoustic monitoring of odontocetes. The offline classifiers can identify and separate porpoise-like sounds from dolphin-like sounds. We show that these two classifiers are not working independently. Run together, virtually no detections of both species were reported within the same minute, whereas 10% of the detection positive minutes were reported positive for both species when the two classifiers were run sequentially. This has important implications for interpretation of data in areas containing both species groups, and we call for reporting all analysis details in such studies and for further description and analysis of the classifiers.</p>","PeriodicalId":73538,"journal":{"name":"JASA express letters","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140103010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A unified beamforming and source separation model for static and dynamic human-robot interaction. 用于静态和动态人机交互的统一波束成形和信号源分离模型。
Pub Date : 2024-03-01 DOI: 10.1121/10.0025238
Jorge Wuth, Rodrigo Mahu, Israel Cohen, Richard M Stern, Néstor Becerra Yoma

This paper presents a unified model for combining beamforming and blind source separation (BSS). The validity of the model's assumptions is confirmed by recovering target speech information in noise accurately using Oracle information. Using real static human-robot interaction (HRI) data, the proposed combination of BSS with the minimum-variance distortionless response beamformer provides a greater signal-to-noise ratio (SNR) than previous parallel and cascade systems that combine BSS and beamforming. In the difficult-to-model HRI dynamic environment, the system provides a SNR gain that was 2.8 dB greater than the results obtained with the cascade combination, where the parallel combination is infeasible.

本文提出了一种结合波束成形和盲源分离(BSS)的统一模型。通过使用 Oracle 信息在噪声中准确恢复目标语音信息,证实了模型假设的有效性。通过使用真实的静态人机交互(HRI)数据,与之前结合了盲源分离和波束成形的并行和级联系统相比,所提出的盲源分离与最小方差无失真响应波束成形器的结合提供了更高的信噪比(SNR)。在难以建模的 HRI 动态环境中,该系统提供的信噪比增益比级联组合获得的结果高出 2.8 dB,而级联组合的并行组合是不可行的。
{"title":"A unified beamforming and source separation model for static and dynamic human-robot interaction.","authors":"Jorge Wuth, Rodrigo Mahu, Israel Cohen, Richard M Stern, Néstor Becerra Yoma","doi":"10.1121/10.0025238","DOIUrl":"10.1121/10.0025238","url":null,"abstract":"<p><p>This paper presents a unified model for combining beamforming and blind source separation (BSS). The validity of the model's assumptions is confirmed by recovering target speech information in noise accurately using Oracle information. Using real static human-robot interaction (HRI) data, the proposed combination of BSS with the minimum-variance distortionless response beamformer provides a greater signal-to-noise ratio (SNR) than previous parallel and cascade systems that combine BSS and beamforming. In the difficult-to-model HRI dynamic environment, the system provides a SNR gain that was 2.8 dB greater than the results obtained with the cascade combination, where the parallel combination is infeasible.</p>","PeriodicalId":73538,"journal":{"name":"JASA express letters","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140029710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Modeling and simulation of underwater acoustic propagation through a random distribution of ice blocks. 通过随机分布的冰块进行水下声波传播的建模与模拟。
Pub Date : 2024-03-01 DOI: 10.1121/10.0025395
Nicholas P Chotiros, Sverre Holm

Acoustic propagation through a random distribution of 1 m ice cubes, from 100 to 1000 Hz, was simulated in a 3D finite element model. The effective sound speed and attenuation as functions of frequency were calculated from the simulated signals. Attempts were made to fit a number of models to the wave speed and attenuation, including single scattering, lossy water, and Biot approximations. An extended Biot model, developed for acoustic propagation in granular seabed sediments, was able to fit the simulation up to 300 Hz. Beyond this frequency, the simulation shows that multiple scattering dominates.

通过三维有限元模型模拟了 100 至 1000 Hz 声波在随机分布的 1 米冰块中的传播。根据模拟信号计算了有效声速和衰减与频率的函数关系。尝试对波速和衰减拟合多种模型,包括单散射、有损耗水和毕奥特近似模型。针对声波在颗粒状海底沉积物中传播而开发的扩展 Biot 模型能够拟合高达 300 Hz 的模拟信号。超过这一频率后,模拟显示多重散射占主导地位。
{"title":"Modeling and simulation of underwater acoustic propagation through a random distribution of ice blocks.","authors":"Nicholas P Chotiros, Sverre Holm","doi":"10.1121/10.0025395","DOIUrl":"https://doi.org/10.1121/10.0025395","url":null,"abstract":"<p><p>Acoustic propagation through a random distribution of 1 m ice cubes, from 100 to 1000 Hz, was simulated in a 3D finite element model. The effective sound speed and attenuation as functions of frequency were calculated from the simulated signals. Attempts were made to fit a number of models to the wave speed and attenuation, including single scattering, lossy water, and Biot approximations. An extended Biot model, developed for acoustic propagation in granular seabed sediments, was able to fit the simulation up to 300 Hz. Beyond this frequency, the simulation shows that multiple scattering dominates.</p>","PeriodicalId":73538,"journal":{"name":"JASA express letters","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140186476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Directional sound radiation from a rectangular panel and the high frequency limit. 矩形面板的定向声辐射和高频极限。
Pub Date : 2024-02-01 DOI: 10.1121/10.0024757
Lu Han, Ming Wu, Xiangning Liao, Xiaoyi Gao, Jun Yang, Xiaochun Yin

Directional sound radiation focuses sound in a specific direction and reduces sound radiation in other directions. This study uses a flat panel driven by an actuator array to realize two-dimensional directional sound radiation by the acoustic contrast control algorithm. The aliasing effect at higher frequencies is analyzed based on the modal vibration of the panel, and a method for estimating the high frequency limit is proposed. Actuator arrays with different parameters are simulated to verify the efficacy of the proposed method and compare the acoustic contrast response with the conventional loudspeaker arrays.

定向声辐射将声音集中在特定方向,并减少其他方向的声辐射。本研究利用致动器阵列驱动的平板,通过声学对比度控制算法实现二维定向声辐射。根据面板的模态振动分析了较高频率下的混叠效应,并提出了一种估算高频极限的方法。对不同参数的激励器阵列进行了仿真,以验证所提方法的有效性,并将声学对比响应与传统扬声器阵列进行了比较。
{"title":"Directional sound radiation from a rectangular panel and the high frequency limit.","authors":"Lu Han, Ming Wu, Xiangning Liao, Xiaoyi Gao, Jun Yang, Xiaochun Yin","doi":"10.1121/10.0024757","DOIUrl":"https://doi.org/10.1121/10.0024757","url":null,"abstract":"<p><p>Directional sound radiation focuses sound in a specific direction and reduces sound radiation in other directions. This study uses a flat panel driven by an actuator array to realize two-dimensional directional sound radiation by the acoustic contrast control algorithm. The aliasing effect at higher frequencies is analyzed based on the modal vibration of the panel, and a method for estimating the high frequency limit is proposed. Actuator arrays with different parameters are simulated to verify the efficacy of the proposed method and compare the acoustic contrast response with the conventional loudspeaker arrays.</p>","PeriodicalId":73538,"journal":{"name":"JASA express letters","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139682069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Tonguedness in speech: Lateral bias in lingual bracing. 语音中的舌位:舌撑的侧向偏差
Pub Date : 2024-02-01 DOI: 10.1121/10.0024756
Yadong Liu, Jahurul Islam, Kate Radford, Oksana Tkachman, Bryan Gick

This study examines the lateral biases in tongue movements during speech production. It builds on previous research on asymmetry in various aspects of human biology and behavior, focusing on the tongue's asymmetric behavior during speech. The findings reveal that speakers have a pronounced preference toward one side of the tongue during lateral releases with a majority displaying the left-side bias. This lateral bias in tongue speech movements is referred to as tonguedness. This research contributes to our understanding of the articulatory mechanisms involved in tongue movements and underscores the importance of considering lateral biases in speech production research.

本研究探讨了说话时舌头运动的横向偏差。它建立在以往对人类生物学和行为学各方面的不对称研究的基础上,重点研究了说话时舌头的不对称行为。研究结果表明,说话者在侧向释放时明显偏向舌头的一侧,大多数人表现出左侧偏向。这种舌头在说话时的侧向偏向被称为 "tonguedness"。这项研究有助于我们理解舌头运动中的发音机制,并强调了在语音生成研究中考虑舌头侧向偏差的重要性。
{"title":"Tonguedness in speech: Lateral bias in lingual bracing.","authors":"Yadong Liu, Jahurul Islam, Kate Radford, Oksana Tkachman, Bryan Gick","doi":"10.1121/10.0024756","DOIUrl":"10.1121/10.0024756","url":null,"abstract":"<p><p>This study examines the lateral biases in tongue movements during speech production. It builds on previous research on asymmetry in various aspects of human biology and behavior, focusing on the tongue's asymmetric behavior during speech. The findings reveal that speakers have a pronounced preference toward one side of the tongue during lateral releases with a majority displaying the left-side bias. This lateral bias in tongue speech movements is referred to as tonguedness. This research contributes to our understanding of the articulatory mechanisms involved in tongue movements and underscores the importance of considering lateral biases in speech production research.</p>","PeriodicalId":73538,"journal":{"name":"JASA express letters","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10848656/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139718105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Ecological sound loudness in environmental sound representations. 环境声音表征中的生态声音响度。
Pub Date : 2024-02-01 DOI: 10.1121/10.0024995
Urszula Oszczapinska, Laurie M Heller, Seojun Jang, Bridget Nance

Listeners recognizing environmental sounds must contend with variations in level due to the source level and the environment. Nonetheless, variations in level disrupt short-term sound recognition [Susini, Houix, Seropian, and Lemaitre (2019). J. Acoust. Soc. Am. 146(2), EL172-EL176] suggesting that loudness is encoded. We asked whether the experimental custom of setting sounds to equal levels disrupts long-term recognition, especially if it creates a mismatch with ecological loudness. Environmental sounds were played at equalized or ecological levels. Although recognition improved with increased loudness and familiarity, this relationship was unaffected by equalization or real-life experience with the source. However, sound pleasantness was altered by deviations from the ecological level.

听者在识别环境声音时必须应对声源声级和环境造成的声级变化。然而,电平变化会干扰短期声音识别[Susini、Houix、Seropian 和 Lemaitre (2019)。J. Acoust.146(2),EL172-EL176],这表明响度是可以编码的。我们想知道,将声音设置为相等音量的实验习惯是否会破坏长期识别,尤其是在与生态响度不匹配的情况下。环境声音以均衡或生态音量播放。虽然识别能力随着响度和熟悉程度的增加而提高,但这种关系并不受均衡化或与声源有关的实际生活经验的影响。然而,声音的悦耳程度会因偏离生态音量而改变。
{"title":"Ecological sound loudness in environmental sound representations.","authors":"Urszula Oszczapinska, Laurie M Heller, Seojun Jang, Bridget Nance","doi":"10.1121/10.0024995","DOIUrl":"10.1121/10.0024995","url":null,"abstract":"<p><p>Listeners recognizing environmental sounds must contend with variations in level due to the source level and the environment. Nonetheless, variations in level disrupt short-term sound recognition [Susini, Houix, Seropian, and Lemaitre (2019). J. Acoust. Soc. Am. 146(2), EL172-EL176] suggesting that loudness is encoded. We asked whether the experimental custom of setting sounds to equal levels disrupts long-term recognition, especially if it creates a mismatch with ecological loudness. Environmental sounds were played at equalized or ecological levels. Although recognition improved with increased loudness and familiarity, this relationship was unaffected by equalization or real-life experience with the source. However, sound pleasantness was altered by deviations from the ecological level.</p>","PeriodicalId":73538,"journal":{"name":"JASA express letters","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139907086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Scattering measurements of rocky seafloors using a split-beam echosounder. 使用分波束回声测深仪测量岩石海床的散射。
Pub Date : 2024-02-01 DOI: 10.1121/10.0024755
Jen A Gruber, Derek R Olson

Scattering measurements were made off the coast of Pacific Grove, CA at 200 kHz, in an exposed fractured granite seafloor. Using inertial sensors and a split-beam transducer, data were processed to obtain a range of grazing angles corresponding to scattering strength, and signal processing techniques were used to extract the relevant portion of each ping. The ensonified angular width from a circular aperture is presented. Scattering strength measurements using different assumptions regarding the grazing angle were compared. The empirical Lommel-Seeliger model provided a good fit to measured data with a parameter of -18.4 dB.

在加利福尼亚州太平洋格罗夫海岸外的裸露断裂花岗岩海底,以 200 千赫的频率进行了散射测量。利用惯性传感器和分波束换能器对数据进行处理,以获得与散射强度相对应的掠角范围,并利用信号处理技术提取每个坪的相关部分。图中给出了来自圆形孔径的激波角宽度。对采用不同掠射角假设的散射强度测量结果进行了比较。Lommel-Seeliger 经验模型与参数为 -18.4 dB 的测量数据拟合良好。
{"title":"Scattering measurements of rocky seafloors using a split-beam echosounder.","authors":"Jen A Gruber, Derek R Olson","doi":"10.1121/10.0024755","DOIUrl":"https://doi.org/10.1121/10.0024755","url":null,"abstract":"<p><p>Scattering measurements were made off the coast of Pacific Grove, CA at 200 kHz, in an exposed fractured granite seafloor. Using inertial sensors and a split-beam transducer, data were processed to obtain a range of grazing angles corresponding to scattering strength, and signal processing techniques were used to extract the relevant portion of each ping. The ensonified angular width from a circular aperture is presented. Scattering strength measurements using different assumptions regarding the grazing angle were compared. The empirical Lommel-Seeliger model provided a good fit to measured data with a parameter of -18.4 dB.</p>","PeriodicalId":73538,"journal":{"name":"JASA express letters","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139718104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Platform motion estimation in multi-band synthetic aperture sonar with coupled variational autoencoders. 利用耦合变异自动编码器进行多波段合成孔径声纳中的平台运动估计。
Pub Date : 2024-02-01 DOI: 10.1121/10.0024998
Angeliki Xenaki, Yan Pailhas, Alessandro Monti

Coherent processing in synthetic aperture sonar (SAS) requires platform motion estimation and compensation with sub-wavelength accuracy for high-resolution imaging. Micronavigation, i.e., through-the-sensor platform motion estimation, is essential when positioning information from navigational instruments is absent or inadequately accurate. A machine learning method based on variational Bayesian inference has been proposed for unsupervised data-driven micronavigation. Herein, the multiple-input multiple-output arrangement of a multi-band SAS system is exploited and combined with a hierarchical variational inference scheme, which self-supervises the learning of platform motion and results in improved micronavigation accuracy.

合成孔径声纳(SAS)的相干处理需要亚波长精度的平台运动估计和补偿,以实现高分辨率成像。在没有导航仪器提供定位信息或定位信息不够准确的情况下,微导航(即通过传感器进行平台运动估计)至关重要。有人提出了一种基于变异贝叶斯推理的机器学习方法,用于无监督数据驱动的微导航。在此,利用多波段 SAS 系统的多输入多输出安排,并结合分层变异推理方案,对平台运动进行自我监督学习,从而提高微导航精度。
{"title":"Platform motion estimation in multi-band synthetic aperture sonar with coupled variational autoencoders.","authors":"Angeliki Xenaki, Yan Pailhas, Alessandro Monti","doi":"10.1121/10.0024998","DOIUrl":"https://doi.org/10.1121/10.0024998","url":null,"abstract":"<p><p>Coherent processing in synthetic aperture sonar (SAS) requires platform motion estimation and compensation with sub-wavelength accuracy for high-resolution imaging. Micronavigation, i.e., through-the-sensor platform motion estimation, is essential when positioning information from navigational instruments is absent or inadequately accurate. A machine learning method based on variational Bayesian inference has been proposed for unsupervised data-driven micronavigation. Herein, the multiple-input multiple-output arrangement of a multi-band SAS system is exploited and combined with a hierarchical variational inference scheme, which self-supervises the learning of platform motion and results in improved micronavigation accuracy.</p>","PeriodicalId":73538,"journal":{"name":"JASA express letters","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139907087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluating OpenAI's Whisper ASR: Performance analysis across diverse accents and speaker traits. 评估 OpenAI 的耳语 ASR:不同口音和说话者特征的性能分析。
Pub Date : 2024-02-01 DOI: 10.1121/10.0024876
Calbert Graham, Nathan Roll

This study investigates Whisper's automatic speech recognition (ASR) system performance across diverse native and non-native English accents. Results reveal superior recognition in American compared to British and Australian English accents with similar performance in Canadian English. Overall, native English accents demonstrate higher accuracy than non-native accents. Exploring connections between speaker traits [sex, native language (L1) typology, and second language (L2) proficiency] and word error rate uncovers notable associations. Furthermore, Whisper exhibits enhanced performance in read speech over conversational speech with modifications based on speaker gender. The implications of these findings are discussed.

本研究调查了 Whisper 的自动语音识别(ASR)系统在不同母语和非母语英语口音中的表现。结果显示,与英式英语和澳大利亚英语口音相比,美式英语口音的识别率更高,而加拿大英语口音的识别率与之相近。总体而言,英语母语口音的准确率高于非母语口音。探索说话者特征(性别、母语(L1)类型和第二语言(L2)熟练程度)与单词错误率之间的联系发现了显著的关联。此外,根据说话者的性别,Whisper 在阅读语音中的表现要优于会话语音。本文讨论了这些发现的意义。
{"title":"Evaluating OpenAI's Whisper ASR: Performance analysis across diverse accents and speaker traits.","authors":"Calbert Graham, Nathan Roll","doi":"10.1121/10.0024876","DOIUrl":"10.1121/10.0024876","url":null,"abstract":"<p><p>This study investigates Whisper's automatic speech recognition (ASR) system performance across diverse native and non-native English accents. Results reveal superior recognition in American compared to British and Australian English accents with similar performance in Canadian English. Overall, native English accents demonstrate higher accuracy than non-native accents. Exploring connections between speaker traits [sex, native language (L1) typology, and second language (L2) proficiency] and word error rate uncovers notable associations. Furthermore, Whisper exhibits enhanced performance in read speech over conversational speech with modifications based on speaker gender. The implications of these findings are discussed.</p>","PeriodicalId":73538,"journal":{"name":"JASA express letters","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139934516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluating acoustic representations and normalization for rhoticity classification in children with speech sound disorders. 评估声学表征和归一化对语言发音障碍儿童的翘舌音分类。
Pub Date : 2024-02-01 DOI: 10.1121/10.0024632
Nina R Benway, Jonathan L Preston, Asif Salekin, Elaine Hitchcock, Tara McAllister

The effects of different acoustic representations and normalizations were compared for classifiers predicting perception of children's rhotic versus derhotic /ɹ/. Formant and Mel frequency cepstral coefficient (MFCC) representations for 350 speakers were z-standardized, either relative to values in the same utterance or age-and-sex data for typical /ɹ/. Statistical modeling indicated age-and-sex normalization significantly increased classifier performances. Clinically interpretable formants performed similarly to MFCCs and were endorsed for deep neural network engineering, achieving mean test-participant-specific F1-score = 0.81 after personalization and replication (σx = 0.10, med = 0.83, n = 48). Shapley additive explanations analysis indicated the third formant most influenced fully rhotic predictions.

我们比较了不同声学表征和归一化对预测儿童发音/ɹ/的分类器的影响。对 350 名说话者的声形和梅尔频率倒谱系数(MFCC)表征进行了 z 标准化,或者相对于同一语料中的值,或者相对于典型 /ɹ/ 的年龄和性别数据。统计建模表明,年龄和性别标准化显著提高了分类器的性能。临床可解释声母的表现与 MFCC 相似,并得到了深度神经网络工程的认可,在个性化和复制后,测试参与者特定的平均 F1 分数 = 0.81(σx = 0.10,中间值 = 0.83,n = 48)。夏普利加法解释分析表明,第三声母对完全斜音预测的影响最大。
{"title":"Evaluating acoustic representations and normalization for rhoticity classification in children with speech sound disorders.","authors":"Nina R Benway, Jonathan L Preston, Asif Salekin, Elaine Hitchcock, Tara McAllister","doi":"10.1121/10.0024632","DOIUrl":"10.1121/10.0024632","url":null,"abstract":"<p><p>The effects of different acoustic representations and normalizations were compared for classifiers predicting perception of children's rhotic versus derhotic /ɹ/. Formant and Mel frequency cepstral coefficient (MFCC) representations for 350 speakers were z-standardized, either relative to values in the same utterance or age-and-sex data for typical /ɹ/. Statistical modeling indicated age-and-sex normalization significantly increased classifier performances. Clinically interpretable formants performed similarly to MFCCs and were endorsed for deep neural network engineering, achieving mean test-participant-specific F1-score = 0.81 after personalization and replication (σx = 0.10, med = 0.83, n = 48). Shapley additive explanations analysis indicated the third formant most influenced fully rhotic predictions.</p>","PeriodicalId":73538,"journal":{"name":"JASA express letters","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139652385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
JASA express letters
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1