Evaluating acoustic representations and normalization for rhoticity classification in children with speech sound disorders.

IF 1.4 Q3 ACOUSTICS JASA express letters Pub Date : 2024-02-01 DOI:10.1121/10.0024632

Nina R Benway, Jonathan L Preston, Asif Salekin, Elaine Hitchcock, Tara McAllister

引用次数: 0

Abstract

The effects of different acoustic representations and normalizations were compared for classifiers predicting perception of children's rhotic versus derhotic /ɹ/. Formant and Mel frequency cepstral coefficient (MFCC) representations for 350 speakers were z-standardized, either relative to values in the same utterance or age-and-sex data for typical /ɹ/. Statistical modeling indicated age-and-sex normalization significantly increased classifier performances. Clinically interpretable formants performed similarly to MFCCs and were endorsed for deep neural network engineering, achieving mean test-participant-specific F1-score = 0.81 after personalization and replication (σx = 0.10, med = 0.83, n = 48). Shapley additive explanations analysis indicated the third formant most influenced fully rhotic predictions.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

评估声学表征和归一化对语言发音障碍儿童的翘舌音分类。

我们比较了不同声学表征和归一化对预测儿童发音/ɹ/的分类器的影响。对 350 名说话者的声形和梅尔频率倒谱系数（MFCC）表征进行了 z 标准化，或者相对于同一语料中的值，或者相对于典型 /ɹ/ 的年龄和性别数据。统计建模表明，年龄和性别标准化显著提高了分类器的性能。临床可解释声母的表现与 MFCC 相似，并得到了深度神经网络工程的认可，在个性化和复制后，测试参与者特定的平均 F1 分数 = 0.81（σx = 0.10，中间值 = 0.83，n = 48）。夏普利加法解释分析表明，第三声母对完全斜音预测的影响最大。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

JASA express letters

CiteScore

1.70

自引率

0.00%

发文量