Usefulness of Automatic Speech Recognition Assessment of Children With Speech Sound Disorders: Validation Study.

IF 6 2区医学 Q1 HEALTH CARE SCIENCES & SERVICES Journal of Medical Internet Research Pub Date : 2025-01-14 DOI:10.2196/60520

Do Hyung Kim, Joo Won Jeong, Dayoung Kang, Taekyung Ahn, Yeonjung Hong, Younggon Im, Jaewon Kim, Min Jung Kim, Dae-Hyun Jang

{"title":"Usefulness of Automatic Speech Recognition Assessment of Children With Speech Sound Disorders: Validation Study.","authors":"Do Hyung Kim, Joo Won Jeong, Dayoung Kang, Taekyung Ahn, Yeonjung Hong, Younggon Im, Jaewon Kim, Min Jung Kim, Dae-Hyun Jang","doi":"10.2196/60520","DOIUrl":null,"url":null,"abstract":"Background: Speech sound disorders (SSDs) are common communication challenges in children, typically assessed by speech-language pathologists (SLPs) using standardized tools. However, traditional evaluation methods are time-intensive and prone to variability, raising concerns about reliability.Objective: This study aimed to compare the evaluation outcomes of SLPs and an automatic speech recognition (ASR) model using two standardized SSD assessments in South Korea, evaluating the ASR model's performance.Methods: A fine-tuned wav2vec 2.0 XLS-R model, pretrained on 436,000 hours of adult voice data spanning 128 languages, was used. The model was further trained on 93.6 minutes of children's voices with articulation errors to improve error detection. Participants included children referred to the Department of Rehabilitation Medicine at a general hospital in Incheon, South Korea, from August 19, 2022, to June 14, 2023. Two standardized assessments-the Assessment of Phonology and Articulation for Children (APAC) and the Urimal Test of Articulation and Phonology (U-TAP)-were used, with ASR transcriptions compared to SLP transcriptions.Results: This study included 30 children aged 3-7 years who were suspected of having SSDs. The phoneme error rates for the APAC and U-TAP were 8.42% (457/5430) and 8.91% (402/4514), respectively, indicating discrepancies between the ASR model and SLP transcriptions across all phonemes. Consonant error rates were 10.58% (327/3090) and 11.86% (331/2790) for the APAC and U-TAP, respectively. On average, there were 2.60 (SD 1.54) and 3.07 (SD 1.39) discrepancies per child for correctly produced phonemes, and 7.87 (SD 3.66) and 7.57 (SD 4.85) discrepancies per child for incorrectly produced phonemes, based on the APAC and U-TAP, respectively. The correlation between SLPs and the ASR model in terms of the percentage of consonants correct was excellent, with an intraclass correlation coefficient of 0.984 (95% CI 0.953-0.994) and 0.978 (95% CI 0.941-0.990) for the APAC and UTAP, respectively. The z scores between SLPs and ASR showed more pronounced differences with the APAC than the U-TAP, with 8 individuals showing discrepancies in the APAC compared to 2 in the U-TAP.Conclusions: The results demonstrate the potential of the ASR model in assessing children with SSDs. However, its performance varied based on phoneme or word characteristics, highlighting areas for refinement. Future research should include more diverse speech samples, clinical settings, and speech data to strengthen the model's refinement and ensure broader clinical applicability.","PeriodicalId":16337,"journal":{"name":"Journal of Medical Internet Research","volume":" ","pages":"e60520"},"PeriodicalIF":6.0000,"publicationDate":"2025-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11775490/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Medical Internet Research","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2196/60520","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Speech sound disorders (SSDs) are common communication challenges in children, typically assessed by speech-language pathologists (SLPs) using standardized tools. However, traditional evaluation methods are time-intensive and prone to variability, raising concerns about reliability.

Objective: This study aimed to compare the evaluation outcomes of SLPs and an automatic speech recognition (ASR) model using two standardized SSD assessments in South Korea, evaluating the ASR model's performance.

Methods: A fine-tuned wav2vec 2.0 XLS-R model, pretrained on 436,000 hours of adult voice data spanning 128 languages, was used. The model was further trained on 93.6 minutes of children's voices with articulation errors to improve error detection. Participants included children referred to the Department of Rehabilitation Medicine at a general hospital in Incheon, South Korea, from August 19, 2022, to June 14, 2023. Two standardized assessments-the Assessment of Phonology and Articulation for Children (APAC) and the Urimal Test of Articulation and Phonology (U-TAP)-were used, with ASR transcriptions compared to SLP transcriptions.

Results: This study included 30 children aged 3-7 years who were suspected of having SSDs. The phoneme error rates for the APAC and U-TAP were 8.42% (457/5430) and 8.91% (402/4514), respectively, indicating discrepancies between the ASR model and SLP transcriptions across all phonemes. Consonant error rates were 10.58% (327/3090) and 11.86% (331/2790) for the APAC and U-TAP, respectively. On average, there were 2.60 (SD 1.54) and 3.07 (SD 1.39) discrepancies per child for correctly produced phonemes, and 7.87 (SD 3.66) and 7.57 (SD 4.85) discrepancies per child for incorrectly produced phonemes, based on the APAC and U-TAP, respectively. The correlation between SLPs and the ASR model in terms of the percentage of consonants correct was excellent, with an intraclass correlation coefficient of 0.984 (95% CI 0.953-0.994) and 0.978 (95% CI 0.941-0.990) for the APAC and UTAP, respectively. The z scores between SLPs and ASR showed more pronounced differences with the APAC than the U-TAP, with 8 individuals showing discrepancies in the APAC compared to 2 in the U-TAP.

Conclusions: The results demonstrate the potential of the ASR model in assessing children with SSDs. However, its performance varied based on phoneme or word characteristics, highlighting areas for refinement. Future research should include more diverse speech samples, clinical settings, and speech data to strengthen the model's refinement and ensure broader clinical applicability.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

自动语音识别评估对言语发音障碍儿童的实用性：验证研究

背景：言语发音障碍（SSD）是儿童常见的交流障碍，通常由言语病理学家（SLP）使用标准化工具进行评估。然而，传统的评估方法耗费大量时间，而且容易产生变异，从而引发对可靠性的担忧：本研究旨在比较韩国语言病理学家和自动语音识别（ASR）模型使用两种标准化 SSD 评估方法的评估结果，评估 ASR 模型的性能：研究使用了一个经过微调的 wav2vec 2.0 XLS-R 模型，该模型在横跨 128 种语言的 43.6 万小时成人语音数据上进行了预训练。该模型在 93.6 分钟有发音错误的儿童语音上进行了进一步训练，以提高错误检测能力。研究对象包括 2022 年 8 月 19 日至 2023 年 6 月 14 日期间转诊到韩国仁川一家综合医院康复医学科的儿童。研究采用了两种标准化评估方法--儿童语音和发音评估（APAC）以及发音和语音测试（U-TAP），并将 ASR 转录与 SLP 转录进行了比较：这项研究包括 30 名 3-7 岁的儿童，他们被怀疑患有 SSD。APAC 和 U-TAP 的音素错误率（PER）分别为 8.42% 和 8.91%，表明 ASR 模型和 SLP 转录在所有音素上都存在差异。APAC 和 U-TAP 的辅音错误率分别为 10.58% 和 11.86%。平均而言，根据 APAC 和 U-TAP，每个儿童正确发音的音素差异分别为 2.60 和 3.07，错误发音的音素差异分别为 7.87 和 7.57。在辅音正确率 (PCC) 方面，SLP 与 ASR 模型之间的相关性非常好，APAC 和 UTAP 的类内相关系数 (ICC) 分别为 0.984（95% CI：.953-.994）和 0.978（95% CI：.941-.990）。SLPs和ASR之间的Z-分数在APAC中比在UTAP中显示出更大的差异，在APAC中有8人显示出差异，而在UTAP中有2人显示出差异：结果表明 ASR 模型在评估 SSD 儿童方面具有潜力。结论：研究结果表明了 ASR 模型在评估 SSD 儿童方面的潜力，但其性能因音素或单词特征的不同而有所差异，这也凸显了需要改进的地方。未来的研究应包括更多样化的语音样本、临床环境和语音数据，以加强模型的完善，确保更广泛的临床适用性：

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Journal of Medical Internet Research 医学-卫生保健

CiteScore

14.40

自引率

5.40%

发文量

654

审稿时长

1 months

期刊介绍： The Journal of Medical Internet Research (JMIR) is a highly respected publication in the field of health informatics and health services. With a founding date in 1999, JMIR has been a pioneer in the field for over two decades. As a leader in the industry, the journal focuses on digital health, data science, health informatics, and emerging technologies for health, medicine, and biomedical research. It is recognized as a top publication in these disciplines, ranking in the first quartile (Q1) by Impact Factor. Notably, JMIR holds the prestigious position of being ranked #1 on Google Scholar within the "Medical Informatics" discipline.