Usefulness of Automatic Speech Recognition Assessment of Children with Speech Sound Disorders: A Validation Study.

IF 5.8 2区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Journal of Medical Internet Research Pub Date : 2024-11-17 DOI:10.2196/60520
Do Hyung Kim, Joo Won Jeong, Dayoung Kang, Taekyung Ahn, Yeonjung Hong, Younggon Im, Jaewon Kim, Min Jung Kim, Dae-Hyun Jang
{"title":"Usefulness of Automatic Speech Recognition Assessment of Children with Speech Sound Disorders: A Validation Study.","authors":"Do Hyung Kim, Joo Won Jeong, Dayoung Kang, Taekyung Ahn, Yeonjung Hong, Younggon Im, Jaewon Kim, Min Jung Kim, Dae-Hyun Jang","doi":"10.2196/60520","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Speech sound disorders (SSDs) are common communication challenges in children, typically assessed by speech-language pathologists (SLPs) using standardized tools. However, traditional evaluation methods are time-intensive and prone to variability, raising concerns about reliability.</p><p><strong>Objective: </strong>This study aimed to compare the evaluation outcomes of SLPs and an automatic speech recognition (ASR) model using two standardized SSD assessments in Korea, evaluating the ASR model's performance.</p><p><strong>Methods: </strong>A fine-tuned wav2vec 2.0 XLS-R model, pretrained on 436,000 hours of adult voice data spanning 128 languages, was utilized. The model was further trained on 93.6 minutes of children's voices with articulation errors to improve error detection. Participants included children referred to the Department of Rehabilitation Medicine at a general hospital in Incheon, South Korea, from August 19, 2022, to June 14, 2023. Two standardized assessments-the Assessment of Phonology and Articulation for Children (APAC) and the Urimal Test of Articulation and Phonology (U-TAP)-were employed, with ASR transcriptions compared to SLP transcriptions.</p><p><strong>Results: </strong>This study included 30 children aged 3-7 years of age, who were suspected of having SSDs. The phoneme error rates (PER) for the APAC and U-TAP were 8.42% and 8.91%, respectively, indicating discrepancies between the ASR model and SLP transcriptions across all phonemes. Consonant error rates were 10.58% and 11.86% for the APAC and U-TAP, respectively. On average, there were 2.60 and 3.07 discrepancies per child for correctly produced phonemes, and 7.87 and 7.57 discrepancies per child for incorrectly produced phonemes, based on the APAC and U-TAP, respectively. The correlation between SLPs and the ASR model in terms of the percentage of consonants correct (PCC) was excellent, with an intraclass correlation coefficient (ICC) of 0.984 (95% CI: .953-.994) and 0.978 (95% CI: .941-.990) for the APAC and UTAP, respectively. Z-scores between SLPs and ASR showed more significant differences with the APAC than the U-TAP, with 8 individuals showing discrepancies in the APAC compared to 2 in the U-TAP.</p><p><strong>Conclusions: </strong>The results demonstrate the potential of the ASR model in assessing children with SSDs. However, its performance varied based on phoneme or word characteristics, highlighting areas for refinement. Future research should include more diverse speech samples, clinical settings, and speech data to strengthen the model's refinement and ensure broader clinical applicability.</p><p><strong>Clinicaltrial: </strong></p>","PeriodicalId":16337,"journal":{"name":"Journal of Medical Internet Research","volume":" ","pages":""},"PeriodicalIF":5.8000,"publicationDate":"2024-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Medical Internet Research","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2196/60520","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Speech sound disorders (SSDs) are common communication challenges in children, typically assessed by speech-language pathologists (SLPs) using standardized tools. However, traditional evaluation methods are time-intensive and prone to variability, raising concerns about reliability.

Objective: This study aimed to compare the evaluation outcomes of SLPs and an automatic speech recognition (ASR) model using two standardized SSD assessments in Korea, evaluating the ASR model's performance.

Methods: A fine-tuned wav2vec 2.0 XLS-R model, pretrained on 436,000 hours of adult voice data spanning 128 languages, was utilized. The model was further trained on 93.6 minutes of children's voices with articulation errors to improve error detection. Participants included children referred to the Department of Rehabilitation Medicine at a general hospital in Incheon, South Korea, from August 19, 2022, to June 14, 2023. Two standardized assessments-the Assessment of Phonology and Articulation for Children (APAC) and the Urimal Test of Articulation and Phonology (U-TAP)-were employed, with ASR transcriptions compared to SLP transcriptions.

Results: This study included 30 children aged 3-7 years of age, who were suspected of having SSDs. The phoneme error rates (PER) for the APAC and U-TAP were 8.42% and 8.91%, respectively, indicating discrepancies between the ASR model and SLP transcriptions across all phonemes. Consonant error rates were 10.58% and 11.86% for the APAC and U-TAP, respectively. On average, there were 2.60 and 3.07 discrepancies per child for correctly produced phonemes, and 7.87 and 7.57 discrepancies per child for incorrectly produced phonemes, based on the APAC and U-TAP, respectively. The correlation between SLPs and the ASR model in terms of the percentage of consonants correct (PCC) was excellent, with an intraclass correlation coefficient (ICC) of 0.984 (95% CI: .953-.994) and 0.978 (95% CI: .941-.990) for the APAC and UTAP, respectively. Z-scores between SLPs and ASR showed more significant differences with the APAC than the U-TAP, with 8 individuals showing discrepancies in the APAC compared to 2 in the U-TAP.

Conclusions: The results demonstrate the potential of the ASR model in assessing children with SSDs. However, its performance varied based on phoneme or word characteristics, highlighting areas for refinement. Future research should include more diverse speech samples, clinical settings, and speech data to strengthen the model's refinement and ensure broader clinical applicability.

Clinicaltrial:

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
自动语音识别评估对言语发音障碍儿童的实用性:验证研究
背景:言语发音障碍(SSD)是儿童常见的交流障碍,通常由言语病理学家(SLP)使用标准化工具进行评估。然而,传统的评估方法耗费大量时间,而且容易产生变异,从而引发对可靠性的担忧:本研究旨在比较韩国语言病理学家和自动语音识别(ASR)模型使用两种标准化 SSD 评估方法的评估结果,评估 ASR 模型的性能:研究使用了一个经过微调的 wav2vec 2.0 XLS-R 模型,该模型在横跨 128 种语言的 43.6 万小时成人语音数据上进行了预训练。该模型在 93.6 分钟有发音错误的儿童语音上进行了进一步训练,以提高错误检测能力。研究对象包括 2022 年 8 月 19 日至 2023 年 6 月 14 日期间转诊到韩国仁川一家综合医院康复医学科的儿童。研究采用了两种标准化评估方法--儿童语音和发音评估(APAC)以及发音和语音测试(U-TAP),并将 ASR 转录与 SLP 转录进行了比较:这项研究包括 30 名 3-7 岁的儿童,他们被怀疑患有 SSD。APAC 和 U-TAP 的音素错误率(PER)分别为 8.42% 和 8.91%,表明 ASR 模型和 SLP 转录在所有音素上都存在差异。APAC 和 U-TAP 的辅音错误率分别为 10.58% 和 11.86%。平均而言,根据 APAC 和 U-TAP,每个儿童正确发音的音素差异分别为 2.60 和 3.07,错误发音的音素差异分别为 7.87 和 7.57。在辅音正确率 (PCC) 方面,SLP 与 ASR 模型之间的相关性非常好,APAC 和 UTAP 的类内相关系数 (ICC) 分别为 0.984(95% CI:.953-.994)和 0.978(95% CI:.941-.990)。SLPs和ASR之间的Z-分数在APAC中比在UTAP中显示出更大的差异,在APAC中有8人显示出差异,而在UTAP中有2人显示出差异:结果表明 ASR 模型在评估 SSD 儿童方面具有潜力。结论:研究结果表明了 ASR 模型在评估 SSD 儿童方面的潜力,但其性能因音素或单词特征的不同而有所差异,这也凸显了需要改进的地方。未来的研究应包括更多样化的语音样本、临床环境和语音数据,以加强模型的完善,确保更广泛的临床适用性:
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
14.40
自引率
5.40%
发文量
654
审稿时长
1 months
期刊介绍: The Journal of Medical Internet Research (JMIR) is a highly respected publication in the field of health informatics and health services. With a founding date in 1999, JMIR has been a pioneer in the field for over two decades. As a leader in the industry, the journal focuses on digital health, data science, health informatics, and emerging technologies for health, medicine, and biomedical research. It is recognized as a top publication in these disciplines, ranking in the first quartile (Q1) by Impact Factor. Notably, JMIR holds the prestigious position of being ranked #1 on Google Scholar within the "Medical Informatics" discipline.
期刊最新文献
Recruitment for Voluntary Video and Mobile HIV Testing on Social Media Platforms During the COVID-19 Pandemic: Cross-Sectional Study. A Blended Learning Course on the Diagnostics of Mental Disorders: Multicenter Cluster Randomized Noninferiority Trial. Gaming-Based Tele-Exercise Program to Improve Physical Function in Frail Older Adults: Feasibility Randomized Controlled Trial. Preferences of Individuals With Obesity for Online Medical Consultation in Different Demand Scenarios: Discrete Choice Experiments. The Views and Experiences of Integrated Care System Commissioners About the Adoption and Implementation of Virtual Wards in England: Qualitative Exploration Study.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1