Jimmy Tobin, Phillip Nelson, Bob MacDonald, Rus Heywood, Richard Cave, Katie Seaver, Antoine Desjardins, Pan-Pan Jiang, Jordan R Green
{"title":"Automatic Speech Recognition of Conversational Speech in Individuals With Disordered Speech.","authors":"Jimmy Tobin, Phillip Nelson, Bob MacDonald, Rus Heywood, Richard Cave, Katie Seaver, Antoine Desjardins, Pan-Pan Jiang, Jordan R Green","doi":"10.1044/2024_JSLHR-24-00045","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>This study examines the effectiveness of automatic speech recognition (ASR) for individuals with speech disorders, addressing the gap in performance between read and conversational ASR. We analyze the factors influencing this disparity and the effect of speech mode-specific training on ASR accuracy.</p><p><strong>Method: </strong>Recordings of read and conversational speech from 27 individuals with various speech disorders were analyzed using both (a) one speaker-independent ASR system trained and optimized for typical speech and (b) multiple ASR models that were personalized to the speech of the participants with disordered speech. Word error rates were calculated for each speech model, read versus conversational, and subject. Linear mixed-effects models were used to assess the impact of speech mode and disorder severity on ASR accuracy. We investigated nine variables, classified as technical, linguistic, or speech impairment factors, for their potential influence on the performance gap.</p><p><strong>Results: </strong>We found a significant performance gap between read and conversational speech in both personalized and unadapted ASR models. Speech impairment severity notably impacted recognition accuracy in unadapted models for both speech modes and in personalized models for read speech. Linguistic attributes of utterances were the most influential on accuracy, though atypical speech characteristics also played a role. Including conversational speech samples in model training notably improved recognition accuracy.</p><p><strong>Conclusions: </strong>We observed a significant performance gap in ASR accuracy between read and conversational speech for individuals with speech disorders. This gap was largely due to the linguistic complexity and unique characteristics of speech disorders in conversational speech. Training personalized ASR models using conversational speech significantly improved recognition accuracy, demonstrating the importance of domain-specific training and highlighting the need for further research into ASR systems capable of handling disordered conversational speech effectively.</p>","PeriodicalId":51254,"journal":{"name":"Journal of Speech Language and Hearing Research","volume":" ","pages":"4176-4185"},"PeriodicalIF":2.2000,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Speech Language and Hearing Research","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1044/2024_JSLHR-24-00045","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/7/4 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"AUDIOLOGY & SPEECH-LANGUAGE PATHOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Purpose: This study examines the effectiveness of automatic speech recognition (ASR) for individuals with speech disorders, addressing the gap in performance between read and conversational ASR. We analyze the factors influencing this disparity and the effect of speech mode-specific training on ASR accuracy.
Method: Recordings of read and conversational speech from 27 individuals with various speech disorders were analyzed using both (a) one speaker-independent ASR system trained and optimized for typical speech and (b) multiple ASR models that were personalized to the speech of the participants with disordered speech. Word error rates were calculated for each speech model, read versus conversational, and subject. Linear mixed-effects models were used to assess the impact of speech mode and disorder severity on ASR accuracy. We investigated nine variables, classified as technical, linguistic, or speech impairment factors, for their potential influence on the performance gap.
Results: We found a significant performance gap between read and conversational speech in both personalized and unadapted ASR models. Speech impairment severity notably impacted recognition accuracy in unadapted models for both speech modes and in personalized models for read speech. Linguistic attributes of utterances were the most influential on accuracy, though atypical speech characteristics also played a role. Including conversational speech samples in model training notably improved recognition accuracy.
Conclusions: We observed a significant performance gap in ASR accuracy between read and conversational speech for individuals with speech disorders. This gap was largely due to the linguistic complexity and unique characteristics of speech disorders in conversational speech. Training personalized ASR models using conversational speech significantly improved recognition accuracy, demonstrating the importance of domain-specific training and highlighting the need for further research into ASR systems capable of handling disordered conversational speech effectively.
目的:本研究探讨了自动语音识别(ASR)对有语言障碍的人的有效性,解决了阅读和会话自动语音识别之间的性能差距。我们分析了影响这种差距的因素,以及针对特定语音模式的训练对自动语音识别准确性的影响:我们使用(a)针对典型语音进行训练和优化的独立于说话人的 ASR 系统和(b)针对语言障碍参与者的语音进行个性化处理的多个 ASR 模型,对 27 名患有各种语言障碍的人的阅读和会话语音记录进行了分析。对每种语音模型、朗读与对话以及受试者的词错误率进行了计算。线性混合效应模型用于评估语音模式和障碍严重程度对 ASR 准确性的影响。我们研究了九个变量(分为技术、语言或言语障碍因素)对成绩差距的潜在影响:结果:我们发现,在个性化和非适应性 ASR 模型中,阅读语音和对话语音之间存在明显的性能差距。在两种语音模式的非适应模型和个性化模型中,阅读语音的识别准确率明显受到语音障碍严重程度的影响。语篇的语言属性对准确率的影响最大,尽管非典型语音特征也有一定作用。在模型训练中加入会话语音样本显著提高了识别准确率:我们观察到,对于有语言障碍的人来说,朗读语音和会话语音之间的 ASR 识别准确率存在明显差距。这种差距主要是由于会话语音中语言的复杂性和语言障碍的独特性造成的。使用会话语音对个性化 ASR 模型进行训练可显著提高识别准确率,这表明了针对特定领域进行训练的重要性,并强调了进一步研究能够有效处理紊乱会话语音的 ASR 系统的必要性。
期刊介绍:
Mission: JSLHR publishes peer-reviewed research and other scholarly articles on the normal and disordered processes in speech, language, hearing, and related areas such as cognition, oral-motor function, and swallowing. The journal is an international outlet for both basic research on communication processes and clinical research pertaining to screening, diagnosis, and management of communication disorders as well as the etiologies and characteristics of these disorders. JSLHR seeks to advance evidence-based practice by disseminating the results of new studies as well as providing a forum for critical reviews and meta-analyses of previously published work.
Scope: The broad field of communication sciences and disorders, including speech production and perception; anatomy and physiology of speech and voice; genetics, biomechanics, and other basic sciences pertaining to human communication; mastication and swallowing; speech disorders; voice disorders; development of speech, language, or hearing in children; normal language processes; language disorders; disorders of hearing and balance; psychoacoustics; and anatomy and physiology of hearing.