Pub Date : 2024-11-07Epub Date: 2024-07-11DOI: 10.1044/2024_JSLHR-24-00097
Richard Cave
Purpose: Amyotrophic lateral sclerosis (ALS) is a progressive, ultimately fatal disease causing progressive muscular weakness. Most people living with ALS (plwALS) experience dysarthria, eventually becoming unable to communicate using natural speech. Many wish to use speech for as long as possible. Personalized automated speech recognition (ASR) model technology, such as Google's Project Relate, is argued to better recognize speech with dysarthria, supporting maintenance of understanding through real-time captioning. The objectives of this study are how plwALS and communication partners use Relate in everyday conversation over a period of up to 12 months and how it may change with any decline in speech over time.
Method: This study videoed interactions between three plwALS and communication partners. We assessed ASR caption accuracy and how well they preserved meaning. Conversation analysis was used to identify participants' own organizational practices in the accomplishment of interaction. Thematic analysis was used to understand better the participants' experiences of using ASR captions.
Results: All plwALS reported lower-than-expected ASR accuracy when used in conversation and felt ASR captioning was only useful in certain contexts. All participants liked the concept of live captioning and were hopeful that future improvements to ASR accuracy may support their communication in everyday life.
Conclusions: Training is needed on best practices for customization and practical use of ASR technology and for the limitations of ASR in conversational settings. Support is needed for those less confident with technology and to reduce misplaced allocation of ownership of captioning errors, risking negative effects on psychological well-being.
目的:肌萎缩性脊髓侧索硬化症(ALS)是一种渐进性、最终致命的疾病,会导致进行性肌肉无力。大多数 ALS 患者(plwALS)都会出现构音障碍,最终无法使用自然语言进行交流。许多人希望尽可能长时间地使用语音。个性化自动语音识别(ASR)模型技术,如谷歌的 "Project Relate",被认为可以更好地识别构音障碍患者的语音,并通过实时字幕支持保持理解。本研究的目标是,在长达 12 个月的时间里,plwALS 和交流伙伴如何在日常对话中使用 Relate,以及随着时间的推移,Relate 如何随着语言能力的下降而发生变化:本研究通过视频记录了三名 plwALS 与交流伙伴之间的互动。我们评估了 ASR 字幕的准确性以及它们在多大程度上保留了意义。对话分析用于识别参与者在完成互动过程中的组织行为。主题分析用于更好地了解参与者使用 ASR 字幕的经验:所有 plwALS 都表示在对话中使用 ASR 时准确率低于预期,并认为 ASR 字幕只在某些情况下有用。所有参与者都喜欢实时字幕的概念,并希望未来 ASR 准确性的提高能够支持他们在日常生活中的交流:需要就 ASR 技术的定制和实际使用的最佳实践以及 ASR 在对话环境中的局限性进行培训。需要为那些对技术缺乏信心的人提供支持,并减少因字幕错误而产生的错位分配,以免对心理健康造成负面影响。
{"title":"How People Living With Amyotrophic Lateral Sclerosis Use Personalized Automatic Speech Recognition Technology to Support Communication.","authors":"Richard Cave","doi":"10.1044/2024_JSLHR-24-00097","DOIUrl":"10.1044/2024_JSLHR-24-00097","url":null,"abstract":"<p><strong>Purpose: </strong>Amyotrophic lateral sclerosis (ALS) is a progressive, ultimately fatal disease causing progressive muscular weakness. Most people living with ALS (plwALS) experience dysarthria, eventually becoming unable to communicate using natural speech. Many wish to use speech for as long as possible. Personalized automated speech recognition (ASR) model technology, such as Google's Project Relate, is argued to better recognize speech with dysarthria, supporting maintenance of understanding through real-time captioning. The objectives of this study are how plwALS and communication partners use Relate in everyday conversation over a period of up to 12 months and how it may change with any decline in speech over time.</p><p><strong>Method: </strong>This study videoed interactions between three plwALS and communication partners. We assessed ASR caption accuracy and how well they preserved meaning. Conversation analysis was used to identify participants' own organizational practices in the accomplishment of interaction. Thematic analysis was used to understand better the participants' experiences of using ASR captions.</p><p><strong>Results: </strong>All plwALS reported lower-than-expected ASR accuracy when used in conversation and felt ASR captioning was only useful in certain contexts. All participants liked the concept of live captioning and were hopeful that future improvements to ASR accuracy may support their communication in everyday life.</p><p><strong>Conclusions: </strong>Training is needed on best practices for customization and practical use of ASR technology and for the limitations of ASR in conversational settings. Support is needed for those less confident with technology and to reduce misplaced allocation of ownership of captioning errors, risking negative effects on psychological well-being.</p>","PeriodicalId":51254,"journal":{"name":"Journal of Speech Language and Hearing Research","volume":" ","pages":"4186-4202"},"PeriodicalIF":2.2,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141592048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-07Epub Date: 2024-10-01DOI: 10.1044/2024_JSLHR-24-00317
Helen L Long, Katherine C Hustad
Purpose: This study aimed to investigate the vocal characteristics of children with cerebral palsy (CP) and anarthria using the stage model of vocal development.
Method: Vocal characteristics of 39 children with CP and anarthria around 4 years of age were analyzed from laboratory-based caregiver-child interactions. Perceptual coding analysis was conducted using the Stark Assessment of Early Vocal Development-Revised to examine vocal complexity, volubility, and consonant diversity.
Results: Children predominately produced vocalizations corresponding to the two earliest stages of vocal development characterized by vowel-like utterances. They showed a limited attainment of consonantal features with low consonant diversity and variably low vocal rates.
Conclusions: Our results demonstrate that underlying neurological impairments resulting in an anarthric status in children with CP affect the progression of speech motor development and their ability to advance beyond early vocal stages. These findings highlight the importance of considering alternative communication modalities for children demonstrating similar vocal characteristics beyond expected periods of development.
{"title":"Vocal Characteristics of Children With Cerebral Palsy and Anarthria.","authors":"Helen L Long, Katherine C Hustad","doi":"10.1044/2024_JSLHR-24-00317","DOIUrl":"10.1044/2024_JSLHR-24-00317","url":null,"abstract":"<p><strong>Purpose: </strong>This study aimed to investigate the vocal characteristics of children with cerebral palsy (CP) and anarthria using the stage model of vocal development.</p><p><strong>Method: </strong>Vocal characteristics of 39 children with CP and anarthria around 4 years of age were analyzed from laboratory-based caregiver-child interactions. Perceptual coding analysis was conducted using the Stark Assessment of Early Vocal Development-Revised to examine vocal complexity, volubility, and consonant diversity.</p><p><strong>Results: </strong>Children predominately produced vocalizations corresponding to the two earliest stages of vocal development characterized by vowel-like utterances. They showed a limited attainment of consonantal features with low consonant diversity and variably low vocal rates.</p><p><strong>Conclusions: </strong>Our results demonstrate that underlying neurological impairments resulting in an anarthric status in children with CP affect the progression of speech motor development and their ability to advance beyond early vocal stages. These findings highlight the importance of considering alternative communication modalities for children demonstrating similar vocal characteristics beyond expected periods of development.</p>","PeriodicalId":51254,"journal":{"name":"Journal of Speech Language and Hearing Research","volume":" ","pages":"4264-4274"},"PeriodicalIF":4.6,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11567076/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142367368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-07Epub Date: 2024-10-04DOI: 10.1044/2024_JSLHR-24-00210
Tiffany Chavers Edgar, Ralf W Schlosser, Rajinder Koul
Purpose: The purpose of this study was to examine the effectiveness of an augmentative and alternative communication (AAC) intervention package consisting of systematic instruction and aided AAC modeling with speech-output technology on the acquisition, maintenance, and generalization of socio-communicative behaviors in four minimally speaking, preschool-aged, autistic children.
Method: A multiple-probe design across behaviors (i.e., initiating a request for a turn, answering questions, and commenting) replicated across participants was implemented to evaluate the effects of the intervention package on socio-communicative behaviors. Furthermore, a pretreatment and posttreatment multiple-generalization-probe design was used to assess generalization across typically developing peers who were not a part of the intervention. Maintenance data were collected 3 weeks post intervention.
Results: Visual analysis, corroborated by nonoverlapping of all pairs statistics, established a strong functional relationship between the AAC intervention package and all targeted socio-communicative outcomes for two participants. For the other two participants, inconsistent intervention effects were observed. In terms of generalization from interacting with the researcher to typically developing peers, a functional relationship between the intervention and generalization outcomes for all targeted behaviors was established for only one participant (i.e., Aiden).
Conclusion: The outcomes of this study suggest that aided AAC modeling and systematic instruction using a speech-output technology may lead to gains in socio-communicative behaviors in some minimally speaking, preschool-aged, autistic children.
{"title":"Socio-Communicative Behaviors Involving Minimally Speaking Autistic Preschoolers and Their Typically Developing Peers: Effects of an Augmentative and Alternative Communication Intervention Package.","authors":"Tiffany Chavers Edgar, Ralf W Schlosser, Rajinder Koul","doi":"10.1044/2024_JSLHR-24-00210","DOIUrl":"10.1044/2024_JSLHR-24-00210","url":null,"abstract":"<p><strong>Purpose: </strong>The purpose of this study was to examine the effectiveness of an augmentative and alternative communication (AAC) intervention package consisting of systematic instruction and aided AAC modeling with speech-output technology on the acquisition, maintenance, and generalization of socio-communicative behaviors in four minimally speaking, preschool-aged, autistic children.</p><p><strong>Method: </strong>A multiple-probe design across behaviors (i.e., initiating a request for a turn, answering questions, and commenting) replicated across participants was implemented to evaluate the effects of the intervention package on socio-communicative behaviors. Furthermore, a pretreatment and posttreatment multiple-generalization-probe design was used to assess generalization across typically developing peers who were not a part of the intervention. Maintenance data were collected 3 weeks post intervention.</p><p><strong>Results: </strong>Visual analysis, corroborated by nonoverlapping of all pairs statistics, established a strong functional relationship between the AAC intervention package and all targeted socio-communicative outcomes for two participants. For the other two participants, inconsistent intervention effects were observed. In terms of generalization from interacting with the researcher to typically developing peers, a functional relationship between the intervention and generalization outcomes for all targeted behaviors was established for only one participant (i.e., Aiden).</p><p><strong>Conclusion: </strong>The outcomes of this study suggest that aided AAC modeling and systematic instruction using a speech-output technology may lead to gains in socio-communicative behaviors in some minimally speaking, preschool-aged, autistic children.</p><p><strong>Supplemental material: </strong>https://doi.org/10.23641/asha.27091879.</p>","PeriodicalId":51254,"journal":{"name":"Journal of Speech Language and Hearing Research","volume":" ","pages":"4466-4486"},"PeriodicalIF":4.6,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11567110/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142376209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-07Epub Date: 2024-10-14DOI: 10.1044/2024_JSLHR-24-00128
Sofia Hein Machado, Alex Sweeney, Arturo E Hernandez, Ferenc Bunta
Purpose: The purpose of this study was to investigate how the amount of home language use between the primary caregiver and bilingual Spanish- and English-speaking children with hearing loss who use cochlear implants (CIs) versus their bilingual age-matched peers with normal hearing (NH) can impact speech outcomes in the home language.
Method: Thirty-four bilingual Spanish- and English-speaking children (17 CI users and 17 with NH) between the ages of 5;3 and 7;9 (years;months) participated in this study. Independent variables were the amount of home language use with the primary caregiver and hearing status, and dependent variables were vowels and consonants correctly produced and occurrence of selected phonological processes. The amount of home language use was ascertained from surveys, and the dependent measures were based on a single-word picture elicitation task.
Results: Bilingual children with CIs who are exposed to Spanish for more than 80% of the time via their primary caregiver performed better on Spanish segmental accuracy measures than those who are exposed to Spanish from only 20% to 50% of the time, specifically on vowels (partial η2 = .31) and consonants (partial η2 = .025). Children with NH outperformed children with CIs on all accuracy measures in Spanish.
Conclusions: Preliminary results suggest the importance of language exposure through interactions with the primary caregiver for speech development in bilingual children. Future studies should investigate strategies to facilitate home language development in bilingual children with CIs, enabling them to reach their full potential.
目的:本研究的目的是调查使用人工耳蜗(CI)的听力损失儿童与听力正常(NH)的双语同龄儿童相比,主要照顾者与双语西班牙语和英语儿童之间的家庭语言使用量如何影响家庭语言的言语效果:34名年龄在5;3至7;9(岁;月)之间的西班牙语和英语双语儿童(17名CI使用者和17名听力正常儿童)参加了这项研究。自变量是主要照顾者使用家庭语言的数量和听力状况,因变量是正确发出的元音和辅音以及选定语音过程的发生率。家庭语言使用量是通过调查确定的,而因变量则是基于单词图片激发任务:结果:通过主要照顾者接触西班牙语时间超过 80% 的双语 CI 儿童在西班牙语分段准确性测量中的表现优于接触西班牙语时间仅为 20% 至 50% 的儿童,特别是在元音(部分 η2 = .31)和辅音(部分 η2 = .025)方面。在西班牙语的所有准确度测量中,NH 儿童的表现均优于 CI 儿童:初步结果表明,通过与主要照顾者的互动接触语言对双语儿童的言语发展非常重要。未来的研究应探讨促进双语 CI 儿童家庭语言发展的策略,使他们能够充分发挥潜能。
{"title":"The Effects of Home Language Use on Spanish Speech Measures in Bilingual Children With Hearing Loss Who Use Cochlear Implants and Their Peers With Normal Hearing.","authors":"Sofia Hein Machado, Alex Sweeney, Arturo E Hernandez, Ferenc Bunta","doi":"10.1044/2024_JSLHR-24-00128","DOIUrl":"10.1044/2024_JSLHR-24-00128","url":null,"abstract":"<p><strong>Purpose: </strong>The purpose of this study was to investigate how the amount of home language use between the primary caregiver and bilingual Spanish- and English-speaking children with hearing loss who use cochlear implants (CIs) versus their bilingual age-matched peers with normal hearing (NH) can impact speech outcomes in the home language.</p><p><strong>Method: </strong>Thirty-four bilingual Spanish- and English-speaking children (17 CI users and 17 with NH) between the ages of 5;3 and 7;9 (years;months) participated in this study. Independent variables were the amount of home language use with the primary caregiver and hearing status, and dependent variables were vowels and consonants correctly produced and occurrence of selected phonological processes. The amount of home language use was ascertained from surveys, and the dependent measures were based on a single-word picture elicitation task.</p><p><strong>Results: </strong>Bilingual children with CIs who are exposed to Spanish for more than 80% of the time via their primary caregiver performed better on Spanish segmental accuracy measures than those who are exposed to Spanish from only 20% to 50% of the time, specifically on vowels (partial η<sup>2</sup> = .31) and consonants (partial η<sup>2</sup> = .025). Children with NH outperformed children with CIs on all accuracy measures in Spanish.</p><p><strong>Conclusions: </strong>Preliminary results suggest the importance of language exposure through interactions with the primary caregiver for speech development in bilingual children. Future studies should investigate strategies to facilitate home language development in bilingual children with CIs, enabling them to reach their full potential.</p>","PeriodicalId":51254,"journal":{"name":"Journal of Speech Language and Hearing Research","volume":" ","pages":"4518-4532"},"PeriodicalIF":4.6,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11567057/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142480236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-07Epub Date: 2024-06-05DOI: 10.1044/2024_JSLHR-24-00039
Julie Liss, Visar Berisha
Objective: This research note advocates for a methodological shift in clinical speech analytics, emphasizing the transition from high-dimensional speech feature representations to clinically validated speech measures designed to operationalize clinically relevant constructs of interest. The aim is to enhance model generalizability and clinical applicability in real-world settings.
Method: We outline the challenges of using conventional supervised machine learning models in clinical speech analytics, particularly their limited generalizability and interpretability. We propose a new framework focusing on speech measures that are closely tied to specific speech constructs and have undergone rigorous validation. This research note discusses a case study involving the development of a measure for articulatory precision in amyotrophic lateral sclerosis (ALS), detailing the process from ideation through Food and Drug Administration (FDA) breakthrough status designation.
Results: The case study demonstrates how the operationalization of the articulatory precision construct into a quantifiable measure yields robust, clinically meaningful results. The measure's validation followed the V3 framework (verification, analytical validation, and clinical validation), showing high correlation with clinical status and speech intelligibility. The practical application of these measures is exemplified in a clinical trial and designation by the FDA as a breakthrough status device, underscoring their real-world impact.
Conclusions: Transitioning from speech features to speech measures offers a more targeted approach for developing speech analytics tools in clinical settings. This shift ensures that models are not only technically sound but also clinically relevant and interpretable, thereby bridging the gap between laboratory research and practical health care applications. We encourage further exploration and adoption of this approach for developing interpretable speech representations tailored to specific clinical needs.
{"title":"Operationalizing Clinical Speech Analytics: Moving From Features to Measures for Real-World Clinical Impact.","authors":"Julie Liss, Visar Berisha","doi":"10.1044/2024_JSLHR-24-00039","DOIUrl":"10.1044/2024_JSLHR-24-00039","url":null,"abstract":"<p><strong>Objective: </strong>This research note advocates for a methodological shift in clinical speech analytics, emphasizing the transition from high-dimensional <i>speech feature</i> representations to clinically validated <i>speech measures</i> designed to operationalize clinically relevant constructs of interest. The aim is to enhance model generalizability and clinical applicability in real-world settings.</p><p><strong>Method: </strong>We outline the challenges of using conventional supervised machine learning models in clinical speech analytics, particularly their limited generalizability and interpretability. We propose a new framework focusing on speech measures that are closely tied to specific speech constructs and have undergone rigorous validation. This research note discusses a case study involving the development of a measure for articulatory precision in amyotrophic lateral sclerosis (ALS), detailing the process from ideation through Food and Drug Administration (FDA) breakthrough status designation.</p><p><strong>Results: </strong>The case study demonstrates how the operationalization of the articulatory precision construct into a quantifiable measure yields robust, clinically meaningful results. The measure's validation followed the V3 framework (verification, analytical validation, and clinical validation), showing high correlation with clinical status and speech intelligibility. The practical application of these measures is exemplified in a clinical trial and designation by the FDA as a breakthrough status device, underscoring their real-world impact.</p><p><strong>Conclusions: </strong>Transitioning from speech features to speech measures offers a more targeted approach for developing speech analytics tools in clinical settings. This shift ensures that models are not only technically sound but also clinically relevant and interpretable, thereby bridging the gap between laboratory research and practical health care applications. We encourage further exploration and adoption of this approach for developing interpretable speech representations tailored to specific clinical needs.</p>","PeriodicalId":51254,"journal":{"name":"Journal of Speech Language and Hearing Research","volume":" ","pages":"4226-4232"},"PeriodicalIF":2.2,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141263395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-07Epub Date: 2024-09-26DOI: 10.1044/2024_JSLHR-24-00122
Mark Hasegawa-Johnson, Xiuwen Zheng, Heejin Kim, Clarion Mendes, Meg Dickinson, Erik Hege, Chris Zwilling, Marie Moore Channell, Laura Mattie, Heather Hodges, Lorraine Ramig, Mary Bellard, Mike Shebanek, Leda Sarι, Kaustubh Kalgaonkar, David Frerichs, Jeffrey P Bigham, Leah Findlater, Colin Lea, Sarah Herrlinger, Peter Korn, Shadi Abou-Zahra, Rus Heywood, Katrin Tomanek, Bob MacDonald
Purpose: The Speech Accessibility Project (SAP) intends to facilitate research and development in automatic speech recognition (ASR) and other machine learning tasks for people with speech disabilities. The purpose of this article is to introduce this project as a resource for researchers, including baseline analysis of the first released data package.
Method: The project aims to facilitate ASR research by collecting, curating, and distributing transcribed U.S. English speech from people with speech and/or language disabilities. Participants record speech from their place of residence by connecting their personal computer, cell phone, and assistive devices, if needed, to the SAP web portal. All samples are manually transcribed, and 30 per participant are annotated using differential diagnostic pattern dimensions. For purposes of ASR experiments, the participants have been randomly assigned to a training set, a development set for controlled testing of a trained ASR, and a test set to evaluate ASR error rate.
Results: The SAP 2023-10-05 Data Package contains the speech of 211 people with dysarthria as a correlate of Parkinson's disease, and the associated test set contains 42 additional speakers. A baseline ASR, with a word error rate of 3.4% for typical speakers, transcribes test speech with a word error rate of 36.3%. Fine-tuning reduces the word error rate to 23.7%.
Conclusions: Preliminary findings suggest that a large corpus of dysarthric and dysphonic speech has the potential to significantly improve speech technology for people with disabilities. By providing these data to researchers, the SAP intends to significantly accelerate research into accessible speech technology.
目的:语音无障碍项目(SAP)旨在促进自动语音识别(ASR)和其他机器学习任务的研究和开发,为语音残疾人士提供便利。本文旨在介绍该项目,将其作为研究人员的资源,包括对首次发布的数据包进行基线分析:该项目旨在通过收集、整理和发布语音和/或语言残障人士转录的美国英语语音来促进 ASR 研究。参与者将个人电脑、手机和辅助设备(如需要)连接到 SAP 门户网站,在居住地录制语音。所有样本均由人工转录,并使用差异诊断模式维度对每位参与者的 30 个样本进行注释。为了进行 ASR 实验,参与者被随机分配到一个训练集、一个用于对训练过的 ASR 进行控制测试的开发集和一个用于评估 ASR 错误率的测试集:SAP 2023-10-05 数据包包含 211 名与帕金森病相关的构音障碍患者的语音,相关测试集包含另外 42 名发言者的语音。基线 ASR 对典型说话者的词错误率为 3.4%,而转录测试语音的词错误率为 36.3%。微调后,词错误率降低到 23.7%:初步研究结果表明,庞大的发音障碍和发音困难语音语料库有可能极大地改进面向残疾人的语音技术。通过向研究人员提供这些数据,SAP 打算大大加快无障碍语音技术的研究。补充材料:https://doi.org/10.23641/asha.27078079。
{"title":"Community-Supported Shared Infrastructure in Support of Speech Accessibility.","authors":"Mark Hasegawa-Johnson, Xiuwen Zheng, Heejin Kim, Clarion Mendes, Meg Dickinson, Erik Hege, Chris Zwilling, Marie Moore Channell, Laura Mattie, Heather Hodges, Lorraine Ramig, Mary Bellard, Mike Shebanek, Leda Sarι, Kaustubh Kalgaonkar, David Frerichs, Jeffrey P Bigham, Leah Findlater, Colin Lea, Sarah Herrlinger, Peter Korn, Shadi Abou-Zahra, Rus Heywood, Katrin Tomanek, Bob MacDonald","doi":"10.1044/2024_JSLHR-24-00122","DOIUrl":"10.1044/2024_JSLHR-24-00122","url":null,"abstract":"<p><strong>Purpose: </strong>The Speech Accessibility Project (SAP) intends to facilitate research and development in automatic speech recognition (ASR) and other machine learning tasks for people with speech disabilities. The purpose of this article is to introduce this project as a resource for researchers, including baseline analysis of the first released data package.</p><p><strong>Method: </strong>The project aims to facilitate ASR research by collecting, curating, and distributing transcribed U.S. English speech from people with speech and/or language disabilities. Participants record speech from their place of residence by connecting their personal computer, cell phone, and assistive devices, if needed, to the SAP web portal. All samples are manually transcribed, and 30 per participant are annotated using differential diagnostic pattern dimensions. For purposes of ASR experiments, the participants have been randomly assigned to a training set, a development set for controlled testing of a trained ASR, and a test set to evaluate ASR error rate.</p><p><strong>Results: </strong>The SAP 2023-10-05 Data Package contains the speech of 211 people with dysarthria as a correlate of Parkinson's disease, and the associated test set contains 42 additional speakers. A baseline ASR, with a word error rate of 3.4% for typical speakers, transcribes test speech with a word error rate of 36.3%. Fine-tuning reduces the word error rate to 23.7%.</p><p><strong>Conclusions: </strong>Preliminary findings suggest that a large corpus of dysarthric and dysphonic speech has the potential to significantly improve speech technology for people with disabilities. By providing these data to researchers, the SAP intends to significantly accelerate research into accessible speech technology.</p><p><strong>Supplemental material: </strong>https://doi.org/10.23641/asha.27078079.</p>","PeriodicalId":51254,"journal":{"name":"Journal of Speech Language and Hearing Research","volume":" ","pages":"4162-4175"},"PeriodicalIF":2.2,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142332056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-07Epub Date: 2024-10-17DOI: 10.1044/2024_JSLHR-24-00594
Jordan R Green
{"title":"Artificial Intelligence in Communication Sciences and Disorders: Introduction to the Forum.","authors":"Jordan R Green","doi":"10.1044/2024_JSLHR-24-00594","DOIUrl":"10.1044/2024_JSLHR-24-00594","url":null,"abstract":"","PeriodicalId":51254,"journal":{"name":"Journal of Speech Language and Hearing Research","volume":" ","pages":"4157-4161"},"PeriodicalIF":4.6,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11567088/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142480231","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-07Epub Date: 2024-10-24DOI: 10.1044/2024_JSLHR-24-00418
Catriona M Steele, Renata Mancopes, Emily Barrett, Vanessa Panes, Melanie Peladeau-Pigeon, Michelle M Simmons, Sana Smaoui
Purpose: Age- and disease-related changes in oropharyngeal anatomy and physiology may be identified through quantitative videofluoroscopic measures of pharyngeal area and dynamics. Pixel-based measures of nonconstricted pharyngeal area (PhAR) are typically taken during oral bolus hold tasks or on postswallow rest frames. A recent study in 87 healthy adults reported mean postswallow PhAR of 62%(C2-4)2, (range: 25%-135%), and significantly larger PhAR in males. The fact that measures were taken after initial bolus swallows without controlling for the presence of subsequent clearing swallows was identified as a potential source of variation. A subset of study participants had completed a protocol including additional static nonswallowing tasks, enabling us to explore variability across those tasks, taking sex differences into account.
Method: Videofluoroscopy still shots were analyzed for 20 healthy adults (10 males, 10 females, Mage = 26 years) in head-neutral position, chin-down and chin-up positions, a sustained /a/ vowel vocalization, and oral bolus hold tasks (1-cc, 5-cc). Trained raters used ImageJ software to measure PhAR in %(C2-4)2 units. Measures were compared to previously reported mean postswallow PhAR for the same participants: (a) explorations of sex differences; (b) pairwise linear mixed-model analyses of variance (ANOVAs) of PhAR for each nonswallowing task versus postswallow measures, controlling for sex; and (c) a combined mixed-model ANOVA to confirm comparability of the subset of tasks showing no significant differences from postswallow measures in Step 2.
Results: Overall, PhAR measures were significantly larger in male participants; however, most pairwise task comparisons did not differ by sex. No significant differences from postswallow measures were seen for 5-cc bolus hold, chin-down and chin-up postures, and the second (but not the first) of two repeated head neutral still shots. PhAR during a 5-cc bolus hold was most similar to postswallow measures: mean ± standard deviation of 51 ± 13%(C2-4)2 in females and 64 ± 16%(C2-4)2 in males.
Conclusions: PhAR is larger in men than in women. Oral bolus hold tasks with a 5-cc liquid bolus yield similar measures to those obtained from postswallow rest frames.
{"title":"Preliminary Exploration of Variations in Measures of Pharyngeal Area During Nonswallowing Tasks.","authors":"Catriona M Steele, Renata Mancopes, Emily Barrett, Vanessa Panes, Melanie Peladeau-Pigeon, Michelle M Simmons, Sana Smaoui","doi":"10.1044/2024_JSLHR-24-00418","DOIUrl":"10.1044/2024_JSLHR-24-00418","url":null,"abstract":"<p><strong>Purpose: </strong>Age- and disease-related changes in oropharyngeal anatomy and physiology may be identified through quantitative videofluoroscopic measures of pharyngeal area and dynamics. Pixel-based measures of nonconstricted pharyngeal area (PhAR) are typically taken during oral bolus hold tasks or on postswallow rest frames. A recent study in 87 healthy adults reported mean postswallow PhAR of 62%(C2-4)<sup>2</sup>, (range: 25%-135%), and significantly larger PhAR in males. The fact that measures were taken after initial bolus swallows without controlling for the presence of subsequent clearing swallows was identified as a potential source of variation. A subset of study participants had completed a protocol including additional static nonswallowing tasks, enabling us to explore variability across those tasks, taking sex differences into account.</p><p><strong>Method: </strong>Videofluoroscopy still shots were analyzed for 20 healthy adults (10 males, 10 females, <i>M</i><sub>age</sub> = 26 years) in head-neutral position, chin-down and chin-up positions, a sustained /a/ vowel vocalization, and oral bolus hold tasks (1-cc, 5-cc). Trained raters used ImageJ software to measure PhAR in %(C2-4)<sup>2</sup> units. Measures were compared to previously reported mean postswallow PhAR for the same participants: (a) explorations of sex differences; (b) pairwise linear mixed-model analyses of variance (ANOVAs) of PhAR for each nonswallowing task versus postswallow measures, controlling for sex; and (c) a combined mixed-model ANOVA to confirm comparability of the subset of tasks showing no significant differences from postswallow measures in Step 2.</p><p><strong>Results: </strong>Overall, PhAR measures were significantly larger in male participants; however, most pairwise task comparisons did not differ by sex. No significant differences from postswallow measures were seen for 5-cc bolus hold, chin-down and chin-up postures, and the second (but not the first) of two repeated head neutral still shots. PhAR during a 5-cc bolus hold was most similar to postswallow measures: mean ± standard deviation of 51 ± 13%(C2-4)<sup>2</sup> in females and 64 ± 16%(C2-4)<sup>2</sup> in males.</p><p><strong>Conclusions: </strong>PhAR is larger in men than in women. Oral bolus hold tasks with a 5-cc liquid bolus yield similar measures to those obtained from postswallow rest frames.</p>","PeriodicalId":51254,"journal":{"name":"Journal of Speech Language and Hearing Research","volume":" ","pages":"4304-4313"},"PeriodicalIF":4.6,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11567086/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142523641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-06DOI: 10.1044/2024_JSLHR-24-00256
Kim R Bauerly, Eric S Jackson
Purpose: Research has found an advantage to maintaining an external attentional focus while speaking as an increase in accuracy and a decrease in across-sentence variability has been found when producing oral-motor and speech tasks. What is not clear is how attention affects articulatory variability both across and within sentences, or how attention affects articulatory control in speakers who stutter. The purpose of this study was to investigate the effects of an internal versus external attention focus on articulatory variability at the sentence level.
Method: This study used linear (spatial-temporal index [STI]) and nonlinear (recurrence quantification analysis [RQA]) indices to measure lip aperture variability in 10 adults who stutter (AWS) and 15 adults who do not stutter (ANS) while they repeated sentences under an internal versus external attentional focus, virtual reality task (withVR.app; retrieved December 2023 from https://therapy.withvr.app). Four RQA measures were used to calculate within sentence variability including percent recurrence, percent determinism (%DET), stability (MAXLINE), and stationarity (TREND). Sentence duration measures were also obtained.
Results: AWS' movement durations were significantly longer than those of the ANS across conditions, and the AWS were more affected by the attentional focus shifts as their speech rate significantly increased when speaking with an external focus. AWS' speech patterns were also significantly more deterministic (%DET) and stable (MAXLINE) across attentional focus conditions compared to those of the ANS. Both groups showed an effect from attentional shifts as they exhibited less variability (i.e., more consistent) across sentences (STI) and less determinism (%DET) and stability (MAXLINE) within sentences when repeating sentences under an external attentional focus. STI values were not significantly different between the AWS and ANS for the internal or external attentional focus tasks. There were no significant main effects for group or condition for TREND; however, a main effect for sentence type was found.
Conclusion: Results suggest that AWS use a more restrictive and less flexible approach to movement and that an external focus fosters more flexibility and thus responsiveness to external factors.
{"title":"Influences of Attentional Focus on Across- and Within-Sentence Variability in Adults Who Do and Do Not Stutter.","authors":"Kim R Bauerly, Eric S Jackson","doi":"10.1044/2024_JSLHR-24-00256","DOIUrl":"https://doi.org/10.1044/2024_JSLHR-24-00256","url":null,"abstract":"<p><strong>Purpose: </strong>Research has found an advantage to maintaining an external attentional focus while speaking as an increase in accuracy and a decrease in across-sentence variability has been found when producing oral-motor and speech tasks. What is not clear is how attention affects articulatory variability both <i>across</i> and <i>within</i> sentences, or how attention affects articulatory control in speakers who stutter. The purpose of this study was to investigate the effects of an internal versus external attention focus on articulatory variability at the sentence level.</p><p><strong>Method: </strong>This study used linear (spatial-temporal index [STI]) and nonlinear (recurrence quantification analysis [RQA]) indices to measure lip aperture variability in 10 adults who stutter (AWS) and 15 adults who do not stutter (ANS) while they repeated sentences under an internal versus external attentional focus, virtual reality task (withVR.app; retrieved December 2023 from https://therapy.withvr.app). Four RQA measures were used to calculate within sentence variability including percent recurrence, percent determinism (%DET), stability (MAXLINE), and stationarity (TREND). Sentence duration measures were also obtained.</p><p><strong>Results: </strong>AWS' movement durations were significantly longer than those of the ANS across conditions, and the AWS were more affected by the attentional focus shifts as their speech rate significantly increased when speaking with an external focus. AWS' speech patterns were also significantly more deterministic (%DET) and stable (MAXLINE) across attentional focus conditions compared to those of the ANS. Both groups showed an effect from attentional shifts as they exhibited less variability (i.e., more consistent) across sentences (STI) and less determinism (%DET) and stability (MAXLINE) within sentences when repeating sentences under an external attentional focus. STI values were not significantly different between the AWS and ANS for the internal or external attentional focus tasks. There were no significant main effects for group or condition for TREND; however, a main effect for sentence type was found.</p><p><strong>Conclusion: </strong>Results suggest that AWS use a more restrictive and less flexible approach to movement and that an external focus fosters more flexibility and thus responsiveness to external factors.</p>","PeriodicalId":51254,"journal":{"name":"Journal of Speech Language and Hearing Research","volume":" ","pages":"1-13"},"PeriodicalIF":2.2,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142590975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-04DOI: 10.1044/2024_JSLHR-23-00533
Jasmeen Mansour-Adwan, Asaid Khateb
Purpose: This study aimed to evaluate the stability of phonological awareness (PA) and language achievements between kindergarten and first grade among Arabic-speaking children.
Method: A total of 1,158 children were assessed in PA and language skills in both grades and were classified based on distinct and integrated achievements on PA and language following percentiles' cutoff criteria. The classification of distinct achievements constituted high, intermediate, low, and very low achievement-based groups for each domain. The classification of the integrated achievements on both domains constituted four groups: intermediate-high PA and language, very low PA, very low language, and doubly low (very low PA and language). Descriptive statistics and McNemar's tests were used to examine the stability of these groups.
Results: The analyses showed a significant improvement in achievements on most tasks. The distinct classification for PA and language indicated that many more kindergarteners in the extreme distribution with high and very low achievement levels maintained this profile in first grade compared to those with intermediate achievements. For PA, 55.7% of kindergarteners with high, 30% with intermediate, 30.4% with low, and 45.5% with very low achievements maintained their achievements in first grade. For language, 52.5% of kindergarteners with high, 34.5% with intermediate, 38.8% with low, and 59.8% with very low achievements maintained their language achievements. The integrated classification indicated a higher achievement stability rate for kindergarteners with intermediate-high PA and language (91.3%) and for doubly low achievers (84.7%) compared to very low PA (24.1%) or very low language (31.8%) achievers.
Conclusions: The study indicated a higher variability in the distribution of the intermediate achievements compared to the high and very low achievements, which were more stable across grade. The results emphasize the need for dynamic linguistic assessments and early intervention for children with very low achievements in PA and language who show a poor prognosis.
目的:本研究旨在评估阿拉伯语儿童在幼儿园和一年级之间语音意识(PA)和语言成绩的稳定性:共有 1,158 名儿童接受了两个年级的语音意识(PA)和语言技能评估,并按照百分位数截断标准,根据语音意识(PA)和语言技能的显著成就和综合成就进行了分类。对每个领域的不同成绩划分为高、中、低和极低成绩组。两个领域的综合成绩分为四组:中等偏上的 PA 和语言成绩、极低的 PA、极低的语言成绩和双低成绩(极低的 PA 和语言成绩)。我们使用了描述性统计和 McNemar 检验来考察这些组别的稳定性:分析表明,大多数任务的成绩都有明显提高。对表演能力和语言能力的不同分类表明,与成绩中等的学生相比,成绩高和成绩极低的极端分布中更多的幼儿园学生在一年级时保持了这种状况。在学习能力方面,55.7%的成绩优秀的幼儿园学生、30%的成绩中等的幼儿园学生、30.4%的成绩较差的幼儿园学生和 45.5%的成绩很差的幼儿园学生在一年级时保持了他们的成绩。在语言方面,52.5%的成绩优秀的幼儿园学生、34.5%的成绩中等的学生、38.8%的成绩较差的学生和 59.8%的成绩很差的学生都保持了语言成绩。综合分类结果表明,与学习能力极低(24.1%)或语言能力极低(31.8%)的学生相比,学习能力和语言能力中等偏上(91.3%)和学习能力双低(84.7%)的幼儿园学生的成绩稳定率更高:研究表明,与成绩优秀和成绩极差的学生相比,中等成绩学生的分布差异较大,而成绩极差的学生在不同年级的分布较为稳定。研究结果表明,有必要对 PA 和语言成绩极差且预后不良的儿童进行动态语言评估和早期干预。
{"title":"The Stability of Linguistic Skills of Arabic-Speaking Children Between Kindergarten and First Grade.","authors":"Jasmeen Mansour-Adwan, Asaid Khateb","doi":"10.1044/2024_JSLHR-23-00533","DOIUrl":"https://doi.org/10.1044/2024_JSLHR-23-00533","url":null,"abstract":"<p><strong>Purpose: </strong>This study aimed to evaluate the stability of phonological awareness (PA) and language achievements between kindergarten and first grade among Arabic-speaking children.</p><p><strong>Method: </strong>A total of 1,158 children were assessed in PA and language skills in both grades and were classified based on distinct and integrated achievements on PA and language following percentiles' cutoff criteria. The classification of distinct achievements constituted high, intermediate, low, and very low achievement-based groups for each domain. The classification of the integrated achievements on both domains constituted four groups: intermediate-high PA and language, very low PA, very low language, and doubly low (very low PA and language). Descriptive statistics and McNemar's tests were used to examine the stability of these groups.</p><p><strong>Results: </strong>The analyses showed a significant improvement in achievements on most tasks. The distinct classification for PA and language indicated that many more kindergarteners in the extreme distribution with high and very low achievement levels maintained this profile in first grade compared to those with intermediate achievements. For PA, 55.7% of kindergarteners with high, 30% with intermediate, 30.4% with low, and 45.5% with very low achievements maintained their achievements in first grade. For language, 52.5% of kindergarteners with high, 34.5% with intermediate, 38.8% with low, and 59.8% with very low achievements maintained their language achievements. The integrated classification indicated a higher achievement stability rate for kindergarteners with intermediate-high PA and language (91.3%) and for doubly low achievers (84.7%) compared to very low PA (24.1%) or very low language (31.8%) achievers.</p><p><strong>Conclusions: </strong>The study indicated a higher variability in the distribution of the intermediate achievements compared to the high and very low achievements, which were more stable across grade. The results emphasize the need for dynamic linguistic assessments and early intervention for children with very low achievements in PA and language who show a poor prognosis.</p>","PeriodicalId":51254,"journal":{"name":"Journal of Speech Language and Hearing Research","volume":" ","pages":"1-16"},"PeriodicalIF":2.2,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142577335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}