Mark Hasegawa-Johnson, Xiuwen Zheng, Heejin Kim, Clarion Mendes, Meg Dickinson, Erik Hege, Chris Zwilling, Marie Moore Channell, Laura Mattie, Heather Hodges, Lorraine Ramig, Mary Bellard, Mike Shebanek, Leda Sarι, Kaustubh Kalgaonkar, David Frerichs, Jeffrey P Bigham, Leah Findlater, Colin Lea, Sarah Herrlinger, Peter Korn, Shadi Abou-Zahra, Rus Heywood, Katrin Tomanek, Bob MacDonald
{"title":"Community-Supported Shared Infrastructure in Support of Speech Accessibility.","authors":"Mark Hasegawa-Johnson, Xiuwen Zheng, Heejin Kim, Clarion Mendes, Meg Dickinson, Erik Hege, Chris Zwilling, Marie Moore Channell, Laura Mattie, Heather Hodges, Lorraine Ramig, Mary Bellard, Mike Shebanek, Leda Sarι, Kaustubh Kalgaonkar, David Frerichs, Jeffrey P Bigham, Leah Findlater, Colin Lea, Sarah Herrlinger, Peter Korn, Shadi Abou-Zahra, Rus Heywood, Katrin Tomanek, Bob MacDonald","doi":"10.1044/2024_JSLHR-24-00122","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>The Speech Accessibility Project (SAP) intends to facilitate research and development in automatic speech recognition (ASR) and other machine learning tasks for people with speech disabilities. The purpose of this article is to introduce this project as a resource for researchers, including baseline analysis of the first released data package.</p><p><strong>Method: </strong>The project aims to facilitate ASR research by collecting, curating, and distributing transcribed U.S. English speech from people with speech and/or language disabilities. Participants record speech from their place of residence by connecting their personal computer, cell phone, and assistive devices, if needed, to the SAP web portal. All samples are manually transcribed, and 30 per participant are annotated using differential diagnostic pattern dimensions. For purposes of ASR experiments, the participants have been randomly assigned to a training set, a development set for controlled testing of a trained ASR, and a test set to evaluate ASR error rate.</p><p><strong>Results: </strong>The SAP 2023-10-05 Data Package contains the speech of 211 people with dysarthria as a correlate of Parkinson's disease, and the associated test set contains 42 additional speakers. A baseline ASR, with a word error rate of 3.4% for typical speakers, transcribes test speech with a word error rate of 36.3%. Fine-tuning reduces the word error rate to 23.7%.</p><p><strong>Conclusions: </strong>Preliminary findings suggest that a large corpus of dysarthric and dysphonic speech has the potential to significantly improve speech technology for people with disabilities. By providing these data to researchers, the SAP intends to significantly accelerate research into accessible speech technology.</p><p><strong>Supplemental material: </strong>https://doi.org/10.23641/asha.27078079.</p>","PeriodicalId":51254,"journal":{"name":"Journal of Speech Language and Hearing Research","volume":" ","pages":"4162-4175"},"PeriodicalIF":2.2000,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Speech Language and Hearing Research","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1044/2024_JSLHR-24-00122","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/9/26 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"AUDIOLOGY & SPEECH-LANGUAGE PATHOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Purpose: The Speech Accessibility Project (SAP) intends to facilitate research and development in automatic speech recognition (ASR) and other machine learning tasks for people with speech disabilities. The purpose of this article is to introduce this project as a resource for researchers, including baseline analysis of the first released data package.
Method: The project aims to facilitate ASR research by collecting, curating, and distributing transcribed U.S. English speech from people with speech and/or language disabilities. Participants record speech from their place of residence by connecting their personal computer, cell phone, and assistive devices, if needed, to the SAP web portal. All samples are manually transcribed, and 30 per participant are annotated using differential diagnostic pattern dimensions. For purposes of ASR experiments, the participants have been randomly assigned to a training set, a development set for controlled testing of a trained ASR, and a test set to evaluate ASR error rate.
Results: The SAP 2023-10-05 Data Package contains the speech of 211 people with dysarthria as a correlate of Parkinson's disease, and the associated test set contains 42 additional speakers. A baseline ASR, with a word error rate of 3.4% for typical speakers, transcribes test speech with a word error rate of 36.3%. Fine-tuning reduces the word error rate to 23.7%.
Conclusions: Preliminary findings suggest that a large corpus of dysarthric and dysphonic speech has the potential to significantly improve speech technology for people with disabilities. By providing these data to researchers, the SAP intends to significantly accelerate research into accessible speech technology.
目的:语音无障碍项目(SAP)旨在促进自动语音识别(ASR)和其他机器学习任务的研究和开发,为语音残疾人士提供便利。本文旨在介绍该项目,将其作为研究人员的资源,包括对首次发布的数据包进行基线分析:该项目旨在通过收集、整理和发布语音和/或语言残障人士转录的美国英语语音来促进 ASR 研究。参与者将个人电脑、手机和辅助设备(如需要)连接到 SAP 门户网站,在居住地录制语音。所有样本均由人工转录,并使用差异诊断模式维度对每位参与者的 30 个样本进行注释。为了进行 ASR 实验,参与者被随机分配到一个训练集、一个用于对训练过的 ASR 进行控制测试的开发集和一个用于评估 ASR 错误率的测试集:SAP 2023-10-05 数据包包含 211 名与帕金森病相关的构音障碍患者的语音,相关测试集包含另外 42 名发言者的语音。基线 ASR 对典型说话者的词错误率为 3.4%,而转录测试语音的词错误率为 36.3%。微调后,词错误率降低到 23.7%:初步研究结果表明,庞大的发音障碍和发音困难语音语料库有可能极大地改进面向残疾人的语音技术。通过向研究人员提供这些数据,SAP 打算大大加快无障碍语音技术的研究。补充材料:https://doi.org/10.23641/asha.27078079。
期刊介绍:
Mission: JSLHR publishes peer-reviewed research and other scholarly articles on the normal and disordered processes in speech, language, hearing, and related areas such as cognition, oral-motor function, and swallowing. The journal is an international outlet for both basic research on communication processes and clinical research pertaining to screening, diagnosis, and management of communication disorders as well as the etiologies and characteristics of these disorders. JSLHR seeks to advance evidence-based practice by disseminating the results of new studies as well as providing a forum for critical reviews and meta-analyses of previously published work.
Scope: The broad field of communication sciences and disorders, including speech production and perception; anatomy and physiology of speech and voice; genetics, biomechanics, and other basic sciences pertaining to human communication; mastication and swallowing; speech disorders; voice disorders; development of speech, language, or hearing in children; normal language processes; language disorders; disorders of hearing and balance; psychoacoustics; and anatomy and physiology of hearing.