A proof-of-concept study for automatic speech recognition to transcribe AAC speakers' speech from high-technology AAC systems.

IF 2.5 4区医学 Q1 REHABILITATION Assistive Technology Pub Date : 2024-07-03 Epub Date: 2023-10-05 DOI:10.1080/10400435.2023.2260860

Szu-Han Kay Chen, Conner Saeli, Gang Hu

{"title":"A proof-of-concept study for automatic speech recognition to transcribe AAC speakers' speech from high-technology AAC systems.","authors":"Szu-Han Kay Chen, Conner Saeli, Gang Hu","doi":"10.1080/10400435.2023.2260860","DOIUrl":null,"url":null,"abstract":"<p><p>Automatic speech recognition (ASR) is an emerging technology that has been used in recognizing non-typical speech of people with speech impairment and enhancing the language sample transcription process in communication sciences and disorders. However, the feasibility of using ASR for recognizing speech samples from high-tech Augmentative and Alternative Communication (AAC) systems has not been investigated. This proof-of-concept paper aims to investigate the feasibility of using AAC-ASR to transcribe language samples generated by high-tech AAC systems and compares the recognition accuracy of two published ASR models: CMU Sphinx and Google Speech-to-text. An AAC-ASR model was developed that transcribes simulated AAC speaker language samples. The AAC-ASR model's word error rate (WER) was compared with those of CMU Sphinx and Google Speech-to-text. The WER of the AAC-ASR model outperformed (28.6%) compared with CMU Sphinx and Google when tested on the testing files (70.7% and 86.2% retrospectively). Our results demonstrate the feasibility of using the ASR model to automatically transcribe high-technology AAC-simulated language samples to support language sample analysis. Future steps will focus on developing the model with diverse AAC speech training datasets and understanding the speech patterns of individual AAC users to refine the AAC-ASR model.</p>","PeriodicalId":51568,"journal":{"name":"Assistive Technology","volume":" ","pages":"319-326"},"PeriodicalIF":2.5000,"publicationDate":"2024-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Assistive Technology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1080/10400435.2023.2260860","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/10/5 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"REHABILITATION","Score":null,"Total":0}

引用次数: 0

Abstract

Automatic speech recognition (ASR) is an emerging technology that has been used in recognizing non-typical speech of people with speech impairment and enhancing the language sample transcription process in communication sciences and disorders. However, the feasibility of using ASR for recognizing speech samples from high-tech Augmentative and Alternative Communication (AAC) systems has not been investigated. This proof-of-concept paper aims to investigate the feasibility of using AAC-ASR to transcribe language samples generated by high-tech AAC systems and compares the recognition accuracy of two published ASR models: CMU Sphinx and Google Speech-to-text. An AAC-ASR model was developed that transcribes simulated AAC speaker language samples. The AAC-ASR model's word error rate (WER) was compared with those of CMU Sphinx and Google Speech-to-text. The WER of the AAC-ASR model outperformed (28.6%) compared with CMU Sphinx and Google when tested on the testing files (70.7% and 86.2% retrospectively). Our results demonstrate the feasibility of using the ASR model to automatically transcribe high-technology AAC-simulated language samples to support language sample analysis. Future steps will focus on developing the model with diverse AAC speech training datasets and understanding the speech patterns of individual AAC users to refine the AAC-ASR model.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

自动语音识别的概念验证研究，用于从高科技AAC系统中转录AAC扬声器的语音。

自动语音识别（ASR）是一种新兴技术，已被用于识别言语障碍者的非典型语音，并在通信科学和疾病中增强语言样本转录过程。然而，使用ASR来识别来自高科技增强和替代通信（AAC）系统的语音样本的可行性尚未得到研究。这篇概念验证论文旨在研究使用AAC-ASR转录高科技AAC系统生成的语言样本的可行性，并比较两个已发表的ASR模型：CMU Sphinx和Google Speech到文本的识别精度。开发了AAC-ASR模型，用于转录模拟AAC说话者语言样本。将AAC-ASR模型的单词错误率（WER）与CMU Sphinx和Google Speech-to-text模型的单词出错率进行了比较。在测试文件上进行测试时，AAC-ASR模型的WER优于CMU Sphinx和Google（回顾性地分别为70.7%和86.2%）（28.6%）。我们的结果证明了使用ASR模型自动转录高科技AAC模拟语言样本以支持语言样本分析的可行性。未来的步骤将侧重于开发具有不同AAC语音训练数据集的模型，并了解各个AAC用户的语音模式，以完善AAC-ASR模型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Assistive Technology REHABILITATION-

CiteScore

4.00

自引率

5.60%

发文量

期刊介绍： Assistive Technology is an applied, scientific publication in the multi-disciplinary field of technology for people with disabilities. The journal"s purpose is to foster communication among individuals working in all aspects of the assistive technology arena including researchers, developers, clinicians, educators and consumers. The journal will consider papers from all assistive technology applications. Only original papers will be accepted. Technical notes describing preliminary techniques, procedures, or findings of original scientific research may also be submitted. Letters to the Editor are welcome. Books for review may be sent to authors or publisher.