The effects of automatic speech recognition quality on human transcription latency

Yashesh Gaur, Walter S. Lasecki, Florian Metze, Jeffrey P. Bigham
{"title":"The effects of automatic speech recognition quality on human transcription latency","authors":"Yashesh Gaur, Walter S. Lasecki, Florian Metze, Jeffrey P. Bigham","doi":"10.1145/2899475.2899478","DOIUrl":null,"url":null,"abstract":"Transcription makes speech accessible to deaf and hard of hearing people. This conversion of speech to text is still done manually by humans, despite high cost, because the quality of automated speech recognition (ASR) is still too low in real-world settings. Manual conversion can require more than 5 times the original audio time, which also introduces significant latency. Giving transcriptionists ASR output as a starting point seems like a reasonable approach to making humans more efficient and thereby reducing this cost, but the effectiveness of this approach is clearly related to the quality of the speech recognition output. At high error rates, fixing inaccurate speech recognition output may take longer than producing the transcription from scratch, and transcriptionists may not realize when transcription output is too inaccurate to be useful. In this paper, we empirically explore how the latency of transcriptions created by participants recruited on Amazon Mechanical Turk vary based on the accuracy of speech recognition output. We present results from 2 studies which indicate that starting with the ASR output is worse unless it is sufficiently accurate (Word Error Rate of under 30%).","PeriodicalId":337838,"journal":{"name":"Proceedings of the 13th Web for All Conference","volume":"47 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 13th Web for All Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2899475.2899478","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 17

Abstract

Transcription makes speech accessible to deaf and hard of hearing people. This conversion of speech to text is still done manually by humans, despite high cost, because the quality of automated speech recognition (ASR) is still too low in real-world settings. Manual conversion can require more than 5 times the original audio time, which also introduces significant latency. Giving transcriptionists ASR output as a starting point seems like a reasonable approach to making humans more efficient and thereby reducing this cost, but the effectiveness of this approach is clearly related to the quality of the speech recognition output. At high error rates, fixing inaccurate speech recognition output may take longer than producing the transcription from scratch, and transcriptionists may not realize when transcription output is too inaccurate to be useful. In this paper, we empirically explore how the latency of transcriptions created by participants recruited on Amazon Mechanical Turk vary based on the accuracy of speech recognition output. We present results from 2 studies which indicate that starting with the ASR output is worse unless it is sufficiently accurate (Word Error Rate of under 30%).
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
自动语音识别质量对人类转录延迟的影响
转录使聋哑人和重听人能够说话。语音到文本的转换仍然是人工完成的,尽管成本很高,因为自动语音识别(ASR)的质量在现实环境中仍然太低。手动转换可能需要5倍以上的原始音频时间,这也会带来明显的延迟。给转录员提供ASR输出作为起点似乎是一种合理的方法,可以提高人类的效率,从而降低成本,但这种方法的有效性显然与语音识别输出的质量有关。在高错误率的情况下,修复不准确的语音识别输出可能比从头开始生成转录需要更长的时间,转录员可能没有意识到转录输出太不准确而无法使用。在本文中,我们实证地探讨了在亚马逊土耳其机器人上招募的参与者创建的转录延迟如何根据语音识别输出的准确性而变化。我们提出了两项研究的结果,表明从ASR输出开始是更糟糕的,除非它足够准确(单词错误率低于30%)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Dytective: towards detecting dyslexia across languages using an online game Life-long learning on the inclusive web Accessible OzPlayer video player WebReader: a screen reader for everyone, everywhere Lake Devo: accessible online role-play
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1