A Case Study in Comparative Speech-to-Text Libraries for Use in Transcript Generation for Online Education Recordings

Pablo Angel Alvarez Fernandez, Jeremy Hajek
{"title":"A Case Study in Comparative Speech-to-Text Libraries for Use in Transcript Generation for Online Education Recordings","authors":"Pablo Angel Alvarez Fernandez, Jeremy Hajek","doi":"10.1145/3368308.3415380","DOIUrl":null,"url":null,"abstract":"With a proliferation of Cloud based Speech-to-Text services it can be difficult to decide where to start and how to make use of these technologies. These include the major Cloud providers as well as several Open Source Speech-to-Text projects available. We desired to investigate a sample of the available libraries and their attributes relating to the recording artifacts that are the by-product of Online Education. The fact that so many resources are available means that the computing and technical barriers for applying speech recognition algorithms have decreased to the point of being a non-factor in the decision to use Speech-to-Text services. New barriers such as price, compute time, and access to the services? source code (software freedom) can be factored into the decision of which platform to use. This case study provides a beginning to developing a test-suite and guide to compare Speech-to-Text libraries and their out-of-the-box accuracy. Our initial test suite employed two models: 1) a Cloud model employing AWS S3 using AWS Transcribe, 2) an on-premises Open Source model that relies on Mozilla's DeepSpeech[1]. We present our findings and recommendations based on the criteria discovered. In order to deliver this test-suite, we also conducted research into the latest web development technologies with emphasis on security. This was done to produce a reliable and secure development process and to provide open access to this proof of concept for further testing and development.","PeriodicalId":374890,"journal":{"name":"Proceedings of the 21st Annual Conference on Information Technology Education","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 21st Annual Conference on Information Technology Education","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3368308.3415380","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

With a proliferation of Cloud based Speech-to-Text services it can be difficult to decide where to start and how to make use of these technologies. These include the major Cloud providers as well as several Open Source Speech-to-Text projects available. We desired to investigate a sample of the available libraries and their attributes relating to the recording artifacts that are the by-product of Online Education. The fact that so many resources are available means that the computing and technical barriers for applying speech recognition algorithms have decreased to the point of being a non-factor in the decision to use Speech-to-Text services. New barriers such as price, compute time, and access to the services? source code (software freedom) can be factored into the decision of which platform to use. This case study provides a beginning to developing a test-suite and guide to compare Speech-to-Text libraries and their out-of-the-box accuracy. Our initial test suite employed two models: 1) a Cloud model employing AWS S3 using AWS Transcribe, 2) an on-premises Open Source model that relies on Mozilla's DeepSpeech[1]. We present our findings and recommendations based on the criteria discovered. In order to deliver this test-suite, we also conducted research into the latest web development technologies with emphasis on security. This was done to produce a reliable and secure development process and to provide open access to this proof of concept for further testing and development.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
比较语音到文本库用于在线教育记录的转录生成的案例研究
随着基于云的语音转文本服务的激增,很难决定从哪里开始以及如何利用这些技术。其中包括主要的云提供商以及几个可用的开源语音到文本项目。我们希望调查一个可用库的样本,以及它们与在线教育的副产品——记录工件相关的属性。如此多的可用资源意味着应用语音识别算法的计算和技术障碍已经减少到决定使用语音到文本服务的非因素。新的障碍,如价格、计算时间和服务访问?源代码(软件自由)可以被考虑到使用哪个平台的决定中。本案例研究为开发测试套件和比较Speech-to-Text库及其开箱即用的准确性提供了开端。我们最初的测试套件采用了两个模型:1)使用AWS translate的AWS S3的云模型,2)依赖于Mozilla DeepSpeech的本地开源模型[1]。我们根据发现的标准提出我们的发现和建议。为了交付这个测试套件,我们还对最新的web开发技术进行了研究,重点是安全性。这样做是为了产生一个可靠和安全的开发过程,并为进一步的测试和开发提供对这个概念证明的开放访问。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Continuous Planning and Forecasting Framework (CPFF) for Agile Project Management: Overcoming the Early Information Technology Program High School Teachers' Training and Continual Professional Development Promoting Teaching Practices in IT Higher Education Exploring the Use of XPath Queries for Automated Assessment of Student Web Development Projects A Novel Framework for Collaborated IT Project with the Consideration of Data Security and Privacy
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1