Speech Generation for Indigenous Language Education

IF 3.1 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Computer Speech and Language Pub Date : 2024-09-28 DOI:10.1016/j.csl.2024.101723
Aidan Pine , Erica Cooper , David Guzmán , Eric Joanis , Anna Kazantseva , Ross Krekoski , Roland Kuhn , Samuel Larkin , Patrick Littell , Delaney Lothian , Akwiratékha’ Martin , Korin Richmond , Marc Tessier , Cassia Valentini-Botinhao , Dan Wells , Junichi Yamagishi
{"title":"Speech Generation for Indigenous Language Education","authors":"Aidan Pine ,&nbsp;Erica Cooper ,&nbsp;David Guzmán ,&nbsp;Eric Joanis ,&nbsp;Anna Kazantseva ,&nbsp;Ross Krekoski ,&nbsp;Roland Kuhn ,&nbsp;Samuel Larkin ,&nbsp;Patrick Littell ,&nbsp;Delaney Lothian ,&nbsp;Akwiratékha’ Martin ,&nbsp;Korin Richmond ,&nbsp;Marc Tessier ,&nbsp;Cassia Valentini-Botinhao ,&nbsp;Dan Wells ,&nbsp;Junichi Yamagishi","doi":"10.1016/j.csl.2024.101723","DOIUrl":null,"url":null,"abstract":"<div><div>As the quality of contemporary speech synthesis improves, so too does the interest from language communities in developing text-to-speech (TTS) systems for a variety of real-world applications. Much of the work on TTS has focused on high-resource languages, resulting in implicitly resource-intensive paths to building such systems. The goal of this paper is to provide signposts and points of reference for future low-resource speech synthesis efforts, with insights drawn from the Speech Generation for Indigenous Language Education (SGILE) project. Funded and coordinated by the National Research Council of Canada (NRC), this multi-year, multi-partner project has the goal of producing high-quality text-to-speech systems that support the teaching of Indigenous languages in a variety of educational contexts. We provide background information and motivation for the project, as well as details about our approach and project structure, including results from a multi-day requirements-gathering session. We discuss some of our key challenges, including building models with appropriate controls for educators, improving model data efficiency, and strategies for low-resource transfer learning and evaluation. Finally, we provide a detailed survey of existing speech synthesis software and introduce EveryVoice TTS, a toolkit designed specifically for low-resource speech synthesis.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":null,"pages":null},"PeriodicalIF":3.1000,"publicationDate":"2024-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Speech and Language","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0885230824001062","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

As the quality of contemporary speech synthesis improves, so too does the interest from language communities in developing text-to-speech (TTS) systems for a variety of real-world applications. Much of the work on TTS has focused on high-resource languages, resulting in implicitly resource-intensive paths to building such systems. The goal of this paper is to provide signposts and points of reference for future low-resource speech synthesis efforts, with insights drawn from the Speech Generation for Indigenous Language Education (SGILE) project. Funded and coordinated by the National Research Council of Canada (NRC), this multi-year, multi-partner project has the goal of producing high-quality text-to-speech systems that support the teaching of Indigenous languages in a variety of educational contexts. We provide background information and motivation for the project, as well as details about our approach and project structure, including results from a multi-day requirements-gathering session. We discuss some of our key challenges, including building models with appropriate controls for educators, improving model data efficiency, and strategies for low-resource transfer learning and evaluation. Finally, we provide a detailed survey of existing speech synthesis software and introduce EveryVoice TTS, a toolkit designed specifically for low-resource speech synthesis.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
土著语言教育的语音生成
随着当代语音合成质量的提高,语言社区对开发文本到语音(TTS)系统以用于各种实际应用的兴趣也日益浓厚。有关 TTS 的大部分工作都集中在高资源语言上,这就导致了构建此类系统的隐性资源密集型途径。本文的目标是为未来的低资源语音合成工作提供路标和参考点,并从 "土著语言教育语音生成(SGILE)"项目中获得启示。由加拿大国家研究理事会 (NRC) 资助和协调的这一多年期多伙伴项目的目标是开发高质量的文本到语音系统,以支持各种教育环境下的土著语言教学。我们将提供该项目的背景信息和动机,并详细介绍我们的方法和项目结构,包括为期多日的需求收集会议的结果。我们讨论了我们面临的一些主要挑战,包括为教育工作者建立具有适当控制功能的模型、提高模型数据的效率以及低资源迁移学习和评估策略。最后,我们对现有的语音合成软件进行了详细调查,并介绍了专为低资源语音合成设计的工具包 EveryVoice TTS。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Computer Speech and Language
Computer Speech and Language 工程技术-计算机:人工智能
CiteScore
11.30
自引率
4.70%
发文量
80
审稿时长
22.9 weeks
期刊介绍: Computer Speech & Language publishes reports of original research related to the recognition, understanding, production, coding and mining of speech and language. The speech and language sciences have a long history, but it is only relatively recently that large-scale implementation of and experimentation with complex models of speech and language processing has become feasible. Such research is often carried out somewhat separately by practitioners of artificial intelligence, computer science, electronic engineering, information retrieval, linguistics, phonetics, or psychology.
期刊最新文献
Entity and relationship extraction based on span contribution evaluation and focusing framework Taking relations as known conditions: A tagging based method for relational triple extraction What’s so complex about conversational speech? A comparison of HMM-based and transformer-based ASR architectures Combining replay and LoRA for continual learning in natural language understanding Optimizing pipeline task-oriented dialogue systems using post-processing networks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1