Speech Generation for Indigenous Language Education

IF 3.4 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Computer Speech and Language Pub Date : 2024-09-28 DOI:10.1016/j.csl.2024.101723

Aidan Pine , Erica Cooper , David Guzmán , Eric Joanis , Anna Kazantseva , Ross Krekoski , Roland Kuhn , Samuel Larkin , Patrick Littell , Delaney Lothian , Akwiratékha’ Martin , Korin Richmond , Marc Tessier , Cassia Valentini-Botinhao , Dan Wells , Junichi Yamagishi

{"title":"Speech Generation for Indigenous Language Education","authors":"Aidan Pine , Erica Cooper , David Guzmán , Eric Joanis , Anna Kazantseva , Ross Krekoski , Roland Kuhn , Samuel Larkin , Patrick Littell , Delaney Lothian , Akwiratékha’ Martin , Korin Richmond , Marc Tessier , Cassia Valentini-Botinhao , Dan Wells , Junichi Yamagishi","doi":"10.1016/j.csl.2024.101723","DOIUrl":null,"url":null,"abstract":"<div><div>As the quality of contemporary speech synthesis improves, so too does the interest from language communities in developing text-to-speech (TTS) systems for a variety of real-world applications. Much of the work on TTS has focused on high-resource languages, resulting in implicitly resource-intensive paths to building such systems. The goal of this paper is to provide signposts and points of reference for future low-resource speech synthesis efforts, with insights drawn from the Speech Generation for Indigenous Language Education (SGILE) project. Funded and coordinated by the National Research Council of Canada (NRC), this multi-year, multi-partner project has the goal of producing high-quality text-to-speech systems that support the teaching of Indigenous languages in a variety of educational contexts. We provide background information and motivation for the project, as well as details about our approach and project structure, including results from a multi-day requirements-gathering session. We discuss some of our key challenges, including building models with appropriate controls for educators, improving model data efficiency, and strategies for low-resource transfer learning and evaluation. Finally, we provide a detailed survey of existing speech synthesis software and introduce EveryVoice TTS, a toolkit designed specifically for low-resource speech synthesis.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"90 ","pages":"Article 101723"},"PeriodicalIF":3.4000,"publicationDate":"2024-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Speech and Language","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0885230824001062","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

As the quality of contemporary speech synthesis improves, so too does the interest from language communities in developing text-to-speech (TTS) systems for a variety of real-world applications. Much of the work on TTS has focused on high-resource languages, resulting in implicitly resource-intensive paths to building such systems. The goal of this paper is to provide signposts and points of reference for future low-resource speech synthesis efforts, with insights drawn from the Speech Generation for Indigenous Language Education (SGILE) project. Funded and coordinated by the National Research Council of Canada (NRC), this multi-year, multi-partner project has the goal of producing high-quality text-to-speech systems that support the teaching of Indigenous languages in a variety of educational contexts. We provide background information and motivation for the project, as well as details about our approach and project structure, including results from a multi-day requirements-gathering session. We discuss some of our key challenges, including building models with appropriate controls for educators, improving model data efficiency, and strategies for low-resource transfer learning and evaluation. Finally, we provide a detailed survey of existing speech synthesis software and introduce EveryVoice TTS, a toolkit designed specifically for low-resource speech synthesis.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

土著语言教育的语音生成

随着当代语音合成质量的提高，语言社区对开发文本到语音（TTS）系统以用于各种实际应用的兴趣也日益浓厚。有关 TTS 的大部分工作都集中在高资源语言上，这就导致了构建此类系统的隐性资源密集型途径。本文的目标是为未来的低资源语音合成工作提供路标和参考点，并从 "土著语言教育语音生成（SGILE）"项目中获得启示。由加拿大国家研究理事会 (NRC) 资助和协调的这一多年期多伙伴项目的目标是开发高质量的文本到语音系统，以支持各种教育环境下的土著语言教学。我们将提供该项目的背景信息和动机，并详细介绍我们的方法和项目结构，包括为期多日的需求收集会议的结果。我们讨论了我们面临的一些主要挑战，包括为教育工作者建立具有适当控制功能的模型、提高模型数据的效率以及低资源迁移学习和评估策略。最后，我们对现有的语音合成软件进行了详细调查，并介绍了专为低资源语音合成设计的工具包 EveryVoice TTS。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Computer Speech and Language 工程技术-计算机：人工智能

CiteScore

11.30

自引率

4.70%

发文量

审稿时长

22.9 weeks

期刊介绍： Computer Speech & Language publishes reports of original research related to the recognition, understanding, production, coding and mining of speech and language. The speech and language sciences have a long history, but it is only relatively recently that large-scale implementation of and experimentation with complex models of speech and language processing has become feasible. Such research is often carried out somewhat separately by practitioners of artificial intelligence, computer science, electronic engineering, information retrieval, linguistics, phonetics, or psychology.

期刊最新文献

Mispronunciation detection and diagnosis based on large language models Pitch-Aware multi-feature fusion for classifying statements, questions, and exclamations in low-resource languages One-class neural network with hybrid pooling on dual-band frequency for spoofing speech detection Decoding phone pairs from MEG signals across speech modalities Editorial Board