Aidan Pine , Erica Cooper , David Guzmán , Eric Joanis , Anna Kazantseva , Ross Krekoski , Roland Kuhn , Samuel Larkin , Patrick Littell , Delaney Lothian , Akwiratékha’ Martin , Korin Richmond , Marc Tessier , Cassia Valentini-Botinhao , Dan Wells , Junichi Yamagishi
{"title":"Speech Generation for Indigenous Language Education","authors":"Aidan Pine , Erica Cooper , David Guzmán , Eric Joanis , Anna Kazantseva , Ross Krekoski , Roland Kuhn , Samuel Larkin , Patrick Littell , Delaney Lothian , Akwiratékha’ Martin , Korin Richmond , Marc Tessier , Cassia Valentini-Botinhao , Dan Wells , Junichi Yamagishi","doi":"10.1016/j.csl.2024.101723","DOIUrl":null,"url":null,"abstract":"<div><div>As the quality of contemporary speech synthesis improves, so too does the interest from language communities in developing text-to-speech (TTS) systems for a variety of real-world applications. Much of the work on TTS has focused on high-resource languages, resulting in implicitly resource-intensive paths to building such systems. The goal of this paper is to provide signposts and points of reference for future low-resource speech synthesis efforts, with insights drawn from the Speech Generation for Indigenous Language Education (SGILE) project. Funded and coordinated by the National Research Council of Canada (NRC), this multi-year, multi-partner project has the goal of producing high-quality text-to-speech systems that support the teaching of Indigenous languages in a variety of educational contexts. We provide background information and motivation for the project, as well as details about our approach and project structure, including results from a multi-day requirements-gathering session. We discuss some of our key challenges, including building models with appropriate controls for educators, improving model data efficiency, and strategies for low-resource transfer learning and evaluation. Finally, we provide a detailed survey of existing speech synthesis software and introduce EveryVoice TTS, a toolkit designed specifically for low-resource speech synthesis.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":null,"pages":null},"PeriodicalIF":3.1000,"publicationDate":"2024-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Speech and Language","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0885230824001062","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
As the quality of contemporary speech synthesis improves, so too does the interest from language communities in developing text-to-speech (TTS) systems for a variety of real-world applications. Much of the work on TTS has focused on high-resource languages, resulting in implicitly resource-intensive paths to building such systems. The goal of this paper is to provide signposts and points of reference for future low-resource speech synthesis efforts, with insights drawn from the Speech Generation for Indigenous Language Education (SGILE) project. Funded and coordinated by the National Research Council of Canada (NRC), this multi-year, multi-partner project has the goal of producing high-quality text-to-speech systems that support the teaching of Indigenous languages in a variety of educational contexts. We provide background information and motivation for the project, as well as details about our approach and project structure, including results from a multi-day requirements-gathering session. We discuss some of our key challenges, including building models with appropriate controls for educators, improving model data efficiency, and strategies for low-resource transfer learning and evaluation. Finally, we provide a detailed survey of existing speech synthesis software and introduce EveryVoice TTS, a toolkit designed specifically for low-resource speech synthesis.
期刊介绍:
Computer Speech & Language publishes reports of original research related to the recognition, understanding, production, coding and mining of speech and language.
The speech and language sciences have a long history, but it is only relatively recently that large-scale implementation of and experimentation with complex models of speech and language processing has become feasible. Such research is often carried out somewhat separately by practitioners of artificial intelligence, computer science, electronic engineering, information retrieval, linguistics, phonetics, or psychology.