A large-scale comparison of two voice synthesis techniques on intelligibility, naturalness, preferences, and attitudes toward voices banked by individuals with amyotrophic lateral sclerosis.
Jolene Hyppa-Martin, Jason Lilley, Mo Chen, Jaclyn Friese, Corinne Schmidt, H Timothy Bunnell
{"title":"A large-scale comparison of two voice synthesis techniques on intelligibility, naturalness, preferences, and attitudes toward voices banked by individuals with amyotrophic lateral sclerosis.","authors":"Jolene Hyppa-Martin, Jason Lilley, Mo Chen, Jaclyn Friese, Corinne Schmidt, H Timothy Bunnell","doi":"10.1080/07434618.2023.2262032","DOIUrl":null,"url":null,"abstract":"<p><p>Amyotrophic lateral sclerosis (ALS) commonly results in the inability to produce natural speech, making speech-generating devices (SGDs) important. Historically, synthetic voices generated by SGDs were neither unique, nor age- or dialect-appropriate, which depersonalized SGD use. Voices generated by SGDs can now be customized via voice banking and should ideally sound uniquely like the individual's natural speech, be intelligible, and elicit positive reactions from communication partners. This large-scale 2 x 2 mixed between- and within-participants design examined perceptions of 831 adult listeners regarding custom synthetic voices created for two individuals diagnosed with ALS via two synthesis systems in common clinical use (waveform concatenation and statistical parametric synthesis). The study explored relationships among synthesis system, dysarthria severity, synthetic speech intelligibility, naturalness, and preferences, and also provided a preliminary examination of attitudes regarding the custom synthetic voices. Synthetic voices generated via statistical parametric synthesis trained on deep neural networks were more intelligible, natural, and preferred than voices produced via waveform concatenation, and were associated with more positive attitudes. The custom synthetic voice created from moderately dysarthric speech was more intelligible than the voice created from mildly dysarthric speech. Clinical implications and factors that may have contributed to the relative intelligibilities are discussed.</p>","PeriodicalId":49234,"journal":{"name":"Augmentative and Alternative Communication","volume":" ","pages":"31-45"},"PeriodicalIF":2.1000,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Augmentative and Alternative Communication","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1080/07434618.2023.2262032","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/10/4 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"AUDIOLOGY & SPEECH-LANGUAGE PATHOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Amyotrophic lateral sclerosis (ALS) commonly results in the inability to produce natural speech, making speech-generating devices (SGDs) important. Historically, synthetic voices generated by SGDs were neither unique, nor age- or dialect-appropriate, which depersonalized SGD use. Voices generated by SGDs can now be customized via voice banking and should ideally sound uniquely like the individual's natural speech, be intelligible, and elicit positive reactions from communication partners. This large-scale 2 x 2 mixed between- and within-participants design examined perceptions of 831 adult listeners regarding custom synthetic voices created for two individuals diagnosed with ALS via two synthesis systems in common clinical use (waveform concatenation and statistical parametric synthesis). The study explored relationships among synthesis system, dysarthria severity, synthetic speech intelligibility, naturalness, and preferences, and also provided a preliminary examination of attitudes regarding the custom synthetic voices. Synthetic voices generated via statistical parametric synthesis trained on deep neural networks were more intelligible, natural, and preferred than voices produced via waveform concatenation, and were associated with more positive attitudes. The custom synthetic voice created from moderately dysarthric speech was more intelligible than the voice created from mildly dysarthric speech. Clinical implications and factors that may have contributed to the relative intelligibilities are discussed.
期刊介绍:
As the official journal of the International Society for Augmentative and Alternative Communication (ISAAC), Augmentative and Alternative Communication (AAC) publishes scientific articles related to the field of augmentative and alternative communication (AAC) that report research concerning assessment, treatment, rehabilitation, and education of people who use or have the potential to use AAC systems; or that discuss theory, technology, and systems development relevant to AAC. The broad range of topic included in the Journal reflects the development of this field internationally. Manuscripts submitted to AAC should fall within one of the following categories, AND MUST COMPLY with associated page maximums listed on page 3 of the Manuscript Preparation Guide.
Research articles (full peer review), These manuscripts report the results of original empirical research, including studies using qualitative and quantitative methodologies, with both group and single-case experimental research designs (e.g, Binger et al., 2008; Petroi et al., 2014).
Technical, research, and intervention notes (full peer review): These are brief manuscripts that address methodological, statistical, technical, or clinical issues or innovations that are of relevance to the AAC community and are designed to bring the research community’s attention to areas that have been minimally or poorly researched in the past (e.g., research note: Thunberg et al., 2016; intervention notes: Laubscher et al., 2019).