{"title":"Better Replacement for TTS Naturalness Evaluation","authors":"S. Shirali-Shahreza, Gerald Penn","doi":"10.21437/ssw.2023-31","DOIUrl":null,"url":null,"abstract":"Text-To-Speech (TTS) systems are commonly evaluated along two main dimensions: intelligibility and naturalness. While there are clear proxies for intelligibility measurements such as transcription Word-Error-Rate (WER), naturalness is not nearly so well defined. In this paper, we present the results of our attempt to learn what aspects human listeners consider when they are asked to evaluate the “naturalness” of TTS systems. We conducted a user study similar to common TTS evaluations and at the end asked the subject to define the sense of naturalness that they had used. Then we coded their answers and statistically analysed the distribution of codes to create a list of aspects that users consider as part of naturalness. We can now provide a list of suggested replacement questions to use instead of a single oblique notion of naturalness.","PeriodicalId":346639,"journal":{"name":"12th ISCA Speech Synthesis Workshop (SSW2023)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"12th ISCA Speech Synthesis Workshop (SSW2023)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21437/ssw.2023-31","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Text-To-Speech (TTS) systems are commonly evaluated along two main dimensions: intelligibility and naturalness. While there are clear proxies for intelligibility measurements such as transcription Word-Error-Rate (WER), naturalness is not nearly so well defined. In this paper, we present the results of our attempt to learn what aspects human listeners consider when they are asked to evaluate the “naturalness” of TTS systems. We conducted a user study similar to common TTS evaluations and at the end asked the subject to define the sense of naturalness that they had used. Then we coded their answers and statistically analysed the distribution of codes to create a list of aspects that users consider as part of naturalness. We can now provide a list of suggested replacement questions to use instead of a single oblique notion of naturalness.