Dong-Yan Huang, Lei Xie, Yvonne Siu Wa Lee, Jie Wu, Huaiping Ming, Xiaohai Tian, Shaofei Zhang, Chuang Ding, Mei Li, Nguyen Quy Hy, M. Dong, Haizhou Li
{"title":"An Automatic Voice Conversion Evaluation Strategy Based on Perceptual Background Noise Distortion and Speaker Similarity","authors":"Dong-Yan Huang, Lei Xie, Yvonne Siu Wa Lee, Jie Wu, Huaiping Ming, Xiaohai Tian, Shaofei Zhang, Chuang Ding, Mei Li, Nguyen Quy Hy, M. Dong, Haizhou Li","doi":"10.21437/SSW.2016-8","DOIUrl":null,"url":null,"abstract":"Voice conversion aims to modify the characteristics of one speaker to make it sound like spoken by another speaker without changing the language content. This task has attracted con-siderable attention and various approaches have been proposed since two decades ago. The evaluation of voice conversion approaches, usually through time-intensive subject listening tests, requires a huge amount of human labor. This paper proposes an automatic voice conversion evaluation strategy based on perceptual background noise distortion and speaker similarity. Ex-perimental results show that our automatic evaluation results match the subjective listening results quite well. We further use our strategy to select best converted samples from multiple voice conversion systems and our submission achieves promising results in the voice conversion challenge (VCC2016).","PeriodicalId":340820,"journal":{"name":"Speech Synthesis Workshop","volume":"94 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Speech Synthesis Workshop","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21437/SSW.2016-8","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8
Abstract
Voice conversion aims to modify the characteristics of one speaker to make it sound like spoken by another speaker without changing the language content. This task has attracted con-siderable attention and various approaches have been proposed since two decades ago. The evaluation of voice conversion approaches, usually through time-intensive subject listening tests, requires a huge amount of human labor. This paper proposes an automatic voice conversion evaluation strategy based on perceptual background noise distortion and speaker similarity. Ex-perimental results show that our automatic evaluation results match the subjective listening results quite well. We further use our strategy to select best converted samples from multiple voice conversion systems and our submission achieves promising results in the voice conversion challenge (VCC2016).