{"title":"Submission from SRCB for Voice Conversion Challenge 2020","authors":"Qiuyue Ma, Ruolan Liu, Xue Wen, Chunhui Lu, Xiao Chen","doi":"10.21437/vcc_bc.2020-18","DOIUrl":null,"url":null,"abstract":"This paper presents the intra-lingual and cross-lingual voice conversion system for Voice Conversion Challenge 2020(VCC 2020). Voice conversion (VC) modifies a source speaker’s speech so that the result sounds like a target speaker. This becomes particularly difficult when source and target speakers speak different languages. In this work we focus on building a voice conversion system achieving consistent improvements in accent and intelligibility evaluations. Our voice conversion system is constituted by a bilingual phoneme recognition based speech representation module, a neural network based speech generation module and a neural vocoder. More concretely, we extract general phonation from the source speakers' speeches of different languages, and improve the sound quality by optimizing the speech synthesis module and adding a noise suppression post-process module to the vocoder. This framework ensures high intelligible and high natural speech, which is very close to human quality (MOS=4.17 rank 2 in Task 1, MOS=4.13 rank 2 in Task 2).","PeriodicalId":355114,"journal":{"name":"Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21437/vcc_bc.2020-18","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
Abstract
This paper presents the intra-lingual and cross-lingual voice conversion system for Voice Conversion Challenge 2020(VCC 2020). Voice conversion (VC) modifies a source speaker’s speech so that the result sounds like a target speaker. This becomes particularly difficult when source and target speakers speak different languages. In this work we focus on building a voice conversion system achieving consistent improvements in accent and intelligibility evaluations. Our voice conversion system is constituted by a bilingual phoneme recognition based speech representation module, a neural network based speech generation module and a neural vocoder. More concretely, we extract general phonation from the source speakers' speeches of different languages, and improve the sound quality by optimizing the speech synthesis module and adding a noise suppression post-process module to the vocoder. This framework ensures high intelligible and high natural speech, which is very close to human quality (MOS=4.17 rank 2 in Task 1, MOS=4.13 rank 2 in Task 2).