Tao Wang, J. Tao, Ruibo Fu, Zhengqi Wen, Chunyu Qiang
{"title":"The NLPR Speech Synthesis entry for Blizzard Challenge 2020","authors":"Tao Wang, J. Tao, Ruibo Fu, Zhengqi Wen, Chunyu Qiang","doi":"10.21437/vcc_bc.2020-12","DOIUrl":null,"url":null,"abstract":"The paper describes the NLPR speech synthesis system entry for Blizzard Challenge 2020. More than 9 hours of speech data from an news anchor and 3 hours of speech from one native Shanghainese speaker are adopted as training data for building system this year. Our speech synthesis system is built based on the multi-speaker end-to-end speech synthesis system. LPCNet based neural vocoder is adapted to improve the quality. Different from our previous system, some improvements about data pruning and speaker adaptation strategies were made to improve the stability of our system. In this paper, the whole system structure, data pruning method, and the duration control will be in-troduced and discussed. In addition, this competition includes two tasks of Mandarin and Shanghainese, and we will intro-duce the important parts of each topic respectively. Finally, the results of listening test are presented.","PeriodicalId":355114,"journal":{"name":"Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21437/vcc_bc.2020-12","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
The paper describes the NLPR speech synthesis system entry for Blizzard Challenge 2020. More than 9 hours of speech data from an news anchor and 3 hours of speech from one native Shanghainese speaker are adopted as training data for building system this year. Our speech synthesis system is built based on the multi-speaker end-to-end speech synthesis system. LPCNet based neural vocoder is adapted to improve the quality. Different from our previous system, some improvements about data pruning and speaker adaptation strategies were made to improve the stability of our system. In this paper, the whole system structure, data pruning method, and the duration control will be in-troduced and discussed. In addition, this competition includes two tasks of Mandarin and Shanghainese, and we will intro-duce the important parts of each topic respectively. Finally, the results of listening test are presented.