{"title":"Improving Multi-Speaker Tacotron with Speaker Gating Mechanisms","authors":"Wei Zhao, Li Xu, Ting He","doi":"10.23919/CCC50068.2020.9188779","DOIUrl":null,"url":null,"abstract":"In this paper, we present two speaker gating mechanisms for multi-speaker Tacotron, a popular end-to-end text-to- speech (TTS) neural system, to improve the performance of generating multiple voices. With our presented mechanisms, the model can work better in both generalization and accuracy. As a starting point, we introduce the original multi-speaker Tacotron as a baseline model because of its excellent performance and straightforward structure. Employing gated linear units (GLUs), two different speaker gating mechanisms are then proposed for this model. Extensive experiments on VCTK dataset are conducted to demonstrate the validity of our methods. Conclusively, we find that it is promising to incorporate the speaker identity information by using the proposed speaker gating mechanisms.","PeriodicalId":255872,"journal":{"name":"2020 39th Chinese Control Conference (CCC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 39th Chinese Control Conference (CCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/CCC50068.2020.9188779","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
In this paper, we present two speaker gating mechanisms for multi-speaker Tacotron, a popular end-to-end text-to- speech (TTS) neural system, to improve the performance of generating multiple voices. With our presented mechanisms, the model can work better in both generalization and accuracy. As a starting point, we introduce the original multi-speaker Tacotron as a baseline model because of its excellent performance and straightforward structure. Employing gated linear units (GLUs), two different speaker gating mechanisms are then proposed for this model. Extensive experiments on VCTK dataset are conducted to demonstrate the validity of our methods. Conclusively, we find that it is promising to incorporate the speaker identity information by using the proposed speaker gating mechanisms.