{"title":"Incorporating dynamic features into minimum generation error training for HMM-based speech synthesis","authors":"Duy Khanh Ninh, M. Morise, Y. Yamashita","doi":"10.1109/ISCSLP.2012.6423486","DOIUrl":null,"url":null,"abstract":"This paper describes new methods of minimum generation error (MGE) training in HMM-based speech synthesis by introducing the error component of dynamic features into the generation error function. We propose two methods for setting the weight associated with the additional error component. In fixed weighting approach, this weight is kept constant over the course of speech. In adaptive weighting approach, it is adjusted according to the degree of dynamic of speech segments. Objective evaluation shows that the newly derived MGE criterion with adaptive weighting method obtains comparable performance on static feature and better performance on delta feature compared to the baseline MGE criterion. Subjective evaluation exhibits an improvement in the quality of synthesized speech with the proposed technique. The newly derived criterion improves the capability of the HMMs in capturing dynamic properties of speech without increasing the computational complexity of training process compared to the baseline criterion.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 8th International Symposium on Chinese Spoken Language Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCSLP.2012.6423486","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
This paper describes new methods of minimum generation error (MGE) training in HMM-based speech synthesis by introducing the error component of dynamic features into the generation error function. We propose two methods for setting the weight associated with the additional error component. In fixed weighting approach, this weight is kept constant over the course of speech. In adaptive weighting approach, it is adjusted according to the degree of dynamic of speech segments. Objective evaluation shows that the newly derived MGE criterion with adaptive weighting method obtains comparable performance on static feature and better performance on delta feature compared to the baseline MGE criterion. Subjective evaluation exhibits an improvement in the quality of synthesized speech with the proposed technique. The newly derived criterion improves the capability of the HMMs in capturing dynamic properties of speech without increasing the computational complexity of training process compared to the baseline criterion.