{"title":"Effective sentence selection based on phone/model coverage maximization for speaker adaptation in HMM-based speech synthesis","authors":"C. Lin, Po Kai Huang, Chengyuan Lin, C. Kuo","doi":"10.1109/ISCSLP.2012.6423469","DOIUrl":null,"url":null,"abstract":"Reducing the recording effort required in practical speaker adaptive text-to-speech applications would be very useful. In this paper, we present two sentence selection approaches based on a greedy algorithm; one is based on phone coverage and the other is based on model coverage. The former considers the phonetic information in speaker adaptation data, while the latter focuses on occurrences of Mel-cepstral and logF0 models in decision trees of the average voice model. To verify the efficacy of the proposed methods, we compare their performance with that of a random selection method in objective and subjective evaluations. The objective and subjective evaluation results demonstrate that both methods outperform the random selection method.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 8th International Symposium on Chinese Spoken Language Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCSLP.2012.6423469","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Reducing the recording effort required in practical speaker adaptive text-to-speech applications would be very useful. In this paper, we present two sentence selection approaches based on a greedy algorithm; one is based on phone coverage and the other is based on model coverage. The former considers the phonetic information in speaker adaptation data, while the latter focuses on occurrences of Mel-cepstral and logF0 models in decision trees of the average voice model. To verify the efficacy of the proposed methods, we compare their performance with that of a random selection method in objective and subjective evaluations. The objective and subjective evaluation results demonstrate that both methods outperform the random selection method.