{"title":"Generating emotional speech from neutral speech","authors":"Ling Cen, P. Chan, M. Dong, Haizhou Li","doi":"10.1109/ISCSLP.2010.5684862","DOIUrl":null,"url":null,"abstract":"Emotional speech is one of the key techniques towards a natural and realistic conversation between human and machines. Generating emotional speech by means of converting a neutral speech is desirable as this allows us to generate emotional speech from many existing text-to-speech systems. The GMM based method is capable of synthesizing the desired spectrum, while the rule-based algorithm is effective in implementing the targeted prosodic features. Note that spectral and prosodic features are key factors that project the emotional effects of speech, in this paper, we propose the synthesis of emotional speech by applying a two-stage transformation that combines the GMM and RB methods. We synthesize happy, angry and sad speech and compare the proposed method with GMM linear transformation and RB transformation respectively. The listening test has shown that the speech synthesized by the proposed method is perceived to best portray the targeted speech emotion.","PeriodicalId":226730,"journal":{"name":"2010 7th International Symposium on Chinese Spoken Language Processing","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 7th International Symposium on Chinese Spoken Language Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCSLP.2010.5684862","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10
Abstract
Emotional speech is one of the key techniques towards a natural and realistic conversation between human and machines. Generating emotional speech by means of converting a neutral speech is desirable as this allows us to generate emotional speech from many existing text-to-speech systems. The GMM based method is capable of synthesizing the desired spectrum, while the rule-based algorithm is effective in implementing the targeted prosodic features. Note that spectral and prosodic features are key factors that project the emotional effects of speech, in this paper, we propose the synthesis of emotional speech by applying a two-stage transformation that combines the GMM and RB methods. We synthesize happy, angry and sad speech and compare the proposed method with GMM linear transformation and RB transformation respectively. The listening test has shown that the speech synthesized by the proposed method is perceived to best portray the targeted speech emotion.