Ashwin Bellur, K. Narayan, K. Raghava Krishnan, H. Murthy
{"title":"Prosody modeling for syllable-based concatenative speech synthesis of Hindi and Tamil","authors":"Ashwin Bellur, K. Narayan, K. Raghava Krishnan, H. Murthy","doi":"10.1109/NCC.2011.5734737","DOIUrl":null,"url":null,"abstract":"This paper describes ways to improve prosody modeling in syllable-based concatenative speech synthesis systems for two Indian languages, namely Hindi and Tamil, within the unit selection paradigm. The syllable is a larger unit than the diphone and contains most of the coarticulation information. Although syllable-based synthesis is quite intelligible compared to diphone based systems, naturalness especially in terms of prosody, requires additional processing. Since the synthesizer is built using a cluster unit framework, a hybrid approach, where a combination of both rule based and statistical models are proposed to model prosody of syllable like units better. It is further observed that prediction of phrase boundaries is crucial, particularly because Indian languages are replete with polysyllabic words. CART based phrase modeling for Hindi and Tamil are discussed. Perceptual experiments show a significant improvement in the MOS for both Hindi and Tamil synthesizers. Index Terms: speech synthesis, unit selection, cluster unit synthesis, phrase boundaries","PeriodicalId":158295,"journal":{"name":"2011 National Conference on Communications (NCC)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"39","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 National Conference on Communications (NCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NCC.2011.5734737","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 39
Abstract
This paper describes ways to improve prosody modeling in syllable-based concatenative speech synthesis systems for two Indian languages, namely Hindi and Tamil, within the unit selection paradigm. The syllable is a larger unit than the diphone and contains most of the coarticulation information. Although syllable-based synthesis is quite intelligible compared to diphone based systems, naturalness especially in terms of prosody, requires additional processing. Since the synthesizer is built using a cluster unit framework, a hybrid approach, where a combination of both rule based and statistical models are proposed to model prosody of syllable like units better. It is further observed that prediction of phrase boundaries is crucial, particularly because Indian languages are replete with polysyllabic words. CART based phrase modeling for Hindi and Tamil are discussed. Perceptual experiments show a significant improvement in the MOS for both Hindi and Tamil synthesizers. Index Terms: speech synthesis, unit selection, cluster unit synthesis, phrase boundaries