Chen-Yu Chiang, Yu-Ping Hung, Sin-Horng Chen, Yih-Ru Wang
{"title":"A New Model-Based Prosody Coder for Mandarin Speech","authors":"Chen-Yu Chiang, Yu-Ping Hung, Sin-Horng Chen, Yih-Ru Wang","doi":"10.1109/IIH-MSP.2013.24","DOIUrl":null,"url":null,"abstract":"In this paper, a novel parametric prosody coding approach for Mandarin speech is proposed. It employs a hierarchical prosodic model (HPM) as a prosody generating model in the encoder to analyze the speech prosody of the input utterance to obtain a parametric representation of four prosodic-acoustic features of syllable pitch contour, syllable duration, syllable energy level, and syllable-juncture pause duration for encoding. In the decoder, the four prosodic-acoustic features are reconstructed by a synthesis operation using the decoded HPM parameters. The reconstructed prosodic features are lastly used in an HMM-based speech synthesizer to help to generate the reconstructed speech. Experimental results show that the reconstructed speech has good quality at low data rates of 114.9 bits/s for a speaker-dependent task. An informal listening test confirmed decoded speeches sounded very fluently.","PeriodicalId":105427,"journal":{"name":"2013 Ninth International Conference on Intelligent Information Hiding and Multimedia Signal Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 Ninth International Conference on Intelligent Information Hiding and Multimedia Signal Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IIH-MSP.2013.24","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In this paper, a novel parametric prosody coding approach for Mandarin speech is proposed. It employs a hierarchical prosodic model (HPM) as a prosody generating model in the encoder to analyze the speech prosody of the input utterance to obtain a parametric representation of four prosodic-acoustic features of syllable pitch contour, syllable duration, syllable energy level, and syllable-juncture pause duration for encoding. In the decoder, the four prosodic-acoustic features are reconstructed by a synthesis operation using the decoded HPM parameters. The reconstructed prosodic features are lastly used in an HMM-based speech synthesizer to help to generate the reconstructed speech. Experimental results show that the reconstructed speech has good quality at low data rates of 114.9 bits/s for a speaker-dependent task. An informal listening test confirmed decoded speeches sounded very fluently.