{"title":"基于话语模式的讲故事语音合成停顿分析与建模","authors":"Parakrant Sarkar, K. S. Rao","doi":"10.1109/IC3.2015.7346683","DOIUrl":null,"url":null,"abstract":"Generally in Text-to-Speech synthesis (TTS) systems, pause prediction plays a vital role in synthesizing natural and expressive speech. In storytelling style, pauses introduce suspense and climax by emphasizing the prominent words or emotion-salient words in a story. The objective of this work is to analyze and model the pause pattern to capture the story-semantic information. The purpose of this paper is to define a stepping stone towards developing a Story TTS based on modes of discourse. In this work, we base our analysis of the pauses in Hindi children stories for each mode of discourse: narrative, descriptive and dialogue. After grouping the sentences into modes, we analyse the pause pattern to capture the story-semantic information. A three stage data-driven method is proposed to predict the location and duration of pauses for each mode. Both the objective as well as subjective test are conducted to evaluate the performance of the proposed method. The subjective evaluation indicates that subjects appreciates the quality of synthesized speech by incorporating the proposed model.","PeriodicalId":217950,"journal":{"name":"2015 Eighth International Conference on Contemporary Computing (IC3)","volume":"1935 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Analysis and modeling pauses for synthesis of storytelling speech based on discourse modes\",\"authors\":\"Parakrant Sarkar, K. S. Rao\",\"doi\":\"10.1109/IC3.2015.7346683\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Generally in Text-to-Speech synthesis (TTS) systems, pause prediction plays a vital role in synthesizing natural and expressive speech. In storytelling style, pauses introduce suspense and climax by emphasizing the prominent words or emotion-salient words in a story. The objective of this work is to analyze and model the pause pattern to capture the story-semantic information. The purpose of this paper is to define a stepping stone towards developing a Story TTS based on modes of discourse. In this work, we base our analysis of the pauses in Hindi children stories for each mode of discourse: narrative, descriptive and dialogue. After grouping the sentences into modes, we analyse the pause pattern to capture the story-semantic information. A three stage data-driven method is proposed to predict the location and duration of pauses for each mode. Both the objective as well as subjective test are conducted to evaluate the performance of the proposed method. The subjective evaluation indicates that subjects appreciates the quality of synthesized speech by incorporating the proposed model.\",\"PeriodicalId\":217950,\"journal\":{\"name\":\"2015 Eighth International Conference on Contemporary Computing (IC3)\",\"volume\":\"1935 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-08-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 Eighth International Conference on Contemporary Computing (IC3)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IC3.2015.7346683\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 Eighth International Conference on Contemporary Computing (IC3)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IC3.2015.7346683","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Analysis and modeling pauses for synthesis of storytelling speech based on discourse modes
Generally in Text-to-Speech synthesis (TTS) systems, pause prediction plays a vital role in synthesizing natural and expressive speech. In storytelling style, pauses introduce suspense and climax by emphasizing the prominent words or emotion-salient words in a story. The objective of this work is to analyze and model the pause pattern to capture the story-semantic information. The purpose of this paper is to define a stepping stone towards developing a Story TTS based on modes of discourse. In this work, we base our analysis of the pauses in Hindi children stories for each mode of discourse: narrative, descriptive and dialogue. After grouping the sentences into modes, we analyse the pause pattern to capture the story-semantic information. A three stage data-driven method is proposed to predict the location and duration of pauses for each mode. Both the objective as well as subjective test are conducted to evaluate the performance of the proposed method. The subjective evaluation indicates that subjects appreciates the quality of synthesized speech by incorporating the proposed model.