End-to-End Speech Synthesis for Bangla with Text Normalization

T. Pial, Shahreen Salim Aunti, Shabbir Ahmed, Hasnain Heickal
{"title":"End-to-End Speech Synthesis for Bangla with Text Normalization","authors":"T. Pial, Shahreen Salim Aunti, Shabbir Ahmed, Hasnain Heickal","doi":"10.1109/CSII.2018.00019","DOIUrl":null,"url":null,"abstract":"Text to speech synthesis is a well-researched area, yet no system has been developed which can claim to be as convincing as a human voice. An end-to-end system in the context of speech synthesis denotes a system capable of synthesizing speech from text using training data as minimal as transcribed audio data without any language-specific knowledge and phoneme dictionaries. But an end-to-end system should also have the capability to integrate any language-specific rules to improve its performance. In this paper, we propose an end-to-end speech synthesis system for Bangla (also known as Bengali) which uses a minimal front end and a neural network as its statistical parametric model. We also propose a Text Normalization Procedure(TNP) for Bangla and incorporate it to the end-to-end system. We have conducted extensive experiments using different models. From the feedback from the participants of the experiment, we have found out that they felt more positively towards the system if TNP is incorporated. A Wilcoxon signed-rank test was conducted to validate the results of the experiment and the probability of the results being like this because of experimental errors rather than TNP was calculated to be less than 5%.","PeriodicalId":202365,"journal":{"name":"2018 5th International Conference on Computational Science/ Intelligence and Applied Informatics (CSII)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 5th International Conference on Computational Science/ Intelligence and Applied Informatics (CSII)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CSII.2018.00019","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Text to speech synthesis is a well-researched area, yet no system has been developed which can claim to be as convincing as a human voice. An end-to-end system in the context of speech synthesis denotes a system capable of synthesizing speech from text using training data as minimal as transcribed audio data without any language-specific knowledge and phoneme dictionaries. But an end-to-end system should also have the capability to integrate any language-specific rules to improve its performance. In this paper, we propose an end-to-end speech synthesis system for Bangla (also known as Bengali) which uses a minimal front end and a neural network as its statistical parametric model. We also propose a Text Normalization Procedure(TNP) for Bangla and incorporate it to the end-to-end system. We have conducted extensive experiments using different models. From the feedback from the participants of the experiment, we have found out that they felt more positively towards the system if TNP is incorporated. A Wilcoxon signed-rank test was conducted to validate the results of the experiment and the probability of the results being like this because of experimental errors rather than TNP was calculated to be less than 5%.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于文本规范化的孟加拉语端到端语音合成
文本到语音的合成是一个研究得很好的领域,但目前还没有开发出能够像人类声音一样令人信服的系统。语音合成上下文中的端到端系统是指能够使用最小的训练数据(如转录音频数据)从文本合成语音的系统,而无需任何语言特定知识和音素字典。但是端到端系统还应该有能力集成任何特定于语言的规则,以提高其性能。在本文中,我们提出了一个端到端的孟加拉语语音合成系统,该系统使用最小的前端和神经网络作为其统计参数模型。我们还为孟加拉语提出了一个文本规范化过程(TNP),并将其纳入端到端系统。我们用不同的模型进行了大量的实验。从实验参与者的反馈中,我们发现,如果纳入TNP,他们对系统的感觉会更积极。对实验结果进行了Wilcoxon符号秩检验,计算出由于实验误差而非TNP导致结果出现这种情况的概率小于5%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Measurement of Line-of-Sight Detection Using Pixel Quantity Variation and Application for Autism A Data Migration Scheme Considering Node Reliability for an Autonomous Distributed Storage System Shape Recovery Using Improved Fast Marching Method for SEM Image Publisher's Information Personal KANSEI Coordinating System for Room Interior Design
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1