基于单元选择和hmm的泰米尔语语音合成系统的开发与评价

Ramani Boothalingam, V. Sherlin Solomi, A. R. Gladston, S. Christina, P. Vijayalakshmi, N. Thangavelu, H. Murthy
{"title":"基于单元选择和hmm的泰米尔语语音合成系统的开发与评价","authors":"Ramani Boothalingam, V. Sherlin Solomi, A. R. Gladston, S. Christina, P. Vijayalakshmi, N. Thangavelu, H. Murthy","doi":"10.1109/NCC.2013.6487984","DOIUrl":null,"url":null,"abstract":"An unrestricted text-to-speech system is expected to produce a speech signal, corresponding to the given text in a language, that is highly intelligible to a human listener. Presently, unit selection-based synthesis (USS) and statistical parametric synthesis techniques are the state-of-art techniques for this task. Earlier, in [3], a concatenative synthesizer was developed for the language, Tamil, using 12 hrs of speech data, and shown that syllable is the better subword unit. The current work focuses on building FestVox voices using phoneme/CV unit as the subword unit, for a reduced amount of speech data (5 hrs) and to compare their performances in terms of quality. Further, the focus is to compare the performance of this synthesizer with that of the well known HMM-based speech synthesizer. Among the phoneme and CV-based systems built, although there are bound to be more concatenation points in a phoneme-based system, it is observed that it triumphs the CV-based system with an MOS of 2.96, primarily because, there are more examples available for each phoneme for the given amount of speech data. Further, an HMM-based speech synthesis system is developed using 5 hrs data. Although, in the synthesized speech, the speaker identity is not completely preserved, there are no sonic-glitches and the quality obtained is much better than that of a phoneme/CV-based systems, with an MOS of 3.86. Further, the footprint size of the system is exorbitantly reduced from 1 GB in USS system to 6 MB in HMM-based speech synthesis system.","PeriodicalId":202526,"journal":{"name":"2013 National Conference on Communications (NCC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"23","resultStr":"{\"title\":\"Development and evaluation of unit selection and HMM-based speech synthesis systems for Tamil\",\"authors\":\"Ramani Boothalingam, V. Sherlin Solomi, A. R. Gladston, S. Christina, P. Vijayalakshmi, N. Thangavelu, H. Murthy\",\"doi\":\"10.1109/NCC.2013.6487984\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"An unrestricted text-to-speech system is expected to produce a speech signal, corresponding to the given text in a language, that is highly intelligible to a human listener. Presently, unit selection-based synthesis (USS) and statistical parametric synthesis techniques are the state-of-art techniques for this task. Earlier, in [3], a concatenative synthesizer was developed for the language, Tamil, using 12 hrs of speech data, and shown that syllable is the better subword unit. The current work focuses on building FestVox voices using phoneme/CV unit as the subword unit, for a reduced amount of speech data (5 hrs) and to compare their performances in terms of quality. Further, the focus is to compare the performance of this synthesizer with that of the well known HMM-based speech synthesizer. Among the phoneme and CV-based systems built, although there are bound to be more concatenation points in a phoneme-based system, it is observed that it triumphs the CV-based system with an MOS of 2.96, primarily because, there are more examples available for each phoneme for the given amount of speech data. Further, an HMM-based speech synthesis system is developed using 5 hrs data. Although, in the synthesized speech, the speaker identity is not completely preserved, there are no sonic-glitches and the quality obtained is much better than that of a phoneme/CV-based systems, with an MOS of 3.86. Further, the footprint size of the system is exorbitantly reduced from 1 GB in USS system to 6 MB in HMM-based speech synthesis system.\",\"PeriodicalId\":202526,\"journal\":{\"name\":\"2013 National Conference on Communications (NCC)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-03-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"23\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 National Conference on Communications (NCC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/NCC.2013.6487984\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 National Conference on Communications (NCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NCC.2013.6487984","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 23

摘要

一个不受限制的文本转语音系统被期望产生一个语音信号,对应于一种语言中的给定文本,这对人类听者来说是高度可理解的。目前,基于单元选择的合成(USS)和统计参数合成技术是这项任务的最新技术。早些时候,在[3]中,使用12小时的语音数据为泰米尔语开发了一个连接合成器,并表明音节是更好的子词单位。目前的工作重点是使用音素/CV单位作为子词单位构建FestVox语音,减少语音数据量(5小时),并比较它们在质量方面的表现。此外,重点是将该合成器的性能与众所周知的基于hmm的语音合成器的性能进行比较。在构建的基于音素和基于cv的系统中,虽然基于音素的系统中必然有更多的连接点,但观察到它以2.96的MOS优于基于cv的系统,主要是因为对于给定的语音数据量,每个音素有更多的可用示例。在此基础上,利用5hrs数据开发了基于hmm的语音合成系统。虽然合成的语音没有完全保留说话人的身份,但没有出现声音故障,并且得到的质量比基于音素/ cv的系统要好得多,MOS为3.86。此外,系统的内存占用大小从USS系统中的1 GB大幅减少到基于hmm的语音合成系统中的6 MB。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Development and evaluation of unit selection and HMM-based speech synthesis systems for Tamil
An unrestricted text-to-speech system is expected to produce a speech signal, corresponding to the given text in a language, that is highly intelligible to a human listener. Presently, unit selection-based synthesis (USS) and statistical parametric synthesis techniques are the state-of-art techniques for this task. Earlier, in [3], a concatenative synthesizer was developed for the language, Tamil, using 12 hrs of speech data, and shown that syllable is the better subword unit. The current work focuses on building FestVox voices using phoneme/CV unit as the subword unit, for a reduced amount of speech data (5 hrs) and to compare their performances in terms of quality. Further, the focus is to compare the performance of this synthesizer with that of the well known HMM-based speech synthesizer. Among the phoneme and CV-based systems built, although there are bound to be more concatenation points in a phoneme-based system, it is observed that it triumphs the CV-based system with an MOS of 2.96, primarily because, there are more examples available for each phoneme for the given amount of speech data. Further, an HMM-based speech synthesis system is developed using 5 hrs data. Although, in the synthesized speech, the speaker identity is not completely preserved, there are no sonic-glitches and the quality obtained is much better than that of a phoneme/CV-based systems, with an MOS of 3.86. Further, the footprint size of the system is exorbitantly reduced from 1 GB in USS system to 6 MB in HMM-based speech synthesis system.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Investigation of fractal DGS microwave filters Bridging the digital gap in rural India VIVEKDISHA: A novel experience A novel optical control plane for switching a limited range wavelength converter based electro-optical hybrid node in translucent WDM optical networks Performance analysis of BitTorrent protocol Design, analysis and simulation of hybrid integrated NRD guide based QPSK modulator for LMDS applications at 28GHz
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1