Voice conversion-based multilingual to polyglot speech synthesizer for Indian languages

B. Ramani, M. P. Actlin Jeeva, P. Vijayalakshmi, T. Nagarajan
{"title":"Voice conversion-based multilingual to polyglot speech synthesizer for Indian languages","authors":"B. Ramani, M. P. Actlin Jeeva, P. Vijayalakshmi, T. Nagarajan","doi":"10.1109/TENCON.2013.6719019","DOIUrl":null,"url":null,"abstract":"A multilingual text-to-speech (TTS) system synthesizes speech signal in multiple languages for a given text, that is intelligible to human listener. However, given a mixed language text to the system, the synthesized output is observed to have speaker switching at the language switching points, which is annoying to the listeners. To overcome this switching effect, a polyglot speech synthesizer is developed, which generates synthesized speech in multiple languages with single voice identity. This can be achieved by inherent voice conversion during synthesis or by using voice conversion to convert the multilingual speech corpus to polyglot speech corpus and then perform synthesis. In this work, the polyglot speech corpus is obtained using Gaussian mixture model (GMM)-based cross-lingual voice conversion technique and a polyglot speech synthesizer for Indian languages is developed using hidden Markov model (HMM)- based synthesis technique. Here, the speech data collected from the native speakers for the Indian languages namely, Telugu, Malayalam, and Hindi are converted to have the voice identity of the native Tamil speaker. Building a HMM-based synthesizer using the obtained polyglot corpus enables the system to synthesize speech for any given text in any language or mixed language text. The performance of the polyglot speech synthesizer is evaluated for the similarity of the synthesized speech to the source or target speaker by performing ABX listening test. The scores obtained shows that the percentage of similarity to the target Tamil speaker varies from 73% to 86%. Further the performance of the system is analyzed for speaker switching.","PeriodicalId":425023,"journal":{"name":"2013 IEEE International Conference of IEEE Region 10 (TENCON 2013)","volume":"171 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE International Conference of IEEE Region 10 (TENCON 2013)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TENCON.2013.6719019","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 15

Abstract

A multilingual text-to-speech (TTS) system synthesizes speech signal in multiple languages for a given text, that is intelligible to human listener. However, given a mixed language text to the system, the synthesized output is observed to have speaker switching at the language switching points, which is annoying to the listeners. To overcome this switching effect, a polyglot speech synthesizer is developed, which generates synthesized speech in multiple languages with single voice identity. This can be achieved by inherent voice conversion during synthesis or by using voice conversion to convert the multilingual speech corpus to polyglot speech corpus and then perform synthesis. In this work, the polyglot speech corpus is obtained using Gaussian mixture model (GMM)-based cross-lingual voice conversion technique and a polyglot speech synthesizer for Indian languages is developed using hidden Markov model (HMM)- based synthesis technique. Here, the speech data collected from the native speakers for the Indian languages namely, Telugu, Malayalam, and Hindi are converted to have the voice identity of the native Tamil speaker. Building a HMM-based synthesizer using the obtained polyglot corpus enables the system to synthesize speech for any given text in any language or mixed language text. The performance of the polyglot speech synthesizer is evaluated for the similarity of the synthesized speech to the source or target speaker by performing ABX listening test. The scores obtained shows that the percentage of similarity to the target Tamil speaker varies from 73% to 86%. Further the performance of the system is analyzed for speaker switching.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于语音转换的多语言到多语言语音合成器,用于印度语言
多语言文本到语音(TTS)系统是针对给定文本合成多种语言的语音信号,使听者能够理解。然而,给系统一个混合语言文本,观察到合成输出在语言切换点有说话人切换,这对听众来说很烦人。为了克服这种切换效应,开发了一种多语言语音合成器,该合成器可以生成具有单一语音身份的多种语言合成语音。这可以通过在合成过程中固有的语音转换来实现,也可以通过使用语音转换将多语言语音语料库转换为多语言语音语料库然后进行合成来实现。本文采用基于高斯混合模型(GMM)的跨语言语音转换技术获得多语语音语料库,并采用基于隐马尔可夫模型(HMM)的合成技术开发了印度语多语语音合成器。在这里,从母语为印度语言的人那里收集的语音数据,即泰卢固语、马拉雅拉姆语和印地语,被转换成具有泰米尔语母语者的语音身份。使用获得的多语言语料库构建基于hmm的合成器,使系统能够以任何语言或混合语言文本合成任何给定文本的语音。通过ABX听力测试,对多语语音合成器的性能进行评价,评价合成语音与源说话人或目标说话人的相似度。获得的分数表明,与目标泰米尔语使用者的相似度百分比从73%到86%不等。进一步分析了系统在扬声器切换时的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
W-band ultra-low-power sub-harmonic mixer for automotive radar in 65nm CMOS A study on digital filter banks for reconstruction of uniformly sampled signals from nonuniform samples Development of a rectenna for batteryless electronic paper On the performance of SVD estimation in Saleh-Valenzuela channel for UWB system Development of dual band digitally controlled oscillator using Fibonacci sequence in 0.18 um CMOS process
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1