Voice conversion-based multilingual to polyglot speech synthesizer for Indian languages

2013 IEEE International Conference of IEEE Region 10 (TENCON 2013) Pub Date : 2013-10-01 DOI:10.1109/TENCON.2013.6719019

B. Ramani, M. P. Actlin Jeeva, P. Vijayalakshmi, T. Nagarajan

{"title":"Voice conversion-based multilingual to polyglot speech synthesizer for Indian languages","authors":"B. Ramani, M. P. Actlin Jeeva, P. Vijayalakshmi, T. Nagarajan","doi":"10.1109/TENCON.2013.6719019","DOIUrl":null,"url":null,"abstract":"A multilingual text-to-speech (TTS) system synthesizes speech signal in multiple languages for a given text, that is intelligible to human listener. However, given a mixed language text to the system, the synthesized output is observed to have speaker switching at the language switching points, which is annoying to the listeners. To overcome this switching effect, a polyglot speech synthesizer is developed, which generates synthesized speech in multiple languages with single voice identity. This can be achieved by inherent voice conversion during synthesis or by using voice conversion to convert the multilingual speech corpus to polyglot speech corpus and then perform synthesis. In this work, the polyglot speech corpus is obtained using Gaussian mixture model (GMM)-based cross-lingual voice conversion technique and a polyglot speech synthesizer for Indian languages is developed using hidden Markov model (HMM)- based synthesis technique. Here, the speech data collected from the native speakers for the Indian languages namely, Telugu, Malayalam, and Hindi are converted to have the voice identity of the native Tamil speaker. Building a HMM-based synthesizer using the obtained polyglot corpus enables the system to synthesize speech for any given text in any language or mixed language text. The performance of the polyglot speech synthesizer is evaluated for the similarity of the synthesized speech to the source or target speaker by performing ABX listening test. The scores obtained shows that the percentage of similarity to the target Tamil speaker varies from 73% to 86%. Further the performance of the system is analyzed for speaker switching.","PeriodicalId":425023,"journal":{"name":"2013 IEEE International Conference of IEEE Region 10 (TENCON 2013)","volume":"171 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE International Conference of IEEE Region 10 (TENCON 2013)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TENCON.2013.6719019","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 15

Abstract

A multilingual text-to-speech (TTS) system synthesizes speech signal in multiple languages for a given text, that is intelligible to human listener. However, given a mixed language text to the system, the synthesized output is observed to have speaker switching at the language switching points, which is annoying to the listeners. To overcome this switching effect, a polyglot speech synthesizer is developed, which generates synthesized speech in multiple languages with single voice identity. This can be achieved by inherent voice conversion during synthesis or by using voice conversion to convert the multilingual speech corpus to polyglot speech corpus and then perform synthesis. In this work, the polyglot speech corpus is obtained using Gaussian mixture model (GMM)-based cross-lingual voice conversion technique and a polyglot speech synthesizer for Indian languages is developed using hidden Markov model (HMM)- based synthesis technique. Here, the speech data collected from the native speakers for the Indian languages namely, Telugu, Malayalam, and Hindi are converted to have the voice identity of the native Tamil speaker. Building a HMM-based synthesizer using the obtained polyglot corpus enables the system to synthesize speech for any given text in any language or mixed language text. The performance of the polyglot speech synthesizer is evaluated for the similarity of the synthesized speech to the source or target speaker by performing ABX listening test. The scores obtained shows that the percentage of similarity to the target Tamil speaker varies from 73% to 86%. Further the performance of the system is analyzed for speaker switching.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于语音转换的多语言到多语言语音合成器，用于印度语言

多语言文本到语音(TTS)系统是针对给定文本合成多种语言的语音信号，使听者能够理解。然而，给系统一个混合语言文本，观察到合成输出在语言切换点有说话人切换，这对听众来说很烦人。为了克服这种切换效应，开发了一种多语言语音合成器，该合成器可以生成具有单一语音身份的多种语言合成语音。这可以通过在合成过程中固有的语音转换来实现，也可以通过使用语音转换将多语言语音语料库转换为多语言语音语料库然后进行合成来实现。本文采用基于高斯混合模型(GMM)的跨语言语音转换技术获得多语语音语料库，并采用基于隐马尔可夫模型(HMM)的合成技术开发了印度语多语语音合成器。在这里，从母语为印度语言的人那里收集的语音数据，即泰卢固语、马拉雅拉姆语和印地语，被转换成具有泰米尔语母语者的语音身份。使用获得的多语言语料库构建基于hmm的合成器，使系统能够以任何语言或混合语言文本合成任何给定文本的语音。通过ABX听力测试，对多语语音合成器的性能进行评价，评价合成语音与源说话人或目标说话人的相似度。获得的分数表明，与目标泰米尔语使用者的相似度百分比从73%到86%不等。进一步分析了系统在扬声器切换时的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2013 IEEE International Conference of IEEE Region 10 (TENCON 2013)

自引率

0.00%

发文量