孟加拉语、高棉语、尼泊尔语、爪哇语、僧伽罗语和巽他语文本到语音系统的文本规范化

Workshop on Spoken Language Technologies for Under-resourced Languages Pub Date : 2018-08-29 DOI:10.21437/SLTU.2018-31

Keshan Sanjaya Sodimana, Pasindu De Silva, R. Sproat, T. Wattanavekin, Alexander Gutkin, Knot Pipatsrisawat

{"title":"孟加拉语、高棉语、尼泊尔语、爪哇语、僧伽罗语和巽他语文本到语音系统的文本规范化","authors":"Keshan Sanjaya Sodimana, Pasindu De Silva, R. Sproat, T. Wattanavekin, Alexander Gutkin, Knot Pipatsrisawat","doi":"10.21437/SLTU.2018-31","DOIUrl":null,"url":null,"abstract":"Text normalization is the process of converting non-standard words (NSWs) such as numbers, and abbreviations into standard words so that their pronunciations can be derived by a typical means (usually lexicon lookups). Text normalization is, thus, an important component of any text-to-speech (TTS) system. Without text normalization, the resulting voice may sound unintelligent. In this paper, we describe an approach to develop rule-based text normalization. We also describe our open source repository containing text normalization grammars and tests for Bangla, Javanese, Khmer, Nepali, Sinhala and Sundanese. Fi-nally, we present a recipe for utilizing the grammars in a TTS system.","PeriodicalId":190269,"journal":{"name":"Workshop on Spoken Language Technologies for Under-resourced Languages","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Text Normalization for Bangla, Khmer, Nepali, Javanese, Sinhala and Sundanese Text-to-Speech Systems\",\"authors\":\"Keshan Sanjaya Sodimana, Pasindu De Silva, R. Sproat, T. Wattanavekin, Alexander Gutkin, Knot Pipatsrisawat\",\"doi\":\"10.21437/SLTU.2018-31\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Text normalization is the process of converting non-standard words (NSWs) such as numbers, and abbreviations into standard words so that their pronunciations can be derived by a typical means (usually lexicon lookups). Text normalization is, thus, an important component of any text-to-speech (TTS) system. Without text normalization, the resulting voice may sound unintelligent. In this paper, we describe an approach to develop rule-based text normalization. We also describe our open source repository containing text normalization grammars and tests for Bangla, Javanese, Khmer, Nepali, Sinhala and Sundanese. Fi-nally, we present a recipe for utilizing the grammars in a TTS system.\",\"PeriodicalId\":190269,\"journal\":{\"name\":\"Workshop on Spoken Language Technologies for Under-resourced Languages\",\"volume\":\"39 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-08-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Workshop on Spoken Language Technologies for Under-resourced Languages\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.21437/SLTU.2018-31\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Workshop on Spoken Language Technologies for Under-resourced Languages","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21437/SLTU.2018-31","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

摘要

文本规范化是将数字和缩写等非标准单词转换为标准单词的过程，以便通过典型方法(通常是词典查找)推导出它们的发音。因此，文本规范化是任何文本到语音(TTS)系统的重要组成部分。如果没有文本规范化，生成的声音可能听起来很不智能。在本文中，我们描述了一种开发基于规则的文本规范化的方法。我们还描述了包含孟加拉语、爪哇语、高棉语、尼泊尔语、僧伽罗语和巽他语文本规范化语法和测试的开源存储库。最后，我们给出了在TTS系统中使用语法的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Text Normalization for Bangla, Khmer, Nepali, Javanese, Sinhala and Sundanese Text-to-Speech Systems

Text normalization is the process of converting non-standard words (NSWs) such as numbers, and abbreviations into standard words so that their pronunciations can be derived by a typical means (usually lexicon lookups). Text normalization is, thus, an important component of any text-to-speech (TTS) system. Without text normalization, the resulting voice may sound unintelligent. In this paper, we describe an approach to develop rule-based text normalization. We also describe our open source repository containing text normalization grammars and tests for Bangla, Javanese, Khmer, Nepali, Sinhala and Sundanese. Fi-nally, we present a recipe for utilizing the grammars in a TTS system.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Workshop on Spoken Language Technologies for Under-resourced Languages

自引率

0.00%

发文量