Modern Arabic speech corpus for Text to Speech synthesis

Zine Oumaima, A. Meziane
{"title":"Modern Arabic speech corpus for Text to Speech synthesis","authors":"Zine Oumaima, A. Meziane","doi":"10.1109/ICTMOD49425.2020.9380606","DOIUrl":null,"url":null,"abstract":"there are hardly any open access large single speaker corpora that could be effectively used to build a Text-to-Speech system, especially for Arabic being a low-resourced language. Thus, the aim of this paper is to present a new open access single speaker corpus. It is by far the largest Arabic speech resource suitable for building text-to-speech systems. The released corpus consists of more than 16-hours audio files aligned with their corresponding phonetic transcription in Buckwalter format, and the orthographic text transcripts representing 81, 000 words fully diactritized. The corpus design was determined by different factors among which is the coverage of the most frequent lemmas, this latter being a common unit in most of the Arabic words. The corpus is freely available for download from http://oujda-nlp-team.net/en/corpora/speech-corpus-1-0/.","PeriodicalId":158303,"journal":{"name":"2020 IEEE International Conference on Technology Management, Operations and Decisions (ICTMOD)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Conference on Technology Management, Operations and Decisions (ICTMOD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICTMOD49425.2020.9380606","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

there are hardly any open access large single speaker corpora that could be effectively used to build a Text-to-Speech system, especially for Arabic being a low-resourced language. Thus, the aim of this paper is to present a new open access single speaker corpus. It is by far the largest Arabic speech resource suitable for building text-to-speech systems. The released corpus consists of more than 16-hours audio files aligned with their corresponding phonetic transcription in Buckwalter format, and the orthographic text transcripts representing 81, 000 words fully diactritized. The corpus design was determined by different factors among which is the coverage of the most frequent lemmas, this latter being a common unit in most of the Arabic words. The corpus is freely available for download from http://oujda-nlp-team.net/en/corpora/speech-corpus-1-0/.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
用于文本到语音合成的现代阿拉伯语语音语料库
几乎没有任何开放访问的大型单个说话人语料库可以有效地用于构建文本到语音的系统,特别是阿拉伯语是一种资源匮乏的语言。因此,本文的目的是提出一个新的开放获取的单说话人语料库。它是迄今为止最大的阿拉伯语语音资源,适合建立文本到语音系统。发布的语料库由超过16小时的音频文件组成,这些音频文件以Buckwalter格式与相应的语音转录对齐,并且正字法文本转录代表81,000个完全数字化的单词。语料库的设计是由不同的因素决定的,其中最常见的引理的覆盖范围,后者是大多数阿拉伯语单词的共同单位。该语料库可从http://oujda-nlp-team.net/en/corpora/speech-corpus-1-0/免费下载。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Digital transformation of Human Resources Management: A Roadmap Sharing Economy in a context of pandemic propagation : Case of the COVID19 The Two Phases Method for operating rooms planning and scheduling Exploring the concept of “knowledge sabotage” A Hierarchical Blockchain of Things Network For Unified Carbon Emission Trading (HBUETS):A Conceptual Framework
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1