An end-to-end Tacotron model versus pre trained Tacotron model for Arabic text-to-speech synthesis

IF 2.2 4区 工程技术 Q3 ENGINEERING, MULTIDISCIPLINARY Journal of Engineering Research Pub Date : 2025-03-01 DOI:10.1016/j.jer.2023.08.016
A.M. Mutawa
{"title":"An end-to-end Tacotron model versus pre trained Tacotron model for Arabic text-to-speech synthesis","authors":"A.M. Mutawa","doi":"10.1016/j.jer.2023.08.016","DOIUrl":null,"url":null,"abstract":"<div><div>Text-to-Speech (TTS) systems turn normal text into spoken language, which is important for accessibility and user interaction. Many of these systems make speech from phonetic or phonemic transcriptions, but another way is to make speech by connecting together pre-recorded units from a database. The size of the units varies, from diphones to whole phrases. Even though this method covers a lot of ground, it sometimes needs more clarity, especially when high-quality output requires storing whole words or phrases in certain situations. Synthesizers can also use the way humans talk and the way their vocal tracts work to make voices. The Arabic language is hard to develop TTS methods for, as a result of its complicated morphology, semantic nuances, and many different dialects. These dialects often have a lot of differences from standard Arabic and don't follow formal rules for spelling. This means that traditional Arabic that hasn't been edited often has spelling and grammar mistakes. In this study, we show and test a Tacotron model that was made just for Arabic TTS synthesis from beginning to end. This model uses the richness of acoustic information in audio files, such as frequency and pitch, to make naturalistic speech that sounds a lot like what humans say. We also compare the performance of this model with that of a pre-trained Tacotron model applied to Arabic text. This gives us important information about how well Arabic TTS systems work and where they could be improved.</div></div>","PeriodicalId":48803,"journal":{"name":"Journal of Engineering Research","volume":"13 1","pages":"Pages 384-389"},"PeriodicalIF":2.2000,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Engineering Research","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2307187723001943","RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

Text-to-Speech (TTS) systems turn normal text into spoken language, which is important for accessibility and user interaction. Many of these systems make speech from phonetic or phonemic transcriptions, but another way is to make speech by connecting together pre-recorded units from a database. The size of the units varies, from diphones to whole phrases. Even though this method covers a lot of ground, it sometimes needs more clarity, especially when high-quality output requires storing whole words or phrases in certain situations. Synthesizers can also use the way humans talk and the way their vocal tracts work to make voices. The Arabic language is hard to develop TTS methods for, as a result of its complicated morphology, semantic nuances, and many different dialects. These dialects often have a lot of differences from standard Arabic and don't follow formal rules for spelling. This means that traditional Arabic that hasn't been edited often has spelling and grammar mistakes. In this study, we show and test a Tacotron model that was made just for Arabic TTS synthesis from beginning to end. This model uses the richness of acoustic information in audio files, such as frequency and pitch, to make naturalistic speech that sounds a lot like what humans say. We also compare the performance of this model with that of a pre-trained Tacotron model applied to Arabic text. This gives us important information about how well Arabic TTS systems work and where they could be improved.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
端到端Tacotron模型与阿拉伯语文本到语音合成的预训练Tacotron模型
文本到语音(TTS)系统将正常文本转换为口语,这对于可访问性和用户交互非常重要。许多这样的系统通过语音或音位的转录来发声,但另一种方法是通过连接数据库中预先录制的单元来发声。单元的大小各不相同,从diphone到整个短语。尽管这种方法涵盖了很多内容,但有时它需要更清晰,特别是当高质量的输出需要在某些情况下存储整个单词或短语时。合成器也可以利用人类说话的方式和他们声道的工作方式来发出声音。阿拉伯语由于其复杂的形态学、语义上的细微差别和许多不同的方言,很难开发TTS方法。这些方言通常与标准阿拉伯语有很大的不同,并且在拼写上不遵循正式的规则。这意味着没有经过编辑的传统阿拉伯语经常有拼写和语法错误。在本研究中,我们展示并测试了一个从头到尾仅为阿拉伯TTS合成而制作的Tacotron模型。这个模型利用音频文件中丰富的声学信息,比如频率和音高,来制作听起来很像人类说话的自然语言。我们还将该模型的性能与应用于阿拉伯文本的预训练Tacotron模型的性能进行了比较。这为我们提供了关于阿拉伯语TTS系统的工作情况以及可以改进的地方的重要信息。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of Engineering Research
Journal of Engineering Research ENGINEERING, MULTIDISCIPLINARY-
CiteScore
1.60
自引率
10.00%
发文量
181
审稿时长
20 weeks
期刊介绍: Journal of Engineering Research (JER) is a international, peer reviewed journal which publishes full length original research papers, reviews, case studies related to all areas of Engineering such as: Civil, Mechanical, Industrial, Electrical, Computer, Chemical, Petroleum, Aerospace, Architectural, Biomedical, Coastal, Environmental, Marine & Ocean, Metallurgical & Materials, software, Surveying, Systems and Manufacturing Engineering. In particular, JER focuses on innovative approaches and methods that contribute to solving the environmental and manufacturing problems, which exist primarily in the Arabian Gulf region and the Middle East countries. Kuwait University used to publish the Journal "Kuwait Journal of Science and Engineering" (ISSN: 1024-8684), which included Science and Engineering articles since 1974. In 2011 the decision was taken to split KJSE into two independent Journals - "Journal of Engineering Research "(JER) and "Kuwait Journal of Science" (KJS).
期刊最新文献
Subcritical water extraction on phenolic, flavonoid and antioxidant activity from Orthosiphon Stamineus leaves: Experimental and optimization Statistical analysis and optimization of mechanical-chemical electro-Fenton for organic contaminant degradation in refinery wastewater Prediction of solar radiation as a function of particulate matter pollution and meteorological data using machine learning models Enhanced paracetamol removal using PES/GO mixed matrix membranes: A study on synthesis, characterization, and performance evaluation In vitro study of the antihemolytic and antioxidant potential of two essential oils from Salvia officinalis L. and Curcuma longa L. against glucantime® toxicity
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1