{"title":"An end-to-end Tacotron model versus pre trained Tacotron model for Arabic text-to-speech synthesis","authors":"A.M. Mutawa","doi":"10.1016/j.jer.2023.08.016","DOIUrl":null,"url":null,"abstract":"<div><div>Text-to-Speech (TTS) systems turn normal text into spoken language, which is important for accessibility and user interaction. Many of these systems make speech from phonetic or phonemic transcriptions, but another way is to make speech by connecting together pre-recorded units from a database. The size of the units varies, from diphones to whole phrases. Even though this method covers a lot of ground, it sometimes needs more clarity, especially when high-quality output requires storing whole words or phrases in certain situations. Synthesizers can also use the way humans talk and the way their vocal tracts work to make voices. The Arabic language is hard to develop TTS methods for, as a result of its complicated morphology, semantic nuances, and many different dialects. These dialects often have a lot of differences from standard Arabic and don't follow formal rules for spelling. This means that traditional Arabic that hasn't been edited often has spelling and grammar mistakes. In this study, we show and test a Tacotron model that was made just for Arabic TTS synthesis from beginning to end. This model uses the richness of acoustic information in audio files, such as frequency and pitch, to make naturalistic speech that sounds a lot like what humans say. We also compare the performance of this model with that of a pre-trained Tacotron model applied to Arabic text. This gives us important information about how well Arabic TTS systems work and where they could be improved.</div></div>","PeriodicalId":48803,"journal":{"name":"Journal of Engineering Research","volume":"13 1","pages":"Pages 384-389"},"PeriodicalIF":2.2000,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Engineering Research","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2307187723001943","RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
Text-to-Speech (TTS) systems turn normal text into spoken language, which is important for accessibility and user interaction. Many of these systems make speech from phonetic or phonemic transcriptions, but another way is to make speech by connecting together pre-recorded units from a database. The size of the units varies, from diphones to whole phrases. Even though this method covers a lot of ground, it sometimes needs more clarity, especially when high-quality output requires storing whole words or phrases in certain situations. Synthesizers can also use the way humans talk and the way their vocal tracts work to make voices. The Arabic language is hard to develop TTS methods for, as a result of its complicated morphology, semantic nuances, and many different dialects. These dialects often have a lot of differences from standard Arabic and don't follow formal rules for spelling. This means that traditional Arabic that hasn't been edited often has spelling and grammar mistakes. In this study, we show and test a Tacotron model that was made just for Arabic TTS synthesis from beginning to end. This model uses the richness of acoustic information in audio files, such as frequency and pitch, to make naturalistic speech that sounds a lot like what humans say. We also compare the performance of this model with that of a pre-trained Tacotron model applied to Arabic text. This gives us important information about how well Arabic TTS systems work and where they could be improved.
期刊介绍:
Journal of Engineering Research (JER) is a international, peer reviewed journal which publishes full length original research papers, reviews, case studies related to all areas of Engineering such as: Civil, Mechanical, Industrial, Electrical, Computer, Chemical, Petroleum, Aerospace, Architectural, Biomedical, Coastal, Environmental, Marine & Ocean, Metallurgical & Materials, software, Surveying, Systems and Manufacturing Engineering. In particular, JER focuses on innovative approaches and methods that contribute to solving the environmental and manufacturing problems, which exist primarily in the Arabian Gulf region and the Middle East countries. Kuwait University used to publish the Journal "Kuwait Journal of Science and Engineering" (ISSN: 1024-8684), which included Science and Engineering articles since 1974. In 2011 the decision was taken to split KJSE into two independent Journals - "Journal of Engineering Research "(JER) and "Kuwait Journal of Science" (KJS).