Translation from Tunisian Dialect to Modern Standard Arabic: Exploring Finite-State Transducers and Sequence-to-Sequence Transformer Approaches

IF 17.7 1区 化学 Q1 CHEMISTRY, MULTIDISCIPLINARY Accounts of Chemical Research Pub Date : 2024-07-24 DOI:10.1145/3681788
Roua Torjmen, K. Haddar
{"title":"Translation from Tunisian Dialect to Modern Standard Arabic: Exploring Finite-State Transducers and Sequence-to-Sequence Transformer Approaches","authors":"Roua Torjmen, K. Haddar","doi":"10.1145/3681788","DOIUrl":null,"url":null,"abstract":"Translation from the mother tongue, including the Tunisian dialect, to modern standard Arabic is a highly significant field in natural language processing due to its wide range of applications and associated benefits. Recently, researchers have shown increased interest in the Tunisian dialect, primarily driven by the massive volume of content generated spontaneously by Tunisians on social media follow-ing the revolution. This paper presents two distinct translators for converting the Tunisian dialect into Modern Standard Arabic. The first translator utilizes a rule-based approach, employing a collection of finite state transducers and a bilingual dictionary derived from the study corpus. On the other hand, the second translator relies on deep learning models, specifically the sequence-to-sequence trans-former model and a parallel corpus. To assess, evaluate, and compare the performance of the two translators, we conducted tests using a parallel corpus comprising 8,599 words. The results achieved by both translators are noteworthy. The translator based on finite state transducers achieved a blue score of 56.65, while the transformer model-based translator achieved a higher score of 66.07.","PeriodicalId":1,"journal":{"name":"Accounts of Chemical Research","volume":"10 8","pages":""},"PeriodicalIF":17.7000,"publicationDate":"2024-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Accounts of Chemical Research","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3681788","RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

Translation from the mother tongue, including the Tunisian dialect, to modern standard Arabic is a highly significant field in natural language processing due to its wide range of applications and associated benefits. Recently, researchers have shown increased interest in the Tunisian dialect, primarily driven by the massive volume of content generated spontaneously by Tunisians on social media follow-ing the revolution. This paper presents two distinct translators for converting the Tunisian dialect into Modern Standard Arabic. The first translator utilizes a rule-based approach, employing a collection of finite state transducers and a bilingual dictionary derived from the study corpus. On the other hand, the second translator relies on deep learning models, specifically the sequence-to-sequence trans-former model and a parallel corpus. To assess, evaluate, and compare the performance of the two translators, we conducted tests using a parallel corpus comprising 8,599 words. The results achieved by both translators are noteworthy. The translator based on finite state transducers achieved a blue score of 56.65, while the transformer model-based translator achieved a higher score of 66.07.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
从突尼斯方言到现代标准阿拉伯语的翻译:探索有限状态转换器和序列到序列转换器方法
从母语(包括突尼斯方言)到现代标准阿拉伯语的翻译是自然语言处理中一个非常重要的领域,因为它具有广泛的应用范围和相关优势。最近,研究人员对突尼斯方言的兴趣与日俱增,主要原因是突尼斯革命后突尼斯人在社交媒体上自发产生了大量内容。本文介绍了两种将突尼斯方言转换为现代标准阿拉伯语的不同翻译器。第一个翻译器采用基于规则的方法,使用了一系列有限状态转换器和从研究语料库中提取的双语词典。另一方面,第二个翻译器依赖于深度学习模型,特别是序列到序列转换器模型和平行语料库。为了评估、评价和比较两个翻译器的性能,我们使用包含 8,599 个单词的平行语料库进行了测试。两个翻译器取得的结果都值得注意。基于有限状态转换器的翻译器获得了 56.65 的蓝色分数,而基于转换器模型的翻译器获得了 66.07 的较高分数。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Accounts of Chemical Research
Accounts of Chemical Research 化学-化学综合
CiteScore
31.40
自引率
1.10%
发文量
312
审稿时长
2 months
期刊介绍: Accounts of Chemical Research presents short, concise and critical articles offering easy-to-read overviews of basic research and applications in all areas of chemistry and biochemistry. These short reviews focus on research from the author’s own laboratory and are designed to teach the reader about a research project. In addition, Accounts of Chemical Research publishes commentaries that give an informed opinion on a current research problem. Special Issues online are devoted to a single topic of unusual activity and significance. Accounts of Chemical Research replaces the traditional article abstract with an article "Conspectus." These entries synopsize the research affording the reader a closer look at the content and significance of an article. Through this provision of a more detailed description of the article contents, the Conspectus enhances the article's discoverability by search engines and the exposure for the research.
期刊最新文献
Multifunctional Guest-Hosting Triple-Stranded Helicates: From Anion Recognition to Quantum Information Applications. Pd/smNBE(D) Chemistry Meets the Amino Group: Catalytic Cycle and Chemoselectivity Photophysics-Guided Upconversion Nanosystems for Sensing Organometallic Clusters in Catalysis: From Designed Synthesis and Structural Evolution to Functional Applications Photophysics of Organic Fluorophore Photobluing and Its Applications in Fluorescence and Super-Resolution Microscopy.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1