神经机器翻译用阿姆哈拉语-阿拉伯语并行文本语料库的构建

Ibrahim Gashaw, H. Shashirekha
{"title":"神经机器翻译用阿姆哈拉语-阿拉伯语并行文本语料库的构建","authors":"Ibrahim Gashaw, H. Shashirekha","doi":"10.5121/ijaia.2020.11107","DOIUrl":null,"url":null,"abstract":"Many automatic translation works have been addressed between major European language pairs, by\n taking advantage of large scale parallel corpora, but very few research works are conducted on the\n Amharic-Arabic language pair due to its parallel data scarcity. However, there is no benchmark parallel\n Amharic-Arabic text corpora available for Machine Translation task. Therefore, a small parallel Quranic\n text corpus is constructed by modifying the existing monolingual Arabic text and its equivalent translation\n of Amharic language text corpora available on Tanzile. Experiments are carried out on Two Long ShortTerm Memory (LSTM) and Gated Recurrent Units (GRU) based Neural Machine Translation (NMT) using\n Attention-based Encoder-Decoder architecture which is adapted from the open-source OpenNMT system.\n LSTM and GRU based NMT models and Google Translation system are compared and found that LSTM\n based OpenNMT outperforms GRU based OpenNMT and Google Translation system, with a BLEU score\n of 12%, 11%, and 6% respectively.","PeriodicalId":93188,"journal":{"name":"International journal of artificial intelligence & applications","volume":"11 1","pages":"79-91"},"PeriodicalIF":0.0000,"publicationDate":"2020-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.5121/ijaia.2020.11107","citationCount":"1","resultStr":"{\"title\":\"Construction of Amharic-arabic Parallel Text Corpus for Neural Machine Translation\",\"authors\":\"Ibrahim Gashaw, H. Shashirekha\",\"doi\":\"10.5121/ijaia.2020.11107\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Many automatic translation works have been addressed between major European language pairs, by\\n taking advantage of large scale parallel corpora, but very few research works are conducted on the\\n Amharic-Arabic language pair due to its parallel data scarcity. However, there is no benchmark parallel\\n Amharic-Arabic text corpora available for Machine Translation task. Therefore, a small parallel Quranic\\n text corpus is constructed by modifying the existing monolingual Arabic text and its equivalent translation\\n of Amharic language text corpora available on Tanzile. Experiments are carried out on Two Long ShortTerm Memory (LSTM) and Gated Recurrent Units (GRU) based Neural Machine Translation (NMT) using\\n Attention-based Encoder-Decoder architecture which is adapted from the open-source OpenNMT system.\\n LSTM and GRU based NMT models and Google Translation system are compared and found that LSTM\\n based OpenNMT outperforms GRU based OpenNMT and Google Translation system, with a BLEU score\\n of 12%, 11%, and 6% respectively.\",\"PeriodicalId\":93188,\"journal\":{\"name\":\"International journal of artificial intelligence & applications\",\"volume\":\"11 1\",\"pages\":\"79-91\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-01-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.5121/ijaia.2020.11107\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International journal of artificial intelligence & applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5121/ijaia.2020.11107\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International journal of artificial intelligence & applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5121/ijaia.2020.11107","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

利用大规模的平行语料库,许多欧洲主要语言对之间的自动翻译工作已经得到了解决,但由于阿姆哈拉语-阿拉伯语对的平行数据稀缺,很少对其进行研究。然而,没有可用于机器翻译任务的基准平行阿姆哈拉语-阿拉伯语文本语料库。因此,通过修改Tanzile上现有的单语阿拉伯语文本及其阿姆哈拉语文本语料库的等效翻译,构建了一个小型的平行古兰经文本语料库。在基于两个长短期存储器(LSTM)和门控递归单元(GRU)的神经机器翻译(NMT)上使用基于注意力的编码器-编码器架构进行了实验,该架构改编自开源的OpenNMT系统。将基于LSTM和GRU的NMT模型与谷歌翻译系统进行比较,发现基于LSTM的OpenNMT优于基于GRU的OpenNMT和谷歌翻译系统,BLEU得分分别为12%、11%和6%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Construction of Amharic-arabic Parallel Text Corpus for Neural Machine Translation
Many automatic translation works have been addressed between major European language pairs, by taking advantage of large scale parallel corpora, but very few research works are conducted on the Amharic-Arabic language pair due to its parallel data scarcity. However, there is no benchmark parallel Amharic-Arabic text corpora available for Machine Translation task. Therefore, a small parallel Quranic text corpus is constructed by modifying the existing monolingual Arabic text and its equivalent translation of Amharic language text corpora available on Tanzile. Experiments are carried out on Two Long ShortTerm Memory (LSTM) and Gated Recurrent Units (GRU) based Neural Machine Translation (NMT) using Attention-based Encoder-Decoder architecture which is adapted from the open-source OpenNMT system. LSTM and GRU based NMT models and Google Translation system are compared and found that LSTM based OpenNMT outperforms GRU based OpenNMT and Google Translation system, with a BLEU score of 12%, 11%, and 6% respectively.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Characteristics of Networks Generated by Kernel Growing Neural Gas Identifying Text Classification Failures in Multilingual AI-Generated Content Subverting Characters Stereotypes: Exploring the Role of AI in Stereotype Subversion Performance Evaluation of Block-Sized Algorithms for Majority Vote in Facial Recognition Sentiment Analysis in Indian Elections: Unraveling Public Perception of the Karnataka Elections With Transformers
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1