{"title":"Construction of Amharic-arabic Parallel Text Corpus for Neural Machine Translation","authors":"Ibrahim Gashaw, H. Shashirekha","doi":"10.5121/ijaia.2020.11107","DOIUrl":null,"url":null,"abstract":"Many automatic translation works have been addressed between major European language pairs, by\n taking advantage of large scale parallel corpora, but very few research works are conducted on the\n Amharic-Arabic language pair due to its parallel data scarcity. However, there is no benchmark parallel\n Amharic-Arabic text corpora available for Machine Translation task. Therefore, a small parallel Quranic\n text corpus is constructed by modifying the existing monolingual Arabic text and its equivalent translation\n of Amharic language text corpora available on Tanzile. Experiments are carried out on Two Long ShortTerm Memory (LSTM) and Gated Recurrent Units (GRU) based Neural Machine Translation (NMT) using\n Attention-based Encoder-Decoder architecture which is adapted from the open-source OpenNMT system.\n LSTM and GRU based NMT models and Google Translation system are compared and found that LSTM\n based OpenNMT outperforms GRU based OpenNMT and Google Translation system, with a BLEU score\n of 12%, 11%, and 6% respectively.","PeriodicalId":93188,"journal":{"name":"International journal of artificial intelligence & applications","volume":"11 1","pages":"79-91"},"PeriodicalIF":0.0000,"publicationDate":"2020-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.5121/ijaia.2020.11107","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International journal of artificial intelligence & applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5121/ijaia.2020.11107","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Many automatic translation works have been addressed between major European language pairs, by
taking advantage of large scale parallel corpora, but very few research works are conducted on the
Amharic-Arabic language pair due to its parallel data scarcity. However, there is no benchmark parallel
Amharic-Arabic text corpora available for Machine Translation task. Therefore, a small parallel Quranic
text corpus is constructed by modifying the existing monolingual Arabic text and its equivalent translation
of Amharic language text corpora available on Tanzile. Experiments are carried out on Two Long ShortTerm Memory (LSTM) and Gated Recurrent Units (GRU) based Neural Machine Translation (NMT) using
Attention-based Encoder-Decoder architecture which is adapted from the open-source OpenNMT system.
LSTM and GRU based NMT models and Google Translation system are compared and found that LSTM
based OpenNMT outperforms GRU based OpenNMT and Google Translation system, with a BLEU score
of 12%, 11%, and 6% respectively.