Neural Machine Translation for Kashmiri to English and Hindi using Pre-trained Embeddings

Shailashree K. Sheshadri, Deepa Gupta, M. Costa-jussà
{"title":"Neural Machine Translation for Kashmiri to English and Hindi using Pre-trained Embeddings","authors":"Shailashree K. Sheshadri, Deepa Gupta, M. Costa-jussà","doi":"10.1109/OCIT56763.2022.00053","DOIUrl":null,"url":null,"abstract":"Neural Machine Translation (NMT) is one of the advanced approaches of Machine Translation (MT) that has recently gained popularity. A significant amount of parallel corpus is required to achieve a sound translation system, but most languages have a deficit worldwide. Many SoTA NMT systems are available for low-resource langauges that are developed using transfer learning, knowledge transfer, and zero-shot learning mechanisms. Most Indic languages fall into low-resource and zero-resource due to the non-availability of rich parallel and monolingual corpora. Though many Indian border languages have social and economic significance, they lack resources and automated machine translation systems. Kashmiri, one such Indian border language, belongs to the zero-resource category with limited corpora and no significant translation system. This paper uses pre-trained word embeddings to create the first NMT system specifically for Kashmiri-English and Kashmiri-Hindi translation. mBPE pre-trained word embeddings for Kashmiri language are used to develop the NMT system. A pre-trained word embedding model shows +2.58 BLEU improvisation in comparison to Vanilla NMT.","PeriodicalId":425541,"journal":{"name":"2022 OITS International Conference on Information Technology (OCIT)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 OITS International Conference on Information Technology (OCIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/OCIT56763.2022.00053","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Neural Machine Translation (NMT) is one of the advanced approaches of Machine Translation (MT) that has recently gained popularity. A significant amount of parallel corpus is required to achieve a sound translation system, but most languages have a deficit worldwide. Many SoTA NMT systems are available for low-resource langauges that are developed using transfer learning, knowledge transfer, and zero-shot learning mechanisms. Most Indic languages fall into low-resource and zero-resource due to the non-availability of rich parallel and monolingual corpora. Though many Indian border languages have social and economic significance, they lack resources and automated machine translation systems. Kashmiri, one such Indian border language, belongs to the zero-resource category with limited corpora and no significant translation system. This paper uses pre-trained word embeddings to create the first NMT system specifically for Kashmiri-English and Kashmiri-Hindi translation. mBPE pre-trained word embeddings for Kashmiri language are used to develop the NMT system. A pre-trained word embedding model shows +2.58 BLEU improvisation in comparison to Vanilla NMT.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
使用预训练嵌入的克什米尔语到英语和印地语的神经机器翻译
神经机器翻译(Neural Machine Translation, NMT)是近年来兴起的一种先进的机器翻译方法。要实现一个完善的翻译系统,需要大量的平行语料库,但在世界范围内,大多数语言都存在缺陷。许多SoTA NMT系统可用于使用迁移学习、知识迁移和零学习机制开发的低资源语言。由于没有丰富的并行语料库和单语语料库,大多数印度语陷入低资源和零资源的境地。尽管许多印度边境语言具有社会和经济意义,但它们缺乏资源和自动机器翻译系统。克什米尔语属于零资源范畴,语料库有限,没有重要的翻译系统。本文使用预训练词嵌入来创建第一个专门用于克什米尔-英语和克什米尔-印地语翻译的NMT系统。使用mBPE预训练的克什米尔语词嵌入来开发NMT系统。与Vanilla NMT相比,预训练的词嵌入模型显示了+2.58的BLEU即兴性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Visualization of 3D Point Clouds for Vehicle Detection Based on LiDAR and Camera Fusion Distributed Self Intermittent Fault outlier identification technique for WSN s Vision-Based Detection of Hospital and Police Station Scene Natural Question Generation using Transformers and Reinforcement Learning Edge Intelligence Based Mitigation of False Data Injection Attack In IoMT Framework
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1