代码混合罗马乌尔都语(罗马乌尔都语和英语)到乌尔都语翻译

Muhammad Wisal, A. Mustafa, Umair Arshad
{"title":"代码混合罗马乌尔都语(罗马乌尔都语和英语)到乌尔都语翻译","authors":"Muhammad Wisal, A. Mustafa, Umair Arshad","doi":"10.1109/INMIC56986.2022.9972972","DOIUrl":null,"url":null,"abstract":"Urdu is the official language of Pakistan and a familiar language in the South Asian countries. It is spoken as the first language by nearly 70 million people and as a second language by more than 100 million people, mainly in Pakistan and India. Most of the textual communication is not pure Roman Urdu. There are words of actual English in between those Roman Urdu sentences. It is necessary to have a translator that can translate these code-mixed sentences into Urdu because the purpose of any language is to communicate. It can be difficult for a machine to understand the shift of languages in between a sentence. In the past, researchers have worked on Urdu transliteration and rule-based translation. However, a pure translation of mixed Roman Urdu to Urdu with such accuracy is novel. In this research, we have introduced Mixed Language (Roman Urdu and English) to the Urdu translator. A deep learning pre-trained model “g2p_multilingual_byT5_small” is fine-tuned with a newly created corpus of Mixed Roman Urdu sentences and their translations in pure Urdu. With a BLEU score of 66.73, It can translate text messages, paragraphs, or any descriptions from Roman Urdu to Urdu. We have carried out this research using Python programming language and the model training on Google Colab.","PeriodicalId":404424,"journal":{"name":"2022 24th International Multitopic Conference (INMIC)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"CMRUTU: Code Mixed Roman Urdu (Roman Urdu and English) to Urdu Translator\",\"authors\":\"Muhammad Wisal, A. Mustafa, Umair Arshad\",\"doi\":\"10.1109/INMIC56986.2022.9972972\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Urdu is the official language of Pakistan and a familiar language in the South Asian countries. It is spoken as the first language by nearly 70 million people and as a second language by more than 100 million people, mainly in Pakistan and India. Most of the textual communication is not pure Roman Urdu. There are words of actual English in between those Roman Urdu sentences. It is necessary to have a translator that can translate these code-mixed sentences into Urdu because the purpose of any language is to communicate. It can be difficult for a machine to understand the shift of languages in between a sentence. In the past, researchers have worked on Urdu transliteration and rule-based translation. However, a pure translation of mixed Roman Urdu to Urdu with such accuracy is novel. In this research, we have introduced Mixed Language (Roman Urdu and English) to the Urdu translator. A deep learning pre-trained model “g2p_multilingual_byT5_small” is fine-tuned with a newly created corpus of Mixed Roman Urdu sentences and their translations in pure Urdu. With a BLEU score of 66.73, It can translate text messages, paragraphs, or any descriptions from Roman Urdu to Urdu. We have carried out this research using Python programming language and the model training on Google Colab.\",\"PeriodicalId\":404424,\"journal\":{\"name\":\"2022 24th International Multitopic Conference (INMIC)\",\"volume\":\"29 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 24th International Multitopic Conference (INMIC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/INMIC56986.2022.9972972\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 24th International Multitopic Conference (INMIC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INMIC56986.2022.9972972","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

乌尔都语是巴基斯坦的官方语言,也是南亚国家熟悉的语言。近7000万人将其作为第一语言,超过1亿人将其作为第二语言,主要是在巴基斯坦和印度。大多数文本交流不是纯粹的罗马乌尔都语。在那些罗马乌尔都语句子之间有一些真正的英语单词。有必要有一个译者,可以翻译这些代码混合的句子到乌尔都语,因为任何语言的目的是沟通。机器很难理解句子之间的语言转换。过去,研究人员对乌尔都语音译和基于规则的翻译进行了研究。然而,将混合罗马乌尔都语翻译成如此精确的乌尔都语是新颖的。在本研究中,我们将混合语言(罗马乌尔都语和英语)介绍给乌尔都语译者。深度学习预训练模型“g2p_multilingual_byT5_small”使用新创建的混合罗马乌尔都语句子语料库及其纯乌尔都语翻译进行微调。BLEU分数为66.73,它可以将文本信息,段落或任何描述从罗马乌尔都语翻译成乌尔都语。本研究采用Python编程语言,并在Google Colab上进行模型训练。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
CMRUTU: Code Mixed Roman Urdu (Roman Urdu and English) to Urdu Translator
Urdu is the official language of Pakistan and a familiar language in the South Asian countries. It is spoken as the first language by nearly 70 million people and as a second language by more than 100 million people, mainly in Pakistan and India. Most of the textual communication is not pure Roman Urdu. There are words of actual English in between those Roman Urdu sentences. It is necessary to have a translator that can translate these code-mixed sentences into Urdu because the purpose of any language is to communicate. It can be difficult for a machine to understand the shift of languages in between a sentence. In the past, researchers have worked on Urdu transliteration and rule-based translation. However, a pure translation of mixed Roman Urdu to Urdu with such accuracy is novel. In this research, we have introduced Mixed Language (Roman Urdu and English) to the Urdu translator. A deep learning pre-trained model “g2p_multilingual_byT5_small” is fine-tuned with a newly created corpus of Mixed Roman Urdu sentences and their translations in pure Urdu. With a BLEU score of 66.73, It can translate text messages, paragraphs, or any descriptions from Roman Urdu to Urdu. We have carried out this research using Python programming language and the model training on Google Colab.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Recognition of Faces Wearing Masks Using Skip Connection Based Dense Units Augmented With Self Restrained Triplet Loss Enhancing NDVI Calculation of Low-Resolution Imagery using ESRGANs Device Interoperability for Industrial IoT using Model-Driven Architecture Multi-Organ Plant Classification Using Deep Learning A Systematic Review on Fully Automated Online Exam Proctoring Approaches
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1