Transforming Language Translation: A Deep Learning Approach to Urdu–English Translation

3区计算机科学 Q1 Computer Science Journal of Ambient Intelligence and Humanized Computing Pub Date : 2024-08-22 DOI:10.1007/s12652-024-04839-2

Iqra Safder, Muhammad Abu Bakar, Farooq Zaman, Hajra Waheed, Naif Radi Aljohani, Raheel Nawaz, Saeed Ul Hassan

{"title":"Transforming Language Translation: A Deep Learning Approach to Urdu–English Translation","authors":"Iqra Safder, Muhammad Abu Bakar, Farooq Zaman, Hajra Waheed, Naif Radi Aljohani, Raheel Nawaz, Saeed Ul Hassan","doi":"10.1007/s12652-024-04839-2","DOIUrl":null,"url":null,"abstract":"<p>Machine translation has revolutionized the field of language translation in the last decade. Initially dominated by statistical models, the rise of deep learning techniques has led to neural networks, particularly Transformer models, taking the lead. These models have demonstrated exceptional performance in natural language processing tasks, surpassing traditional sequence-to-sequence models like RNN, GRU, and LSTM. With advantages like better handling of long-range dependencies and requiring less training time, the NLP community has shifted towards using Transformers for sequence-to-sequence tasks. In this work, we leverage the sequence-to-sequence transformer model to translate Urdu (a low resourced language) to English. Our model is based on a variant of transformer with some changes as activation dropout, attention dropout and final layer normalization. We have used four different datasets (UMC005, Tanzil, The Wire, and PIB) from two categories (religious and news) to train our model. The achieved results demonstrated that the model’s performance and quality of translation varied depending on the dataset used for fine-tuning. Our designed model has out performed the baseline models with 23.9 BLEU, 0.46 chrf, 0.44 METEOR and 60.75 TER scores. The enhanced performance attributes to meticulous parameter tuning, encompassing modifications in architecture and optimization techniques. Comprehensive parametric details regarding model configurations and optimizations are provided to elucidate the distinctiveness of our approach and how it surpasses prior works. We provide source code via GitHub for future studies.</p>","PeriodicalId":14959,"journal":{"name":"Journal of Ambient Intelligence and Humanized Computing","volume":"2 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Ambient Intelligence and Humanized Computing","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s12652-024-04839-2","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Computer Science","Score":null,"Total":0}

引用次数: 0

Abstract

Machine translation has revolutionized the field of language translation in the last decade. Initially dominated by statistical models, the rise of deep learning techniques has led to neural networks, particularly Transformer models, taking the lead. These models have demonstrated exceptional performance in natural language processing tasks, surpassing traditional sequence-to-sequence models like RNN, GRU, and LSTM. With advantages like better handling of long-range dependencies and requiring less training time, the NLP community has shifted towards using Transformers for sequence-to-sequence tasks. In this work, we leverage the sequence-to-sequence transformer model to translate Urdu (a low resourced language) to English. Our model is based on a variant of transformer with some changes as activation dropout, attention dropout and final layer normalization. We have used four different datasets (UMC005, Tanzil, The Wire, and PIB) from two categories (religious and news) to train our model. The achieved results demonstrated that the model’s performance and quality of translation varied depending on the dataset used for fine-tuning. Our designed model has out performed the baseline models with 23.9 BLEU, 0.46 chrf, 0.44 METEOR and 60.75 TER scores. The enhanced performance attributes to meticulous parameter tuning, encompassing modifications in architecture and optimization techniques. Comprehensive parametric details regarding model configurations and optimizations are provided to elucidate the distinctiveness of our approach and how it surpasses prior works. We provide source code via GitHub for future studies.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

改变语言翻译：乌尔都语-英语翻译的深度学习方法

过去十年间，机器翻译彻底改变了语言翻译领域。最初由统计模型主导，随着深度学习技术的兴起，神经网络，尤其是 Transformer 模型，占据了主导地位。这些模型在自然语言处理任务中表现出卓越的性能，超越了 RNN、GRU 和 LSTM 等传统的序列到序列模型。这些模型具有更好地处理长距离依赖关系和所需训练时间更短等优势，因此 NLP 界已转向在序列到序列任务中使用变换器。在这项工作中，我们利用序列到序列转换器模型将乌尔都语（一种资源匮乏的语言）翻译成英语。我们的模型基于变换器的一个变体，并做了一些改动，如激活剔除、注意力剔除和最终层归一化。我们使用了来自两个类别（宗教和新闻）的四个不同数据集（UMC005、Tanzil、The Wire 和 PIB）来训练我们的模型。所取得的结果表明，模型的性能和翻译质量因用于微调的数据集而异。我们设计的模型的 BLEU 值为 23.9，chrf 值为 0.46，METEOR 值为 0.44，TER 值为 60.75，均优于基线模型。性能的提升归功于细致的参数调整，包括架构和优化技术的修改。我们提供了有关模型配置和优化的全面参数细节，以阐明我们方法的独特性以及它如何超越了之前的研究成果。我们通过 GitHub 提供源代码，供未来研究使用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Journal of Ambient Intelligence and Humanized Computing COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCEC-COMPUTER SCIENCE, INFORMATION SYSTEMS

CiteScore

9.60

自引率

0.00%

发文量

854

期刊介绍： The purpose of JAIHC is to provide a high profile, leading edge forum for academics, industrial professionals, educators and policy makers involved in the field to contribute, to disseminate the most innovative researches and developments of all aspects of ambient intelligence and humanized computing, such as intelligent/smart objects, environments/spaces, and systems. The journal discusses various technical, safety, personal, social, physical, political, artistic and economic issues. The research topics covered by the journal are (but not limited to): Pervasive/Ubiquitous Computing and Applications Cognitive wireless sensor network Embedded Systems and Software Mobile Computing and Wireless Communications Next Generation Multimedia Systems Security, Privacy and Trust Service and Semantic Computing Advanced Networking Architectures Dependable, Reliable and Autonomic Computing Embedded Smart Agents Context awareness, social sensing and inference Multi modal interaction design Ergonomics and product prototyping Intelligent and self-organizing transportation networks & services Healthcare Systems Virtual Humans & Virtual Worlds Wearables sensors and actuators