A machine learning approach for Urdu text sentiment analysis

IF 0.7 Q3 ENGINEERING, MULTIDISCIPLINARY Mehran University Research Journal of Engineering and Technology Pub Date : 2023-04-03 DOI:10.22581/muet1982.2302.09

Muhammad Akhtar, Saif-Ur Rehman

{"title":"A machine learning approach for Urdu text sentiment analysis","authors":"Muhammad Akhtar, Saif-Ur Rehman","doi":"10.22581/muet1982.2302.09","DOIUrl":null,"url":null,"abstract":"Product evaluations, ratings, and other sorts of online expressions have risen in popularity as a result of the emergence of social networking sites and blogs. Sentiment analysis has emerged as a new area of study for computational linguists as a result of this rapidly expanding data set. From around a decade ago, this has been a topic of discussion for English speakers. However, the scientific community completely ignores other important languages, such as Urdu. Morphologically, Urdu is one of the most complex languages in the world. For this reason, a variety of unique characteristics, such as the language's unusual morphology and unrestricted word order, make the Urdu language processing a difficult challenge to solve. This research provides a new framework for the categorization of Urdu language sentiments. The main contributions of the research are to show how important this multidimensional research problem is as well as its technical parts, such as the parsing algorithm, corpus, lexicon, etc. A new approach for Urdu text sentiment analysis including data gathering, pre-processing, feature extraction, feature vector formation, and finally, sentiment classification has been designed to deal with Urdu language sentiments. The result and discussion section provides a comprehensive comparison of the proposed work with the standard baseline method in terms of precision, recall, f-measure, and accuracy of three different types of datasets. In the overall comparison of the models, the proposed work shows an encouraging achievement in terms of accuracy and other metrics. Last but not least, this section also provides the featured trend and possible direction of the current work.","PeriodicalId":44836,"journal":{"name":"Mehran University Research Journal of Engineering and Technology","volume":" ","pages":""},"PeriodicalIF":0.7000,"publicationDate":"2023-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Mehran University Research Journal of Engineering and Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.22581/muet1982.2302.09","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

Abstract

Product evaluations, ratings, and other sorts of online expressions have risen in popularity as a result of the emergence of social networking sites and blogs. Sentiment analysis has emerged as a new area of study for computational linguists as a result of this rapidly expanding data set. From around a decade ago, this has been a topic of discussion for English speakers. However, the scientific community completely ignores other important languages, such as Urdu. Morphologically, Urdu is one of the most complex languages in the world. For this reason, a variety of unique characteristics, such as the language's unusual morphology and unrestricted word order, make the Urdu language processing a difficult challenge to solve. This research provides a new framework for the categorization of Urdu language sentiments. The main contributions of the research are to show how important this multidimensional research problem is as well as its technical parts, such as the parsing algorithm, corpus, lexicon, etc. A new approach for Urdu text sentiment analysis including data gathering, pre-processing, feature extraction, feature vector formation, and finally, sentiment classification has been designed to deal with Urdu language sentiments. The result and discussion section provides a comprehensive comparison of the proposed work with the standard baseline method in terms of precision, recall, f-measure, and accuracy of three different types of datasets. In the overall comparison of the models, the proposed work shows an encouraging achievement in terms of accuracy and other metrics. Last but not least, this section also provides the featured trend and possible direction of the current work.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

乌尔都语文本情感分析的机器学习方法

由于社交网站和博客的出现，产品评价、评级和其他类型的在线表达越来越受欢迎。由于数据集的迅速扩展，情感分析已经成为计算语言学家研究的一个新领域。从大约十年前开始，这一直是说英语的人讨论的话题。然而，科学界完全忽视了其他重要的语言，比如乌尔都语。在语态上，乌尔都语是世界上最复杂的语言之一。因此，乌尔都语的各种独特特征，如不同寻常的词法和不受限制的词序，使乌尔都语的语言处理成为一个难以解决的挑战。本研究为乌尔都语情感的分类提供了一个新的框架。该研究的主要贡献在于展示了这个多维研究问题的重要性，以及它的技术部分，如解析算法、语料库、词典等。针对乌尔都语情感，设计了一种新的乌尔都语文本情感分析方法，包括数据采集、预处理、特征提取、特征向量形成以及情感分类。结果和讨论部分在三种不同类型数据集的精密度、召回率、f-measure和准确性方面，提供了与标准基线方法所建议的工作的全面比较。在模型的总体比较中，所建议的工作在准确性和其他指标方面显示出令人鼓舞的成就。最后，本节还提供了当前工作的特色趋势和可能的方向。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Mehran University Research Journal of Engineering and Technology ENGINEERING, MULTIDISCIPLINARY-

自引率

0.00%

发文量

审稿时长

40 weeks