Urdu Sentiment Analysis

IF 0.5 Q4 COMPUTER SCIENCE, THEORY & METHODS Applied Computer Systems Pub Date : 2022-06-01 DOI:10.2478/acss-2022-0004
Iffraah Rehman, Tariq Rahim Soomro
{"title":"Urdu Sentiment Analysis","authors":"Iffraah Rehman, Tariq Rahim Soomro","doi":"10.2478/acss-2022-0004","DOIUrl":null,"url":null,"abstract":"Abstract The world is heading towards more modernized and digitalized data and therefore a significant growth is observed in the active number of social media users with each passing day. Each post and comment can give an insight into valuable information about a certain topic or issue, a product or a brand, etc. Similarly, the process to uncover the underlying information from the opinion that a person keeps about any entity is called a sentiment analysis. The analysis can be carried out through two main approaches, i.e., either lexicon-based or machine learning algorithms. A significant amount of work in the different domains has been done in numerous languages for sentiment analysis, but minimal research has been conducted on the national language of Pakistan, which is Urdu. Twitter users who are familiar with Urdu update the tweets in two different textual formats either in Urdu Script (Nastaleeq) or in Roman Urdu. Thus, the paper is an attempt to perform the sentiment analysis on the Urdu language by extracting the tweets (Nastaleeq and Roman Urdu both) from Twitter using Tweepy API. A machine learning-based approach has been adopted for this study and the tool opted for the purpose is WEKA. The best algorithm was identified based on evaluation metrics, which comprise the number of correctly and incorrectly classified instances, accuracy, precision, and recall. SMO was found to be the most suitable machine learning algorithm for performing the sentiment analysis on Urdu (Nastaleeq) tweets, while the Roman Urdu Random Forest algorithm was identified as the best one.","PeriodicalId":41960,"journal":{"name":"Applied Computer Systems","volume":"85 10 1","pages":"30 - 42"},"PeriodicalIF":0.5000,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Computer Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2478/acss-2022-0004","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Abstract The world is heading towards more modernized and digitalized data and therefore a significant growth is observed in the active number of social media users with each passing day. Each post and comment can give an insight into valuable information about a certain topic or issue, a product or a brand, etc. Similarly, the process to uncover the underlying information from the opinion that a person keeps about any entity is called a sentiment analysis. The analysis can be carried out through two main approaches, i.e., either lexicon-based or machine learning algorithms. A significant amount of work in the different domains has been done in numerous languages for sentiment analysis, but minimal research has been conducted on the national language of Pakistan, which is Urdu. Twitter users who are familiar with Urdu update the tweets in two different textual formats either in Urdu Script (Nastaleeq) or in Roman Urdu. Thus, the paper is an attempt to perform the sentiment analysis on the Urdu language by extracting the tweets (Nastaleeq and Roman Urdu both) from Twitter using Tweepy API. A machine learning-based approach has been adopted for this study and the tool opted for the purpose is WEKA. The best algorithm was identified based on evaluation metrics, which comprise the number of correctly and incorrectly classified instances, accuracy, precision, and recall. SMO was found to be the most suitable machine learning algorithm for performing the sentiment analysis on Urdu (Nastaleeq) tweets, while the Roman Urdu Random Forest algorithm was identified as the best one.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
乌尔都语情感分析
世界正朝着更加现代化和数字化的方向发展,因此社交媒体的活跃用户数量日益显著增长。每一篇帖子和评论都可以提供关于某个主题或问题、产品或品牌等有价值信息的见解。同样,从一个人对任何实体的看法中发现潜在信息的过程被称为情感分析。分析可以通过两种主要方法进行,即基于词典或机器学习算法。在不同领域的大量工作已经在许多语言中进行了情感分析,但对巴基斯坦的国家语言乌尔都语进行的研究很少。熟悉乌尔都语的Twitter用户以两种不同的文本格式更新tweet,一种是乌尔都语脚本(Nastaleeq),另一种是罗马乌尔都语。因此,本文试图通过使用Tweepy API从Twitter中提取推文(Nastaleeq和Roman Urdu)来对乌尔都语进行情感分析。本研究采用了一种基于机器学习的方法,为此选择的工具是WEKA。根据评估指标确定最佳算法,评估指标包括正确和错误分类实例的数量、准确性、精度和召回率。SMO被认为是最适合对乌尔都语(Nastaleeq)推文进行情感分析的机器学习算法,而罗马乌尔都语随机森林算法被认为是最好的算法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Applied Computer Systems
Applied Computer Systems COMPUTER SCIENCE, THEORY & METHODS-
自引率
10.00%
发文量
9
审稿时长
30 weeks
期刊最新文献
Multimodal Biometric System Based on the Fusion in Score of Fingerprint and Online Handwritten Signature Multichannel Approach for Sentiment Analysis Using Stack of Neural Network with Lexicon Based Padding and Attention Mechanism BRS-based Model for the Specification of Multi-view Point Ontology Empirical Analysis of Supervised and Unsupervised Machine Learning Algorithms with Aspect-Based Sentiment Analysis Approximate Nearest Neighbour-based Index Tree: A Case Study for Instrumental Music Search
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1