乌尔都语文本情感分析的机器学习方法

IF 0.6 Q3 ENGINEERING, MULTIDISCIPLINARY Mehran University Research Journal of Engineering and Technology Pub Date : 2023-04-03 DOI:10.22581/muet1982.2302.09
Muhammad Akhtar, Saif-Ur Rehman
{"title":"乌尔都语文本情感分析的机器学习方法","authors":"Muhammad Akhtar, Saif-Ur Rehman","doi":"10.22581/muet1982.2302.09","DOIUrl":null,"url":null,"abstract":"Product evaluations, ratings, and other sorts of online expressions have risen in popularity as a result of the emergence of social networking sites and blogs. Sentiment analysis has emerged as a new area of study for computational linguists as a result of this rapidly expanding data set. From around a decade ago, this has been a topic of discussion for English speakers. However, the scientific community completely ignores other important languages, such as Urdu. Morphologically, Urdu is one of the most complex languages in the world. For this reason, a variety of unique characteristics, such as the language's unusual morphology and unrestricted word order, make the Urdu language processing a difficult challenge to solve. This research provides a new framework for the categorization of Urdu language sentiments. The main contributions of the research are to show how important this multidimensional research problem is as well as its technical parts, such as the parsing algorithm, corpus, lexicon, etc. A new approach for Urdu text sentiment analysis including data gathering, pre-processing, feature extraction, feature vector formation, and finally, sentiment classification has been designed to deal with Urdu language sentiments. The result and discussion section provides a comprehensive comparison of the proposed work with the standard baseline method in terms of precision, recall, f-measure, and accuracy of three different types of datasets. In the overall comparison of the models, the proposed work shows an encouraging achievement in terms of accuracy and other metrics. Last but not least, this section also provides the featured trend and possible direction of the current work.","PeriodicalId":44836,"journal":{"name":"Mehran University Research Journal of Engineering and Technology","volume":" ","pages":""},"PeriodicalIF":0.6000,"publicationDate":"2023-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A machine learning approach for Urdu text sentiment analysis\",\"authors\":\"Muhammad Akhtar, Saif-Ur Rehman\",\"doi\":\"10.22581/muet1982.2302.09\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Product evaluations, ratings, and other sorts of online expressions have risen in popularity as a result of the emergence of social networking sites and blogs. Sentiment analysis has emerged as a new area of study for computational linguists as a result of this rapidly expanding data set. From around a decade ago, this has been a topic of discussion for English speakers. However, the scientific community completely ignores other important languages, such as Urdu. Morphologically, Urdu is one of the most complex languages in the world. For this reason, a variety of unique characteristics, such as the language's unusual morphology and unrestricted word order, make the Urdu language processing a difficult challenge to solve. This research provides a new framework for the categorization of Urdu language sentiments. The main contributions of the research are to show how important this multidimensional research problem is as well as its technical parts, such as the parsing algorithm, corpus, lexicon, etc. A new approach for Urdu text sentiment analysis including data gathering, pre-processing, feature extraction, feature vector formation, and finally, sentiment classification has been designed to deal with Urdu language sentiments. The result and discussion section provides a comprehensive comparison of the proposed work with the standard baseline method in terms of precision, recall, f-measure, and accuracy of three different types of datasets. In the overall comparison of the models, the proposed work shows an encouraging achievement in terms of accuracy and other metrics. Last but not least, this section also provides the featured trend and possible direction of the current work.\",\"PeriodicalId\":44836,\"journal\":{\"name\":\"Mehran University Research Journal of Engineering and Technology\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.6000,\"publicationDate\":\"2023-04-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Mehran University Research Journal of Engineering and Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.22581/muet1982.2302.09\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"ENGINEERING, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Mehran University Research Journal of Engineering and Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.22581/muet1982.2302.09","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

摘要

由于社交网站和博客的出现,产品评价、评级和其他类型的在线表达越来越受欢迎。由于数据集的迅速扩展,情感分析已经成为计算语言学家研究的一个新领域。从大约十年前开始,这一直是说英语的人讨论的话题。然而,科学界完全忽视了其他重要的语言,比如乌尔都语。在语态上,乌尔都语是世界上最复杂的语言之一。因此,乌尔都语的各种独特特征,如不同寻常的词法和不受限制的词序,使乌尔都语的语言处理成为一个难以解决的挑战。本研究为乌尔都语情感的分类提供了一个新的框架。该研究的主要贡献在于展示了这个多维研究问题的重要性,以及它的技术部分,如解析算法、语料库、词典等。针对乌尔都语情感,设计了一种新的乌尔都语文本情感分析方法,包括数据采集、预处理、特征提取、特征向量形成以及情感分类。结果和讨论部分在三种不同类型数据集的精密度、召回率、f-measure和准确性方面,提供了与标准基线方法所建议的工作的全面比较。在模型的总体比较中,所建议的工作在准确性和其他指标方面显示出令人鼓舞的成就。最后,本节还提供了当前工作的特色趋势和可能的方向。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
A machine learning approach for Urdu text sentiment analysis
Product evaluations, ratings, and other sorts of online expressions have risen in popularity as a result of the emergence of social networking sites and blogs. Sentiment analysis has emerged as a new area of study for computational linguists as a result of this rapidly expanding data set. From around a decade ago, this has been a topic of discussion for English speakers. However, the scientific community completely ignores other important languages, such as Urdu. Morphologically, Urdu is one of the most complex languages in the world. For this reason, a variety of unique characteristics, such as the language's unusual morphology and unrestricted word order, make the Urdu language processing a difficult challenge to solve. This research provides a new framework for the categorization of Urdu language sentiments. The main contributions of the research are to show how important this multidimensional research problem is as well as its technical parts, such as the parsing algorithm, corpus, lexicon, etc. A new approach for Urdu text sentiment analysis including data gathering, pre-processing, feature extraction, feature vector formation, and finally, sentiment classification has been designed to deal with Urdu language sentiments. The result and discussion section provides a comprehensive comparison of the proposed work with the standard baseline method in terms of precision, recall, f-measure, and accuracy of three different types of datasets. In the overall comparison of the models, the proposed work shows an encouraging achievement in terms of accuracy and other metrics. Last but not least, this section also provides the featured trend and possible direction of the current work.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
76
审稿时长
40 weeks
期刊最新文献
Heat transfer augmentation through engine oil-based hybrid nanofluid inside a trapezoid cavity Sustainable natural dyeing of cellulose with agricultural medicinal plant waste, new shades development with nontoxic sustainable elements Fabrication of low-cost and environmental-friendly EHD printable thin film nanocomposite triboelectric nanogenerator using household recyclable materials Compositional analysis of dark colored particulates homogeneously emitted with combustion gases (dark plumes) from brick making kilns situated in the area of Khyber Pakhtunkhwa, Pakistan Biosorption studies on arsenic (III) removal from industrial wastewater by using fixed and fluidized bed operation
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1