A hybrid composite features based sentence level sentiment analyzer

Mohammed Maree, Mujahed Eleyat, Shatha Rabayah, M. Belkhatir
{"title":"A hybrid composite features based sentence level sentiment analyzer","authors":"Mohammed Maree, Mujahed Eleyat, Shatha Rabayah, M. Belkhatir","doi":"10.11591/ijai.v12.i1.pp284-294","DOIUrl":null,"url":null,"abstract":"Current lexica and machine learning based sentiment analysis approaches still suffer from a two-fold limitation. First, manual lexicon construction and machine training is time consuming and error-prone. Second, the prediction’s accuracy entails sentences and their corresponding training text should fall under the same domain. In this article, we experimentally evaluate four sentiment classifiers, namely Support Vector Machines, Naive Bayes, Logistic Regression and Random Forest. We quantify the quality of each of these models using three real-world datasets that comprise 50,000 movie reviews, 10,662 sentences, and 300 generic movie reviews. Specifically, we study the impact of a variety of natural language processing (NLP) pipelines on the quality of the predicted sentiment orientations. Additionally, we measure the impact of incorporating lexical semantic knowledge captured by WordNet on expanding original words in sentences. Findings demonstrate that the utilizing different NLP pipelines and semantic relationships impacts the quality of the sentiment analyzers. In particular, results indicate that coupling lemmatization and knowledge-based n-gram features proved to produce higher accuracy results. With this coupling, the accuracy of the support vector machine (SVM) classifier has improved to 90.43%, while it was 86.83%, 90.11%, 86.20%, respectively using the three other classifiers. ","PeriodicalId":52221,"journal":{"name":"IAES International Journal of Artificial Intelligence","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IAES International Journal of Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.11591/ijai.v12.i1.pp284-294","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Decision Sciences","Score":null,"Total":0}
引用次数: 0

Abstract

Current lexica and machine learning based sentiment analysis approaches still suffer from a two-fold limitation. First, manual lexicon construction and machine training is time consuming and error-prone. Second, the prediction’s accuracy entails sentences and their corresponding training text should fall under the same domain. In this article, we experimentally evaluate four sentiment classifiers, namely Support Vector Machines, Naive Bayes, Logistic Regression and Random Forest. We quantify the quality of each of these models using three real-world datasets that comprise 50,000 movie reviews, 10,662 sentences, and 300 generic movie reviews. Specifically, we study the impact of a variety of natural language processing (NLP) pipelines on the quality of the predicted sentiment orientations. Additionally, we measure the impact of incorporating lexical semantic knowledge captured by WordNet on expanding original words in sentences. Findings demonstrate that the utilizing different NLP pipelines and semantic relationships impacts the quality of the sentiment analyzers. In particular, results indicate that coupling lemmatization and knowledge-based n-gram features proved to produce higher accuracy results. With this coupling, the accuracy of the support vector machine (SVM) classifier has improved to 90.43%, while it was 86.83%, 90.11%, 86.20%, respectively using the three other classifiers. 
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
一种基于混合复合特征的句子级情感分析器
当前基于词汇和机器学习的情绪分析方法仍然受到双重限制。首先,人工词汇构建和机器训练耗时且容易出错。其次,预测的准确性要求句子及其相应的训练文本应属于同一领域。在本文中,我们对四种情绪分类器进行了实验评估,即支持向量机、朴素贝叶斯、逻辑回归和随机森林。我们使用三个真实世界的数据集来量化这些模型中每一个的质量,这些数据集包括50000条电影评论、10662句句子和300条普通电影评论。具体来说,我们研究了各种自然语言处理(NLP)管道对预测情感取向质量的影响。此外,我们还测量了整合WordNet获取的词汇语义知识对扩展句子中的原始单词的影响。研究结果表明,使用不同的NLP管道和语义关系会影响情绪分析器的质量。特别地,结果表明,耦合引理化和基于知识的n-gram特征被证明产生了更高精度的结果。通过这种耦合,支持向量机(SVM)分类器的准确率提高到90.43%,而使用其他三个分类器的准确度分别为86.83%、90.11%和86.20%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IAES International Journal of Artificial Intelligence
IAES International Journal of Artificial Intelligence Decision Sciences-Information Systems and Management
CiteScore
3.90
自引率
0.00%
发文量
170
期刊最新文献
Traffic light counter detection comparison using you only look oncev3 and you only look oncev5 for version 3 and 5 Eligibility of village fund direct cash assistance recipients using artificial neural network Reducing the time needed to solve a traveling salesman problem by clustering with a Hierarchy-based algorithm Glove based wearable devices for sign language-GloSign Hybrid travel time estimation model for public transit buses using limited datasets
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1