A hybrid composite features based sentence level sentiment analyzer

Q2 Decision Sciences IAES International Journal of Artificial Intelligence Pub Date : 2023-03-01 DOI:10.11591/ijai.v12.i1.pp284-294

Mohammed Maree, Mujahed Eleyat, Shatha Rabayah, M. Belkhatir

{"title":"A hybrid composite features based sentence level sentiment analyzer","authors":"Mohammed Maree, Mujahed Eleyat, Shatha Rabayah, M. Belkhatir","doi":"10.11591/ijai.v12.i1.pp284-294","DOIUrl":null,"url":null,"abstract":"Current lexica and machine learning based sentiment analysis approaches still suffer from a two-fold limitation. First, manual lexicon construction and machine training is time consuming and error-prone. Second, the prediction’s accuracy entails sentences and their corresponding training text should fall under the same domain. In this article, we experimentally evaluate four sentiment classifiers, namely Support Vector Machines, Naive Bayes, Logistic Regression and Random Forest. We quantify the quality of each of these models using three real-world datasets that comprise 50,000 movie reviews, 10,662 sentences, and 300 generic movie reviews. Specifically, we study the impact of a variety of natural language processing (NLP) pipelines on the quality of the predicted sentiment orientations. Additionally, we measure the impact of incorporating lexical semantic knowledge captured by WordNet on expanding original words in sentences. Findings demonstrate that the utilizing different NLP pipelines and semantic relationships impacts the quality of the sentiment analyzers. In particular, results indicate that coupling lemmatization and knowledge-based n-gram features proved to produce higher accuracy results. With this coupling, the accuracy of the support vector machine (SVM) classifier has improved to 90.43%, while it was 86.83%, 90.11%, 86.20%, respectively using the three other classifiers. ","PeriodicalId":52221,"journal":{"name":"IAES International Journal of Artificial Intelligence","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IAES International Journal of Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.11591/ijai.v12.i1.pp284-294","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Decision Sciences","Score":null,"Total":0}

引用次数: 0

Abstract

Current lexica and machine learning based sentiment analysis approaches still suffer from a two-fold limitation. First, manual lexicon construction and machine training is time consuming and error-prone. Second, the prediction’s accuracy entails sentences and their corresponding training text should fall under the same domain. In this article, we experimentally evaluate four sentiment classifiers, namely Support Vector Machines, Naive Bayes, Logistic Regression and Random Forest. We quantify the quality of each of these models using three real-world datasets that comprise 50,000 movie reviews, 10,662 sentences, and 300 generic movie reviews. Specifically, we study the impact of a variety of natural language processing (NLP) pipelines on the quality of the predicted sentiment orientations. Additionally, we measure the impact of incorporating lexical semantic knowledge captured by WordNet on expanding original words in sentences. Findings demonstrate that the utilizing different NLP pipelines and semantic relationships impacts the quality of the sentiment analyzers. In particular, results indicate that coupling lemmatization and knowledge-based n-gram features proved to produce higher accuracy results. With this coupling, the accuracy of the support vector machine (SVM) classifier has improved to 90.43%, while it was 86.83%, 90.11%, 86.20%, respectively using the three other classifiers.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

一种基于混合复合特征的句子级情感分析器

当前基于词汇和机器学习的情绪分析方法仍然受到双重限制。首先，人工词汇构建和机器训练耗时且容易出错。其次，预测的准确性要求句子及其相应的训练文本应属于同一领域。在本文中，我们对四种情绪分类器进行了实验评估，即支持向量机、朴素贝叶斯、逻辑回归和随机森林。我们使用三个真实世界的数据集来量化这些模型中每一个的质量，这些数据集包括50000条电影评论、10662句句子和300条普通电影评论。具体来说，我们研究了各种自然语言处理（NLP）管道对预测情感取向质量的影响。此外，我们还测量了整合WordNet获取的词汇语义知识对扩展句子中的原始单词的影响。研究结果表明，使用不同的NLP管道和语义关系会影响情绪分析器的质量。特别地，结果表明，耦合引理化和基于知识的n-gram特征被证明产生了更高精度的结果。通过这种耦合，支持向量机（SVM）分类器的准确率提高到90.43%，而使用其他三个分类器的准确度分别为86.83%、90.11%和86.20%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊