A Scheme of Pairwise Feature Combinations to Improve Sentiment Classification Using Book Review Dataset

IF 1.3 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE International Journal of Innovative Computing Information and Control Pub Date : 2021-11-16 DOI:10.11113/ijic.v12n1.344
S. Huspi, Haisal Dauda Abubakar, M. Umar
{"title":"A Scheme of Pairwise Feature Combinations to Improve Sentiment Classification Using Book Review Dataset","authors":"S. Huspi, Haisal Dauda Abubakar, M. Umar","doi":"10.11113/ijic.v12n1.344","DOIUrl":null,"url":null,"abstract":"Sentiment Analysis is a Natural Language Processing (NLP) domain related to the identification or extraction of user sentiments or opinions from written language. Although the approaches to achieve the goals may vary, Machine Learning (ML) methods are gradually becoming the preferred method because of their ability to automatically draw useful insight from data regardless of their complexity. However, an important prerequisite for most ML algorithms to learn from text data is to encode them into numerical vectors. Popular approaches to this include word level representation methods TF-IDF, distributed word representations (word2vec) and distributed document representations (doc2vec). Each of these methods has demonstrated remarkable success in representing the encoded text, however we found that no method has been set to be excellence in all tasks. Motivated by this challenge, an improved scheme of pairwise fusion are proposed for sentiment classification of book reviews. In the experimental findings, Artificial Neural Networks (ANN) and Logistic Regression (LR) classifiers showed that the proposed scheme improved the performance compared to the single method vectorization method. We see that TF-IDF-word2vec performed best among other methods with a mean accuracy of 91.0% (ANN) and 92.5% (LR); showed an improvement of 0.7% and 0.2% respectively over TF-IDF which is the best single vector method. Thus, the proposed method can used as a compact alternative to the popular bag-of-n-gram models as it captures contextual information of encoded document with a less sparse data.","PeriodicalId":50314,"journal":{"name":"International Journal of Innovative Computing Information and Control","volume":"16 1","pages":""},"PeriodicalIF":1.3000,"publicationDate":"2021-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Innovative Computing Information and Control","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.11113/ijic.v12n1.344","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Sentiment Analysis is a Natural Language Processing (NLP) domain related to the identification or extraction of user sentiments or opinions from written language. Although the approaches to achieve the goals may vary, Machine Learning (ML) methods are gradually becoming the preferred method because of their ability to automatically draw useful insight from data regardless of their complexity. However, an important prerequisite for most ML algorithms to learn from text data is to encode them into numerical vectors. Popular approaches to this include word level representation methods TF-IDF, distributed word representations (word2vec) and distributed document representations (doc2vec). Each of these methods has demonstrated remarkable success in representing the encoded text, however we found that no method has been set to be excellence in all tasks. Motivated by this challenge, an improved scheme of pairwise fusion are proposed for sentiment classification of book reviews. In the experimental findings, Artificial Neural Networks (ANN) and Logistic Regression (LR) classifiers showed that the proposed scheme improved the performance compared to the single method vectorization method. We see that TF-IDF-word2vec performed best among other methods with a mean accuracy of 91.0% (ANN) and 92.5% (LR); showed an improvement of 0.7% and 0.2% respectively over TF-IDF which is the best single vector method. Thus, the proposed method can used as a compact alternative to the popular bag-of-n-gram models as it captures contextual information of encoded document with a less sparse data.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于书评数据集的两两特征组合改进情感分类方案
情感分析是一个自然语言处理(NLP)领域,涉及从书面语言中识别或提取用户情感或意见。尽管实现目标的方法可能会有所不同,但机器学习(ML)方法正逐渐成为首选方法,因为它们能够自动从数据中提取有用的见解,而不管其复杂性如何。然而,大多数ML算法从文本数据中学习的一个重要先决条件是将它们编码为数值向量。常用的方法包括词级表示方法TF-IDF、分布式词表示(word2vec)和分布式文档表示(doc2vec)。这些方法中的每一种都在表示编码文本方面取得了显著的成功,但是我们发现没有一种方法在所有任务中都是卓越的。针对这一挑战,提出了一种改进的两两融合的书评情感分类方法。在人工神经网络(ANN)和逻辑回归(LR)分类器的实验结果中,与单一方法的矢量化方法相比,该方法的性能有所提高。我们看到TF-IDF-word2vec在其他方法中表现最好,平均准确率为91.0% (ANN)和92.5% (LR);与最佳单载体TF-IDF相比,分别提高了0.7%和0.2%。因此,所提出的方法可以作为流行的n-gram bag模型的紧凑替代方案,因为它使用较少稀疏的数据捕获编码文档的上下文信息。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
3.20
自引率
20.00%
发文量
0
审稿时长
4.3 months
期刊介绍: The primary aim of the International Journal of Innovative Computing, Information and Control (IJICIC) is to publish high-quality papers of new developments and trends, novel techniques and approaches, innovative methodologies and technologies on the theory and applications of intelligent systems, information and control. The IJICIC is a peer-reviewed English language journal and is published bimonthly
期刊最新文献
A Robust Image Encryption Scheme Based on Block Compressive Sensing and Wavelet Transform New Proposed Mixed Transforms: CAW and FAW and Their Application in Medical Image Classification A Hybrid Multiwavelet Transform with Grey Wolf Optimization Used for an Efficient Classification of Documents A Useful and Effective Method for Selecting a Smart Controller for SDN Network Design and Implement Fast Dust Sand Image Enhancement Based on Color Correction and New Fuzzy Intensification Operators
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1