An Experimental Study on Hybrid Feature Selection Techniques for Sentiment Classification

N. Dina, Sri Devi Ravana, N. Idris
{"title":"An Experimental Study on Hybrid Feature Selection Techniques for Sentiment Classification","authors":"N. Dina, Sri Devi Ravana, N. Idris","doi":"10.1109/SKIMA57145.2022.10029452","DOIUrl":null,"url":null,"abstract":"Text sentiment classification aims to extract useful information from unstructured text data and classify its sentiment into positive and negative categories. Irrelevant features and high-dimensional feature space from text data are common issues in sentiment classification because they degrade the classification performance. To address these issues, this study applies hybrid feature selection using Term Frequency-Inverse Document Frequency (TF-IDF) and Support Vector Machine-Recursive Feature Elimination (SVM-RFE) to three text datasets: IMDB, Yelp, and Amazon. The TF-IDF is employed to select sentiment features, which are further refined by SVM-RFE. Finally, SVM is applied to determine whether the sentiment is positive or negative. This study outperforms the existing techniques in two datasets: 88% accuracy in the IMDB dataset and 84.5% in the Yelp dataset. Meanwhile, the accuracy in the Amazon dataset is lower than the existing studies, at 81.5%. These results indicate inconsistency of the technique, and it opens the opportunity for further research on the other hybrid feature selection techniques for sentiment classification to improve the accuracy in all datasets. Also, the results show that the technique improved classification performance and reduced feature space by 63%.","PeriodicalId":277436,"journal":{"name":"2022 14th International Conference on Software, Knowledge, Information Management and Applications (SKIMA)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 14th International Conference on Software, Knowledge, Information Management and Applications (SKIMA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SKIMA57145.2022.10029452","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Text sentiment classification aims to extract useful information from unstructured text data and classify its sentiment into positive and negative categories. Irrelevant features and high-dimensional feature space from text data are common issues in sentiment classification because they degrade the classification performance. To address these issues, this study applies hybrid feature selection using Term Frequency-Inverse Document Frequency (TF-IDF) and Support Vector Machine-Recursive Feature Elimination (SVM-RFE) to three text datasets: IMDB, Yelp, and Amazon. The TF-IDF is employed to select sentiment features, which are further refined by SVM-RFE. Finally, SVM is applied to determine whether the sentiment is positive or negative. This study outperforms the existing techniques in two datasets: 88% accuracy in the IMDB dataset and 84.5% in the Yelp dataset. Meanwhile, the accuracy in the Amazon dataset is lower than the existing studies, at 81.5%. These results indicate inconsistency of the technique, and it opens the opportunity for further research on the other hybrid feature selection techniques for sentiment classification to improve the accuracy in all datasets. Also, the results show that the technique improved classification performance and reduced feature space by 63%.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
情感分类中混合特征选择技术的实验研究
文本情感分类旨在从非结构化文本数据中提取有用信息,并将其情感分为积极和消极两类。文本数据中的不相关特征和高维特征空间是情感分类中常见的问题,因为它们会降低分类性能。为了解决这些问题,本研究将使用术语频率-逆文档频率(TF-IDF)和支持向量机-递归特征消除(SVM-RFE)的混合特征选择应用于三个文本数据集:IMDB, Yelp和Amazon。使用TF-IDF选择情感特征,并通过SVM-RFE进一步细化。最后,利用支持向量机判断情感是积极的还是消极的。该研究在两个数据集上优于现有技术:IMDB数据集的准确率为88%,Yelp数据集的准确率为84.5%。同时,亚马逊数据集的准确率低于现有研究,为81.5%。这些结果表明了该技术的不一致性,并为进一步研究其他用于情感分类的混合特征选择技术提供了机会,以提高所有数据集的准确性。此外,结果表明,该技术提高了分类性能,减少了63%的特征空间。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A Clustering Based Priority Driven Sampling Technique for Imbalance Data Classification Incorporating Extended Reality Technology for Delivering Computer Aided Design and Visualisation Modules Generation of High-Quality Relevant Judgments through Document Similarity and Document Pooling for the Evaluation of Information Retrieval Systems A Framework of Ensemble CNN Models for Real-Time Sign Language Translation Multidimensional Disturbance Propagation Model for a Network of Bus Lines
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1