An Experimental Study on Hybrid Feature Selection Techniques for Sentiment Classification

2022 14th International Conference on Software, Knowledge, Information Management and Applications (SKIMA) Pub Date : 2022-12-02 DOI:10.1109/SKIMA57145.2022.10029452

N. Dina, Sri Devi Ravana, N. Idris

{"title":"An Experimental Study on Hybrid Feature Selection Techniques for Sentiment Classification","authors":"N. Dina, Sri Devi Ravana, N. Idris","doi":"10.1109/SKIMA57145.2022.10029452","DOIUrl":null,"url":null,"abstract":"Text sentiment classification aims to extract useful information from unstructured text data and classify its sentiment into positive and negative categories. Irrelevant features and high-dimensional feature space from text data are common issues in sentiment classification because they degrade the classification performance. To address these issues, this study applies hybrid feature selection using Term Frequency-Inverse Document Frequency (TF-IDF) and Support Vector Machine-Recursive Feature Elimination (SVM-RFE) to three text datasets: IMDB, Yelp, and Amazon. The TF-IDF is employed to select sentiment features, which are further refined by SVM-RFE. Finally, SVM is applied to determine whether the sentiment is positive or negative. This study outperforms the existing techniques in two datasets: 88% accuracy in the IMDB dataset and 84.5% in the Yelp dataset. Meanwhile, the accuracy in the Amazon dataset is lower than the existing studies, at 81.5%. These results indicate inconsistency of the technique, and it opens the opportunity for further research on the other hybrid feature selection techniques for sentiment classification to improve the accuracy in all datasets. Also, the results show that the technique improved classification performance and reduced feature space by 63%.","PeriodicalId":277436,"journal":{"name":"2022 14th International Conference on Software, Knowledge, Information Management and Applications (SKIMA)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 14th International Conference on Software, Knowledge, Information Management and Applications (SKIMA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SKIMA57145.2022.10029452","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Text sentiment classification aims to extract useful information from unstructured text data and classify its sentiment into positive and negative categories. Irrelevant features and high-dimensional feature space from text data are common issues in sentiment classification because they degrade the classification performance. To address these issues, this study applies hybrid feature selection using Term Frequency-Inverse Document Frequency (TF-IDF) and Support Vector Machine-Recursive Feature Elimination (SVM-RFE) to three text datasets: IMDB, Yelp, and Amazon. The TF-IDF is employed to select sentiment features, which are further refined by SVM-RFE. Finally, SVM is applied to determine whether the sentiment is positive or negative. This study outperforms the existing techniques in two datasets: 88% accuracy in the IMDB dataset and 84.5% in the Yelp dataset. Meanwhile, the accuracy in the Amazon dataset is lower than the existing studies, at 81.5%. These results indicate inconsistency of the technique, and it opens the opportunity for further research on the other hybrid feature selection techniques for sentiment classification to improve the accuracy in all datasets. Also, the results show that the technique improved classification performance and reduced feature space by 63%.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

情感分类中混合特征选择技术的实验研究

文本情感分类旨在从非结构化文本数据中提取有用信息，并将其情感分为积极和消极两类。文本数据中的不相关特征和高维特征空间是情感分类中常见的问题，因为它们会降低分类性能。为了解决这些问题，本研究将使用术语频率-逆文档频率(TF-IDF)和支持向量机-递归特征消除(SVM-RFE)的混合特征选择应用于三个文本数据集:IMDB, Yelp和Amazon。使用TF-IDF选择情感特征，并通过SVM-RFE进一步细化。最后，利用支持向量机判断情感是积极的还是消极的。该研究在两个数据集上优于现有技术:IMDB数据集的准确率为88%，Yelp数据集的准确率为84.5%。同时，亚马逊数据集的准确率低于现有研究，为81.5%。这些结果表明了该技术的不一致性，并为进一步研究其他用于情感分类的混合特征选择技术提供了机会，以提高所有数据集的准确性。此外，结果表明，该技术提高了分类性能，减少了63%的特征空间。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2022 14th International Conference on Software, Knowledge, Information Management and Applications (SKIMA)

自引率

0.00%

发文量