{"title":"An Experimental Study on Hybrid Feature Selection Techniques for Sentiment Classification","authors":"N. Dina, Sri Devi Ravana, N. Idris","doi":"10.1109/SKIMA57145.2022.10029452","DOIUrl":null,"url":null,"abstract":"Text sentiment classification aims to extract useful information from unstructured text data and classify its sentiment into positive and negative categories. Irrelevant features and high-dimensional feature space from text data are common issues in sentiment classification because they degrade the classification performance. To address these issues, this study applies hybrid feature selection using Term Frequency-Inverse Document Frequency (TF-IDF) and Support Vector Machine-Recursive Feature Elimination (SVM-RFE) to three text datasets: IMDB, Yelp, and Amazon. The TF-IDF is employed to select sentiment features, which are further refined by SVM-RFE. Finally, SVM is applied to determine whether the sentiment is positive or negative. This study outperforms the existing techniques in two datasets: 88% accuracy in the IMDB dataset and 84.5% in the Yelp dataset. Meanwhile, the accuracy in the Amazon dataset is lower than the existing studies, at 81.5%. These results indicate inconsistency of the technique, and it opens the opportunity for further research on the other hybrid feature selection techniques for sentiment classification to improve the accuracy in all datasets. Also, the results show that the technique improved classification performance and reduced feature space by 63%.","PeriodicalId":277436,"journal":{"name":"2022 14th International Conference on Software, Knowledge, Information Management and Applications (SKIMA)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 14th International Conference on Software, Knowledge, Information Management and Applications (SKIMA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SKIMA57145.2022.10029452","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Text sentiment classification aims to extract useful information from unstructured text data and classify its sentiment into positive and negative categories. Irrelevant features and high-dimensional feature space from text data are common issues in sentiment classification because they degrade the classification performance. To address these issues, this study applies hybrid feature selection using Term Frequency-Inverse Document Frequency (TF-IDF) and Support Vector Machine-Recursive Feature Elimination (SVM-RFE) to three text datasets: IMDB, Yelp, and Amazon. The TF-IDF is employed to select sentiment features, which are further refined by SVM-RFE. Finally, SVM is applied to determine whether the sentiment is positive or negative. This study outperforms the existing techniques in two datasets: 88% accuracy in the IMDB dataset and 84.5% in the Yelp dataset. Meanwhile, the accuracy in the Amazon dataset is lower than the existing studies, at 81.5%. These results indicate inconsistency of the technique, and it opens the opportunity for further research on the other hybrid feature selection techniques for sentiment classification to improve the accuracy in all datasets. Also, the results show that the technique improved classification performance and reduced feature space by 63%.