{"title":"Improving of Imbalanced Data in Multiclass Classification for Sentiment Analysis using Supervised Term Weighting","authors":"J. Polpinij, K. Namee","doi":"10.1109/RI2C51727.2021.9559797","DOIUrl":null,"url":null,"abstract":"Sentiment classification (SC) is an ongoing field of research, which involves computing opinions, sentiments, and the subjectivity of a text. It has recently been proven that imbalanced classification is challenging for the SC research community. Most existing studies assume that the balance between negative and positive samples may not be true in reality. This work describes a method to improve the problem of imbalanced sentiment classification using supervised term weighting schemes and shows how these weighting schemes can improve the performance of sentiment classification with imbalanced data, especially in the domain of multi-class classification. Nonetheless, to obtain the most appropriate term weighting schemes, five term weighting schemes are comparatively studied, namely tf-idf, tf-idf-icf, tf-rf, tf-igm, and sqrt_tf-igm. In addition to comparing several term weightings schemes, this work also compares four supervised machine learning algorithms to obtain an appropriate algorithm, including k-Nearest Neighbor (k-NN), Multinomial Naïve Bayes (MNB), Support Vector Machines (SVM) with linear, and SVM with RBF. After evaluating by F1, the performance of sqrt_tf-igm was superior to all other weighting schemes. Since the overall picture of sqrt_tf-igm returned better results than the tf-idf, tf-idf-icf, and tf-rf methods, with improved scores of F1 at 10.94%. Meanwhile, the result of sqrt_tf-igm was slightly better than tf-igm.","PeriodicalId":422981,"journal":{"name":"2021 Research, Invention, and Innovation Congress: Innovation Electricals and Electronics (RI2C)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 Research, Invention, and Innovation Congress: Innovation Electricals and Electronics (RI2C)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/RI2C51727.2021.9559797","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Sentiment classification (SC) is an ongoing field of research, which involves computing opinions, sentiments, and the subjectivity of a text. It has recently been proven that imbalanced classification is challenging for the SC research community. Most existing studies assume that the balance between negative and positive samples may not be true in reality. This work describes a method to improve the problem of imbalanced sentiment classification using supervised term weighting schemes and shows how these weighting schemes can improve the performance of sentiment classification with imbalanced data, especially in the domain of multi-class classification. Nonetheless, to obtain the most appropriate term weighting schemes, five term weighting schemes are comparatively studied, namely tf-idf, tf-idf-icf, tf-rf, tf-igm, and sqrt_tf-igm. In addition to comparing several term weightings schemes, this work also compares four supervised machine learning algorithms to obtain an appropriate algorithm, including k-Nearest Neighbor (k-NN), Multinomial Naïve Bayes (MNB), Support Vector Machines (SVM) with linear, and SVM with RBF. After evaluating by F1, the performance of sqrt_tf-igm was superior to all other weighting schemes. Since the overall picture of sqrt_tf-igm returned better results than the tf-idf, tf-idf-icf, and tf-rf methods, with improved scores of F1 at 10.94%. Meanwhile, the result of sqrt_tf-igm was slightly better than tf-igm.