Pulung Hendro Prastyo, I. Ardiyanto, Risanuri Hidayat
{"title":"情感分析中使用过滤器、包装器或混合方法的特征选择技术综述","authors":"Pulung Hendro Prastyo, I. Ardiyanto, Risanuri Hidayat","doi":"10.1109/ICST50505.2020.9732885","DOIUrl":null,"url":null,"abstract":"Sentiment analysis is one of the text mining fields that classify the polarity of document texts and determine positive, neutral, or negative opinions. Document texts tend to have noise features or irrelevant features, so that feature selection is needed to overcome the problems. The feature selection is a challenge in sentiment analysis to produce accurate models. It is crucial for improving machine learning algorithms because it can reduce the dimensionality of feature space, remove irrelevant features, select valuable features, and increase learning accuracy. Therefore, this study focuses on reviewing feature selection techniques classified into three categories, such as filter, wrapper, and hybrid methods. The review results concluded that all feature selection techniques could select essential features, reduce the dimensionality of feature space, and improve the accuracy of machine learning algorithms. Filter methods are easy to implement and faster than wrapper and hybrid methods, whereas wrapper methods are better than filter methods in terms of accuracy but slower than filter methods. The hybrid techniques are the best feature selection method to resolve redundant and irrelevant data and increase the classifier's performance. However, hybrid methods are complicated. Thus, they need a high computational cost.","PeriodicalId":125807,"journal":{"name":"2020 6th International Conference on Science and Technology (ICST)","volume":"200 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"292","resultStr":"{\"title\":\"A Review of Feature Selection Techniques in Sentiment Analysis Using Filter, Wrapper, or Hybrid Methods\",\"authors\":\"Pulung Hendro Prastyo, I. Ardiyanto, Risanuri Hidayat\",\"doi\":\"10.1109/ICST50505.2020.9732885\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Sentiment analysis is one of the text mining fields that classify the polarity of document texts and determine positive, neutral, or negative opinions. Document texts tend to have noise features or irrelevant features, so that feature selection is needed to overcome the problems. The feature selection is a challenge in sentiment analysis to produce accurate models. It is crucial for improving machine learning algorithms because it can reduce the dimensionality of feature space, remove irrelevant features, select valuable features, and increase learning accuracy. Therefore, this study focuses on reviewing feature selection techniques classified into three categories, such as filter, wrapper, and hybrid methods. The review results concluded that all feature selection techniques could select essential features, reduce the dimensionality of feature space, and improve the accuracy of machine learning algorithms. Filter methods are easy to implement and faster than wrapper and hybrid methods, whereas wrapper methods are better than filter methods in terms of accuracy but slower than filter methods. The hybrid techniques are the best feature selection method to resolve redundant and irrelevant data and increase the classifier's performance. However, hybrid methods are complicated. Thus, they need a high computational cost.\",\"PeriodicalId\":125807,\"journal\":{\"name\":\"2020 6th International Conference on Science and Technology (ICST)\",\"volume\":\"200 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-09-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"292\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 6th International Conference on Science and Technology (ICST)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICST50505.2020.9732885\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 6th International Conference on Science and Technology (ICST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICST50505.2020.9732885","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Review of Feature Selection Techniques in Sentiment Analysis Using Filter, Wrapper, or Hybrid Methods
Sentiment analysis is one of the text mining fields that classify the polarity of document texts and determine positive, neutral, or negative opinions. Document texts tend to have noise features or irrelevant features, so that feature selection is needed to overcome the problems. The feature selection is a challenge in sentiment analysis to produce accurate models. It is crucial for improving machine learning algorithms because it can reduce the dimensionality of feature space, remove irrelevant features, select valuable features, and increase learning accuracy. Therefore, this study focuses on reviewing feature selection techniques classified into three categories, such as filter, wrapper, and hybrid methods. The review results concluded that all feature selection techniques could select essential features, reduce the dimensionality of feature space, and improve the accuracy of machine learning algorithms. Filter methods are easy to implement and faster than wrapper and hybrid methods, whereas wrapper methods are better than filter methods in terms of accuracy but slower than filter methods. The hybrid techniques are the best feature selection method to resolve redundant and irrelevant data and increase the classifier's performance. However, hybrid methods are complicated. Thus, they need a high computational cost.