{"title":"基于过滤器和包装器的垃圾邮件评论分类特征选择方法的比较","authors":"Amalia Nur Anggraeni, K. Mustofa, Sigit Priyanta","doi":"10.22146/IJCCS.66965","DOIUrl":null,"url":null,"abstract":"The continuous growth of the internet has led to the use of social media for various purposes increase. For instance, some irresponsible parties take advantage of the comment feature on social media platforms to harm others by providing spam comments on the shared object. Furthermore, variation of comments creates many features to be processed, thereby negatively impacting the performance of a classification algorithm. Therefore, this study aims to solve the problem associated with spam comments by comparing filter and wrapper based feature selection using text classification techniques. Data collected from training and test data of 4944 and 100 comments showed that the best accuracy, precision, recall, and f-measure of MNB are 96%, 100%, 92%, and 95.8%. The best accuracy is achieved using feature selection by combining Chi-Square and Sequential Forward Selection methods with a subset of 500 features. Furthermore, the accuracy increase in the MNB and SVM classifications are 8% and 4%. This research concludes that the combination of feature selection improves the classification performance of Indonesian language spam comments.","PeriodicalId":31625,"journal":{"name":"IJCCS Indonesian Journal of Computing and Cybernetics Systems","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2021-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Comparison of Filter and Wrapper Based Feature Selection Methods on Spam Comment Classification\",\"authors\":\"Amalia Nur Anggraeni, K. Mustofa, Sigit Priyanta\",\"doi\":\"10.22146/IJCCS.66965\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The continuous growth of the internet has led to the use of social media for various purposes increase. For instance, some irresponsible parties take advantage of the comment feature on social media platforms to harm others by providing spam comments on the shared object. Furthermore, variation of comments creates many features to be processed, thereby negatively impacting the performance of a classification algorithm. Therefore, this study aims to solve the problem associated with spam comments by comparing filter and wrapper based feature selection using text classification techniques. Data collected from training and test data of 4944 and 100 comments showed that the best accuracy, precision, recall, and f-measure of MNB are 96%, 100%, 92%, and 95.8%. The best accuracy is achieved using feature selection by combining Chi-Square and Sequential Forward Selection methods with a subset of 500 features. Furthermore, the accuracy increase in the MNB and SVM classifications are 8% and 4%. This research concludes that the combination of feature selection improves the classification performance of Indonesian language spam comments.\",\"PeriodicalId\":31625,\"journal\":{\"name\":\"IJCCS Indonesian Journal of Computing and Cybernetics Systems\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-07-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IJCCS Indonesian Journal of Computing and Cybernetics Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.22146/IJCCS.66965\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IJCCS Indonesian Journal of Computing and Cybernetics Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.22146/IJCCS.66965","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Comparison of Filter and Wrapper Based Feature Selection Methods on Spam Comment Classification
The continuous growth of the internet has led to the use of social media for various purposes increase. For instance, some irresponsible parties take advantage of the comment feature on social media platforms to harm others by providing spam comments on the shared object. Furthermore, variation of comments creates many features to be processed, thereby negatively impacting the performance of a classification algorithm. Therefore, this study aims to solve the problem associated with spam comments by comparing filter and wrapper based feature selection using text classification techniques. Data collected from training and test data of 4944 and 100 comments showed that the best accuracy, precision, recall, and f-measure of MNB are 96%, 100%, 92%, and 95.8%. The best accuracy is achieved using feature selection by combining Chi-Square and Sequential Forward Selection methods with a subset of 500 features. Furthermore, the accuracy increase in the MNB and SVM classifications are 8% and 4%. This research concludes that the combination of feature selection improves the classification performance of Indonesian language spam comments.