{"title":"SVM与Naïve内部富集贝叶斯算法预测仇恨言论的比较","authors":"Isnen HADİ AL GHOZALİ, Arif PİRMAN, Indra INDRA","doi":"10.31202/ecjse.1325078","DOIUrl":null,"url":null,"abstract":"Hate speech is one of the negative sides of social media abuse. Hate speech can be classified into insults, defamation, unpleasant acts, provoking, inciting, and spreading fake news (hoax). The purpose of this study is to compare the SVM and Naïve Bayes methods with feature extraction in the form of Indonesian NER (InNER) for detecting hate speech. To obtain the best model, this study applies five steps: a) data collection; b) data preprocessing; c) feature engineering; d) model development; and e) evaluating and comparing models. In this study, we have collected 7100 tweets as an initial dataset. After manual annotation, this study produced 1681 tweets: 548 insult tweets, 288 blasphemy tweets, 272 provocative tweets, and 573 neutral tweets. This study use two Python libraries that accommodate NER in Indonesian, namely the NLTK library and the Polyglot library. Based on the results of the evaluation of the proposed model, model 5, which develops the SVM algorithm with the NLTK library, is the best model proposed. This model shows an accuracy score of 92.88% with a precision of 0.93, a recall of 0.93, and an F-1 score of 0.92.","PeriodicalId":52363,"journal":{"name":"El-Cezeri Journal of Science and Engineering","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comparison of SVM and Naïve Bayes Algorithms with InNER enriched to Predict Hate Speech\",\"authors\":\"Isnen HADİ AL GHOZALİ, Arif PİRMAN, Indra INDRA\",\"doi\":\"10.31202/ecjse.1325078\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Hate speech is one of the negative sides of social media abuse. Hate speech can be classified into insults, defamation, unpleasant acts, provoking, inciting, and spreading fake news (hoax). The purpose of this study is to compare the SVM and Naïve Bayes methods with feature extraction in the form of Indonesian NER (InNER) for detecting hate speech. To obtain the best model, this study applies five steps: a) data collection; b) data preprocessing; c) feature engineering; d) model development; and e) evaluating and comparing models. In this study, we have collected 7100 tweets as an initial dataset. After manual annotation, this study produced 1681 tweets: 548 insult tweets, 288 blasphemy tweets, 272 provocative tweets, and 573 neutral tweets. This study use two Python libraries that accommodate NER in Indonesian, namely the NLTK library and the Polyglot library. Based on the results of the evaluation of the proposed model, model 5, which develops the SVM algorithm with the NLTK library, is the best model proposed. This model shows an accuracy score of 92.88% with a precision of 0.93, a recall of 0.93, and an F-1 score of 0.92.\",\"PeriodicalId\":52363,\"journal\":{\"name\":\"El-Cezeri Journal of Science and Engineering\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-09-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"El-Cezeri Journal of Science and Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.31202/ecjse.1325078\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"Engineering\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"El-Cezeri Journal of Science and Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.31202/ecjse.1325078","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Engineering","Score":null,"Total":0}
Comparison of SVM and Naïve Bayes Algorithms with InNER enriched to Predict Hate Speech
Hate speech is one of the negative sides of social media abuse. Hate speech can be classified into insults, defamation, unpleasant acts, provoking, inciting, and spreading fake news (hoax). The purpose of this study is to compare the SVM and Naïve Bayes methods with feature extraction in the form of Indonesian NER (InNER) for detecting hate speech. To obtain the best model, this study applies five steps: a) data collection; b) data preprocessing; c) feature engineering; d) model development; and e) evaluating and comparing models. In this study, we have collected 7100 tweets as an initial dataset. After manual annotation, this study produced 1681 tweets: 548 insult tweets, 288 blasphemy tweets, 272 provocative tweets, and 573 neutral tweets. This study use two Python libraries that accommodate NER in Indonesian, namely the NLTK library and the Polyglot library. Based on the results of the evaluation of the proposed model, model 5, which develops the SVM algorithm with the NLTK library, is the best model proposed. This model shows an accuracy score of 92.88% with a precision of 0.93, a recall of 0.93, and an F-1 score of 0.92.