Tanzina Akter Tani, Tabassum Islam, Sayed Atique Newaz, N. Sultana
{"title":"Systematic Analysis of Hateful Text Detection Using Machine Learning Classifiers","authors":"Tanzina Akter Tani, Tabassum Islam, Sayed Atique Newaz, N. Sultana","doi":"10.1109/ICTS52701.2021.9608010","DOIUrl":null,"url":null,"abstract":"In today's internet-based world, social media is one of the most popular platforms through which users can outburst their different types of feelings, emotions, frustration, anger, happiness etc. without having concern about distinguishes between moral and social values. These kinds of abusive or offensive texts cause social disturbances, crimes, and many unethical deeds. So, there is a huge necessity to distinguish these kinds of abusive texts/posts and remove them from social media. Different researchers have distinguished different text detection processes in their related work. In our proposed work, three classifiers have been used: Naïve Bayes (NB), Random Forest (RF), and Support Vector Machine (SVM) for detecting hateful text. Bag of Words (BoW) and TF-IDF feature extraction methods have been used to compare these three classifiers for both unigram and bigrams words. To balance hateful and clean content, the Twitter dataset has been under-sampled. Text preprocessing is essential for NLP to produce better and more accurate results which have been carried out in this work. In our result, Naive Bayes has provided the highest accuracy (89%) using the TF-IDF feature extraction model, whereas Random Forest has provided the most accuracy (88%) using Bag of words (BoW) in the case of unigram word. Overall, we got much better performance using unigram than using bigrams word. Finally, we made a number of principle contributions.","PeriodicalId":6738,"journal":{"name":"2021 13th International Conference on Information & Communication Technology and System (ICTS)","volume":"339 1","pages":"330-335"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 13th International Conference on Information & Communication Technology and System (ICTS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICTS52701.2021.9608010","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
In today's internet-based world, social media is one of the most popular platforms through which users can outburst their different types of feelings, emotions, frustration, anger, happiness etc. without having concern about distinguishes between moral and social values. These kinds of abusive or offensive texts cause social disturbances, crimes, and many unethical deeds. So, there is a huge necessity to distinguish these kinds of abusive texts/posts and remove them from social media. Different researchers have distinguished different text detection processes in their related work. In our proposed work, three classifiers have been used: Naïve Bayes (NB), Random Forest (RF), and Support Vector Machine (SVM) for detecting hateful text. Bag of Words (BoW) and TF-IDF feature extraction methods have been used to compare these three classifiers for both unigram and bigrams words. To balance hateful and clean content, the Twitter dataset has been under-sampled. Text preprocessing is essential for NLP to produce better and more accurate results which have been carried out in this work. In our result, Naive Bayes has provided the highest accuracy (89%) using the TF-IDF feature extraction model, whereas Random Forest has provided the most accuracy (88%) using Bag of words (BoW) in the case of unigram word. Overall, we got much better performance using unigram than using bigrams word. Finally, we made a number of principle contributions.