{"title":"Automatic Hate and Offensive speech detection framework from social media: the case of Afaan Oromoo language","authors":"Lata Guta Kanessa, S. Tulu","doi":"10.1109/ict4da53266.2021.9672232","DOIUrl":null,"url":null,"abstract":"The easily accessibility of different online platform allows every individuals people to express their ideas and share experiences easily without any restriction because of freedom of speech. Since social media don't have general framework to identify hate and neutral speech this results anonymity. However, the propagation of hate speech on social media distresses the society in many aspects, such as affecting the mental health of targeted audiences, affects social interaction and distraction of properties. This research proposed the SVM with TF-IDF, N-gram, and W2vec feature extraction to construct dataset which is binary classifier to detect hate speech for Afaan Oromoo language. To construct dataset for this study first we crawl data from Facebook posts and comments by using Face pager and scrap storm API. After we collect we labeled the collected data to two class hate and neutral class. The general objective of this research is to design a framework which classify hate and neutral speech. Furthermore, when we compare the results of different Machine Learning algorithms. The experiment is evaluated based on accuracy, F-score, recall and precision measurements. The framework based on SVM with n-gram combination with TF-IDF achieve 96% in all metrics.","PeriodicalId":371663,"journal":{"name":"2021 International Conference on Information and Communication Technology for Development for Africa (ICT4DA)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2021-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Information and Communication Technology for Development for Africa (ICT4DA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ict4da53266.2021.9672232","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
The easily accessibility of different online platform allows every individuals people to express their ideas and share experiences easily without any restriction because of freedom of speech. Since social media don't have general framework to identify hate and neutral speech this results anonymity. However, the propagation of hate speech on social media distresses the society in many aspects, such as affecting the mental health of targeted audiences, affects social interaction and distraction of properties. This research proposed the SVM with TF-IDF, N-gram, and W2vec feature extraction to construct dataset which is binary classifier to detect hate speech for Afaan Oromoo language. To construct dataset for this study first we crawl data from Facebook posts and comments by using Face pager and scrap storm API. After we collect we labeled the collected data to two class hate and neutral class. The general objective of this research is to design a framework which classify hate and neutral speech. Furthermore, when we compare the results of different Machine Learning algorithms. The experiment is evaluated based on accuracy, F-score, recall and precision measurements. The framework based on SVM with n-gram combination with TF-IDF achieve 96% in all metrics.