Muhammad Sajjad, Fatima Zulifqar, Muhammad Usman Ghani Khan, Muhammad Azeem
{"title":"Hate Speech Detection using Fusion Approach","authors":"Muhammad Sajjad, Fatima Zulifqar, Muhammad Usman Ghani Khan, Muhammad Azeem","doi":"10.1109/ICAEM.2019.8853762","DOIUrl":null,"url":null,"abstract":"Detection of hate speech in user-generated online content has become an issue of increasing importance in recent years and is discerning for applications such as disputed event identification and sentiment analysis. Text classification for online content is a bit challenging task due to the natural language complexity and hastily generated online user microblogs including a plethora of informality and mistakes. This work introduces a system to classify tweets in three categories (i.e., racism, sexism and none). In our classification strategy, we integrate deep features extracted from Convolutional Neural Network(CNN) trained on semantic word embedding with state-of-the-art syntactic and word n-gram features. We perform comprehensive experiments on a standard dataset containing 16k manually annotated tweets. Our proposed approach outperform all other state-of-the-art approaches with a significant increase in accuracy.","PeriodicalId":304208,"journal":{"name":"2019 International Conference on Applied and Engineering Mathematics (ICAEM)","volume":"338 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Applied and Engineering Mathematics (ICAEM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAEM.2019.8853762","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 17
Abstract
Detection of hate speech in user-generated online content has become an issue of increasing importance in recent years and is discerning for applications such as disputed event identification and sentiment analysis. Text classification for online content is a bit challenging task due to the natural language complexity and hastily generated online user microblogs including a plethora of informality and mistakes. This work introduces a system to classify tweets in three categories (i.e., racism, sexism and none). In our classification strategy, we integrate deep features extracted from Convolutional Neural Network(CNN) trained on semantic word embedding with state-of-the-art syntactic and word n-gram features. We perform comprehensive experiments on a standard dataset containing 16k manually annotated tweets. Our proposed approach outperform all other state-of-the-art approaches with a significant increase in accuracy.