{"title":"Hate Speech Detection using Text and Image Tweets Based On Bi-directional Long Short-Term Memory","authors":"Priyesh Kumar, K. Varalakshmi","doi":"10.1109/CENTCON52345.2021.9688115","DOIUrl":null,"url":null,"abstract":"Due to the obvious exponential growth in the usage of the internet by individuals of all ethnicities and educational backgrounds, dangerous internet media has become a serious concern in today's society. In the automated identification of hazardous text material, distinguishing between offensive speech and offensive language is a major problem. Most of the current approaches revolve around TF-IDF feature extraction, followed by the traditional classification techniques like Support Vector Machines (SVM), Decision Trees etc., As a result, there is a scope of improvement in the Accuracy of Emotion Detection and long training times. Most of the works considered only tweet data only. But in this work, we would like to include image characters and image components also. We propose a technique in this study for automatically classifying tweets on Twitter into two categories: Hate speech, Offensive speech and non-hate speech. A training and testing step are included in the suggested technique. Traditional Tweet preparation procedures such as removing Twitter handles, URLs, punctuation, stop words, and stemming were used. In both testing and training, we pad each tweet to its maximum length based on the vocabulary. This padding can have an impact on how the network works and can have a significant impact on performance and accuracy. The normalized characteristics are supplied into Bi-directional Long Short-Term Memory, which learns bidirectional long-term relationships between time steps in a time series or sequential twitter data. In comparison research, we compare the models utilizing each of these approaches. We used the Kaggle data set to predict Hate, offensive and Neutral Messages. After conducting many tests, we discovered that the suggested technique outperforms state-of-the-art algorithms by more than 90 percent.","PeriodicalId":103865,"journal":{"name":"2021 International Conference on Disruptive Technologies for Multi-Disciplinary Research and Applications (CENTCON)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Disruptive Technologies for Multi-Disciplinary Research and Applications (CENTCON)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CENTCON52345.2021.9688115","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Due to the obvious exponential growth in the usage of the internet by individuals of all ethnicities and educational backgrounds, dangerous internet media has become a serious concern in today's society. In the automated identification of hazardous text material, distinguishing between offensive speech and offensive language is a major problem. Most of the current approaches revolve around TF-IDF feature extraction, followed by the traditional classification techniques like Support Vector Machines (SVM), Decision Trees etc., As a result, there is a scope of improvement in the Accuracy of Emotion Detection and long training times. Most of the works considered only tweet data only. But in this work, we would like to include image characters and image components also. We propose a technique in this study for automatically classifying tweets on Twitter into two categories: Hate speech, Offensive speech and non-hate speech. A training and testing step are included in the suggested technique. Traditional Tweet preparation procedures such as removing Twitter handles, URLs, punctuation, stop words, and stemming were used. In both testing and training, we pad each tweet to its maximum length based on the vocabulary. This padding can have an impact on how the network works and can have a significant impact on performance and accuracy. The normalized characteristics are supplied into Bi-directional Long Short-Term Memory, which learns bidirectional long-term relationships between time steps in a time series or sequential twitter data. In comparison research, we compare the models utilizing each of these approaches. We used the Kaggle data set to predict Hate, offensive and Neutral Messages. After conducting many tests, we discovered that the suggested technique outperforms state-of-the-art algorithms by more than 90 percent.