Hate Speech Detection using Text and Image Tweets Based On Bi-directional Long Short-Term Memory

Priyesh Kumar, K. Varalakshmi
{"title":"Hate Speech Detection using Text and Image Tweets Based On Bi-directional Long Short-Term Memory","authors":"Priyesh Kumar, K. Varalakshmi","doi":"10.1109/CENTCON52345.2021.9688115","DOIUrl":null,"url":null,"abstract":"Due to the obvious exponential growth in the usage of the internet by individuals of all ethnicities and educational backgrounds, dangerous internet media has become a serious concern in today's society. In the automated identification of hazardous text material, distinguishing between offensive speech and offensive language is a major problem. Most of the current approaches revolve around TF-IDF feature extraction, followed by the traditional classification techniques like Support Vector Machines (SVM), Decision Trees etc., As a result, there is a scope of improvement in the Accuracy of Emotion Detection and long training times. Most of the works considered only tweet data only. But in this work, we would like to include image characters and image components also. We propose a technique in this study for automatically classifying tweets on Twitter into two categories: Hate speech, Offensive speech and non-hate speech. A training and testing step are included in the suggested technique. Traditional Tweet preparation procedures such as removing Twitter handles, URLs, punctuation, stop words, and stemming were used. In both testing and training, we pad each tweet to its maximum length based on the vocabulary. This padding can have an impact on how the network works and can have a significant impact on performance and accuracy. The normalized characteristics are supplied into Bi-directional Long Short-Term Memory, which learns bidirectional long-term relationships between time steps in a time series or sequential twitter data. In comparison research, we compare the models utilizing each of these approaches. We used the Kaggle data set to predict Hate, offensive and Neutral Messages. After conducting many tests, we discovered that the suggested technique outperforms state-of-the-art algorithms by more than 90 percent.","PeriodicalId":103865,"journal":{"name":"2021 International Conference on Disruptive Technologies for Multi-Disciplinary Research and Applications (CENTCON)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Disruptive Technologies for Multi-Disciplinary Research and Applications (CENTCON)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CENTCON52345.2021.9688115","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Due to the obvious exponential growth in the usage of the internet by individuals of all ethnicities and educational backgrounds, dangerous internet media has become a serious concern in today's society. In the automated identification of hazardous text material, distinguishing between offensive speech and offensive language is a major problem. Most of the current approaches revolve around TF-IDF feature extraction, followed by the traditional classification techniques like Support Vector Machines (SVM), Decision Trees etc., As a result, there is a scope of improvement in the Accuracy of Emotion Detection and long training times. Most of the works considered only tweet data only. But in this work, we would like to include image characters and image components also. We propose a technique in this study for automatically classifying tweets on Twitter into two categories: Hate speech, Offensive speech and non-hate speech. A training and testing step are included in the suggested technique. Traditional Tweet preparation procedures such as removing Twitter handles, URLs, punctuation, stop words, and stemming were used. In both testing and training, we pad each tweet to its maximum length based on the vocabulary. This padding can have an impact on how the network works and can have a significant impact on performance and accuracy. The normalized characteristics are supplied into Bi-directional Long Short-Term Memory, which learns bidirectional long-term relationships between time steps in a time series or sequential twitter data. In comparison research, we compare the models utilizing each of these approaches. We used the Kaggle data set to predict Hate, offensive and Neutral Messages. After conducting many tests, we discovered that the suggested technique outperforms state-of-the-art algorithms by more than 90 percent.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于双向长短期记忆的文本和图像推文仇恨语音检测
由于不同种族和教育背景的个人使用互联网的人数呈指数级增长,危险的网络媒体已成为当今社会的一个严重问题。在危险文本材料的自动识别中,区分攻击性言论和攻击性语言是一个主要问题。目前的方法大多以TF-IDF特征提取为中心,其次是传统的分类技术,如支持向量机(SVM)、决策树等,因此在情感检测的准确性上存在一定的提高范围,并且训练时间长。大多数作品只考虑推特数据。但在这项工作中,我们还想包括图像字符和图像组件。在这项研究中,我们提出了一种技术,可以自动将Twitter上的推文分为两类:仇恨言论、攻击性言论和非仇恨言论。建议的技术包括培训和测试步骤。使用了传统的Tweet准备程序,例如删除Twitter句柄、url、标点符号、停止词和词干。在测试和训练中,我们根据词汇量将每条tweet填充到最大长度。这种填充会对网络的工作方式产生影响,并对性能和准确性产生重大影响。归一化特征提供给双向长短期记忆,学习时间序列或顺序推特数据中时间步长之间的双向长期关系。在比较研究中,我们比较了利用这些方法的模型。我们使用Kaggle数据集来预测仇恨、攻击性和中性信息。经过多次测试,我们发现建议的技术比最先进的算法高出90%以上。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Analysis of Open Defect Faults in Single 6T SRAM Cell Using R and C Parasitic Extraction Method Python Data Analytics of Influence on Temperature and Humidity of City from Mountains: Case Study of Chengdu Qingcheng Mountains Determinant Effects of using Toilet Cleaners on Indoor Air Quality Hate Speech Detection using Text and Image Tweets Based On Bi-directional Long Short-Term Memory Improving Cloud Security and Privacy Using Blockchain
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1