Detecting Offensive Content on Twitter During Proud Boys Riots

2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA) Pub Date : 2021-12-01 DOI:10.1109/ICMLA52953.2021.00253

M. Fahim, S. Gokhale

{"title":"Detecting Offensive Content on Twitter During Proud Boys Riots","authors":"M. Fahim, S. Gokhale","doi":"10.1109/ICMLA52953.2021.00253","DOIUrl":null,"url":null,"abstract":"Hateful and offensive speech on online social media platforms has seen a rise in the recent years. Often used to convey humor through sarcasm or to emphasize a point, offensive speech may also be employed to insult, deride and mock alternate points of view. In turbulent and chaotic circumstances, insults and mockery can lead to violence and unrest, and hence, such speech must be identified and tagged to limit its damage. This paper presents an application of machine learning to detect hateful and offensive content from Twitter feeds shared after the protests by Proud Boys, an extremist, ideological and violent hate group. A comprehensive coding guide, consolidating definitions of what constitutes offensive content based on the potential to trigger and incite people is developed and used to label the tweets. Linguistic, auxiliary and social features extracted from these labeled tweets were used to train machine learning classifiers, which detect offensive content with an accuracy of about 92%. An analysis of the importance scores reveals that offensiveness is pre-dominantly a function of words and their combinations, rather than meta features such as punctuations and quotes. This observation can form the foundation of pre-trained classifiers that can be deployed to automatically detect offensive speech in new and unforeseen circumstances.","PeriodicalId":6750,"journal":{"name":"2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"44 3","pages":"1582-1587"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA52953.2021.00253","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

Hateful and offensive speech on online social media platforms has seen a rise in the recent years. Often used to convey humor through sarcasm or to emphasize a point, offensive speech may also be employed to insult, deride and mock alternate points of view. In turbulent and chaotic circumstances, insults and mockery can lead to violence and unrest, and hence, such speech must be identified and tagged to limit its damage. This paper presents an application of machine learning to detect hateful and offensive content from Twitter feeds shared after the protests by Proud Boys, an extremist, ideological and violent hate group. A comprehensive coding guide, consolidating definitions of what constitutes offensive content based on the potential to trigger and incite people is developed and used to label the tweets. Linguistic, auxiliary and social features extracted from these labeled tweets were used to train machine learning classifiers, which detect offensive content with an accuracy of about 92%. An analysis of the importance scores reveals that offensiveness is pre-dominantly a function of words and their combinations, rather than meta features such as punctuations and quotes. This observation can form the foundation of pre-trained classifiers that can be deployed to automatically detect offensive speech in new and unforeseen circumstances.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

在骄傲男孩骚乱期间检测Twitter上的攻击性内容

近年来，在线社交媒体平台上的仇恨和攻击性言论有所增加。攻击性言语通常用于通过讽刺来表达幽默或强调一个观点，也可用于侮辱、嘲笑和嘲笑不同的观点。在动荡和混乱的环境中，侮辱和嘲弄可能导致暴力和动荡，因此，必须识别和标记此类言论，以限制其损害。本文介绍了一种机器学习的应用，用于检测极端主义、意识形态和暴力仇恨团体Proud Boys抗议后分享的Twitter feed中的仇恨和攻击性内容。一份全面的编码指南，根据触发和煽动人们的可能性，巩固了构成冒犯性内容的定义，并用于标记推文。从这些标记的推文中提取的语言、辅助和社交特征被用来训练机器学习分类器，它检测攻击性内容的准确率约为92%。对重要性分数的分析表明，冒犯性主要是单词及其组合的功能，而不是标点和引号等元特征。这种观察可以形成预训练分类器的基础，可以部署在新的和不可预见的情况下自动检测攻击性言论。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)

自引率

0.00%

发文量

期刊最新文献

Detecting Offensive Content on Twitter During Proud Boys Riots Explainable Zero-Shot Modelling of Clinical Depression Symptoms from Text Deep Learning Methods for the Prediction of Information Display Type Using Eye Tracking Sequences Step Detection using SVM on NURVV Trackers Condition Monitoring for Power Converters via Deep One-Class Classification