Classifying Hate Speech Using a Two-Layer Model

IF 1.5 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS Statistics and Public Policy Pub Date : 2019-01-01 DOI:10.1080/2330443x.2019.1660285
Yi-jie Tang, Nicole M. Dalzell
{"title":"Classifying Hate Speech Using a Two-Layer Model","authors":"Yi-jie Tang, Nicole M. Dalzell","doi":"10.1080/2330443x.2019.1660285","DOIUrl":null,"url":null,"abstract":"ABSTRACT Social media and other online sites are being increasingly scrutinized as platforms for cyberbullying and hate speech. Many machine learning algorithms, such as support vector machines, have been adopted to create classification tools to identify and potentially filter patterns of negative speech. While effective for prediction, these methodologies yield models that are difficult to interpret. In addition, many studies focus on classifying comments as either negative or neutral, rather than further separating negative comments into subcategories. To address both of these concerns, we introduce a two-stage model for classifying text. With this model, we illustrate the use of internal lexicons, collections of words generated from a pre-classified training dataset of comments that are specific to several subcategories of negative comments. In the first stage, a machine learning algorithm classifies each comment as negative or neutral, or more generally target or nontarget. The second stage of model building leverages the internal lexicons (called L2CLs) to create features specific to each subcategory. These features, along with others, are then used in a random forest model to classify the comments into the subcategories of interest. We demonstrate our approach using two sets of data. Supplementary materials for this article are available online.","PeriodicalId":43397,"journal":{"name":"Statistics and Public Policy","volume":"6 1","pages":"80 - 86"},"PeriodicalIF":1.5000,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/2330443x.2019.1660285","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistics and Public Policy","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/2330443x.2019.1660285","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"SOCIAL SCIENCES, MATHEMATICAL METHODS","Score":null,"Total":0}
引用次数: 7

Abstract

ABSTRACT Social media and other online sites are being increasingly scrutinized as platforms for cyberbullying and hate speech. Many machine learning algorithms, such as support vector machines, have been adopted to create classification tools to identify and potentially filter patterns of negative speech. While effective for prediction, these methodologies yield models that are difficult to interpret. In addition, many studies focus on classifying comments as either negative or neutral, rather than further separating negative comments into subcategories. To address both of these concerns, we introduce a two-stage model for classifying text. With this model, we illustrate the use of internal lexicons, collections of words generated from a pre-classified training dataset of comments that are specific to several subcategories of negative comments. In the first stage, a machine learning algorithm classifies each comment as negative or neutral, or more generally target or nontarget. The second stage of model building leverages the internal lexicons (called L2CLs) to create features specific to each subcategory. These features, along with others, are then used in a random forest model to classify the comments into the subcategories of interest. We demonstrate our approach using two sets of data. Supplementary materials for this article are available online.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于双层模型的仇恨言语分类
摘要社交媒体和其他网站作为网络欺凌和仇恨言论的平台,正受到越来越多的审查。许多机器学习算法,如支持向量机,已被用于创建分类工具,以识别并潜在地过滤负面语音的模式。虽然这些方法对预测有效,但产生的模型很难解释。此外,许多研究侧重于将评论分类为负面或中性,而不是将负面评论进一步划分为子类别。为了解决这两个问题,我们引入了一个两阶段的文本分类模型。通过这个模型,我们说明了内部词典的使用,这些词典是从预先分类的评论训练数据集中生成的单词集合,这些评论特定于负面评论的几个子类别。在第一阶段,机器学习算法将每条评论分类为负面或中性,或者更一般地为目标或非目标。模型构建的第二阶段利用内部词典(称为L2CL)来创建每个子类别特有的特征。然后,在随机森林模型中使用这些特征和其他特征,将评论分类到感兴趣的子类别中。我们使用两组数据来演示我们的方法。本文的补充材料可在线获取。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Statistics and Public Policy
Statistics and Public Policy SOCIAL SCIENCES, MATHEMATICAL METHODS-
CiteScore
3.20
自引率
6.20%
发文量
13
审稿时长
32 weeks
期刊最新文献
State-Building through Public Land Disposal? An Application of Matrix Completion for Counterfactual Prediction Clusters of Jail Incarcerations in US Counties: 2010-2018 Comment on ‘What protects the autonomy of the Federal Statistics Agencies? An Assessment of the Procedures in Place That Protect the Independence and Objectivity of Official Statistics” by Pierson et al. On Coping in a Non-Binary World: Rejoinder to Biedermann and Kotsoglou Commentary on “Three-Way ROCs for Forensic Decision Making” by Nicholas Scurich and Richard S. John (in: Statistics and Public Policy)
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1