Transfer learning for hate speech detection in social media

IF 2 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS Journal of Computational Social Science Pub Date : 2023-10-17 DOI:10.1007/s42001-023-00224-9

Marian-Andrei Rizoiu, Tianyu Wang, Gabriela Ferraro, Hanna Suominen

{"title":"Transfer learning for hate speech detection in social media","authors":"Marian-Andrei Rizoiu, Tianyu Wang, Gabriela Ferraro, Hanna Suominen","doi":"10.1007/s42001-023-00224-9","DOIUrl":null,"url":null,"abstract":"Abstract Today, the internet is an integral part of our daily lives, enabling people to be more connected than ever before. However, this greater connectivity and access to information increase exposure to harmful content, such as cyber-bullying and cyber-hatred. Models based on machine learning and natural language offer a way to make online platforms safer by identifying hate speech in web text autonomously. However, the main difficulty is annotating a sufficiently large number of examples to train these models. This paper uses a transfer learning technique to leverage two independent datasets jointly and builds a single representation of hate speech. We build an interpretable two-dimensional visualization tool of the constructed hate speech representation—dubbed the Map of Hate—in which multiple datasets can be projected and comparatively analyzed. The hateful content is annotated differently across the two datasets (racist and sexist in one dataset, hateful and offensive in another). However, the common representation successfully projects the harmless class of both datasets into the same space and can be used to uncover labeling errors (false positives). We also show that the joint representation boosts prediction performances when only a limited amount of supervision is available. These methods and insights hold the potential for safer social media and reduce the need to expose human moderators and annotators to distressing online messaging.","PeriodicalId":29946,"journal":{"name":"Journal of Computational Social Science","volume":"11 1","pages":"0"},"PeriodicalIF":2.0000,"publicationDate":"2023-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Computational Social Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s42001-023-00224-9","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"SOCIAL SCIENCES, MATHEMATICAL METHODS","Score":null,"Total":0}

引用次数: 10

Abstract

Abstract Today, the internet is an integral part of our daily lives, enabling people to be more connected than ever before. However, this greater connectivity and access to information increase exposure to harmful content, such as cyber-bullying and cyber-hatred. Models based on machine learning and natural language offer a way to make online platforms safer by identifying hate speech in web text autonomously. However, the main difficulty is annotating a sufficiently large number of examples to train these models. This paper uses a transfer learning technique to leverage two independent datasets jointly and builds a single representation of hate speech. We build an interpretable two-dimensional visualization tool of the constructed hate speech representation—dubbed the Map of Hate—in which multiple datasets can be projected and comparatively analyzed. The hateful content is annotated differently across the two datasets (racist and sexist in one dataset, hateful and offensive in another). However, the common representation successfully projects the harmless class of both datasets into the same space and can be used to uncover labeling errors (false positives). We also show that the joint representation boosts prediction performances when only a limited amount of supervision is available. These methods and insights hold the potential for safer social media and reduce the need to expose human moderators and annotators to distressing online messaging.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于迁移学习的社交媒体仇恨言论检测

如今，互联网已成为我们日常生活中不可或缺的一部分，使人们比以往任何时候都更加紧密地联系在一起。然而，这种更大的连通性和获取信息的途径增加了接触有害内容的机会，例如网络欺凌和网络仇恨。基于机器学习和自然语言的模型提供了一种方法，通过自主识别网络文本中的仇恨言论，使在线平台更安全。然而，主要的困难是注释足够多的例子来训练这些模型。本文使用迁移学习技术来联合利用两个独立的数据集，并构建一个仇恨言论的单一表示。我们构建了一个可解释的仇恨言论表示的二维可视化工具-被称为仇恨地图-其中多个数据集可以投影和比较分析。仇恨内容在两个数据集上的注释不同(一个数据集是种族主义和性别歧视，另一个数据集是仇恨和冒犯)。然而，通用表示成功地将两个数据集的无害类投影到相同的空间中，并可用于发现标记错误(误报)。我们还表明，当只有有限数量的监督可用时，联合表示提高了预测性能。这些方法和见解具有更安全的社交媒体的潜力，并减少了将人类版主和注释者暴露在令人痛苦的在线消息中的需要。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Journal of Computational Social Science SOCIAL SCIENCES, MATHEMATICAL METHODS-

CiteScore

6.20

自引率

6.20%

发文量