Label Propagation-Based Semi-Supervised Learning for Hate Speech Classification

First Workshop on Insights from Negative Results in NLP Pub Date : 2020-11-01 DOI:10.18653/v1/2020.insights-1.8

Ashwin Geet D'Sa, I. Illina, D. Fohr, D. Klakow, Dana Ruiter

引用次数: 6

Abstract

Research on hate speech classification has received increased attention. In real-life scenarios, a small amount of labeled hate speech data is available to train a reliable classifier. Semi-supervised learning takes advantage of a small amount of labeled data and a large amount of unlabeled data. In this paper, label propagation-based semi-supervised learning is explored for the task of hate speech classification. The quality of labeling the unlabeled set depends on the input representations. In this work, we show that pre-trained representations are label agnostic, and when used with label propagation yield poor results. Neural network-based fine-tuning can be adopted to learn task-specific representations using a small amount of labeled data. We show that fully fine-tuned representations may not always be the best representations for the label propagation and intermediate representations may perform better in a semi-supervised setup.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于标签传播的半监督学习仇恨言论分类

仇恨言论分类的研究越来越受到人们的关注。在现实场景中，少量标记的仇恨言论数据可用于训练可靠的分类器。半监督学习利用了少量的标记数据和大量的未标记数据。本文探讨了基于标签传播的半监督学习方法在仇恨言论分类中的应用。标记未标记集的质量取决于输入表示。在这项工作中，我们表明预训练的表示是标签不可知论的，当与标签传播一起使用时，结果很差。基于神经网络的微调可以使用少量标记数据来学习特定于任务的表示。我们表明，完全微调的表示可能并不总是标签传播的最佳表示，中间表示可能在半监督设置中表现更好。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

First Workshop on Insights from Negative Results in NLP

自引率

0.00%

发文量

期刊最新文献

What GPT Knows About Who is Who Pathologies of Pre-trained Language Models in Few-shot Fine-tuning Can Question Rewriting Help Conversational Question Answering? Extending the Scope of Out-of-Domain: Examining QA models in multiple subdomains Do Data-based Curricula Work?