Abusive Language Detection in Online User Content

Proceedings of the 25th International Conference on World Wide Web Pub Date : 2016-04-11 DOI:10.1145/2872427.2883062

Chikashi Nobata, Joel R. Tetreault, A. Thomas, Yashar Mehdad, Yi Chang

引用次数: 953

Abstract

Detection of abusive language in user generated online content has become an issue of increasing importance in recent years. Most current commercial methods make use of blacklists and regular expressions, however these measures fall short when contending with more subtle, less ham-fisted examples of hate speech. In this work, we develop a machine learning based method to detect hate speech on online user comments from two domains which outperforms a state-of-the-art deep learning approach. We also develop a corpus of user comments annotated for abusive language, the first of its kind. Finally, we use our detection tool to analyze abusive language over time and in different settings to further enhance our knowledge of this behavior.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

在线用户内容中的辱骂语言检测

近年来，在用户生成的网络内容中检测辱骂性语言已成为一个日益重要的问题。目前大多数商业方法都使用黑名单和正则表达式，然而，这些措施在对付更微妙、不那么笨拙的仇恨言论时，效果不佳。在这项工作中，我们开发了一种基于机器学习的方法来检测来自两个领域的在线用户评论中的仇恨言论，该方法优于最先进的深度学习方法。我们还开发了一个针对辱骂性语言的用户评论语料库，这是同类中第一个。最后，我们使用我们的检测工具来分析不同时间和不同环境下的辱骂性语言，以进一步提高我们对这种行为的认识。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 25th International Conference on World Wide Web

自引率

0.00%

发文量