Self-Supervised Euphemism Detection and Identification for Content Moderation

2021 IEEE Symposium on Security and Privacy (SP) Pub Date : 2021-03-31 DOI:10.1109/SP40001.2021.00075

Wanzheng Zhu, Hongyu Gong, Rohan Bansal, Zachary Weinberg, Nicolas Christin, G. Fanti, S. Bhat

{"title":"Self-Supervised Euphemism Detection and Identification for Content Moderation","authors":"Wanzheng Zhu, Hongyu Gong, Rohan Bansal, Zachary Weinberg, Nicolas Christin, G. Fanti, S. Bhat","doi":"10.1109/SP40001.2021.00075","DOIUrl":null,"url":null,"abstract":"Fringe groups and organizations have a long history of using euphemisms—ordinary-sounding words with a secret meaning—to conceal what they are discussing. Nowadays, one common use of euphemisms is to evade content moderation policies enforced by social media platforms. Existing tools for enforcing policy automatically rely on keyword searches for words on a \"ban list\", but these are notoriously imprecise: even when limited to swearwords, they can still cause embarrassing false positives [1]. When a commonly used ordinary word acquires a euphemistic meaning, adding it to a keyword-based ban list is hopeless: consider \"pot\" (storage container or marijuana?) or \"heater\" (household appliance or firearm?) The current generation of social media companies instead hire staff to check posts manually, but this is expensive, inhumane, and not much more effective. It is usually apparent to a human moderator that a word is being used euphemistically, but they may not know what the secret meaning is, and therefore whether the message violates policy. Also, when a euphemism is banned, the group that used it need only invent another one, leaving moderators one step behind.This paper will demonstrate unsupervised algorithms that, by analyzing words in their sentence-level context, can both detect words being used euphemistically, and identify the secret meaning of each word. Compared to the existing state of the art, which uses context-free word embeddings, our algorithm for detecting euphemisms achieves 30–400% higher detection accuracies of unlabeled euphemisms in a text corpus. Our algorithm for revealing euphemistic meanings of words is the first of its kind, as far as we are aware. In the arms race between content moderators and policy evaders, our algorithms may help shift the balance in the direction of the moderators.","PeriodicalId":6786,"journal":{"name":"2021 IEEE Symposium on Security and Privacy (SP)","volume":"4 1","pages":"229-246"},"PeriodicalIF":0.0000,"publicationDate":"2021-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"22","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE Symposium on Security and Privacy (SP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SP40001.2021.00075","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 22

Abstract

Fringe groups and organizations have a long history of using euphemisms—ordinary-sounding words with a secret meaning—to conceal what they are discussing. Nowadays, one common use of euphemisms is to evade content moderation policies enforced by social media platforms. Existing tools for enforcing policy automatically rely on keyword searches for words on a "ban list", but these are notoriously imprecise: even when limited to swearwords, they can still cause embarrassing false positives [1]. When a commonly used ordinary word acquires a euphemistic meaning, adding it to a keyword-based ban list is hopeless: consider "pot" (storage container or marijuana?) or "heater" (household appliance or firearm?) The current generation of social media companies instead hire staff to check posts manually, but this is expensive, inhumane, and not much more effective. It is usually apparent to a human moderator that a word is being used euphemistically, but they may not know what the secret meaning is, and therefore whether the message violates policy. Also, when a euphemism is banned, the group that used it need only invent another one, leaving moderators one step behind.This paper will demonstrate unsupervised algorithms that, by analyzing words in their sentence-level context, can both detect words being used euphemistically, and identify the secret meaning of each word. Compared to the existing state of the art, which uses context-free word embeddings, our algorithm for detecting euphemisms achieves 30–400% higher detection accuracies of unlabeled euphemisms in a text corpus. Our algorithm for revealing euphemistic meanings of words is the first of its kind, as far as we are aware. In the arms race between content moderators and policy evaders, our algorithms may help shift the balance in the direction of the moderators.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

内容审核中的自监督委婉语检测与识别

边缘团体和组织使用委婉语——听起来很普通却有秘密含义的词——来掩盖他们正在讨论的内容，这一习惯由来已久。如今，委婉语的一个常见用途是逃避社交媒体平台执行的内容审核政策。现有的执行策略的工具自动依赖于“禁止列表”上的关键词搜索，但这些工具是出了名的不精确:即使仅限于脏话，它们仍然会导致尴尬的误报[1]。当一个常用的普通单词获得了委婉的含义时，将其添加到基于关键字的禁用列表中是无望的:考虑“pot”(储存容器或大麻?)或“加热器”(家用电器或枪支?)当前这一代的社交媒体公司转而雇佣员工手动检查帖子，但这既昂贵又不人道，而且效率也不会高得多。对于人工版主来说，一个词的委婉使用通常是显而易见的，但他们可能不知道其中的秘密含义是什么，因此也不知道该消息是否违反了策略。此外，当一种委婉语被禁止时，使用它的组织只需要发明另一种委婉语，让版主落后一步。本文将演示无监督算法，通过分析句子级上下文中的单词，既可以检测委婉使用的单词，又可以识别每个单词的秘密含义。与使用上下文无关词嵌入的现有技术相比，我们的委婉语检测算法对文本语料库中未标记委婉语的检测准确率提高了30-400%。据我们所知，我们用于揭示单词委婉含义的算法是同类算法中的第一个。在内容审核员和政策逃避者之间的军备竞赛中，我们的算法可能有助于将平衡转向审核员的方向。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2021 IEEE Symposium on Security and Privacy (SP)

自引率

0.00%

发文量

期刊最新文献

A2L: Anonymous Atomic Locks for Scalability in Payment Channel Hubs High-Assurance Cryptography in the Spectre Era An I/O Separation Model for Formal Verification of Kernel Implementations Trust, But Verify: A Longitudinal Analysis Of Android OEM Compliance and Customization HackEd: A Pedagogical Analysis of Online Vulnerability Discovery Exercises