Beyond plain toxic: building datasets for detection of flammable topics and inappropriate statements

IF 1.7 3区 计算机科学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Language Resources and Evaluation Pub Date : 2023-10-21 DOI:10.1007/s10579-023-09682-z
Nikolay Babakov, Varvara Logacheva, Alexander Panchenko
{"title":"Beyond plain toxic: building datasets for detection of flammable topics and inappropriate statements","authors":"Nikolay Babakov, Varvara Logacheva, Alexander Panchenko","doi":"10.1007/s10579-023-09682-z","DOIUrl":null,"url":null,"abstract":"Toxicity on the Internet is an acknowledged problem. It includes a wide range of actions from the use of obscene words to offenses and hate speech toward particular users or groups of people. However, there also exist other types of inappropriate messages which are usually not viewed as toxic as they do not contain swear words or explicit offenses. Such messages can contain covert toxicity or generalizations, incite harmful actions (crime, suicide, drug use), and provoke “heated” discussions. These messages are often related to particular sensitive topics, e.g. politics, sexual minorities, or social injustice. Such topics tend to yield toxic emotional reactions more often than other topics, e.g. cars or computing. At the same time, not all messages within “flammable” topics are inappropriate. This work focuses on automatically detecting inappropriate language in natural texts. This is crucial for monitoring user-generated content and developing dialogue systems and AI assistants. While many works focus on toxicity detection, we highlight the fact that texts can be harmful without being toxic or containing obscene language. Blind censorship based on keywords is a common approach to address these issues, but it limits a system’s functionality. This work proposes a safe and effective solution to serve broad user needs and develop necessary resources and tools. Thus, machinery for inappropriateness detection could be useful (i) for making communication on the Internet safer, more productive, and inclusive by flagging truly inappropriate content while not banning messages blindly by topic; (ii) for detection of inappropriate messages generated by automatic systems, e.g. neural chatbots, due to biases in training data; (iii) for debiasing training data for language models (e.g. BERT and GPT-2). Towards this end, in this work, we present two text collections labeled according to a binary notion of inappropriateness (124,597 samples) and a multinomial notion of sensitive topic (33,904 samples). Assuming that the notion of inappropriateness is common among people of the same culture, we base our approach on a human intuitive understanding of what is not acceptable and harmful. To devise an objective view of inappropriateness, we define it in a data-driven way through crowdsourcing. Namely, we run a large-scale annotation study asking workers if a given chatbot-generated utterance could harm the reputation of the company that created this chatbot. High values of inter-annotator agreement suggest that the notion of inappropriateness exists and can be uniformly understood by different people. To define the notion of a sensitive topic in an objective way we use guidelines suggested by specialists in the Legal and PR departments of a large company. We use the collected datasets to train inappropriateness and sensitive topic classifiers employing both classic and Transformer-based models.","PeriodicalId":49927,"journal":{"name":"Language Resources and Evaluation","volume":"14 3","pages":"0"},"PeriodicalIF":1.7000,"publicationDate":"2023-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Language Resources and Evaluation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s10579-023-09682-z","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

Toxicity on the Internet is an acknowledged problem. It includes a wide range of actions from the use of obscene words to offenses and hate speech toward particular users or groups of people. However, there also exist other types of inappropriate messages which are usually not viewed as toxic as they do not contain swear words or explicit offenses. Such messages can contain covert toxicity or generalizations, incite harmful actions (crime, suicide, drug use), and provoke “heated” discussions. These messages are often related to particular sensitive topics, e.g. politics, sexual minorities, or social injustice. Such topics tend to yield toxic emotional reactions more often than other topics, e.g. cars or computing. At the same time, not all messages within “flammable” topics are inappropriate. This work focuses on automatically detecting inappropriate language in natural texts. This is crucial for monitoring user-generated content and developing dialogue systems and AI assistants. While many works focus on toxicity detection, we highlight the fact that texts can be harmful without being toxic or containing obscene language. Blind censorship based on keywords is a common approach to address these issues, but it limits a system’s functionality. This work proposes a safe and effective solution to serve broad user needs and develop necessary resources and tools. Thus, machinery for inappropriateness detection could be useful (i) for making communication on the Internet safer, more productive, and inclusive by flagging truly inappropriate content while not banning messages blindly by topic; (ii) for detection of inappropriate messages generated by automatic systems, e.g. neural chatbots, due to biases in training data; (iii) for debiasing training data for language models (e.g. BERT and GPT-2). Towards this end, in this work, we present two text collections labeled according to a binary notion of inappropriateness (124,597 samples) and a multinomial notion of sensitive topic (33,904 samples). Assuming that the notion of inappropriateness is common among people of the same culture, we base our approach on a human intuitive understanding of what is not acceptable and harmful. To devise an objective view of inappropriateness, we define it in a data-driven way through crowdsourcing. Namely, we run a large-scale annotation study asking workers if a given chatbot-generated utterance could harm the reputation of the company that created this chatbot. High values of inter-annotator agreement suggest that the notion of inappropriateness exists and can be uniformly understood by different people. To define the notion of a sensitive topic in an objective way we use guidelines suggested by specialists in the Legal and PR departments of a large company. We use the collected datasets to train inappropriateness and sensitive topic classifiers employing both classic and Transformer-based models.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
除了简单的有毒之外:构建用于检测易燃话题和不适当陈述的数据集
互联网上的毒性是一个公认的问题。它包括各种各样的行为,从使用淫秽文字到针对特定用户或人群的冒犯和仇恨言论。然而,也有其他类型的不适当的信息,通常不被视为有毒的,因为它们不包含脏话或明确的冒犯。这些信息可能包含隐蔽的毒性或泛化,煽动有害行为(犯罪、自杀、吸毒),并引发“激烈”的讨论。这些信息通常与特别敏感的话题有关,例如政治、性少数群体或社会不公正。这类话题往往比其他话题(如汽车或计算机)更容易产生有害的情绪反应。同时,并非所有“易燃”话题的信息都是不合适的。这项工作的重点是自动检测自然文本中的不适当语言。这对于监控用户生成的内容以及开发对话系统和人工智能助手至关重要。虽然许多作品关注毒性检测,但我们强调的事实是,文本可以是有害的,而不是有毒或含有淫秽语言。基于关键字的盲目审查是解决这些问题的常见方法,但它限制了系统的功能。这项工作提出了一个安全有效的解决方案,以满足广泛的用户需求,并开发必要的资源和工具。因此,不适当的检测机制可能是有用的(i)通过标记真正不适当的内容,而不是盲目地根据主题禁止信息,使互联网上的通信更安全,更富有成效和包容性;(ii)检测由自动系统(例如神经聊天机器人)由于训练数据中的偏差而产生的不适当消息;(iii)消除语言模型(例如BERT和GPT-2)的训练数据的偏差。为此,在这项工作中,我们提出了两个根据不适当的二元概念(124,597个样本)和敏感主题的多项概念(33,904个样本)标记的文本集合。假设不恰当的概念在同一文化的人们中是普遍的,我们的方法基于人类对什么是不可接受的和有害的直觉理解。为了设计一个客观的不恰当的观点,我们通过众包以数据驱动的方式定义它。也就是说,我们进行了一项大规模的注释研究,询问员工一个给定的聊天机器人生成的话语是否会损害创造这个聊天机器人的公司的声誉。注释者间一致性的高值表明不适当的概念存在,并且可以被不同的人统一理解。为了以客观的方式定义敏感话题的概念,我们使用了一家大公司法律和公关部门专家建议的指导方针。我们使用收集的数据集来训练不恰当和敏感的主题分类器,使用经典和基于transformer的模型。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Language Resources and Evaluation
Language Resources and Evaluation 工程技术-计算机:跨学科应用
CiteScore
6.50
自引率
3.70%
发文量
55
审稿时长
>12 weeks
期刊介绍: Language Resources and Evaluation is the first publication devoted to the acquisition, creation, annotation, and use of language resources, together with methods for evaluation of resources, technologies, and applications. Language resources include language data and descriptions in machine readable form used to assist and augment language processing applications, such as written or spoken corpora and lexica, multimodal resources, grammars, terminology or domain specific databases and dictionaries, ontologies, multimedia databases, etc., as well as basic software tools for their acquisition, preparation, annotation, management, customization, and use. Evaluation of language resources concerns assessing the state-of-the-art for a given technology, comparing different approaches to a given problem, assessing the availability of resources and technologies for a given application, benchmarking, and assessing system usability and user satisfaction.
期刊最新文献
Sentiment analysis dataset in Moroccan dialect: bridging the gap between Arabic and Latin scripted dialect Studying word meaning evolution through incremental semantic shift detection PARSEME-AR: Arabic reference corpus for multiword expressions using PARSEME annotation guidelines Normalized dataset for Sanskrit word segmentation and morphological parsing Conversion of the Spanish WordNet databases into a Prolog-readable format
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1