“Zo Grof !”: A Comprehensive Corpus for Offensive and Abusive Language in Dutch

Ward Ruitenbeek, Victor Zwart, Robin Van Der Noord, Zhenja Gnezdilov, T. Caselli
{"title":"“Zo Grof !”: A Comprehensive Corpus for Offensive and Abusive Language in Dutch","authors":"Ward Ruitenbeek, Victor Zwart, Robin Van Der Noord, Zhenja Gnezdilov, T. Caselli","doi":"10.18653/v1/2022.woah-1.5","DOIUrl":null,"url":null,"abstract":"This paper presents a comprehensive corpus for the study of socially unacceptable language in Dutch. The corpus extends and revise an existing resource with more data and introduces a new annotation dimension for offensive language, making it a unique resource in the Dutch language panorama. Each language phenomenon (abusive and offensive language) in the corpus has been annotated with a multi-layer annotation scheme modelling the explicitness and the target(s) of the message. We have conducted a new set of experiments with different classification algorithms on all annotation dimensions. Monolingual Pre-Trained Language Models prove as the best systems, obtaining a macro-average F1 of 0.828 for binary classification of offensive language, and 0.579 for the targets of offensive messages. Furthermore, the best system obtains a macro-average F1 of 0.667 for distinguishing between abusive and offensive messages.","PeriodicalId":440731,"journal":{"name":"Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/v1/2022.woah-1.5","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

This paper presents a comprehensive corpus for the study of socially unacceptable language in Dutch. The corpus extends and revise an existing resource with more data and introduces a new annotation dimension for offensive language, making it a unique resource in the Dutch language panorama. Each language phenomenon (abusive and offensive language) in the corpus has been annotated with a multi-layer annotation scheme modelling the explicitness and the target(s) of the message. We have conducted a new set of experiments with different classification algorithms on all annotation dimensions. Monolingual Pre-Trained Language Models prove as the best systems, obtaining a macro-average F1 of 0.828 for binary classification of offensive language, and 0.579 for the targets of offensive messages. Furthermore, the best system obtains a macro-average F1 of 0.667 for distinguishing between abusive and offensive messages.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
“Zo Grof !”:荷兰语侮辱性语言综合语料库
本文提供了一个全面的语料库,用于研究荷兰语中社会不可接受的语言。该语料库对现有资源进行了扩展和修订,增加了更多的数据,并为攻击性语言引入了新的注释维度,使其成为荷兰语全景中的独特资源。语料库中的每种语言现象(辱骂性和攻击性语言)都使用多层注释方案进行注释,该方案对消息的显式性和目标进行建模。我们在所有标注维度上使用不同的分类算法进行了一组新的实验。单语预训练语言模型被证明是最好的系统,对于攻击性语言的二元分类,其宏观平均F1为0.828,对于攻击性信息的目标,其宏观平均F1为0.579。此外,最佳系统在区分辱骂性和攻击性信息方面获得了0.667的宏观平均F1。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Accounting for Offensive Speech as a Practice of Resistance The subtle language of exclusion: Identifying the Toxic Speech of Trans-exclusionary Radical Feminists Counter-TWIT: An Italian Corpus for Online Counterspeech in Ecological Contexts Cleansing & expanding the HURTLEX(el) with a multidimensional categorization of offensive words HATE-ITA: Hate Speech Detection in Italian Social Media Text
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1