Pub Date : 1900-01-01DOI: 10.18653/v1/2022.woah-1.18
Mark Díaz, Razvan Amironesei, Laura Weidinger, Iason Gabriel
Tasks such as toxicity detection, hate speech detection, and online harassment detection have been developed for identifying interactions involving offensive speech. In this work we articulate the need for a relational understanding of offensiveness to help distinguish denotative offensive speech from offensive speech serving as a mechanism through which marginalized communities resist oppressive social norms. Using examples from the queer community, we argue that evaluations of offensive speech must focus on the impacts of language use. We call this the cynic perspective– or a characteristic of language with roots in Cynic philosophy that pertains to employing offensive speech as a practice of resistance. We also explore the degree to which NLP systems may encounter limits to modeling relational context.
{"title":"Accounting for Offensive Speech as a Practice of Resistance","authors":"Mark Díaz, Razvan Amironesei, Laura Weidinger, Iason Gabriel","doi":"10.18653/v1/2022.woah-1.18","DOIUrl":"https://doi.org/10.18653/v1/2022.woah-1.18","url":null,"abstract":"Tasks such as toxicity detection, hate speech detection, and online harassment detection have been developed for identifying interactions involving offensive speech. In this work we articulate the need for a relational understanding of offensiveness to help distinguish denotative offensive speech from offensive speech serving as a mechanism through which marginalized communities resist oppressive social norms. Using examples from the queer community, we argue that evaluations of offensive speech must focus on the impacts of language use. We call this the cynic perspective– or a characteristic of language with roots in Cynic philosophy that pertains to employing offensive speech as a practice of resistance. We also explore the degree to which NLP systems may encounter limits to modeling relational context.","PeriodicalId":440731,"journal":{"name":"Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114495917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.18653/v1/2022.woah-1.16
Niclas Hertzberg, R. Cooper, Eveliina Lindgren, B. Rönnerstrand, Gregor Rettenegger, Ellen Breitholtz, A. Sayeed
“Dogwhistles” are expressions intended by the speaker have two messages: a socially-unacceptable “in-group” message understood by a subset of listeners, and a benign message intended for the out-group. We take the result of a word-replacement survey of the Swedish population intended to reveal how dogwhistles are understood, and we show that the difficulty of annotating dogwhistles is reflected in the separability in the space of a sentence-transformer Swedish BERT trained on general data.
{"title":"Distributional properties of political dogwhistle representations in Swedish BERT","authors":"Niclas Hertzberg, R. Cooper, Eveliina Lindgren, B. Rönnerstrand, Gregor Rettenegger, Ellen Breitholtz, A. Sayeed","doi":"10.18653/v1/2022.woah-1.16","DOIUrl":"https://doi.org/10.18653/v1/2022.woah-1.16","url":null,"abstract":"“Dogwhistles” are expressions intended by the speaker have two messages: a socially-unacceptable “in-group” message understood by a subset of listeners, and a benign message intended for the out-group. We take the result of a word-replacement survey of the Swedish population intended to reveal how dogwhistles are understood, and we show that the difficulty of annotating dogwhistles is reflected in the separability in the space of a sentence-transformer Swedish BERT trained on general data.","PeriodicalId":440731,"journal":{"name":"Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131130884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Romania ranks almost last in Europe when it comes to gender equality in political representation, with about 10${%$ fewer women in politics than the E.U. average. We proceed from the assumption that this underrepresentation is also influenced by the sexism and verbal abuse female politicians face in the public sphere, especially in online media. We collect a novel dataset with sexist comments in Romanian language from newspaper articles about Romanian female politicians and propose baseline models using classical machine learning models and fine-tuned pretrained transformer models for the classification of sexist language in the online medium.
{"title":"Users Hate Blondes: Detecting Sexism in User Comments on Online Romanian News","authors":"Andreea-Loredana Moldovan, Karla-Claudia Csürös, Ana-Maria Bucur, Loredana Bercuci","doi":"10.18653/v1/2022.woah-1.21","DOIUrl":"https://doi.org/10.18653/v1/2022.woah-1.21","url":null,"abstract":"Romania ranks almost last in Europe when it comes to gender equality in political representation, with about 10${%$ fewer women in politics than the E.U. average. We proceed from the assumption that this underrepresentation is also influenced by the sexism and verbal abuse female politicians face in the public sphere, especially in online media. We collect a novel dataset with sexist comments in Romanian language from newspaper articles about Romanian female politicians and propose baseline models using classical machine learning models and fine-tuned pretrained transformer models for the classification of sexist language in the online medium.","PeriodicalId":440731,"journal":{"name":"Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH)","volume":"197 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133120143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.18653/v1/2022.woah-1.23
Krithika Ramesh, Sumeet Kumar, Ashiqur R. KhudaBukhsh
Lexicons play an important role in content moderation often being the first line of defense. However, little or no literature exists in analyzing the representation of queer-related words in them. In this paper, we consider twelve well-known lexicons containing inappropriate words and analyze how gender and sexual minorities are represented in these lexicons. Our analyses reveal that several of these lexicons barely make any distinction between pejorative and non-pejorative queer-related words. We express concern that such unfettered usage of non-pejorative queer-related words may impact queer presence in mainstream discourse. Our analyses further reveal that the lexicons have poor overlap in queer-related words. We finally present a quantifiable measure of consistency and show that several of these lexicons are not consistent in how they include (or omit) queer-related words.
{"title":"Revisiting Queer Minorities in Lexicons","authors":"Krithika Ramesh, Sumeet Kumar, Ashiqur R. KhudaBukhsh","doi":"10.18653/v1/2022.woah-1.23","DOIUrl":"https://doi.org/10.18653/v1/2022.woah-1.23","url":null,"abstract":"Lexicons play an important role in content moderation often being the first line of defense. However, little or no literature exists in analyzing the representation of queer-related words in them. In this paper, we consider twelve well-known lexicons containing inappropriate words and analyze how gender and sexual minorities are represented in these lexicons. Our analyses reveal that several of these lexicons barely make any distinction between pejorative and non-pejorative queer-related words. We express concern that such unfettered usage of non-pejorative queer-related words may impact queer presence in mainstream discourse. Our analyses further reveal that the lexicons have poor overlap in queer-related words. We finally present a quantifiable measure of consistency and show that several of these lexicons are not consistent in how they include (or omit) queer-related words.","PeriodicalId":440731,"journal":{"name":"Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH)","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125503428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.18653/v1/2022.woah-1.2
Mana Ashida, Mamoru Komachi
With the widespread use of social media, online hate is increasing, and microaggressions are receiving attention. We explore the potential for using pretrained language models to automatically generate messages that combat the associated offensive texts. Specifically, we focus on using prompting to steer model generation as it requires less data and computation than fine-tuning. We also propose a human evaluation perspective; offensiveness, stance, and informativeness. After obtaining 306 counterspeech and 42 microintervention messages generated by GPT-{2, 3, Neo}, we conducted a human evaluation using Amazon Mechanical Turk. The results indicate the potential of using prompting in the proposed generation task. All the generated texts along with the annotation are published to encourage future research on countering hate and microaggressions online.
{"title":"Towards Automatic Generation of Messages Countering Online Hate Speech and Microaggressions","authors":"Mana Ashida, Mamoru Komachi","doi":"10.18653/v1/2022.woah-1.2","DOIUrl":"https://doi.org/10.18653/v1/2022.woah-1.2","url":null,"abstract":"With the widespread use of social media, online hate is increasing, and microaggressions are receiving attention. We explore the potential for using pretrained language models to automatically generate messages that combat the associated offensive texts. Specifically, we focus on using prompting to steer model generation as it requires less data and computation than fine-tuning. We also propose a human evaluation perspective; offensiveness, stance, and informativeness. After obtaining 306 counterspeech and 42 microintervention messages generated by GPT-{2, 3, Neo}, we conducted a human evaluation using Amazon Mechanical Turk. The results indicate the potential of using prompting in the proposed generation task. All the generated texts along with the annotation are published to encourage future research on countering hate and microaggressions online.","PeriodicalId":440731,"journal":{"name":"Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH)","volume":"212 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133645066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.18653/v1/2022.woah-1.24
Debora Nozza, Federico Bianchi, Giuseppe Attanasio
{"title":"HATE-ITA: Hate Speech Detection in Italian Social Media Text","authors":"Debora Nozza, Federico Bianchi, Giuseppe Attanasio","doi":"10.18653/v1/2022.woah-1.24","DOIUrl":"https://doi.org/10.18653/v1/2022.woah-1.24","url":null,"abstract":"","PeriodicalId":440731,"journal":{"name":"Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH)","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117321334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.18653/v1/2022.woah-1.8
Christina T. Lu, David Jurgens
Toxic language can take many forms, from explicit hate speech to more subtle microaggressions. Within this space, models identifying transphobic language have largely focused on overt forms. However, a more pernicious and subtle source of transphobic comments comes in the form of statements made by Trans-exclusionary Radical Feminists (TERFs); these statements often appear seemingly-positive and promote women’s causes and issues, while simultaneously denying the inclusion of transgender women as women. Here, we introduce two models to mitigate this antisocial behavior. The first model identifies TERF users in social media, recognizing that these users are a main source of transphobic material that enters mainstream discussion and whom other users may not desire to engage with in good faith. The second model tackles the harder task of recognizing the masked rhetoric of TERF messages and introduces a new dataset to support this task. Finally, we discuss the ethics of deploying these models to mitigate the harm of this language, arguing for a balanced approach that allows for restorative interactions.
{"title":"The subtle language of exclusion: Identifying the Toxic Speech of Trans-exclusionary Radical Feminists","authors":"Christina T. Lu, David Jurgens","doi":"10.18653/v1/2022.woah-1.8","DOIUrl":"https://doi.org/10.18653/v1/2022.woah-1.8","url":null,"abstract":"Toxic language can take many forms, from explicit hate speech to more subtle microaggressions. Within this space, models identifying transphobic language have largely focused on overt forms. However, a more pernicious and subtle source of transphobic comments comes in the form of statements made by Trans-exclusionary Radical Feminists (TERFs); these statements often appear seemingly-positive and promote women’s causes and issues, while simultaneously denying the inclusion of transgender women as women. Here, we introduce two models to mitigate this antisocial behavior. The first model identifies TERF users in social media, recognizing that these users are a main source of transphobic material that enters mainstream discussion and whom other users may not desire to engage with in good faith. The second model tackles the harder task of recognizing the masked rhetoric of TERF messages and introduces a new dataset to support this task. Finally, we discuss the ethics of deploying these models to mitigate the harm of this language, arguing for a balanced approach that allows for restorative interactions.","PeriodicalId":440731,"journal":{"name":"Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115212099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In an era of increasingly large pre-trained language models, knowledge distillation is a powerful tool for transferring information from a large model to a smaller one. In particular, distillation is of tremendous benefit when it comes to real-world constraints such as serving latency or serving at scale. However, a loss of robustness in language understanding may be hidden in the process and not immediately revealed when looking at high-level evaluation metrics. In this work, we investigate the hidden costs: what is “lost in distillation”, especially in regards to identity-based model bias using the case study of toxicity modeling. With reproducible models using open source training sets, we investigate models distilled from a BERT teacher baseline. Using both open source and proprietary big data models, we investigate these hidden performance costs.
{"title":"Lost in Distillation: A Case Study in Toxicity Modeling","authors":"Alyssa Chvasta, Alyssa Lees, Jeffrey Sorensen, Lucy Vasserman, Nitesh Goyal","doi":"10.18653/v1/2022.woah-1.9","DOIUrl":"https://doi.org/10.18653/v1/2022.woah-1.9","url":null,"abstract":"In an era of increasingly large pre-trained language models, knowledge distillation is a powerful tool for transferring information from a large model to a smaller one. In particular, distillation is of tremendous benefit when it comes to real-world constraints such as serving latency or serving at scale. However, a loss of robustness in language understanding may be hidden in the process and not immediately revealed when looking at high-level evaluation metrics. In this work, we investigate the hidden costs: what is “lost in distillation”, especially in regards to identity-based model bias using the case study of toxicity modeling. With reproducible models using open source training sets, we investigate models distilled from a BERT teacher baseline. Using both open source and proprietary big data models, we investigate these hidden performance costs.","PeriodicalId":440731,"journal":{"name":"Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122877286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.18653/v1/2022.woah-1.14
Christoph Demus, Jonas Pitz, Mina Schütz, Nadine Probol, M. Siegel, D. Labudde
In this work, we present a new publicly available offensive language dataset of 10.278 German social media comments collected in the first half of 2021 that were annotated by in total six annotators. With twelve different annotation categories, it is far more comprehensive than other datasets, and goes beyond just hate speech detection. The labels aim in particular also at toxicity, criminal relevance and discrimination types of comments.Furthermore, about half of the comments are from coherent parts of conversations, which opens the possibility to consider the comments’ contexts and do conversation analyses in order to research the contagion of offensive language in conversations.
{"title":"A Comprehensive Dataset for German Offensive Language and Conversation Analysis","authors":"Christoph Demus, Jonas Pitz, Mina Schütz, Nadine Probol, M. Siegel, D. Labudde","doi":"10.18653/v1/2022.woah-1.14","DOIUrl":"https://doi.org/10.18653/v1/2022.woah-1.14","url":null,"abstract":"In this work, we present a new publicly available offensive language dataset of 10.278 German social media comments collected in the first half of 2021 that were annotated by in total six annotators. With twelve different annotation categories, it is far more comprehensive than other datasets, and goes beyond just hate speech detection. The labels aim in particular also at toxicity, criminal relevance and discrimination types of comments.Furthermore, about half of the comments are from coherent parts of conversations, which opens the possibility to consider the comments’ contexts and do conversation analyses in order to research the contagion of offensive language in conversations.","PeriodicalId":440731,"journal":{"name":"Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125248652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.18653/v1/2022.woah-1.22
Pratik S. Sachdeva, Renata Barreto, Claudia von Vacano, Chris J. Kennedy
The past decade has seen an abundance of work seeking to detect, characterize, and measure online hate speech. A related, but less studied problem, is the detection of identity groups targeted by that hate speech. Predictive accuracy on this task can supplement additional analyses beyond hate speech detection, motivating its study. Using the Measuring Hate Speech corpus, which provided annotations for targeted identity groups, we created neural network models to perform multi-label binary prediction of identity groups targeted by a comment. Specifically, we studied 8 broad identity groups and 12 identity sub-groups within race and gender identity. We found that these networks exhibited good predictive performance, achieving ROC AUCs of greater than 0.9 and PR AUCs of greater than 0.7 on several identity groups. We validated their performance on HateCheck and Gab Hate Corpora, finding that predictive performance generalized in most settings. We additionally examined the performance of the model on comments targeting multiple identity groups. Our results demonstrate the feasibility of simultaneously identifying targeted groups in social media comments.
{"title":"Targeted Identity Group Prediction in Hate Speech Corpora","authors":"Pratik S. Sachdeva, Renata Barreto, Claudia von Vacano, Chris J. Kennedy","doi":"10.18653/v1/2022.woah-1.22","DOIUrl":"https://doi.org/10.18653/v1/2022.woah-1.22","url":null,"abstract":"The past decade has seen an abundance of work seeking to detect, characterize, and measure online hate speech. A related, but less studied problem, is the detection of identity groups targeted by that hate speech. Predictive accuracy on this task can supplement additional analyses beyond hate speech detection, motivating its study. Using the Measuring Hate Speech corpus, which provided annotations for targeted identity groups, we created neural network models to perform multi-label binary prediction of identity groups targeted by a comment. Specifically, we studied 8 broad identity groups and 12 identity sub-groups within race and gender identity. We found that these networks exhibited good predictive performance, achieving ROC AUCs of greater than 0.9 and PR AUCs of greater than 0.7 on several identity groups. We validated their performance on HateCheck and Gab Hate Corpora, finding that predictive performance generalized in most settings. We additionally examined the performance of the model on comments targeting multiple identity groups. Our results demonstrate the feasibility of simultaneously identifying targeted groups in social media comments.","PeriodicalId":440731,"journal":{"name":"Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121492205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}