{"title":"Abusiveness is Non-Binary: Five Shades of Gray in German Online News-Comments","authors":"Marco Niemann","doi":"10.1109/CBI.2019.00009","DOIUrl":null,"url":null,"abstract":"Online news comment sections face a surge in uncivil, abusive and even straightforwardly hateful and threatening posts. In Germany especially the refugee crisis beginning in 2015 has sparked a lot of controversial and even unacceptable user comments. Overwhelmed by the amount of content and facing the risk of fines and a churn of readers as well as advertisers, many platforms shut down their comment sections as a last resort. To reduce their moderation effort, academics started applying machine learning to classify comments automatically. However, these efforts so far have been mostly focused on English texts. To provide similar systems for German, this paper implements and evaluates six different machine learning classifiers and five different strategies to convert textual comments into machine-compatible vectors. Contrary to common belief in the domain, comments often evade binary classification: Often comments are not only hateful, or insulting or threatening but fall within multiple of these categories. Hence, we will go beyond traditional multi-class classification models and prototypically evaluate the use of multi-label techniques. The first evaluations indicate that systems for abusive language detection are transferable to the German language and that supporting multi-labels might not only help to improve the detection of rare abusiveness types but also lead to a more realistic representation of actual online commentary.","PeriodicalId":193238,"journal":{"name":"2019 IEEE 21st Conference on Business Informatics (CBI)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 21st Conference on Business Informatics (CBI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CBI.2019.00009","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
Online news comment sections face a surge in uncivil, abusive and even straightforwardly hateful and threatening posts. In Germany especially the refugee crisis beginning in 2015 has sparked a lot of controversial and even unacceptable user comments. Overwhelmed by the amount of content and facing the risk of fines and a churn of readers as well as advertisers, many platforms shut down their comment sections as a last resort. To reduce their moderation effort, academics started applying machine learning to classify comments automatically. However, these efforts so far have been mostly focused on English texts. To provide similar systems for German, this paper implements and evaluates six different machine learning classifiers and five different strategies to convert textual comments into machine-compatible vectors. Contrary to common belief in the domain, comments often evade binary classification: Often comments are not only hateful, or insulting or threatening but fall within multiple of these categories. Hence, we will go beyond traditional multi-class classification models and prototypically evaluate the use of multi-label techniques. The first evaluations indicate that systems for abusive language detection are transferable to the German language and that supporting multi-labels might not only help to improve the detection of rare abusiveness types but also lead to a more realistic representation of actual online commentary.