Albert Pritzkau, Steffen Winandy, Theresa Krumbiegel
{"title":"Finding a line between trusted and untrusted information on tweets through sequence classification","authors":"Albert Pritzkau, Steffen Winandy, Theresa Krumbiegel","doi":"10.1109/ICMCIS52405.2021.9486423","DOIUrl":null,"url":null,"abstract":"The Internet has long since established itself as an indispensable source of information for both organizations and individuals. The lack of social responsibility of many digital platforms, however, offers many incentives for various forms of abuse. Disinformation, propaganda and fake news are just a few examples. Among the actors of information campaigns, we find not only individuals but also state actors with a clear agenda. Often, such information campaigns make use of psychological and rhetorical methods to achieve their goals. The manipulation of information is a major challenge for our democracies. It also presents us with major technical problems to identify and assess risks arising from the dissemination of such information.The following system description presents our approach to the detection of misinformation on social media data, which is twofold. Initially, we subjected the given training data to an exploratory analysis to get an overview of the general structure. Then we framed the given task as a simpler classification problem. In order to distinguish between trusted and untrusted information, using BERT (Bidirectional Encoder Representations from Transformers) as a neural network architecture for sequence classification, we started with a pre-trained model for language representation. In a supervised training step we fine-tuned this model on the given classification task with the provided annotated data.In this paper we would like to discuss both the quality of the training data and the performance of the trained classifier to derive promising directions for future work.","PeriodicalId":246290,"journal":{"name":"2021 International Conference on Military Communication and Information Systems (ICMCIS)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Military Communication and Information Systems (ICMCIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMCIS52405.2021.9486423","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
The Internet has long since established itself as an indispensable source of information for both organizations and individuals. The lack of social responsibility of many digital platforms, however, offers many incentives for various forms of abuse. Disinformation, propaganda and fake news are just a few examples. Among the actors of information campaigns, we find not only individuals but also state actors with a clear agenda. Often, such information campaigns make use of psychological and rhetorical methods to achieve their goals. The manipulation of information is a major challenge for our democracies. It also presents us with major technical problems to identify and assess risks arising from the dissemination of such information.The following system description presents our approach to the detection of misinformation on social media data, which is twofold. Initially, we subjected the given training data to an exploratory analysis to get an overview of the general structure. Then we framed the given task as a simpler classification problem. In order to distinguish between trusted and untrusted information, using BERT (Bidirectional Encoder Representations from Transformers) as a neural network architecture for sequence classification, we started with a pre-trained model for language representation. In a supervised training step we fine-tuned this model on the given classification task with the provided annotated data.In this paper we would like to discuss both the quality of the training data and the performance of the trained classifier to derive promising directions for future work.