Finding a line between trusted and untrusted information on tweets through sequence classification

Albert Pritzkau, Steffen Winandy, Theresa Krumbiegel
{"title":"Finding a line between trusted and untrusted information on tweets through sequence classification","authors":"Albert Pritzkau, Steffen Winandy, Theresa Krumbiegel","doi":"10.1109/ICMCIS52405.2021.9486423","DOIUrl":null,"url":null,"abstract":"The Internet has long since established itself as an indispensable source of information for both organizations and individuals. The lack of social responsibility of many digital platforms, however, offers many incentives for various forms of abuse. Disinformation, propaganda and fake news are just a few examples. Among the actors of information campaigns, we find not only individuals but also state actors with a clear agenda. Often, such information campaigns make use of psychological and rhetorical methods to achieve their goals. The manipulation of information is a major challenge for our democracies. It also presents us with major technical problems to identify and assess risks arising from the dissemination of such information.The following system description presents our approach to the detection of misinformation on social media data, which is twofold. Initially, we subjected the given training data to an exploratory analysis to get an overview of the general structure. Then we framed the given task as a simpler classification problem. In order to distinguish between trusted and untrusted information, using BERT (Bidirectional Encoder Representations from Transformers) as a neural network architecture for sequence classification, we started with a pre-trained model for language representation. In a supervised training step we fine-tuned this model on the given classification task with the provided annotated data.In this paper we would like to discuss both the quality of the training data and the performance of the trained classifier to derive promising directions for future work.","PeriodicalId":246290,"journal":{"name":"2021 International Conference on Military Communication and Information Systems (ICMCIS)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Military Communication and Information Systems (ICMCIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMCIS52405.2021.9486423","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

The Internet has long since established itself as an indispensable source of information for both organizations and individuals. The lack of social responsibility of many digital platforms, however, offers many incentives for various forms of abuse. Disinformation, propaganda and fake news are just a few examples. Among the actors of information campaigns, we find not only individuals but also state actors with a clear agenda. Often, such information campaigns make use of psychological and rhetorical methods to achieve their goals. The manipulation of information is a major challenge for our democracies. It also presents us with major technical problems to identify and assess risks arising from the dissemination of such information.The following system description presents our approach to the detection of misinformation on social media data, which is twofold. Initially, we subjected the given training data to an exploratory analysis to get an overview of the general structure. Then we framed the given task as a simpler classification problem. In order to distinguish between trusted and untrusted information, using BERT (Bidirectional Encoder Representations from Transformers) as a neural network architecture for sequence classification, we started with a pre-trained model for language representation. In a supervised training step we fine-tuned this model on the given classification task with the provided annotated data.In this paper we would like to discuss both the quality of the training data and the performance of the trained classifier to derive promising directions for future work.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
通过序列分类找到tweet上可信和不可信信息之间的界线
互联网早已成为组织和个人不可缺少的信息来源。然而,许多数字平台缺乏社会责任,为各种形式的滥用提供了许多动机。虚假信息、宣传和假新闻只是其中的几个例子。在信息运动的行动者中,我们不仅发现个人,而且发现有明确议程的国家行动者。通常,这种宣传运动利用心理和修辞的方法来实现他们的目标。对信息的操纵是我们民主国家面临的一个重大挑战。它也给我们提出了重大的技术问题,以确定和评估因传播这种信息而产生的风险。下面的系统描述介绍了我们在社交媒体数据中检测错误信息的方法,这是双重的。首先,我们对给定的训练数据进行探索性分析,以获得总体结构的概述。然后我们将给定的任务框定为一个更简单的分类问题。为了区分可信和不可信的信息,使用BERT(双向编码器表示)作为序列分类的神经网络架构,我们从语言表示的预训练模型开始。在监督训练步骤中,我们使用提供的注释数据对给定的分类任务微调该模型。在本文中,我们想讨论训练数据的质量和训练分类器的性能,为未来的工作得出有希望的方向。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Novel Multi-Parameter based Rate-Matching of Polar Codes A Multimodal Mixed Reality Data Exploration Framework for Tactical Decision Making Mobile cyber defense agents for low throughput DNS-based data exfiltration detection in military networks CNN-based processing of radio frequency signals for augmenting acoustic source localization and enhancement in UAV security applications Cyber Intrusion Detection using Natural Language Processing on Windows Event Logs
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1