Deep active learning for misinformation detection using geometric deep learning

Q1 Social Sciences Online Social Networks and Media Pub Date : 2023-01-01 DOI:10.1016/j.osnem.2023.100244
Giorgio Barnabò , Federico Siciliano , Carlos Castillo , Stefano Leonardi , Preslav Nakov , Giovanni Da San Martino , Fabrizio Silvestri
{"title":"Deep active learning for misinformation detection using geometric deep learning","authors":"Giorgio Barnabò ,&nbsp;Federico Siciliano ,&nbsp;Carlos Castillo ,&nbsp;Stefano Leonardi ,&nbsp;Preslav Nakov ,&nbsp;Giovanni Da San Martino ,&nbsp;Fabrizio Silvestri","doi":"10.1016/j.osnem.2023.100244","DOIUrl":null,"url":null,"abstract":"<div><p><span><span>Human fact-checkers currently represent a key component of any semi-automatic misinformation </span>detection pipeline<span><span>. While current state-of-the-art systems are mostly based on geometric deep-learning models, these architectures still need human-labeled data to be trained and updated — due to shifting topic distributions and adversarial attacks. Most research on automatic misinformation detection, however, neither considers time budget constraints on the number of pieces of news that can be manually fact-checked, nor tries to reduce the burden of fact-checking on – mostly pro bono – </span>annotators and journalists. The first contribution of this work is a thorough analysis of active learning (AL) strategies applied to </span></span>Graph Neural Networks (GNN) for misinformation detection. Then, based on this analysis, we propose Deep Error Sampling (DES) — a new deep active learning architecture that, when coupled with uncertainty sampling, performs equally or better than the most common AL strategies and the only existing active learning procedure specifically targeting fake news detection. Overall, our experimental results on two benchmark datasets show that all AL strategies outperform random sampling, allowing – on average – to achieve a 2% increase in AUC for the same percentage of third-party fact-checked news and to save up to 25% of labeling effort for a desired level of classification performance. As for DES, while it does not always clearly outperform other strategies, it still reduces variance in the performance between rounds, resulting in a more reliable method. To the best of our knowledge, we are the first to comprehensively study active learning in the context of misinformation detection and to show its potential to reduce the burden of third-party fact-checking without compromising classification performance.</p></div>","PeriodicalId":52228,"journal":{"name":"Online Social Networks and Media","volume":"33 ","pages":"Article 100244"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Online Social Networks and Media","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2468696423000034","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Social Sciences","Score":null,"Total":0}
引用次数: 1

Abstract

Human fact-checkers currently represent a key component of any semi-automatic misinformation detection pipeline. While current state-of-the-art systems are mostly based on geometric deep-learning models, these architectures still need human-labeled data to be trained and updated — due to shifting topic distributions and adversarial attacks. Most research on automatic misinformation detection, however, neither considers time budget constraints on the number of pieces of news that can be manually fact-checked, nor tries to reduce the burden of fact-checking on – mostly pro bono – annotators and journalists. The first contribution of this work is a thorough analysis of active learning (AL) strategies applied to Graph Neural Networks (GNN) for misinformation detection. Then, based on this analysis, we propose Deep Error Sampling (DES) — a new deep active learning architecture that, when coupled with uncertainty sampling, performs equally or better than the most common AL strategies and the only existing active learning procedure specifically targeting fake news detection. Overall, our experimental results on two benchmark datasets show that all AL strategies outperform random sampling, allowing – on average – to achieve a 2% increase in AUC for the same percentage of third-party fact-checked news and to save up to 25% of labeling effort for a desired level of classification performance. As for DES, while it does not always clearly outperform other strategies, it still reduces variance in the performance between rounds, resulting in a more reliable method. To the best of our knowledge, we are the first to comprehensively study active learning in the context of misinformation detection and to show its potential to reduce the burden of third-party fact-checking without compromising classification performance.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用几何深度学习进行错误信息检测的深度主动学习
人工事实检查员目前是任何半自动错误信息检测管道的关键组成部分。虽然目前最先进的系统主要基于几何深度学习模型,但由于主题分布的变化和对抗性攻击,这些架构仍然需要人工标记的数据进行训练和更新。然而,大多数关于错误信息自动检测的研究,既没有考虑到可以手工核实事实的新闻片段数量的时间预算限制,也没有试图减轻注释者和记者(主要是无偿的)核实事实的负担。这项工作的第一个贡献是对应用于图神经网络(GNN)进行错误信息检测的主动学习(AL)策略的全面分析。然后,在此分析的基础上,我们提出了深度误差采样(DES)——一种新的深度主动学习架构,当与不确定性采样相结合时,它的性能与最常见的人工智能策略和现有的唯一专门针对假新闻检测的主动学习过程一样或更好。总体而言,我们在两个基准数据集上的实验结果表明,所有人工智能策略的性能都优于随机抽样,平均而言,对于相同百分比的第三方事实检查新闻,AUC可以提高2%,并且可以节省高达25%的标记工作,以达到所需的分类性能水平。对于DES,虽然它并不总是明显优于其他策略,但它仍然减少了轮与轮之间的性能差异,从而使方法更加可靠。据我们所知,我们是第一个在错误信息检测的背景下全面研究主动学习的人,并展示了它在不影响分类性能的情况下减轻第三方事实核查负担的潜力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Online Social Networks and Media
Online Social Networks and Media Social Sciences-Communication
CiteScore
10.60
自引率
0.00%
发文量
32
审稿时长
44 days
期刊最新文献
How does user-generated content on Social Media affect stock predictions? A case study on GameStop Measuring centralization of online platforms through size and interconnection of communities Crowdsourcing the Mitigation of disinformation and misinformation: The case of spontaneous community-based moderation on Reddit GASCOM: Graph-based Attentive Semantic Context Modeling for Online Conversation Understanding The influence of coordinated behavior on toxicity
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1