Developing a hierarchical model for unraveling conspiracy theories

IF 3 2区 计算机科学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS EPJ Data Science Pub Date : 2024-04-16 DOI:10.1140/epjds/s13688-024-00470-5
Mohsen Ghasemizade, Jeremiah Onaolapo
{"title":"Developing a hierarchical model for unraveling conspiracy theories","authors":"Mohsen Ghasemizade, Jeremiah Onaolapo","doi":"10.1140/epjds/s13688-024-00470-5","DOIUrl":null,"url":null,"abstract":"<p>A conspiracy theory (CT) suggests covert groups or powerful individuals secretly manipulate events. Not knowing about existing conspiracy theories could make one more likely to believe them, so this work aims to compile a list of CTs shaped as a tree that is as comprehensive as possible. We began with a manually curated ‘tree’ of CTs from academic papers and Wikipedia. Next, we examined 1769 CT-related articles from four fact-checking websites, focusing on their core content, and used a technique called Keyphrase Extraction to label the documents. This process yielded 769 identified conspiracies, each assigned a label and a family name. The second goal of this project was to detect whether an article is a conspiracy theory, so we built a binary classifier with our labeled dataset. This model uses a transformer-based machine learning technique and is pre-trained on a large corpus called RoBERTa, resulting in an F1 score of 87%. This model helps to identify potential conspiracy theories in new articles. We used a combination of clustering (HDBSCAN) and a dimension reduction technique (UMAP) to assign a label from the tree to these new articles detected as conspiracy theories. We then labeled these groups accordingly to help us match them to the tree. These can lead us to detect new conspiracy theories and expand the tree using computational methods. We successfully generated a tree of conspiracy theories and built a pipeline to detect and categorize conspiracy theories within any text corpora. This pipeline gives us valuable insights through any databases formatted as text.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"1 1","pages":""},"PeriodicalIF":3.0000,"publicationDate":"2024-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"EPJ Data Science","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1140/epjds/s13688-024-00470-5","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

A conspiracy theory (CT) suggests covert groups or powerful individuals secretly manipulate events. Not knowing about existing conspiracy theories could make one more likely to believe them, so this work aims to compile a list of CTs shaped as a tree that is as comprehensive as possible. We began with a manually curated ‘tree’ of CTs from academic papers and Wikipedia. Next, we examined 1769 CT-related articles from four fact-checking websites, focusing on their core content, and used a technique called Keyphrase Extraction to label the documents. This process yielded 769 identified conspiracies, each assigned a label and a family name. The second goal of this project was to detect whether an article is a conspiracy theory, so we built a binary classifier with our labeled dataset. This model uses a transformer-based machine learning technique and is pre-trained on a large corpus called RoBERTa, resulting in an F1 score of 87%. This model helps to identify potential conspiracy theories in new articles. We used a combination of clustering (HDBSCAN) and a dimension reduction technique (UMAP) to assign a label from the tree to these new articles detected as conspiracy theories. We then labeled these groups accordingly to help us match them to the tree. These can lead us to detect new conspiracy theories and expand the tree using computational methods. We successfully generated a tree of conspiracy theories and built a pipeline to detect and categorize conspiracy theories within any text corpora. This pipeline gives us valuable insights through any databases formatted as text.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
建立揭示阴谋论的分层模型
阴谋论(CT)是指秘密团体或有权势的个人暗中操纵事件。不了解现有的阴谋论可能会让人更容易相信它们,因此这项工作旨在编制一份尽可能全面的阴谋论树状列表。我们首先从学术论文和维基百科中人工编辑了一棵 CT "树"。接下来,我们检查了四个事实核查网站中与 CT 相关的 1769 篇文章,重点关注其核心内容,并使用一种名为 "关键词提取 "的技术对文档进行标注。在此过程中,我们识别出了 769 个阴谋,每个阴谋都有一个标签和姓氏。这个项目的第二个目标是检测一篇文章是否是阴谋论,因此我们用标注过的数据集建立了一个二元分类器。该模型使用了基于变换器的机器学习技术,并在名为 RoBERTa 的大型语料库上进行了预训练,结果 F1 得分为 87%。该模型有助于识别新文章中潜在的阴谋论。我们结合使用了聚类(HDBSCAN)和降维技术(UMAP),为这些被检测为阴谋论的新文章分配树标签。然后,我们对这些组进行相应的标记,以帮助我们将它们与树进行匹配。这些可以帮助我们检测出新的阴谋论,并使用计算方法扩展树。我们成功生成了一棵阴谋论树,并建立了一个在任何文本语料库中检测和分类阴谋论的管道。通过该管道,我们可以从任何文本格式的数据库中获得有价值的见解。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
EPJ Data Science
EPJ Data Science MATHEMATICS, INTERDISCIPLINARY APPLICATIONS -
CiteScore
6.10
自引率
5.60%
发文量
53
审稿时长
13 weeks
期刊介绍: EPJ Data Science covers a broad range of research areas and applications and particularly encourages contributions from techno-socio-economic systems, where it comprises those research lines that now regard the digital “tracks” of human beings as first-order objects for scientific investigation. Topics include, but are not limited to, human behavior, social interaction (including animal societies), economic and financial systems, management and business networks, socio-technical infrastructure, health and environmental systems, the science of science, as well as general risk and crisis scenario forecasting up to and including policy advice.
期刊最新文献
Estimating work engagement from online chat tools Language and the use of law are predictive of judge gender and seniority Connection between climatic change and international food prices: evidence from robust long-range cross-correlation and variable-lag transfer entropy with sliding windows approach Keep your friends close, and your enemies closer: structural properties of negative relationships on Twitter Analyzing user ideologies and shared news during the 2019 argentinian elections
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1