COCO: an annotated Twitter dataset of COVID-19 conspiracy theories.

IF 2 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS Journal of Computational Social Science Pub Date : 2023-04-04 DOI:10.1007/s42001-023-00200-3
Johannes Langguth, Daniel Thilo Schroeder, Petra Filkuková, Stefan Brenner, Jesper Phillips, Konstantin Pogorelov
{"title":"COCO: an annotated Twitter dataset of COVID-19 conspiracy theories.","authors":"Johannes Langguth,&nbsp;Daniel Thilo Schroeder,&nbsp;Petra Filkuková,&nbsp;Stefan Brenner,&nbsp;Jesper Phillips,&nbsp;Konstantin Pogorelov","doi":"10.1007/s42001-023-00200-3","DOIUrl":null,"url":null,"abstract":"<p><p>The COVID-19 pandemic has been accompanied by a surge of misinformation on social media which covered a wide range of different topics and contained many competing narratives, including conspiracy theories. To study such conspiracy theories, we created a dataset of 3495 tweets with manual labeling of the stance of each tweet w.r.t. 12 different conspiracy topics. The dataset thus contains almost 42,000 labels, each of which determined by majority among three expert annotators. The dataset was selected from COVID-19 related Twitter data spanning from January 2020 to June 2021 using a list of 54 keywords. The dataset can be used to train machine learning based classifiers for both stance and topic detection, either individually or simultaneously. BERT was used successfully for the combined task. The dataset can also be used to further study the prevalence of different conspiracy narratives. To this end we qualitatively analyze the tweets, discussing the structure of conspiracy narratives that are frequently found in the dataset. Furthermore, we illustrate the interconnection between the conspiracy categories as well as the keywords.</p>","PeriodicalId":29946,"journal":{"name":"Journal of Computational Social Science","volume":null,"pages":null},"PeriodicalIF":2.0000,"publicationDate":"2023-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10071453/pdf/","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Computational Social Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s42001-023-00200-3","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"SOCIAL SCIENCES, MATHEMATICAL METHODS","Score":null,"Total":0}
引用次数: 1

Abstract

The COVID-19 pandemic has been accompanied by a surge of misinformation on social media which covered a wide range of different topics and contained many competing narratives, including conspiracy theories. To study such conspiracy theories, we created a dataset of 3495 tweets with manual labeling of the stance of each tweet w.r.t. 12 different conspiracy topics. The dataset thus contains almost 42,000 labels, each of which determined by majority among three expert annotators. The dataset was selected from COVID-19 related Twitter data spanning from January 2020 to June 2021 using a list of 54 keywords. The dataset can be used to train machine learning based classifiers for both stance and topic detection, either individually or simultaneously. BERT was used successfully for the combined task. The dataset can also be used to further study the prevalence of different conspiracy narratives. To this end we qualitatively analyze the tweets, discussing the structure of conspiracy narratives that are frequently found in the dataset. Furthermore, we illustrate the interconnection between the conspiracy categories as well as the keywords.

Abstract Image

Abstract Image

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
COCO:新冠肺炎阴谋论的注释推特数据集。
新冠肺炎大流行期间,社交媒体上的错误信息激增,涵盖了广泛的不同话题,并包含了许多相互竞争的叙述,包括阴谋论。为了研究这些阴谋论,我们创建了一个由3495条推文组成的数据集,其中手动标记了每条推文的立场,涉及12个不同的阴谋主题。因此,该数据集包含近42000个标签,每个标签由三位专家注释者中的大多数决定。该数据集是从2020年1月至2021年6月的新冠肺炎相关推特数据中选择的,使用了54个关键词。该数据集可用于单独或同时训练基于机器学习的分类器,用于立场和主题检测。BERT已成功用于组合任务。该数据集还可用于进一步研究不同阴谋叙事的流行情况。为此,我们对推文进行了定性分析,讨论了数据集中经常出现的阴谋叙事的结构。此外,我们还说明了阴谋类别和关键词之间的相互联系。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of Computational Social Science
Journal of Computational Social Science SOCIAL SCIENCES, MATHEMATICAL METHODS-
CiteScore
6.20
自引率
6.20%
发文量
30
期刊最新文献
Digital intermediaries in pandemic times: social media and the role of bots in communicating emotions and stress about Coronavirus Correction to: The AI community building the future? A quantitative analysis of development activity on Hugging Face Hub Comparing location-specific and location-open social media data: methodological lessons from a study of blaming of minorities on Twitter during the COVID-19 pandemic Uncovering electric vehicle ownership disparities using K-means clustering analysis: A case study of Austin, Texas MAS-Bench: a benchmarking for parameter calibration of multi-agent crowd simulation
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1