MMCoVaR: multimodal COVID-19 vaccine focused data repository for fake news detection and a baseline architecture for classification

Mingxuan Chen, Xinqiao Chu, K. P. Subbalakshmi
{"title":"MMCoVaR: multimodal COVID-19 vaccine focused data repository for fake news detection and a baseline architecture for classification","authors":"Mingxuan Chen, Xinqiao Chu, K. P. Subbalakshmi","doi":"10.1145/3487351.3488346","DOIUrl":null,"url":null,"abstract":"The outbreak of COVID-19 has resulted in an \"infodemic\" that has encouraged the propagation of misinformation about COVID-19 and cure methods which, in turn, could negatively affect the adoption of recommended public health measures in the larger population. In this paper, we provide a new multimodal (consisting of images, text and temporal information) labeled dataset containing news articles and tweets on the COVID-19 vaccine. We collected 2,593 news articles from 80 publishers for one year between Feb 16th 2020 to May 8th 2021 and 24184 Twitter posts (collected between April 17th 2021 to May 8th 2021). We combine ratings from two news media ranking sites: Medias Bias Chart and Media Bias/Fact Check (MBFC) to classify the news dataset into two levels of credibility: reliable and unreliable. The combination of two filters allows for higher precision of labeling. We also propose a stance detection mechanism to annotate tweets into three levels of credibility: reliable, unreliable and inconclusive. We provide several statistics as well as other analytics like, publisher distribution, publication date distribution, topic analysis, etc. We also provide a novel architecture that classifies the news data into misinformation or truth to provide a baseline performance for this dataset. We find that the proposed architecture has an F-Score of 0.919 and accuracy of 0.882 for fake news detection. Furthermore, we provide benchmark performance for misinformation detection on tweet dataset. This new multimodal dataset can be used in research on COVID-19 vaccine, including misinformation detection, influence of fake COVID-19 vaccine information, etc.","PeriodicalId":320904,"journal":{"name":"Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3487351.3488346","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 14

Abstract

The outbreak of COVID-19 has resulted in an "infodemic" that has encouraged the propagation of misinformation about COVID-19 and cure methods which, in turn, could negatively affect the adoption of recommended public health measures in the larger population. In this paper, we provide a new multimodal (consisting of images, text and temporal information) labeled dataset containing news articles and tweets on the COVID-19 vaccine. We collected 2,593 news articles from 80 publishers for one year between Feb 16th 2020 to May 8th 2021 and 24184 Twitter posts (collected between April 17th 2021 to May 8th 2021). We combine ratings from two news media ranking sites: Medias Bias Chart and Media Bias/Fact Check (MBFC) to classify the news dataset into two levels of credibility: reliable and unreliable. The combination of two filters allows for higher precision of labeling. We also propose a stance detection mechanism to annotate tweets into three levels of credibility: reliable, unreliable and inconclusive. We provide several statistics as well as other analytics like, publisher distribution, publication date distribution, topic analysis, etc. We also provide a novel architecture that classifies the news data into misinformation or truth to provide a baseline performance for this dataset. We find that the proposed architecture has an F-Score of 0.919 and accuracy of 0.882 for fake news detection. Furthermore, we provide benchmark performance for misinformation detection on tweet dataset. This new multimodal dataset can be used in research on COVID-19 vaccine, including misinformation detection, influence of fake COVID-19 vaccine information, etc.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
MMCoVaR:用于假新闻检测的以COVID-19疫苗为重点的多模式数据存储库和分类基线架构
COVID-19的爆发导致了“信息大流行”,助长了关于COVID-19和治疗方法的错误信息的传播,这反过来又可能对在更大人群中采用建议的公共卫生措施产生负面影响。在本文中,我们提供了一个新的多模态(由图像、文本和时间信息组成)标记数据集,其中包含关于COVID-19疫苗的新闻文章和推文。我们在2020年2月16日至2021年5月8日期间收集了来自80家出版商的2,593篇新闻文章和24184篇Twitter帖子(收集于2021年4月17日至2021年5月8日)。我们结合两个新闻媒体排名网站的评级:媒体偏见图表和媒体偏见/事实检查(MBFC),将新闻数据集分为两个可信度级别:可靠和不可靠。两个过滤器的组合允许更高的标签精度。我们还提出了一种姿态检测机制,将推文标注为三个可信度级别:可靠、不可靠和不确定。我们提供一些统计数据以及其他分析,如出版商分布,出版日期分布,主题分析等。我们还提供了一种新的架构,将新闻数据分类为错误信息或事实,从而为该数据集提供基准性能。我们发现所提出的架构在假新闻检测上的F-Score为0.919,准确率为0.882。此外,我们还提供了推文数据集错误信息检测的基准性能。该多模态数据集可用于新冠肺炎疫苗的研究,包括错误信息检测、假疫苗信息的影响等。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Predicting COVID-19 with AI techniques: current research and future directions Predictions of drug metabolism pathways through CYP 3A4 enzyme by analysing drug-target interactions network graph An insight into network structure measures and number of driver nodes Temporal dynamics of posts and user engagement of influencers on Facebook and Instagram Vibe check: social resonance learning for enhanced recommendation
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1