Tamanna Hossain, Robert L. Logan, Arjuna Ugarte, Yoshitomo Matsubara, Sameer Singh, Sean Young
{"title":"DETECTING COVID-19 MISINFORMATION ON SOCIAL MEDIA","authors":"Tamanna Hossain, Robert L. Logan, Arjuna Ugarte, Yoshitomo Matsubara, Sameer Singh, Sean Young","doi":"10.48009/3_iis_2023_124","DOIUrl":null,"url":null,"abstract":"The ongoing pandemic has heightened the need for developing tools to flag COVID-19related misinformation on the internet, specifically on social media such as Twitter. However, due to novel language and the rapid change of information, existing misinformation detection datasets are not effective in evaluating systems designed to detect misinformation on this topic. Misinformation detection can be subdivided into two sub-tasks retrieval of misconceptions relevant to posts being checked for veracity, and stance detection to identify whether the posts agree, disagree, or express no stance towards the retrieved misconceptions. To facilitate research on this task, we release COVID-Lies1, a dataset of 5K expert-annotated tweets to evaluate the performance of misinformation detection systems on 86 different pieces of COVID-19 related misinformation. We evaluate existing NLP systems on this dataset, providing first benchmarks and identifying key challenges for future models to improve upon.","PeriodicalId":33557,"journal":{"name":"Issues in Information Systems","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"26","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Issues in Information Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48009/3_iis_2023_124","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 26
Abstract
The ongoing pandemic has heightened the need for developing tools to flag COVID-19related misinformation on the internet, specifically on social media such as Twitter. However, due to novel language and the rapid change of information, existing misinformation detection datasets are not effective in evaluating systems designed to detect misinformation on this topic. Misinformation detection can be subdivided into two sub-tasks retrieval of misconceptions relevant to posts being checked for veracity, and stance detection to identify whether the posts agree, disagree, or express no stance towards the retrieved misconceptions. To facilitate research on this task, we release COVID-Lies1, a dataset of 5K expert-annotated tweets to evaluate the performance of misinformation detection systems on 86 different pieces of COVID-19 related misinformation. We evaluate existing NLP systems on this dataset, providing first benchmarks and identifying key challenges for future models to improve upon.