{"title":"Leveraging social computing for epidemic surveillance: A case study","authors":"Bilal Tahir , Muhammad Amir Mehmood","doi":"10.1016/j.bdr.2024.100483","DOIUrl":null,"url":null,"abstract":"<div><p>Social media platforms have become a popular source of information for real-time monitoring of events and user behavior. In particular, Twitter provides invaluable information related to diseases and public health to build real-time disease surveillance systems. Effective use of such social media platforms for public health surveillance requires data-driven AI models which are hindered by the difficult, expensive, and time-consuming task of collecting high-quality and large-scale datasets. In this paper, we build and analyze the Epidemic TweetBank (EpiBank) dataset containing 271 million English tweets related to six epidemic-prone diseases COVID19, Flu, Hepatitis, Dengue, Malaria, and HIV/AIDs. For this purpose, we develop a tool of ESS-T (Epidemic Surveillance Study via Twitter) which collects tweets according to provided input parameters and keywords. Also, our tool assigns location to tweets with 95% accuracy value and performs analysis of collected tweets focusing on temporal distribution, spatial patterns, users, entities, sentiment, and misinformation. Leveraging ESS-T, we build two geo-tagged datasets of EpiBank-global and EpiBank-Pak containing 86 million tweets from 190 countries and 2.6 million tweets from Pakistan, respectively. Our spatial analysis of EpiBank-global for COVID19, Malaria, and Dengue indicates that our framework correctly identifies high-risk epidemic-prone countries according to World Health Organization (WHO) statistics.</p></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"38 ","pages":"Article 100483"},"PeriodicalIF":3.5000,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Big Data Research","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2214579624000583","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Social media platforms have become a popular source of information for real-time monitoring of events and user behavior. In particular, Twitter provides invaluable information related to diseases and public health to build real-time disease surveillance systems. Effective use of such social media platforms for public health surveillance requires data-driven AI models which are hindered by the difficult, expensive, and time-consuming task of collecting high-quality and large-scale datasets. In this paper, we build and analyze the Epidemic TweetBank (EpiBank) dataset containing 271 million English tweets related to six epidemic-prone diseases COVID19, Flu, Hepatitis, Dengue, Malaria, and HIV/AIDs. For this purpose, we develop a tool of ESS-T (Epidemic Surveillance Study via Twitter) which collects tweets according to provided input parameters and keywords. Also, our tool assigns location to tweets with 95% accuracy value and performs analysis of collected tweets focusing on temporal distribution, spatial patterns, users, entities, sentiment, and misinformation. Leveraging ESS-T, we build two geo-tagged datasets of EpiBank-global and EpiBank-Pak containing 86 million tweets from 190 countries and 2.6 million tweets from Pakistan, respectively. Our spatial analysis of EpiBank-global for COVID19, Malaria, and Dengue indicates that our framework correctly identifies high-risk epidemic-prone countries according to World Health Organization (WHO) statistics.
期刊介绍:
The journal aims to promote and communicate advances in big data research by providing a fast and high quality forum for researchers, practitioners and policy makers from the very many different communities working on, and with, this topic.
The journal will accept papers on foundational aspects in dealing with big data, as well as papers on specific Platforms and Technologies used to deal with big data. To promote Data Science and interdisciplinary collaboration between fields, and to showcase the benefits of data driven research, papers demonstrating applications of big data in domains as diverse as Geoscience, Social Web, Finance, e-Commerce, Health Care, Environment and Climate, Physics and Astronomy, Chemistry, life sciences and drug discovery, digital libraries and scientific publications, security and government will also be considered. Occasionally the journal may publish whitepapers on policies, standards and best practices.