Leveraging social computing for epidemic surveillance: A case study

IF 4.6 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Big Data Research Pub Date : 2024-11-28 Epub Date: 2024-08-08 DOI:10.1016/j.bdr.2024.100483

Bilal Tahir , Muhammad Amir Mehmood

{"title":"Leveraging social computing for epidemic surveillance: A case study","authors":"Bilal Tahir , Muhammad Amir Mehmood","doi":"10.1016/j.bdr.2024.100483","DOIUrl":null,"url":null,"abstract":"<div><p>Social media platforms have become a popular source of information for real-time monitoring of events and user behavior. In particular, Twitter provides invaluable information related to diseases and public health to build real-time disease surveillance systems. Effective use of such social media platforms for public health surveillance requires data-driven AI models which are hindered by the difficult, expensive, and time-consuming task of collecting high-quality and large-scale datasets. In this paper, we build and analyze the Epidemic TweetBank (EpiBank) dataset containing 271 million English tweets related to six epidemic-prone diseases COVID19, Flu, Hepatitis, Dengue, Malaria, and HIV/AIDs. For this purpose, we develop a tool of ESS-T (Epidemic Surveillance Study via Twitter) which collects tweets according to provided input parameters and keywords. Also, our tool assigns location to tweets with 95% accuracy value and performs analysis of collected tweets focusing on temporal distribution, spatial patterns, users, entities, sentiment, and misinformation. Leveraging ESS-T, we build two geo-tagged datasets of EpiBank-global and EpiBank-Pak containing 86 million tweets from 190 countries and 2.6 million tweets from Pakistan, respectively. Our spatial analysis of EpiBank-global for COVID19, Malaria, and Dengue indicates that our framework correctly identifies high-risk epidemic-prone countries according to World Health Organization (WHO) statistics.</p></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"38 ","pages":"Article 100483"},"PeriodicalIF":4.6000,"publicationDate":"2024-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Big Data Research","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2214579624000583","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/8/8 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Social media platforms have become a popular source of information for real-time monitoring of events and user behavior. In particular, Twitter provides invaluable information related to diseases and public health to build real-time disease surveillance systems. Effective use of such social media platforms for public health surveillance requires data-driven AI models which are hindered by the difficult, expensive, and time-consuming task of collecting high-quality and large-scale datasets. In this paper, we build and analyze the Epidemic TweetBank (EpiBank) dataset containing 271 million English tweets related to six epidemic-prone diseases COVID19, Flu, Hepatitis, Dengue, Malaria, and HIV/AIDs. For this purpose, we develop a tool of ESS-T (Epidemic Surveillance Study via Twitter) which collects tweets according to provided input parameters and keywords. Also, our tool assigns location to tweets with 95% accuracy value and performs analysis of collected tweets focusing on temporal distribution, spatial patterns, users, entities, sentiment, and misinformation. Leveraging ESS-T, we build two geo-tagged datasets of EpiBank-global and EpiBank-Pak containing 86 million tweets from 190 countries and 2.6 million tweets from Pakistan, respectively. Our spatial analysis of EpiBank-global for COVID19, Malaria, and Dengue indicates that our framework correctly identifies high-risk epidemic-prone countries according to World Health Organization (WHO) statistics.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

利用社交计算进行流行病监测：案例研究

社交媒体平台已成为实时监控事件和用户行为的热门信息来源。特别是，Twitter 为建立实时疾病监测系统提供了与疾病和公共卫生相关的宝贵信息。有效利用此类社交媒体平台进行公共卫生监测需要数据驱动的人工智能模型，而收集高质量和大规模数据集的工作难度大、成本高、耗时长，阻碍了人工智能模型的发展。在本文中，我们建立并分析了 Epidemic TweetBank（EpiBank）数据集，其中包含与 COVID19、流感、肝炎、登革热、疟疾和艾滋病毒/艾滋病六种流行病相关的 2.71 亿条英文推文。为此，我们开发了一个 ESS-T 工具（通过 Twitter 进行流行病监测研究），该工具可根据提供的输入参数和关键词收集推文。此外，我们的工具还能以 95% 的准确率为推文分配位置，并对收集到的推文进行分析，重点关注时间分布、空间模式、用户、实体、情感和错误信息。利用 ESS-T，我们建立了 EpiBank-global 和 EpiBank-Pak 两个地理标记数据集，分别包含来自 190 个国家的 8600 万条推文和来自巴基斯坦的 260 万条推文。我们针对 COVID19、疟疾和登革热对 EpiBank-global 进行的空间分析表明，根据世界卫生组织（WHO）的统计数据，我们的框架能正确识别流行病高发国家。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Big Data Research Computer Science-Computer Science Applications

CiteScore

8.40

自引率

3.00%

发文量

期刊介绍： The journal aims to promote and communicate advances in big data research by providing a fast and high quality forum for researchers, practitioners and policy makers from the very many different communities working on, and with, this topic. The journal will accept papers on foundational aspects in dealing with big data, as well as papers on specific Platforms and Technologies used to deal with big data. To promote Data Science and interdisciplinary collaboration between fields, and to showcase the benefits of data driven research, papers demonstrating applications of big data in domains as diverse as Geoscience, Social Web, Finance, e-Commerce, Health Care, Environment and Climate, Physics and Astronomy, Chemistry, life sciences and drug discovery, digital libraries and scientific publications, security and government will also be considered. Occasionally the journal may publish whitepapers on policies, standards and best practices.