A Large-Scale Dataset of Search Interests Related to Disease X Originating from Different Geographic Regions

IF 2.2 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Data Pub Date : 2023-10-26 DOI:10.3390/data8110163
Nirmalya Thakur, Shuqi Cui, Kesha A. Patel, Isabella Hall, Yuvraj Nihal Duggal
{"title":"A Large-Scale Dataset of Search Interests Related to Disease X Originating from Different Geographic Regions","authors":"Nirmalya Thakur, Shuqi Cui, Kesha A. Patel, Isabella Hall, Yuvraj Nihal Duggal","doi":"10.3390/data8110163","DOIUrl":null,"url":null,"abstract":"The World Health Organization (WHO) added Disease X to their shortlist of blueprint priority diseases to represent a hypothetical, unknown pathogen that could cause a future epidemic. During different virus outbreaks of the past, such as COVID-19, Influenza, Lyme Disease, and Zika virus, researchers from various disciplines utilized Google Trends to mine multimodal components of web behavior to study, investigate, and analyze the global awareness, preparedness, and response associated with these respective virus outbreaks. As the world prepares for Disease X, a dataset on web behavior related to Disease X would be crucial to contribute towards the timely advancement of research in this field. Furthermore, none of the prior works in this field have focused on the development of a dataset to compile relevant web behavior data, which would help to prepare for Disease X. To address these research challenges, this work presents a dataset of web behavior related to Disease X, which emerged from different geographic regions of the world, between February 2018 and August 2023. Specifically, this dataset presents the search interests related to Disease X from 94 geographic regions. These regions were chosen for data mining as these regions recorded significant search interests related to Disease X during this timeframe. The dataset was developed by collecting data using Google Trends. The relevant search interests for all these regions for each month in this time range are available in this dataset. This paper also discusses the compliance of this dataset with the FAIR principles of scientific data management. Finally, an analysis of this dataset is presented to uphold the applicability, relevance, and usefulness of this dataset for the investigation of different research questions in the interrelated fields of Big Data, Data Mining, Healthcare, Epidemiology, and Data Analysis with a specific focus on Disease X.","PeriodicalId":36824,"journal":{"name":"Data","volume":"8 1","pages":"0"},"PeriodicalIF":2.2000,"publicationDate":"2023-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/data8110163","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

The World Health Organization (WHO) added Disease X to their shortlist of blueprint priority diseases to represent a hypothetical, unknown pathogen that could cause a future epidemic. During different virus outbreaks of the past, such as COVID-19, Influenza, Lyme Disease, and Zika virus, researchers from various disciplines utilized Google Trends to mine multimodal components of web behavior to study, investigate, and analyze the global awareness, preparedness, and response associated with these respective virus outbreaks. As the world prepares for Disease X, a dataset on web behavior related to Disease X would be crucial to contribute towards the timely advancement of research in this field. Furthermore, none of the prior works in this field have focused on the development of a dataset to compile relevant web behavior data, which would help to prepare for Disease X. To address these research challenges, this work presents a dataset of web behavior related to Disease X, which emerged from different geographic regions of the world, between February 2018 and August 2023. Specifically, this dataset presents the search interests related to Disease X from 94 geographic regions. These regions were chosen for data mining as these regions recorded significant search interests related to Disease X during this timeframe. The dataset was developed by collecting data using Google Trends. The relevant search interests for all these regions for each month in this time range are available in this dataset. This paper also discusses the compliance of this dataset with the FAIR principles of scientific data management. Finally, an analysis of this dataset is presented to uphold the applicability, relevance, and usefulness of this dataset for the investigation of different research questions in the interrelated fields of Big Data, Data Mining, Healthcare, Epidemiology, and Data Analysis with a specific focus on Disease X.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
源自不同地理区域的与疾病X相关的大规模搜索兴趣数据集
世界卫生组织(世卫组织)将疾病X添加到他们的蓝图优先疾病名单中,代表一种假设的未知病原体,可能导致未来的流行病。在过去不同的病毒爆发期间,如COVID-19、流感、莱姆病和寨卡病毒,来自不同学科的研究人员利用谷歌趋势挖掘网络行为的多模态组成部分,研究、调查和分析与这些病毒爆发相关的全球意识、准备和响应。随着世界为疾病X做准备,与疾病X相关的网络行为数据集对于及时推进该领域的研究至关重要。此外,该领域之前的工作都没有专注于开发一个数据集来编译相关的网络行为数据,这将有助于为疾病X做准备。为了解决这些研究挑战,本工作提出了一个与疾病X相关的网络行为数据集,这些数据来自2018年2月至2023年8月之间的世界不同地理区域。具体来说,该数据集展示了来自94个地理区域的与X疾病相关的搜索兴趣。选择这些区域进行数据挖掘是因为这些区域在此时间段内记录了与X疾病相关的重要搜索兴趣。该数据集是通过使用谷歌趋势收集数据而开发的。在这个数据集中可以找到所有这些地区在这个时间范围内每个月的相关搜索兴趣。本文还讨论了该数据集是否符合科学数据管理的FAIR原则。最后,对该数据集进行了分析,以维护该数据集在大数据、数据挖掘、医疗保健、流行病学和数据分析等相关领域的不同研究问题的适用性、相关性和有用性,并特别关注X疾病。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Data
Data Decision Sciences-Information Systems and Management
CiteScore
4.30
自引率
3.80%
发文量
0
审稿时长
10 weeks
期刊最新文献
Medical Opinions Analysis about the Decrease of Autopsies Using Emerging Pattern Mining Unlocking Insights: Analysing COVID-19 Lockdown Policies and Mobility Data in Victoria, Australia, through a Data-Driven Machine Learning Approach Expert-Annotated Dataset to Study Cyberbullying in Polish Language Genome Sequence of the Plant-Growth-Promoting Endophyte Curtobacterium flaccumfaciens Strain W004 A Qualitative Dataset for Coffee Bio-Aggressors Detection Based on the Ancestral Knowledge of the Cauca Coffee Farmers in Colombia
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1