Forecasting local COVID-19/Respiratory Disease mortality via national longitudinal shopping data: the case for integrating digital footprint data into early warning systems

James Goulding, Elizabeth Dolan, Gavin Long, Anya Skatova, John Harvey, Gavin Smith, Laila Tata
{"title":"Forecasting local COVID-19/Respiratory Disease mortality via national longitudinal shopping data: the case for integrating digital footprint data into early warning systems","authors":"James Goulding, Elizabeth Dolan, Gavin Long, Anya Skatova, John Harvey, Gavin Smith, Laila Tata","doi":"10.23889/ijpds.v8i3.2290","DOIUrl":null,"url":null,"abstract":"Introduction & BackgroundThe COVID-19 pandemic led to unparalleled pressure on healthcare services, highlighting the need for improved healthcare planning for respiratory disease outbreaks. With rapid virus diversification, and correspondingly rapid shifts in symptom expression, there is often a complete lack of representative clinical testing data available to modellers. This is especially true at the onset in outbreaks, where traditional epidemiological and statistical approaches that utilise case data ‘ground truths’ are extremely challenging to apply. In this abstract we preview the results of two novel studies that investigate how the use of digital footprint data - in the form of over-the-counter medication sales - might serve as a predictive proxy for underlying and often hidden disease incidence, and the extent to which such data might improve mortality rate forecasting at local area levels.
 Objectives & ApproachOver 2 billion transactions logged by a UK high-street health retailer were collated across English local authorities (n=314), generating weekly variables corresponding to a range of health purchase behaviours (e.g cough mixture / pain-relief sales) in each authority. These purchase data were additionally linked to a set of independent variables describing each local authority’s 1. weekly environment (e.g. weather, temperature, pollution), 2. socio-demographics (e.g. age distributions, deprivation levels, population densities) and 3. available local test case data. Machine learning regression models were then deployed to investigate the ability of each of these variable sets to underpin predictions of weekly registered deaths in the 314 authorities that were due to: COVID-19 between Apr 2020 - Dec 2021 (Study 1) or general respiratory disease between March 2016 - Mar 2020 (Study 2). All models were rigorously tested out-of-sample via walk forward cross-validation, and across a range of forecast windows.
 Relevance to Digital FootprintsEpidemics such as COVID-19 are recognised as being driven as much by behavioural factors as they are by clinical ones. Indicators of infection rates may be revealed in purchasing and self-medication logs, where there exists rich data: in 2022 UK citizens were reported to generate >1 billion prescriptions; consume ~6,300 tonnes of paracetamol; and spend £572m on cough, cold and sore throat treatments. Application of the digital footprint data logs generated by such activities may hold potential to reveal hidden disease incidence and risk to vulnerable communities, without reliance on prohibitively expensive testing infrastructures.
 ResultsEvidence was found that models incorporating digital footprint sales data were able to significantly out-perform models that used variables traditionally associated with respiratory disease alone (e.g. sociodemographics, weather, or case data). In Study 1, XGBoost models were able to optimally predict the number of COVID deaths 21 days in advance (R2=0.71***), significantly outperforming models based on official COVID case data alone at local-area levels (R2=0.44**). For the pre-COVID period, where registered deaths express a far greater seasonal pattern, models optimally predicted registered respiratory deaths 17 days in advance (R2=0.78***), with highest accuracy gains over models without digital footprint data (increases in R2 between 0.09 to 0.11) occurring in periods of maximum risk to the general public (winter periods).
 Conclusions & ImplicationsOver-the-counter medication purchases related to management of respiratory illness are correlated with registered deaths at a 17-21 day window. Results demonstrate the potential for sales data to support early warning population health mechanisms at local area levels, and the need for ongoing research into their application to support health planning.","PeriodicalId":132937,"journal":{"name":"International Journal for Population Data Science","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal for Population Data Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23889/ijpds.v8i3.2290","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Introduction & BackgroundThe COVID-19 pandemic led to unparalleled pressure on healthcare services, highlighting the need for improved healthcare planning for respiratory disease outbreaks. With rapid virus diversification, and correspondingly rapid shifts in symptom expression, there is often a complete lack of representative clinical testing data available to modellers. This is especially true at the onset in outbreaks, where traditional epidemiological and statistical approaches that utilise case data ‘ground truths’ are extremely challenging to apply. In this abstract we preview the results of two novel studies that investigate how the use of digital footprint data - in the form of over-the-counter medication sales - might serve as a predictive proxy for underlying and often hidden disease incidence, and the extent to which such data might improve mortality rate forecasting at local area levels. Objectives & ApproachOver 2 billion transactions logged by a UK high-street health retailer were collated across English local authorities (n=314), generating weekly variables corresponding to a range of health purchase behaviours (e.g cough mixture / pain-relief sales) in each authority. These purchase data were additionally linked to a set of independent variables describing each local authority’s 1. weekly environment (e.g. weather, temperature, pollution), 2. socio-demographics (e.g. age distributions, deprivation levels, population densities) and 3. available local test case data. Machine learning regression models were then deployed to investigate the ability of each of these variable sets to underpin predictions of weekly registered deaths in the 314 authorities that were due to: COVID-19 between Apr 2020 - Dec 2021 (Study 1) or general respiratory disease between March 2016 - Mar 2020 (Study 2). All models were rigorously tested out-of-sample via walk forward cross-validation, and across a range of forecast windows. Relevance to Digital FootprintsEpidemics such as COVID-19 are recognised as being driven as much by behavioural factors as they are by clinical ones. Indicators of infection rates may be revealed in purchasing and self-medication logs, where there exists rich data: in 2022 UK citizens were reported to generate >1 billion prescriptions; consume ~6,300 tonnes of paracetamol; and spend £572m on cough, cold and sore throat treatments. Application of the digital footprint data logs generated by such activities may hold potential to reveal hidden disease incidence and risk to vulnerable communities, without reliance on prohibitively expensive testing infrastructures. ResultsEvidence was found that models incorporating digital footprint sales data were able to significantly out-perform models that used variables traditionally associated with respiratory disease alone (e.g. sociodemographics, weather, or case data). In Study 1, XGBoost models were able to optimally predict the number of COVID deaths 21 days in advance (R2=0.71***), significantly outperforming models based on official COVID case data alone at local-area levels (R2=0.44**). For the pre-COVID period, where registered deaths express a far greater seasonal pattern, models optimally predicted registered respiratory deaths 17 days in advance (R2=0.78***), with highest accuracy gains over models without digital footprint data (increases in R2 between 0.09 to 0.11) occurring in periods of maximum risk to the general public (winter periods). Conclusions & ImplicationsOver-the-counter medication purchases related to management of respiratory illness are correlated with registered deaths at a 17-21 day window. Results demonstrate the potential for sales data to support early warning population health mechanisms at local area levels, and the need for ongoing research into their application to support health planning.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
通过全国纵向购物数据预测当地COVID-19/呼吸系统疾病死亡率:将数字足迹数据纳入预警系统的案例
介绍,2019冠状病毒病大流行给卫生保健服务带来了前所未有的压力,凸显了改善呼吸道疾病暴发卫生保健规划的必要性。随着病毒的快速多样化,以及相应的症状表达的快速变化,建模者往往完全缺乏具有代表性的临床测试数据。在疫情暴发初期尤其如此,在这种情况下,利用病例数据“基础事实”的传统流行病学和统计方法极具挑战性。在这篇摘要中,我们预览了两项新研究的结果,这两项研究调查了如何使用数字足迹数据(以非处方药销售的形式)作为潜在的和通常隐藏的疾病发病率的预测代理,以及这些数据在多大程度上可以改善地方一级的死亡率预测。目标,在英国各地方当局(n=314)整理了一家英国高街保健零售商记录的约20亿笔交易,生成了与每个当局一系列保健购买行为(例如止咳合剂/止痛药销售)相对应的每周变量。这些购买数据还与一组描述每个地方政府的1。1 .每周环境(如天气、温度、污染);2 .社会人口统计(如年龄分布、贫困程度、人口密度);可用的本地测试用例数据。然后使用机器学习回归模型来调查每个变量集的能力,以支持对314个当局中每周登记死亡人数的预测:2020年4月至2021年12月(研究1)期间的COVID-19(研究1)或2016年3月至2020年3月期间的一般呼吸道疾病(研究2)。所有模型都通过向前交叉验证和一系列预测窗口进行了严格的样本外测试。 人们认为,COVID-19等流行病不仅受到临床因素的影响,也受到行为因素的影响。感染率的指标可能会在购买和自我用药日志中显示出来,这些日志中存在丰富的数据:据报道,英国公民在2022年开出了10亿张处方;消耗约6,300吨扑热息痛;花费5.72亿英镑用于治疗咳嗽、感冒和喉咙痛。应用这些活动产生的数字足迹数据日志可能有潜力揭示隐藏的疾病发病率和脆弱社区的风险,而无需依赖过于昂贵的测试基础设施。结果有证据表明,纳入数字足迹销售数据的模型能够显著优于仅使用传统上与呼吸系统疾病相关的变量(例如社会人口统计学、天气或病例数据)的模型。在研究1中,XGBoost模型能够提前21天预测新冠肺炎死亡人数(R2=0.71***),显著优于仅基于地方层面官方病例数据的模型(R2=0.44**)。在covid - 19之前的时期,登记的死亡人数表现出更大的季节性模式,模型提前17天预测登记的呼吸道死亡人数最佳(R2=0.78***),在对公众风险最大的时期(冬季),与没有数字足迹数据的模型相比,准确性增益最高(R2增加在0.09至0.11之间)。结论,与呼吸系统疾病管理相关的非处方药物购买与17-21天内的登记死亡相关。结果表明,销售数据具有支持地方一级人口健康预警机制的潜力,需要对其应用进行持续研究,以支持健康规划。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Using novel data linkage of biobank data with administrative health data to inform genomic analysis for future precision medicine treatment of congenital heart disease Common governance model: a way to avoid data segregation between existing trusted research environment Federated learning for generating synthetic data: a scoping review Health Data Governance for Research Use in Alberta Establishment of a birth-to-education cohort of 1 million Palestinian refugees using electronic medical records and electronic education records
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1