Flusion: Integrating multiple data sources for accurate influenza predictions.

IF 3 3区 医学 Q2 INFECTIOUS DISEASES Epidemics Pub Date : 2024-12-25 DOI:10.1016/j.epidem.2024.100810
Evan L Ray, Yijin Wang, Russell D Wolfinger, Nicholas G Reich
{"title":"Flusion: Integrating multiple data sources for accurate influenza predictions.","authors":"Evan L Ray, Yijin Wang, Russell D Wolfinger, Nicholas G Reich","doi":"10.1016/j.epidem.2024.100810","DOIUrl":null,"url":null,"abstract":"<p><p>Over the last ten years, the US Centers for Disease Control and Prevention (CDC) has organized an annual influenza forecasting challenge with the motivation that accurate probabilistic forecasts could improve situational awareness and yield more effective public health actions. Starting with the 2021/22 influenza season, the forecasting targets for this challenge have been based on hospital admissions reported in the CDC's National Healthcare Safety Network (NHSN) surveillance system. Reporting of influenza hospital admissions through NHSN began within the last few years, and as such only a limited amount of historical data are available for this target signal. To produce forecasts in the presence of limited data for the target surveillance system, we augmented these data with two signals that have a longer historical record: 1) ILI+, which estimates the proportion of outpatient doctor visits where the patient has influenza; and 2) rates of laboratory-confirmed influenza hospitalizations at a selected set of healthcare facilities. Our model, Flusion, is an ensemble model that combines two machine learning models using gradient boosting for quantile regression based on different feature sets with a Bayesian autoregressive model. The gradient boosting models were trained on all three data signals, while the autoregressive model was trained on only data for the target surveillance signal, NHSN admissions; all three models were trained jointly on data for multiple locations. In each week of the influenza season, these models produced quantiles of a predictive distribution of influenza hospital admissions in each state for the current week and the following three weeks; the ensemble prediction was computed by averaging these quantile predictions. Flusion emerged as the top-performing model in the CDC's influenza prediction challenge for the 2023/24 season. In this article we investigate the factors contributing to Flusion's success, and we find that its strong performance was primarily driven by the use of a gradient boosting model that was trained jointly on data from multiple surveillance signals and multiple locations. These results indicate the value of sharing information across multiple locations and surveillance signals, especially when doing so adds to the pool of available training data.</p>","PeriodicalId":49206,"journal":{"name":"Epidemics","volume":"50 ","pages":"100810"},"PeriodicalIF":3.0000,"publicationDate":"2024-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Epidemics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.epidem.2024.100810","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"INFECTIOUS DISEASES","Score":null,"Total":0}
引用次数: 0

Abstract

Over the last ten years, the US Centers for Disease Control and Prevention (CDC) has organized an annual influenza forecasting challenge with the motivation that accurate probabilistic forecasts could improve situational awareness and yield more effective public health actions. Starting with the 2021/22 influenza season, the forecasting targets for this challenge have been based on hospital admissions reported in the CDC's National Healthcare Safety Network (NHSN) surveillance system. Reporting of influenza hospital admissions through NHSN began within the last few years, and as such only a limited amount of historical data are available for this target signal. To produce forecasts in the presence of limited data for the target surveillance system, we augmented these data with two signals that have a longer historical record: 1) ILI+, which estimates the proportion of outpatient doctor visits where the patient has influenza; and 2) rates of laboratory-confirmed influenza hospitalizations at a selected set of healthcare facilities. Our model, Flusion, is an ensemble model that combines two machine learning models using gradient boosting for quantile regression based on different feature sets with a Bayesian autoregressive model. The gradient boosting models were trained on all three data signals, while the autoregressive model was trained on only data for the target surveillance signal, NHSN admissions; all three models were trained jointly on data for multiple locations. In each week of the influenza season, these models produced quantiles of a predictive distribution of influenza hospital admissions in each state for the current week and the following three weeks; the ensemble prediction was computed by averaging these quantile predictions. Flusion emerged as the top-performing model in the CDC's influenza prediction challenge for the 2023/24 season. In this article we investigate the factors contributing to Flusion's success, and we find that its strong performance was primarily driven by the use of a gradient boosting model that was trained jointly on data from multiple surveillance signals and multiple locations. These results indicate the value of sharing information across multiple locations and surveillance signals, especially when doing so adds to the pool of available training data.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
fluusion:整合多个数据源以实现准确的流感预测。
在过去十年中,美国疾病控制和预防中心(CDC)组织了一年一度的流感预测挑战,其动机是准确的概率预测可以提高态势意识,并产生更有效的公共卫生行动。从2021/22年流感季节开始,这一挑战的预测目标是基于疾病预防控制中心国家卫生保健安全网(NHSN)监测系统报告的住院情况。在过去几年中,通过国家卫生保健网络开始报告流感住院情况,因此只有有限数量的历史数据可用于这一目标信号。为了在目标监测系统数据有限的情况下做出预测,我们用两个具有较长历史记录的信号来增强这些数据:1)ILI+,它估计患者患流感的门诊医生就诊比例;2)在选定的一组卫生保健机构中经实验室确诊的流感住院率。我们的模型fluusion是一个集成模型,它结合了两个机器学习模型,使用梯度增强进行基于不同特征集和贝叶斯自回归模型的分位数回归。梯度增强模型在所有三个数据信号上进行训练,而自回归模型仅在目标监视信号(NHSN录取)的数据上进行训练;所有三个模型都是在多个地点的数据上进行联合训练的。在流感季节的每一周,这些模型产生了当周和接下来三周内每个州流感住院人数的预测分布的分位数;集合预测是通过平均这些分位数预测来计算的。在美国疾病控制与预防中心的2023/24年流感预测挑战赛中,fluusion成为表现最好的模型。在本文中,我们研究了促成fluusion成功的因素,我们发现其强大的性能主要是由使用梯度增强模型驱动的,该模型是根据来自多个监视信号和多个位置的数据联合训练的。这些结果表明跨多个位置和监视信号共享信息的价值,特别是当这样做增加了可用的训练数据池时。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Epidemics
Epidemics INFECTIOUS DISEASES-
CiteScore
6.00
自引率
7.90%
发文量
92
审稿时长
140 days
期刊介绍: Epidemics publishes papers on infectious disease dynamics in the broadest sense. Its scope covers both within-host dynamics of infectious agents and dynamics at the population level, particularly the interaction between the two. Areas of emphasis include: spread, transmission, persistence, implications and population dynamics of infectious diseases; population and public health as well as policy aspects of control and prevention; dynamics at the individual level; interaction with the environment, ecology and evolution of infectious diseases, as well as population genetics of infectious agents.
期刊最新文献
Estimating the generation time for influenza transmission using household data in the United States. Reconstructing the first COVID-19 pandemic wave with minimal data in England. Retrospective modelling of the disease and mortality burden of the 1918-1920 influenza pandemic in Zurich, Switzerland. Flusion: Integrating multiple data sources for accurate influenza predictions. Infectious diseases: Household modeling with missing data.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1