Multi-Model Prediction of West Nile Virus Neuroinvasive Disease With Machine Learning for Identification of Important Regional Climatic Drivers

IF 3.8 2区医学 Q2 ENVIRONMENTAL SCIENCES Geohealth Pub Date : 2023-11-17 DOI:10.1029/2023GH000906

Karen M. Holcomb, J. Erin Staples, Randall J. Nett, Charles B. Beard, Lyle R. Petersen, Stanley G. Benjamin, Benjamin W. Green, Hunter Jones, Michael A. Johansson

{"title":"Multi-Model Prediction of West Nile Virus Neuroinvasive Disease With Machine Learning for Identification of Important Regional Climatic Drivers","authors":"Karen M. Holcomb, J. Erin Staples, Randall J. Nett, Charles B. Beard, Lyle R. Petersen, Stanley G. Benjamin, Benjamin W. Green, Hunter Jones, Michael A. Johansson","doi":"10.1029/2023GH000906","DOIUrl":null,"url":null,"abstract":"<p>West Nile virus (WNV) is the leading cause of mosquito-borne illness in the continental United States (CONUS). Spatial heterogeneity in historical incidence, environmental factors, and complex ecology make prediction of spatiotemporal variation in WNV transmission challenging. Machine learning provides promising tools for identification of important variables in such situations. To predict annual WNV neuroinvasive disease (WNND) cases in CONUS (2015–2021), we fitted 10 probabilistic models with variation in complexity from naïve to machine learning algorithm and an ensemble. We made predictions in each of nine climate regions on a hexagonal grid and evaluated each model's predictive accuracy. Using the machine learning models (random forest and neural network), we identified the relative importance and variation in ranking of predictors (historical WNND cases, climate anomalies, human demographics, and land use) across regions. We found that historical WNND cases and population density were among the most important factors while anomalies in temperature and precipitation often had relatively low importance. While the relative performance of each model varied across climatic regions, the magnitude of difference between models was small. All models except the naïve model had non-significant differences in performance relative to the baseline model (negative binomial model fit per hexagon). No model, including the ensemble or more complex machine learning models, outperformed models based on historical case counts on the hexagon or region level; these models are good forecasting benchmarks. Further work is needed to assess if predictive capacity can be improved beyond that of these historical baselines.</p>","PeriodicalId":48618,"journal":{"name":"Geohealth","volume":"7 11","pages":""},"PeriodicalIF":3.8000,"publicationDate":"2023-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://agupubs.onlinelibrary.wiley.com/doi/epdf/10.1029/2023GH000906","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Geohealth","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1029/2023GH000906","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}

引用次数: 0

Abstract

West Nile virus (WNV) is the leading cause of mosquito-borne illness in the continental United States (CONUS). Spatial heterogeneity in historical incidence, environmental factors, and complex ecology make prediction of spatiotemporal variation in WNV transmission challenging. Machine learning provides promising tools for identification of important variables in such situations. To predict annual WNV neuroinvasive disease (WNND) cases in CONUS (2015–2021), we fitted 10 probabilistic models with variation in complexity from naïve to machine learning algorithm and an ensemble. We made predictions in each of nine climate regions on a hexagonal grid and evaluated each model's predictive accuracy. Using the machine learning models (random forest and neural network), we identified the relative importance and variation in ranking of predictors (historical WNND cases, climate anomalies, human demographics, and land use) across regions. We found that historical WNND cases and population density were among the most important factors while anomalies in temperature and precipitation often had relatively low importance. While the relative performance of each model varied across climatic regions, the magnitude of difference between models was small. All models except the naïve model had non-significant differences in performance relative to the baseline model (negative binomial model fit per hexagon). No model, including the ensemble or more complex machine learning models, outperformed models based on historical case counts on the hexagon or region level; these models are good forecasting benchmarks. Further work is needed to assess if predictive capacity can be improved beyond that of these historical baselines.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

西尼罗病毒神经侵袭性疾病的多模型预测与机器学习识别重要的区域气候驱动因素

西尼罗河病毒(WNV)是美国大陆(CONUS)蚊媒疾病的主要原因。历史发病率的空间异质性、环境因素和复杂的生态环境使得预测西尼罗河病毒传播的时空变化具有挑战性。在这种情况下，机器学习为识别重要变量提供了很有前途的工具。为了预测CONUS(2015-2021)年度WNV神经侵袭性疾病(WNND)病例，我们拟合了10个复杂程度不同的概率模型，从naïve到机器学习算法和集成。我们在一个六边形网格上对九个气候区域进行了预测，并评估了每个模型的预测准确性。利用机器学习模型(随机森林和神经网络)，我们确定了各地区预测因子(历史WNND病例、气候异常、人口统计和土地利用)排名的相对重要性和变化。研究发现，历史WNND病例和人口密度是最重要的影响因素，而温度和降水异常的重要性相对较低。虽然每个模式的相对性能因气候区域而异，但模式之间的差异幅度很小。除naïve模型外，所有模型的性能与基线模型相比均无显著差异(每六边形负二项模型拟合)。没有模型，包括集成模型或更复杂的机器学习模型，在六边形或区域级别上优于基于历史案例计数的模型;这些模型是很好的预测基准。需要进一步的工作来评估预测能力是否可以在这些历史基线之上得到改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Geohealth Environmental Science-Pollution

CiteScore

6.80

自引率

6.20%

发文量

124

审稿时长

19 weeks

期刊介绍： GeoHealth will publish original research, reviews, policy discussions, and commentaries that cover the growing science on the interface among the Earth, atmospheric, oceans and environmental sciences, ecology, and the agricultural and health sciences. The journal will cover a wide variety of global and local issues including the impacts of climate change on human, agricultural, and ecosystem health, air and water pollution, environmental persistence of herbicides and pesticides, radiation and health, geomedicine, and the health effects of disasters. Many of these topics and others are of critical importance in the developing world and all require bringing together leading research across multiple disciplines.