Karen M. Holcomb, J. Erin Staples, Randall J. Nett, Charles B. Beard, Lyle R. Petersen, Stanley G. Benjamin, Benjamin W. Green, Hunter Jones, Michael A. Johansson
{"title":"Multi-Model Prediction of West Nile Virus Neuroinvasive Disease With Machine Learning for Identification of Important Regional Climatic Drivers","authors":"Karen M. Holcomb, J. Erin Staples, Randall J. Nett, Charles B. Beard, Lyle R. Petersen, Stanley G. Benjamin, Benjamin W. Green, Hunter Jones, Michael A. Johansson","doi":"10.1029/2023GH000906","DOIUrl":null,"url":null,"abstract":"<p>West Nile virus (WNV) is the leading cause of mosquito-borne illness in the continental United States (CONUS). Spatial heterogeneity in historical incidence, environmental factors, and complex ecology make prediction of spatiotemporal variation in WNV transmission challenging. Machine learning provides promising tools for identification of important variables in such situations. To predict annual WNV neuroinvasive disease (WNND) cases in CONUS (2015–2021), we fitted 10 probabilistic models with variation in complexity from naïve to machine learning algorithm and an ensemble. We made predictions in each of nine climate regions on a hexagonal grid and evaluated each model's predictive accuracy. Using the machine learning models (random forest and neural network), we identified the relative importance and variation in ranking of predictors (historical WNND cases, climate anomalies, human demographics, and land use) across regions. We found that historical WNND cases and population density were among the most important factors while anomalies in temperature and precipitation often had relatively low importance. While the relative performance of each model varied across climatic regions, the magnitude of difference between models was small. All models except the naïve model had non-significant differences in performance relative to the baseline model (negative binomial model fit per hexagon). No model, including the ensemble or more complex machine learning models, outperformed models based on historical case counts on the hexagon or region level; these models are good forecasting benchmarks. Further work is needed to assess if predictive capacity can be improved beyond that of these historical baselines.</p>","PeriodicalId":48618,"journal":{"name":"Geohealth","volume":"7 11","pages":""},"PeriodicalIF":4.3000,"publicationDate":"2023-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://agupubs.onlinelibrary.wiley.com/doi/epdf/10.1029/2023GH000906","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Geohealth","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1029/2023GH000906","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
West Nile virus (WNV) is the leading cause of mosquito-borne illness in the continental United States (CONUS). Spatial heterogeneity in historical incidence, environmental factors, and complex ecology make prediction of spatiotemporal variation in WNV transmission challenging. Machine learning provides promising tools for identification of important variables in such situations. To predict annual WNV neuroinvasive disease (WNND) cases in CONUS (2015–2021), we fitted 10 probabilistic models with variation in complexity from naïve to machine learning algorithm and an ensemble. We made predictions in each of nine climate regions on a hexagonal grid and evaluated each model's predictive accuracy. Using the machine learning models (random forest and neural network), we identified the relative importance and variation in ranking of predictors (historical WNND cases, climate anomalies, human demographics, and land use) across regions. We found that historical WNND cases and population density were among the most important factors while anomalies in temperature and precipitation often had relatively low importance. While the relative performance of each model varied across climatic regions, the magnitude of difference between models was small. All models except the naïve model had non-significant differences in performance relative to the baseline model (negative binomial model fit per hexagon). No model, including the ensemble or more complex machine learning models, outperformed models based on historical case counts on the hexagon or region level; these models are good forecasting benchmarks. Further work is needed to assess if predictive capacity can be improved beyond that of these historical baselines.
期刊介绍:
GeoHealth will publish original research, reviews, policy discussions, and commentaries that cover the growing science on the interface among the Earth, atmospheric, oceans and environmental sciences, ecology, and the agricultural and health sciences. The journal will cover a wide variety of global and local issues including the impacts of climate change on human, agricultural, and ecosystem health, air and water pollution, environmental persistence of herbicides and pesticides, radiation and health, geomedicine, and the health effects of disasters. Many of these topics and others are of critical importance in the developing world and all require bringing together leading research across multiple disciplines.