{"title":"From Policy to Prediction: Assessing Forecasting Accuracy in an Integrated Framework with Machine Learning and Disease Models.","authors":"Amit K Chakraborty, Hao Wang, Pouria Ramazi","doi":"10.1089/cmb.2023.0377","DOIUrl":null,"url":null,"abstract":"<p><p>To improve the forecasting accuracy of the spread of infectious diseases, a hybrid model was recently introduced where the commonly assumed constant disease transmission rate was actively estimated from enforced mitigating policy data by a machine learning (ML) model and then fed to an extended susceptible-infected-recovered model to forecast the number of infected cases. Testing only one ML model, that is, gradient boosting model (GBM), the work left open whether other ML models would perform better. Here, we compared GBMs, linear regressions, k-nearest neighbors, and Bayesian networks (BNs) in forecasting the number of COVID-19-infected cases in the United States and Canadian provinces based on policy indices of future 35 days. There was no significant difference in the mean absolute percentage errors of these ML models over the combined dataset [<math><mrow><mi>H</mi><mo>(</mo><mn>3</mn><mo>)</mo><mo>=</mo><mn>3.10</mn><mo>,</mo><mi>p</mi><mo>=</mo><mn>0.38</mn></mrow></math>]. In two provinces, a significant difference was observed [<math><mrow><mi>H</mi><mo>(</mo><mn>3</mn><mo>)</mo><mo>=</mo><mn>8.77</mn><mo>,</mo><mi>H</mi><mo>(</mo><mn>3</mn><mo>)</mo><mo>=</mo><mn>8.07</mn><mo>,</mo><mi>p</mi><mo><</mo><mn>0.05</mn></mrow></math>], yet posthoc tests revealed no significant difference in pairwise comparisons. Nevertheless, BNs significantly outperformed the other models in most of the training datasets. The results put forward that the ML models have equal forecasting power overall, and BNs are best for data-fitting applications.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"1104-1117"},"PeriodicalIF":1.4000,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Computational Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1089/cmb.2023.0377","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/8/2 0:00:00","PubModel":"Epub","JCR":"Q4","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
To improve the forecasting accuracy of the spread of infectious diseases, a hybrid model was recently introduced where the commonly assumed constant disease transmission rate was actively estimated from enforced mitigating policy data by a machine learning (ML) model and then fed to an extended susceptible-infected-recovered model to forecast the number of infected cases. Testing only one ML model, that is, gradient boosting model (GBM), the work left open whether other ML models would perform better. Here, we compared GBMs, linear regressions, k-nearest neighbors, and Bayesian networks (BNs) in forecasting the number of COVID-19-infected cases in the United States and Canadian provinces based on policy indices of future 35 days. There was no significant difference in the mean absolute percentage errors of these ML models over the combined dataset []. In two provinces, a significant difference was observed [], yet posthoc tests revealed no significant difference in pairwise comparisons. Nevertheless, BNs significantly outperformed the other models in most of the training datasets. The results put forward that the ML models have equal forecasting power overall, and BNs are best for data-fitting applications.
期刊介绍:
Journal of Computational Biology is the leading peer-reviewed journal in computational biology and bioinformatics, publishing in-depth statistical, mathematical, and computational analysis of methods, as well as their practical impact. Available only online, this is an essential journal for scientists and students who want to keep abreast of developments in bioinformatics.
Journal of Computational Biology coverage includes:
-Genomics
-Mathematical modeling and simulation
-Distributed and parallel biological computing
-Designing biological databases
-Pattern matching and pattern detection
-Linking disparate databases and data
-New tools for computational biology
-Relational and object-oriented database technology for bioinformatics
-Biological expert system design and use
-Reasoning by analogy, hypothesis formation, and testing by machine
-Management of biological databases