从政策到预测:利用机器学习和疾病模型评估综合框架中的预测准确性。

IF 1.4 4区 生物学 Q4 BIOCHEMICAL RESEARCH METHODS Journal of Computational Biology Pub Date : 2024-11-01 Epub Date: 2024-08-02 DOI:10.1089/cmb.2023.0377
Amit K Chakraborty, Hao Wang, Pouria Ramazi
{"title":"从政策到预测:利用机器学习和疾病模型评估综合框架中的预测准确性。","authors":"Amit K Chakraborty, Hao Wang, Pouria Ramazi","doi":"10.1089/cmb.2023.0377","DOIUrl":null,"url":null,"abstract":"<p><p>To improve the forecasting accuracy of the spread of infectious diseases, a hybrid model was recently introduced where the commonly assumed constant disease transmission rate was actively estimated from enforced mitigating policy data by a machine learning (ML) model and then fed to an extended susceptible-infected-recovered model to forecast the number of infected cases. Testing only one ML model, that is, gradient boosting model (GBM), the work left open whether other ML models would perform better. Here, we compared GBMs, linear regressions, k-nearest neighbors, and Bayesian networks (BNs) in forecasting the number of COVID-19-infected cases in the United States and Canadian provinces based on policy indices of future 35 days. There was no significant difference in the mean absolute percentage errors of these ML models over the combined dataset [<math><mrow><mi>H</mi><mo>(</mo><mn>3</mn><mo>)</mo><mo>=</mo><mn>3.10</mn><mo>,</mo><mi>p</mi><mo>=</mo><mn>0.38</mn></mrow></math>]. In two provinces, a significant difference was observed [<math><mrow><mi>H</mi><mo>(</mo><mn>3</mn><mo>)</mo><mo>=</mo><mn>8.77</mn><mo>,</mo><mi>H</mi><mo>(</mo><mn>3</mn><mo>)</mo><mo>=</mo><mn>8.07</mn><mo>,</mo><mi>p</mi><mo><</mo><mn>0.05</mn></mrow></math>], yet posthoc tests revealed no significant difference in pairwise comparisons. Nevertheless, BNs significantly outperformed the other models in most of the training datasets. The results put forward that the ML models have equal forecasting power overall, and BNs are best for data-fitting applications.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"1104-1117"},"PeriodicalIF":1.4000,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"From Policy to Prediction: Assessing Forecasting Accuracy in an Integrated Framework with Machine Learning and Disease Models.\",\"authors\":\"Amit K Chakraborty, Hao Wang, Pouria Ramazi\",\"doi\":\"10.1089/cmb.2023.0377\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>To improve the forecasting accuracy of the spread of infectious diseases, a hybrid model was recently introduced where the commonly assumed constant disease transmission rate was actively estimated from enforced mitigating policy data by a machine learning (ML) model and then fed to an extended susceptible-infected-recovered model to forecast the number of infected cases. Testing only one ML model, that is, gradient boosting model (GBM), the work left open whether other ML models would perform better. Here, we compared GBMs, linear regressions, k-nearest neighbors, and Bayesian networks (BNs) in forecasting the number of COVID-19-infected cases in the United States and Canadian provinces based on policy indices of future 35 days. There was no significant difference in the mean absolute percentage errors of these ML models over the combined dataset [<math><mrow><mi>H</mi><mo>(</mo><mn>3</mn><mo>)</mo><mo>=</mo><mn>3.10</mn><mo>,</mo><mi>p</mi><mo>=</mo><mn>0.38</mn></mrow></math>]. In two provinces, a significant difference was observed [<math><mrow><mi>H</mi><mo>(</mo><mn>3</mn><mo>)</mo><mo>=</mo><mn>8.77</mn><mo>,</mo><mi>H</mi><mo>(</mo><mn>3</mn><mo>)</mo><mo>=</mo><mn>8.07</mn><mo>,</mo><mi>p</mi><mo><</mo><mn>0.05</mn></mrow></math>], yet posthoc tests revealed no significant difference in pairwise comparisons. Nevertheless, BNs significantly outperformed the other models in most of the training datasets. The results put forward that the ML models have equal forecasting power overall, and BNs are best for data-fitting applications.</p>\",\"PeriodicalId\":15526,\"journal\":{\"name\":\"Journal of Computational Biology\",\"volume\":\" \",\"pages\":\"1104-1117\"},\"PeriodicalIF\":1.4000,\"publicationDate\":\"2024-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Computational Biology\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1089/cmb.2023.0377\",\"RegionNum\":4,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/8/2 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q4\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Computational Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1089/cmb.2023.0377","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/8/2 0:00:00","PubModel":"Epub","JCR":"Q4","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

摘要

为了提高传染病传播预测的准确性,最近推出了一种混合模型,即通过机器学习(ML)模型从强制减灾政策数据中主动估计通常假定的恒定疾病传播率,然后将其输入扩展的易感-感染-恢复模型,以预测感染病例的数量。这项工作只测试了一种 ML 模型,即梯度提升模型(GBM),其他 ML 模型是否会有更好的表现尚无定论。在此,我们根据未来 35 天的政策指数,比较了 GBM、线性回归、k 最近邻和贝叶斯网络 (BN) 在预测美国和加拿大各省 COVID-19 感染病例数方面的表现。在综合数据集上,这些 ML 模型的平均绝对百分比误差没有明显差异[H(3)=3.10,p=0.38]。在两个省份,观察到了显著差异[H(3)=8.77,H(3)=8.07,p0.05],但事后检验显示配对比较无显著差异。不过,在大多数训练数据集中,BNs 的表现明显优于其他模型。结果表明,ML 模型总体上具有相同的预测能力,而 BN 最适合数据拟合应用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
From Policy to Prediction: Assessing Forecasting Accuracy in an Integrated Framework with Machine Learning and Disease Models.

To improve the forecasting accuracy of the spread of infectious diseases, a hybrid model was recently introduced where the commonly assumed constant disease transmission rate was actively estimated from enforced mitigating policy data by a machine learning (ML) model and then fed to an extended susceptible-infected-recovered model to forecast the number of infected cases. Testing only one ML model, that is, gradient boosting model (GBM), the work left open whether other ML models would perform better. Here, we compared GBMs, linear regressions, k-nearest neighbors, and Bayesian networks (BNs) in forecasting the number of COVID-19-infected cases in the United States and Canadian provinces based on policy indices of future 35 days. There was no significant difference in the mean absolute percentage errors of these ML models over the combined dataset [H(3)=3.10,p=0.38]. In two provinces, a significant difference was observed [H(3)=8.77,H(3)=8.07,p<0.05], yet posthoc tests revealed no significant difference in pairwise comparisons. Nevertheless, BNs significantly outperformed the other models in most of the training datasets. The results put forward that the ML models have equal forecasting power overall, and BNs are best for data-fitting applications.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Computational Biology
Journal of Computational Biology 生物-计算机:跨学科应用
CiteScore
3.60
自引率
5.90%
发文量
113
审稿时长
6-12 weeks
期刊介绍: Journal of Computational Biology is the leading peer-reviewed journal in computational biology and bioinformatics, publishing in-depth statistical, mathematical, and computational analysis of methods, as well as their practical impact. Available only online, this is an essential journal for scientists and students who want to keep abreast of developments in bioinformatics. Journal of Computational Biology coverage includes: -Genomics -Mathematical modeling and simulation -Distributed and parallel biological computing -Designing biological databases -Pattern matching and pattern detection -Linking disparate databases and data -New tools for computational biology -Relational and object-oriented database technology for bioinformatics -Biological expert system design and use -Reasoning by analogy, hypothesis formation, and testing by machine -Management of biological databases
期刊最新文献
Adaptive Arithmetic Coding-Based Encoding Method Toward High-Density DNA Storage. The Statistics of Parametrized Syncmers in a Simple Mutation Process Without Spurious Matches. A Hybrid GNN Approach for Improved Molecular Property Prediction. From Policy to Prediction: Assessing Forecasting Accuracy in an Integrated Framework with Machine Learning and Disease Models. Network-Constrained Eigen-Single-Cell Profile Estimation for Uncovering Crucial Immunogene Regulatory Systems in Human Bone Marrow.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1