Integrating Machine Learning into Statistical Methods in Disease Risk Prediction Modeling: A Systematic Review.

Health data science Pub Date : 2024-07-23 eCollection Date: 2024-01-01 DOI:10.34133/hds.0165
Meng Zhang, Yongqi Zheng, Xiagela Maidaiti, Baosheng Liang, Yongyue Wei, Feng Sun
{"title":"Integrating Machine Learning into Statistical Methods in Disease Risk Prediction Modeling: A Systematic Review.","authors":"Meng Zhang, Yongqi Zheng, Xiagela Maidaiti, Baosheng Liang, Yongyue Wei, Feng Sun","doi":"10.34133/hds.0165","DOIUrl":null,"url":null,"abstract":"<p><p><b>Background:</b> Disease prediction models often use statistical methods or machine learning, both with their own corresponding application scenarios, raising the risk of errors when used alone. Integrating machine learning into statistical methods may yield robust prediction models. This systematic review aims to comprehensively assess current development of global disease prediction integration models. <b>Methods:</b> PubMed, EMbase, Web of Science, CNKI, VIP, WanFang, and SinoMed databases were searched to collect studies on prediction models integrating machine learning into statistical methods from database inception to 2023 May 1. Information including basic characteristics of studies, integrating approaches, application scenarios, modeling details, and model performance was extracted. <b>Results:</b> A total of 20 eligible studies in English and 1 in Chinese were included. Five studies concentrated on diagnostic models, while 16 studies concentrated on predicting disease occurrence or prognosis. Integrating strategies of classification models included majority voting, weighted voting, stacking, and model selection (when statistical methods and machine learning disagreed). Regression models adopted strategies including simple statistics, weighted statistics, and stacking. AUROC of integration models surpassed 0.75 and performed better than statistical methods and machine learning in most studies. Stacking was used for situations with >100 predictors and needed relatively larger amount of training data. <b>Conclusion:</b> Research on integrating machine learning into statistical methods in prediction models remains limited, but some studies have exhibited great potential that integration models outperform single models. This study provides insights for the selection of integration methods for different scenarios. Future research could emphasize on the improvement and validation of integrating strategies.</p>","PeriodicalId":73207,"journal":{"name":"Health data science","volume":"4 ","pages":"0165"},"PeriodicalIF":0.0000,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11266123/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Health data science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.34133/hds.0165","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Disease prediction models often use statistical methods or machine learning, both with their own corresponding application scenarios, raising the risk of errors when used alone. Integrating machine learning into statistical methods may yield robust prediction models. This systematic review aims to comprehensively assess current development of global disease prediction integration models. Methods: PubMed, EMbase, Web of Science, CNKI, VIP, WanFang, and SinoMed databases were searched to collect studies on prediction models integrating machine learning into statistical methods from database inception to 2023 May 1. Information including basic characteristics of studies, integrating approaches, application scenarios, modeling details, and model performance was extracted. Results: A total of 20 eligible studies in English and 1 in Chinese were included. Five studies concentrated on diagnostic models, while 16 studies concentrated on predicting disease occurrence or prognosis. Integrating strategies of classification models included majority voting, weighted voting, stacking, and model selection (when statistical methods and machine learning disagreed). Regression models adopted strategies including simple statistics, weighted statistics, and stacking. AUROC of integration models surpassed 0.75 and performed better than statistical methods and machine learning in most studies. Stacking was used for situations with >100 predictors and needed relatively larger amount of training data. Conclusion: Research on integrating machine learning into statistical methods in prediction models remains limited, but some studies have exhibited great potential that integration models outperform single models. This study provides insights for the selection of integration methods for different scenarios. Future research could emphasize on the improvement and validation of integrating strategies.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
将机器学习融入疾病风险预测建模的统计方法:系统综述。
背景:疾病预测模型通常使用统计方法或机器学习,这两种方法都有各自相应的应用场景,单独使用时会增加出错的风险。将机器学习融入统计方法可能会产生稳健的预测模型。本系统综述旨在全面评估当前全球疾病预测整合模型的发展情况。研究方法检索PubMed、EMbase、Web of Science、CNKI、VIP、万方和SinoMed数据库,收集从数据库建立到2023年5月1日有关将机器学习融入统计方法的预测模型的研究。提取的信息包括研究的基本特征、整合方法、应用场景、建模细节和模型性能。结果:共纳入了 20 项符合条件的英文研究和 1 项中文研究。其中 5 项研究侧重于诊断模型,16 项研究侧重于预测疾病的发生或预后。分类模型的整合策略包括多数投票、加权投票、堆叠和模型选择(当统计方法和机器学习出现分歧时)。回归模型采用的策略包括简单统计、加权统计和堆叠。在大多数研究中,整合模型的 AUROC 超过 0.75,表现优于统计方法和机器学习。堆叠用于预测因子大于 100 个的情况,需要相对较多的训练数据。结论在预测模型中将机器学习与统计方法相结合的研究仍然有限,但一些研究显示出整合模型优于单一模型的巨大潜力。本研究为在不同情况下选择集成方法提供了启示。未来的研究可以重点关注整合策略的改进和验证。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
3.70
自引率
0.00%
发文量
0
期刊最新文献
Multi-Modal CLIP-Informed Protein Editing. The Burden of Type 2 Diabetes in Adolescents and Young Adults in China: A Secondary Analysis from the Global Burden of Disease Study 2021. Federated Learning in Healthcare: A Benchmark Comparison of Engineering and Statistical Approaches for Structured Data Analysis. Robust Meta-Model for Predicting the Likelihood of Receiving Blood Transfusion in Non-traumatic Intensive Care Unit Patients. Survival Disparities among Cancer Patients Based on Mobility Patterns: A Population-Based Study.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1