William Heseltine-Carp , Megan Courtman , Daniel Browning , Aishwarya Kasabe , Michael Allen , Adam Streeter , Emmanuel Ifeachor , Martin James , Stephen Mullin
{"title":"Machine learning to predict stroke risk from routine hospital data: A systematic review","authors":"William Heseltine-Carp , Megan Courtman , Daniel Browning , Aishwarya Kasabe , Michael Allen , Adam Streeter , Emmanuel Ifeachor , Martin James , Stephen Mullin","doi":"10.1016/j.ijmedinf.2025.105811","DOIUrl":null,"url":null,"abstract":"<div><h3>Purpose</h3><div>Stroke remains a leading cause of morbidity and mortality. Despite this, current risk stratification tools such as CHA<sub>2</sub>DS<sub>2</sub>-VASc and QRISK3 are of limited accuracy, particularly in those without a diagnosis of atrial-fibrillation. Hence, there is a need for more accurate stroke risk prediction models. Machine-learning (ML) may provide a solution to this by leveraging existing routine hospital databases to build accurate stroke risk prediction models and identify novel risk factors for stroke.</div></div><div><h3>Aims</h3><div>In this systematic review we appraise current research using ML to predict stroke risk from routine hospital data. Based on these findings we then highlight common methodological limitations and recommendations for future research.</div></div><div><h3>Methods</h3><div>In this review we identify 49 original research (38 in the general population and 11 in AF specific populations) articles from the PUBMED database from January-2013 to December-2024 using ML and routine hospital data to predict the risk of stroke.</div></div><div><h3>Results</h3><div>ML models were able to accurately predict stroke risk in both AF specific and general populations, with AUCs ranging from 0.64 to 0.99. Where tested, ML also consistently outperformed traditional risk stratification tool, such as CHA<sub>2</sub>DS<sub>2</sub>-VASc. ML also appeared useful in identifying several novel risk factors from electrocardiogram, laboratory test and echocardiography data.</div><div>However, the quality of datasets were often limited, there was a high suspicion of overfitting and models often lacked calibration, external validation and explainability analysis.</div></div><div><h3>Conclusion</h3><div>Whilst ML has shown great potential in stroke prediction and identifying novel risk factors for stroke, improvements in study methodology is required prior to integration of ML into routine healthcare. Future research should adhere to the EQUATOR guidance on prediction models and encourage interdisciplinary collaboration between computer scientists and clinicians. Further prospective RCTs are also required to validate models in the clinical setting and the identify barriers of integrating ML into routine healthcare.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"196 ","pages":"Article 105811"},"PeriodicalIF":3.7000,"publicationDate":"2025-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Medical Informatics","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1386505625000280","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Purpose
Stroke remains a leading cause of morbidity and mortality. Despite this, current risk stratification tools such as CHA2DS2-VASc and QRISK3 are of limited accuracy, particularly in those without a diagnosis of atrial-fibrillation. Hence, there is a need for more accurate stroke risk prediction models. Machine-learning (ML) may provide a solution to this by leveraging existing routine hospital databases to build accurate stroke risk prediction models and identify novel risk factors for stroke.
Aims
In this systematic review we appraise current research using ML to predict stroke risk from routine hospital data. Based on these findings we then highlight common methodological limitations and recommendations for future research.
Methods
In this review we identify 49 original research (38 in the general population and 11 in AF specific populations) articles from the PUBMED database from January-2013 to December-2024 using ML and routine hospital data to predict the risk of stroke.
Results
ML models were able to accurately predict stroke risk in both AF specific and general populations, with AUCs ranging from 0.64 to 0.99. Where tested, ML also consistently outperformed traditional risk stratification tool, such as CHA2DS2-VASc. ML also appeared useful in identifying several novel risk factors from electrocardiogram, laboratory test and echocardiography data.
However, the quality of datasets were often limited, there was a high suspicion of overfitting and models often lacked calibration, external validation and explainability analysis.
Conclusion
Whilst ML has shown great potential in stroke prediction and identifying novel risk factors for stroke, improvements in study methodology is required prior to integration of ML into routine healthcare. Future research should adhere to the EQUATOR guidance on prediction models and encourage interdisciplinary collaboration between computer scientists and clinicians. Further prospective RCTs are also required to validate models in the clinical setting and the identify barriers of integrating ML into routine healthcare.
期刊介绍:
International Journal of Medical Informatics provides an international medium for dissemination of original results and interpretative reviews concerning the field of medical informatics. The Journal emphasizes the evaluation of systems in healthcare settings.
The scope of journal covers:
Information systems, including national or international registration systems, hospital information systems, departmental and/or physician''s office systems, document handling systems, electronic medical record systems, standardization, systems integration etc.;
Computer-aided medical decision support systems using heuristic, algorithmic and/or statistical methods as exemplified in decision theory, protocol development, artificial intelligence, etc.
Educational computer based programs pertaining to medical informatics or medicine in general;
Organizational, economic, social, clinical impact, ethical and cost-benefit aspects of IT applications in health care.