Prediction of pre-eclampsia with machine learning approaches: Leveraging important information from routinely collected data

IF 3.7 2区 医学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS International Journal of Medical Informatics Pub Date : 2024-10-05 DOI:10.1016/j.ijmedinf.2024.105645
Sofonyas Abebaw Tiruneh , Daniel Lorber Rolnik , Helena J. Teede , Joanne Enticott
{"title":"Prediction of pre-eclampsia with machine learning approaches: Leveraging important information from routinely collected data","authors":"Sofonyas Abebaw Tiruneh ,&nbsp;Daniel Lorber Rolnik ,&nbsp;Helena J. Teede ,&nbsp;Joanne Enticott","doi":"10.1016/j.ijmedinf.2024.105645","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>Globally, pre-eclampsia (PE) is a leading cause of maternal and perinatal morbidity and mortality. PE prediction using routinely collected data has the advantage of being widely applicable, particularly in low-resource settings. Early intervention for high-risk women might reduce PE incidence and related complications. We aimed to replicate our machine learning (ML) published work predicting another maternal condition (gestational diabetes) to (1) predict PE using routine health data, (2) identify the optimal ML model, and (3) compare it with logistic regression approach.</div></div><div><h3>Methods</h3><div>Data were from a large health service network with 48,250 singleton pregnancies between January 2016 and June 2021. Supervised ML models were employed. Maternal clinical and medical characteristics were the feature variables (predictors), and a 70/30 data split was used for training and testing the model. Predictive performance was assessed using area under the curve (AUC) and calibration plots. Shapley value analysis assessed the contribution of feature variables.</div></div><div><h3>Results</h3><div>The random forest approach provided excellent discrimination with an AUC of 0.84 (95% CI: 0.82–0.86) and highest prediction accuracy (0.79); however, the calibration curve (slope of 1.21, 95% CI 1.13–1.30) was acceptable only for a threshold of 0.3 or less. The next best approach was extreme gradient boosting, which provided an AUC of 0.77 (95% CI: 0.76–0.79) and well-calibrated (slope of 0.93, 95% CI 0.85–1.01). Logistic regression provided good discrimination performance with an AUC of 0.75 (95% CI: 0.74–0.76) and perfect calibration. Nulliparous, pre-pregnancy body mass index, previous pregnancy with prior PE, maternal age, family history of hypertension, and pre-existing hypertension and diabetes were the top-ranked features in Shapley value analysis.</div></div><div><h3>Conclusion</h3><div>Two ML models created the highest-performing prediction using routinely collected data to identify women at high risk of PE, with acceptable discrimination. However, to confirm this result and also examine model generalisability, external validation studies are needed in other settings, utilising standardised prognostic factors.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":null,"pages":null},"PeriodicalIF":3.7000,"publicationDate":"2024-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Medical Informatics","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1386505624003083","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Background

Globally, pre-eclampsia (PE) is a leading cause of maternal and perinatal morbidity and mortality. PE prediction using routinely collected data has the advantage of being widely applicable, particularly in low-resource settings. Early intervention for high-risk women might reduce PE incidence and related complications. We aimed to replicate our machine learning (ML) published work predicting another maternal condition (gestational diabetes) to (1) predict PE using routine health data, (2) identify the optimal ML model, and (3) compare it with logistic regression approach.

Methods

Data were from a large health service network with 48,250 singleton pregnancies between January 2016 and June 2021. Supervised ML models were employed. Maternal clinical and medical characteristics were the feature variables (predictors), and a 70/30 data split was used for training and testing the model. Predictive performance was assessed using area under the curve (AUC) and calibration plots. Shapley value analysis assessed the contribution of feature variables.

Results

The random forest approach provided excellent discrimination with an AUC of 0.84 (95% CI: 0.82–0.86) and highest prediction accuracy (0.79); however, the calibration curve (slope of 1.21, 95% CI 1.13–1.30) was acceptable only for a threshold of 0.3 or less. The next best approach was extreme gradient boosting, which provided an AUC of 0.77 (95% CI: 0.76–0.79) and well-calibrated (slope of 0.93, 95% CI 0.85–1.01). Logistic regression provided good discrimination performance with an AUC of 0.75 (95% CI: 0.74–0.76) and perfect calibration. Nulliparous, pre-pregnancy body mass index, previous pregnancy with prior PE, maternal age, family history of hypertension, and pre-existing hypertension and diabetes were the top-ranked features in Shapley value analysis.

Conclusion

Two ML models created the highest-performing prediction using routinely collected data to identify women at high risk of PE, with acceptable discrimination. However, to confirm this result and also examine model generalisability, external validation studies are needed in other settings, utilising standardised prognostic factors.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用机器学习方法预测先兆子痫:利用日常收集数据中的重要信息。
背景:在全球范围内,子痫前期(PE)是孕产妇和围产期发病率和死亡率的主要原因。利用常规收集的数据进行子痫前期预测具有广泛的适用性,尤其是在资源匮乏的环境中。对高危产妇进行早期干预可降低 PE 的发病率和相关并发症。我们的目标是复制我们已发表的预测另一种孕产妇疾病(妊娠糖尿病)的机器学习(ML)工作,(1) 利用常规健康数据预测 PE,(2) 确定最佳 ML 模型,(3) 将其与逻辑回归方法进行比较:数据来自一个大型医疗服务网络,其中包括 2016 年 1 月至 2021 年 6 月期间的 48,250 例单胎妊娠。采用了有监督的 ML 模型。孕产妇的临床和医疗特征是特征变量(预测因子),模型的训练和测试采用 70/30 的数据分配比例。预测性能通过曲线下面积(AUC)和校准图进行评估。沙普利值分析评估了特征变量的贡献:随机森林方法提供了极佳的分辨能力,AUC 为 0.84(95% CI:0.82-0.86),预测准确率最高(0.79);然而,校准曲线(斜率为 1.21,95% CI 为 1.13-1.30)仅在阈值为 0.3 或更低时可以接受。其次是极梯度提升法,其 AUC 为 0.77(95% CI:0.76-0.79),校准良好(斜率为 0.93,95% CI 为 0.85-1.01)。逻辑回归具有良好的分辨性能,AUC 为 0.75(95% CI:0.74-0.76),校准完美。在 Shapley 值分析中,无子宫、孕前体重指数、既往妊娠合并 PE、孕产妇年龄、高血压家族史、既往高血压和糖尿病是排名靠前的特征:结论:利用常规收集的数据识别 PE 高危妇女,两个 ML 模型的预测效果最好,且具有可接受的区分度。不过,为了证实这一结果并检验模型的通用性,还需要在其他环境中利用标准化的预后因素进行外部验证研究。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
International Journal of Medical Informatics
International Journal of Medical Informatics 医学-计算机:信息系统
CiteScore
8.90
自引率
4.10%
发文量
217
审稿时长
42 days
期刊介绍: International Journal of Medical Informatics provides an international medium for dissemination of original results and interpretative reviews concerning the field of medical informatics. The Journal emphasizes the evaluation of systems in healthcare settings. The scope of journal covers: Information systems, including national or international registration systems, hospital information systems, departmental and/or physician''s office systems, document handling systems, electronic medical record systems, standardization, systems integration etc.; Computer-aided medical decision support systems using heuristic, algorithmic and/or statistical methods as exemplified in decision theory, protocol development, artificial intelligence, etc. Educational computer based programs pertaining to medical informatics or medicine in general; Organizational, economic, social, clinical impact, ethical and cost-benefit aspects of IT applications in health care.
期刊最新文献
Application of the openEHR reference model for PGHD: A case study on the DH-Convener initiative Tracking provenance in clinical data warehouses for quality management Acute myocardial infarction risk prediction in emergency chest pain patients: An external validation study Healthcare professionals’ cross-organizational access to electronic health records: A scoping review Cross-modal similar clinical case retrieval using a modular model based on contrastive learning and k-nearest neighbor search
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1