Prediction of adverse pregnancy outcomes using machine learning techniques: evidence from analysis of electronic medical records data in Rwanda.

IF 3.8 3区 医学 Q2 MEDICAL INFORMATICS BMC Medical Informatics and Decision Making Pub Date : 2025-02-12 DOI:10.1186/s12911-025-02921-z
Muzungu Hirwa Sylvain, Emmanuel Christian Nyabyenda, Melissa Uwase, Isaac Komezusenge, Fauste Ndikumana, Innocent Ngaruye
{"title":"Prediction of adverse pregnancy outcomes using machine learning techniques: evidence from analysis of electronic medical records data in Rwanda.","authors":"Muzungu Hirwa Sylvain, Emmanuel Christian Nyabyenda, Melissa Uwase, Isaac Komezusenge, Fauste Ndikumana, Innocent Ngaruye","doi":"10.1186/s12911-025-02921-z","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Despite substantial progress in maternal and neonatal health, Rwanda's mortality rates remain high, necessitating innovative approaches to meet health related Sustainable Development Goals (SDGs). By leveraging data collected from Electronic Medical Records, this study explores the application of machine learning models to predict adverse pregnancy outcomes, thereby improving risk assessment and enhancing care delivery.</p><p><strong>Methods: </strong>This study utilized retrospective cohort data from the electronic medical record (EMR) system of 25 hospitals in Rwanda from 2020 to 2023. The independent variables included socioeconomic status, health status, reproductive health, and pregnancy-related factors. The outcome variable was a binary composite feature that combined adverse pregnancy outcomes in both the mother and the newborn. Extensive data cleaning was performed, with missing values addressed through various strategies, including the exclusion of variables and instances, imputation techniques using K-Nearest Neighbors and Multiple Imputation by Chained Equations. Data imbalance was managed using a synthetic minority oversampling technique. Six machine learning models-Logistic Regression, Decision Trees, Support Vector Machine, Gradient Boosting, Random Forest, and Multilayer Perceptron-were trained using 10-fold cross-validation and evaluated on an unseen dataset with-70 - 30 training and evaluation splits.</p><p><strong>Results: </strong>Data from 117,069 women across 25 hospitals in Rwanda were analyzed, leading to a final dataset of 32,783 women after removing entries with significant missing values. Among these women, 5,424 (16.5%) experienced adverse pregnancy outcomes. Random Forest and Gradient Boosting Classifiers demonstrated high accuracy and precision. After hyperparameter tuning, the Random Forest model achieved an accuracy of 90.6% and an ROC-AUC score of 0.85, underscoring its effectiveness in predicting adverse outcomes. However, a recall rate of 46.5% suggests challenges in detecting all the adverse cases. Key predictors of adverse outcomes identified in this study included gestational age, number of pregnancies, antenatal care visits, maternal age, vital signs, and delivery methods.</p><p><strong>Conclusions: </strong>This study recommends enhancing EMR data quality, integrating machine learning into routine practice, and conducting further research to refine predictive models and address evolving pregnancy outcomes. In addition, this study recommends the design of AI-based interventions for high-risk pregnancies.</p><p><strong>Clinical trial number: </strong>Not applicable.</p>","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":"25 1","pages":"76"},"PeriodicalIF":3.8000,"publicationDate":"2025-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11823242/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medical Informatics and Decision Making","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12911-025-02921-z","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Despite substantial progress in maternal and neonatal health, Rwanda's mortality rates remain high, necessitating innovative approaches to meet health related Sustainable Development Goals (SDGs). By leveraging data collected from Electronic Medical Records, this study explores the application of machine learning models to predict adverse pregnancy outcomes, thereby improving risk assessment and enhancing care delivery.

Methods: This study utilized retrospective cohort data from the electronic medical record (EMR) system of 25 hospitals in Rwanda from 2020 to 2023. The independent variables included socioeconomic status, health status, reproductive health, and pregnancy-related factors. The outcome variable was a binary composite feature that combined adverse pregnancy outcomes in both the mother and the newborn. Extensive data cleaning was performed, with missing values addressed through various strategies, including the exclusion of variables and instances, imputation techniques using K-Nearest Neighbors and Multiple Imputation by Chained Equations. Data imbalance was managed using a synthetic minority oversampling technique. Six machine learning models-Logistic Regression, Decision Trees, Support Vector Machine, Gradient Boosting, Random Forest, and Multilayer Perceptron-were trained using 10-fold cross-validation and evaluated on an unseen dataset with-70 - 30 training and evaluation splits.

Results: Data from 117,069 women across 25 hospitals in Rwanda were analyzed, leading to a final dataset of 32,783 women after removing entries with significant missing values. Among these women, 5,424 (16.5%) experienced adverse pregnancy outcomes. Random Forest and Gradient Boosting Classifiers demonstrated high accuracy and precision. After hyperparameter tuning, the Random Forest model achieved an accuracy of 90.6% and an ROC-AUC score of 0.85, underscoring its effectiveness in predicting adverse outcomes. However, a recall rate of 46.5% suggests challenges in detecting all the adverse cases. Key predictors of adverse outcomes identified in this study included gestational age, number of pregnancies, antenatal care visits, maternal age, vital signs, and delivery methods.

Conclusions: This study recommends enhancing EMR data quality, integrating machine learning into routine practice, and conducting further research to refine predictive models and address evolving pregnancy outcomes. In addition, this study recommends the design of AI-based interventions for high-risk pregnancies.

Clinical trial number: Not applicable.

Abstract Image

Abstract Image

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
使用机器学习技术预测不良妊娠结局:来自卢旺达电子医疗记录数据分析的证据。
背景:尽管在孕产妇和新生儿健康方面取得了重大进展,但卢旺达的死亡率仍然很高,需要采取创新办法来实现与卫生有关的可持续发展目标。通过利用从电子病历中收集的数据,本研究探索了机器学习模型在预测不良妊娠结局中的应用,从而改进风险评估并加强护理提供。方法:本研究利用来自卢旺达25家医院2020年至2023年电子病历(EMR)系统的回顾性队列数据。自变量包括社会经济状况、健康状况、生殖健康和怀孕相关因素。结果变量是一个二元复合特征,结合了母亲和新生儿的不良妊娠结局。执行了广泛的数据清理,通过各种策略解决了缺失值,包括排除变量和实例,使用k近邻的imputation技术和链式方程的多重imputation。数据不平衡使用合成少数过采样技术进行管理。六种机器学习模型——逻辑回归、决策树、支持向量机、梯度增强、随机森林和多层感知器——使用10倍交叉验证进行训练,并在一个未见过的数据集上进行了70 - 30次训练和评估分割。结果:对来自卢旺达25家医院的117,069名妇女的数据进行了分析,在删除了显著缺失值的条目后,最终数据集包含32,783名妇女。在这些妇女中,5424人(16.5%)经历了不良妊娠结局。随机森林分类器和梯度增强分类器具有较高的准确率和精密度。经过超参数调整后,随机森林模型的准确率为90.6%,ROC-AUC得分为0.85,表明其在预测不良结局方面的有效性。然而,46.5%的召回率表明,在发现所有不良病例方面存在挑战。本研究确定的不良结局的主要预测因素包括胎龄、妊娠次数、产前保健就诊、产妇年龄、生命体征和分娩方式。结论:本研究建议提高EMR数据质量,将机器学习整合到日常实践中,并开展进一步研究以完善预测模型并解决不断变化的妊娠结局。此外,本研究建议设计基于人工智能的高危妊娠干预措施。临床试验号:不适用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
7.20
自引率
5.70%
发文量
297
审稿时长
1 months
期刊介绍: BMC Medical Informatics and Decision Making is an open access journal publishing original peer-reviewed research articles in relation to the design, development, implementation, use, and evaluation of health information technologies and decision-making for human health.
期刊最新文献
Towards a clinical decision protocol for therapeutic plasma exchange based on biomarker patterns and machine learning. Time-to-event risk prediction of dual sensory loss in middle-aged patients with symptomatic knee osteoarthritis: model development, preliminary external validation, and explainability analysis. PASTA-4-PHT: a pipeline for automated security and technical audits for the personal health train. CycleGAN-based image-to-image translation for synthetic contrast enhancement in non-contrast cardiac CT: a ViT-CNN hybrid deep learning approach. Examining usability of RxFill: integrating health IT to support medication adherence.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1