A Machine Learning-Based Prediction of Hospital Mortality in Mechanically Ventilated ICU Patients

Hexin Li, Negin Ashrafi, Chris Kang, Guanlan Zhao, Yubing Chen, Maryam Pishgar
{"title":"A Machine Learning-Based Prediction of Hospital Mortality in Mechanically Ventilated ICU Patients","authors":"Hexin Li, Negin Ashrafi, Chris Kang, Guanlan Zhao, Yubing Chen, Maryam Pishgar","doi":"10.1101/2024.07.12.24310325","DOIUrl":null,"url":null,"abstract":"Background:\nMechanical ventilation (MV) is vital for critically ill ICU patients but carries significant mortality risks. This study aims to develop a predictive model to estimate hospital mortality among MV patients, utilizing comprehensive health data to assist ICU physicians with early-stage alerts. Methods:\nWe developed a Machine Learning (ML) framework to predict hospital mortality in ICU patients receiving MV. Using the MIMIC-III database, we identified 25,202 eligible patients through ICD-9 codes. We employed backward elimination and the Lasso method, selecting 32 features based on clinical insights and literature. Data preprocessing included eliminating columns with over 90% missing data and using mean imputation for the remaining missing values. To address class imbalance, we used the Synthetic Minority Over-sampling Technique (SMOTE). We evaluated several ML models, including CatBoost, XGBoost, Decision Tree, Random Forest, Support Vector Machine (SVM), K-Nearest Neighbors (KNN), and Logistic Regression, using a 70/30 train-test split. The CatBoost model was chosen for its superior performance in terms of accuracy, precision, recall, F1-score, AUROC metrics, and calibration plots. Results:\nThe study involved a cohort of 25,202 patients on MV. The CatBoost model attained an AUROC of 0.862, an increase from an initial AUROC of 0.821, which was the best reported in the literature. It also demonstrated an accuracy of 0.789, an F1-score of 0.747, and better calibration, outperforming other models. These improvements are due to systematic feature selection and the robust gradient boosting architecture of CatBoost. Conclusion:\nThe preprocessing methodology significantly reduced the number of relevant features, simplifying computational processes, and identified critical features previously overlooked. Integrating these features and tuning the parameters, our model demonstrated strong generalization to unseen data. This highlights the potential of ML as a crucial tool in ICUs, enhancing resource allocation and providing more personalized interventions for MV patients.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"medRxiv - Health Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.07.12.24310325","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Mechanical ventilation (MV) is vital for critically ill ICU patients but carries significant mortality risks. This study aims to develop a predictive model to estimate hospital mortality among MV patients, utilizing comprehensive health data to assist ICU physicians with early-stage alerts. Methods: We developed a Machine Learning (ML) framework to predict hospital mortality in ICU patients receiving MV. Using the MIMIC-III database, we identified 25,202 eligible patients through ICD-9 codes. We employed backward elimination and the Lasso method, selecting 32 features based on clinical insights and literature. Data preprocessing included eliminating columns with over 90% missing data and using mean imputation for the remaining missing values. To address class imbalance, we used the Synthetic Minority Over-sampling Technique (SMOTE). We evaluated several ML models, including CatBoost, XGBoost, Decision Tree, Random Forest, Support Vector Machine (SVM), K-Nearest Neighbors (KNN), and Logistic Regression, using a 70/30 train-test split. The CatBoost model was chosen for its superior performance in terms of accuracy, precision, recall, F1-score, AUROC metrics, and calibration plots. Results: The study involved a cohort of 25,202 patients on MV. The CatBoost model attained an AUROC of 0.862, an increase from an initial AUROC of 0.821, which was the best reported in the literature. It also demonstrated an accuracy of 0.789, an F1-score of 0.747, and better calibration, outperforming other models. These improvements are due to systematic feature selection and the robust gradient boosting architecture of CatBoost. Conclusion: The preprocessing methodology significantly reduced the number of relevant features, simplifying computational processes, and identified critical features previously overlooked. Integrating these features and tuning the parameters, our model demonstrated strong generalization to unseen data. This highlights the potential of ML as a crucial tool in ICUs, enhancing resource allocation and providing more personalized interventions for MV patients.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于机器学习的机械通气 ICU 患者住院死亡率预测方法
背景:机械通气(MV)对重症监护病房(ICU)的危重病人至关重要,但也有很大的死亡风险。本研究旨在开发一种预测模型,利用全面的健康数据估算机械通气患者的住院死亡率,以协助重症监护室医生发出早期警报。方法:我们开发了一个机器学习(ML)框架来预测接受 MV 治疗的 ICU 患者的住院死亡率。利用 MIMIC-III 数据库,我们通过 ICD-9 编码确定了 25202 名符合条件的患者。我们采用了反向排除法和拉索法,根据临床见解和文献选择了 32 个特征。数据预处理包括剔除数据缺失率超过 90% 的列,并对剩余的缺失值进行平均估算。为了解决类不平衡问题,我们使用了合成少数群体过度采样技术(SMOTE)。我们评估了多个 ML 模型,包括 CatBoost、XGBoost、决策树、随机森林、支持向量机 (SVM)、K-近邻 (KNN) 和逻辑回归,采用 70/30 的训练-测试比例。之所以选择 CatBoost 模型,是因为它在准确度、精确度、召回率、F1 分数、AUROC 指标和校准图等方面表现出色。结果:该研究涉及 25202 名 MV 患者。CatBoost 模型的 AUROC 为 0.862,比文献报道的最佳 AUROC 0.821 有所提高。该模型的准确度为 0.789,F1 分数为 0.747,校准效果更好,优于其他模型。这些改进归功于 CatBoost 系统化的特征选择和稳健的梯度提升架构。结论:预处理方法大大减少了相关特征的数量,简化了计算过程,并找出了之前被忽视的关键特征。整合这些特征并调整参数后,我们的模型对未见数据表现出了很强的泛化能力。这凸显了人工智能作为重症监护室重要工具的潜力,它能提高资源分配效率,为重症监护室患者提供更个性化的干预措施。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A case is not a case is not a case - challenges and solutions in determining urolithiasis caseloads using the digital infrastructure of a clinical data warehouse Reliable Online Auditory Cognitive Testing: An observational study Federated Multiple Imputation for Variables that Are Missing Not At Random in Distributed Electronic Health Records Characterizing the connection between Parkinson's disease progression and healthcare utilization Generative AI and Large Language Models in Reducing Medication Related Harm and Adverse Drug Events - A Scoping Review
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1