Machine-learning-enabled prognostic models for sepsis

Intelligence-based medicine Pub Date : 2024-01-01 DOI:10.1016/j.ibmed.2024.100167

Chunyan Li , Lu Wang , Kexun Li , Hongfei Deng , Yu Wang , Li Chang , Ping Zhou , Jun Zeng , Mingwei Sun , Hua Jiang , Qi Wang

{"title":"Machine-learning-enabled prognostic models for sepsis","authors":"Chunyan Li , Lu Wang , Kexun Li , Hongfei Deng , Yu Wang , Li Chang , Ping Zhou , Jun Zeng , Mingwei Sun , Hua Jiang , Qi Wang","doi":"10.1016/j.ibmed.2024.100167","DOIUrl":null,"url":null,"abstract":"<div><h3>Background and Objectives:</h3><div>Sepsis is a leading cause of mortality in intensive care units (ICUs). The development of a robust prognostic model utilizing patients’ clinical data could significantly enhance clinicians’ ability to make informed treatment decisions, potentially improving outcomes for septic patients. This study aims to create a novel machine-learning framework for constructing prognostic tools capable of predicting patient survival or mortality outcome.</div></div><div><h3>Methods:</h3><div>A novel dataset is created using concatenated triples of static data, temporal data, and clinical outcomes to expand data size. This structured input trains five machine learning classifiers (KNN, Logistic Regression, SVM, RF, and XGBoost) with advanced feature engineering. Models are evaluated on an independent cohort using AUROC and a new metric, <span><math><mi>γ</mi></math></span>, which incorporates the F1 score, to assess discriminative power and generalizability.</div></div><div><h3>Results:</h3><div>We developed five prognostic models using the concatenated triple dataset with 10 dynamic features from patient medical records. Our analysis shows that the Extreme Gradient Boosting (XGBoost) model (AUROC = 0.777, F1 score = 0.694) and the Random Forest (RF) model (AUROC = 0.769, F1 score = 0.647), when paired with an ensemble under-sampling strategy, outperform other models. The RF model improves AUROC by 6.66% and reduces overfitting by 54.96%, while the XGBoost model shows a 0.52% increase in AUROC and a 77.72% reduction in overfitting. These results highlight our framework’s ability to enhance predictive accuracy and generalizability, particularly in sepsis prognosis.</div></div><div><h3>Conclusion:</h3><div>This study presents a novel modeling framework for predicting treatment outcomes in septic patients, designed for small, imbalanced, and high-dimensional datasets. By using temporal feature encoding, advanced sampling, and dimension reduction techniques, our approach enhances standard classifier performance. The resulting models show improved accuracy with limited data, offering valuable prognostic tools for sepsis management. This framework demonstrates the potential of machine learning in small medical datasets.</div></div>","PeriodicalId":73399,"journal":{"name":"Intelligence-based medicine","volume":"10 ","pages":"Article 100167"},"PeriodicalIF":0.0000,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Intelligence-based medicine","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666521224000346","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Background and Objectives:

Sepsis is a leading cause of mortality in intensive care units (ICUs). The development of a robust prognostic model utilizing patients’ clinical data could significantly enhance clinicians’ ability to make informed treatment decisions, potentially improving outcomes for septic patients. This study aims to create a novel machine-learning framework for constructing prognostic tools capable of predicting patient survival or mortality outcome.

Methods:

A novel dataset is created using concatenated triples of static data, temporal data, and clinical outcomes to expand data size. This structured input trains five machine learning classifiers (KNN, Logistic Regression, SVM, RF, and XGBoost) with advanced feature engineering. Models are evaluated on an independent cohort using AUROC and a new metric,

γ

, which incorporates the F1 score, to assess discriminative power and generalizability.

Results:

We developed five prognostic models using the concatenated triple dataset with 10 dynamic features from patient medical records. Our analysis shows that the Extreme Gradient Boosting (XGBoost) model (AUROC = 0.777, F1 score = 0.694) and the Random Forest (RF) model (AUROC = 0.769, F1 score = 0.647), when paired with an ensemble under-sampling strategy, outperform other models. The RF model improves AUROC by 6.66% and reduces overfitting by 54.96%, while the XGBoost model shows a 0.52% increase in AUROC and a 77.72% reduction in overfitting. These results highlight our framework’s ability to enhance predictive accuracy and generalizability, particularly in sepsis prognosis.

Conclusion:

This study presents a novel modeling framework for predicting treatment outcomes in septic patients, designed for small, imbalanced, and high-dimensional datasets. By using temporal feature encoding, advanced sampling, and dimension reduction techniques, our approach enhances standard classifier performance. The resulting models show improved accuracy with limited data, offering valuable prognostic tools for sepsis management. This framework demonstrates the potential of machine learning in small medical datasets.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

脓毒症机器学习预后模型

背景和目的：脓毒症是重症监护病房（ICU）的主要死亡原因。利用患者的临床数据开发一个强大的预后模型，可以大大提高临床医生做出明智治疗决策的能力，从而改善脓毒症患者的预后。本研究旨在创建一个新颖的机器学习框架，用于构建能够预测患者生存或死亡结果的预后工具。方法：利用静态数据、时间数据和临床结果的三元组串联创建一个新颖的数据集，以扩大数据规模。这种结构化输入利用先进的特征工程训练了五个机器学习分类器（KNN、逻辑回归、SVM、RF 和 XGBoost）。使用 AUROC 和包含 F1 分数的新指标 γ 在独立队列中对模型进行评估，以评估判别能力和可推广性。我们的分析表明，极端梯度提升（XGBoost）模型（AUROC = 0.777，F1 得分 = 0.694）和随机森林（RF）模型（AUROC = 0.769，F1 得分 = 0.647）在与集合下采样策略配对后，表现优于其他模型。RF 模型的 AUROC 提高了 6.66%，过拟合减少了 54.96%，而 XGBoost 模型的 AUROC 提高了 0.52%，过拟合减少了 77.72%。这些结果凸显了我们的框架在提高预测准确性和可推广性方面的能力，尤其是在脓毒症预后方面。通过使用时间特征编码、高级采样和降维技术，我们的方法提高了标准分类器的性能。由此产生的模型在数据有限的情况下提高了准确性，为败血症管理提供了有价值的预后工具。该框架展示了机器学习在小型医疗数据集中的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊