Johayra Prithula , Khandaker Reajul Islam , Jaya Kumar , Toh Leong Tan , Mamun Bin Ibne Reaz , Tawsifur Rahman , Susu M. Zughaier , Muhammad Salman Khan , M. Murugappan , Muhammad E.H. Chowdhury
{"title":"利用重症监护室患者的电子健康记录数据进行早期败血症预测的新型经典机器学习框架。","authors":"Johayra Prithula , Khandaker Reajul Islam , Jaya Kumar , Toh Leong Tan , Mamun Bin Ibne Reaz , Tawsifur Rahman , Susu M. Zughaier , Muhammad Salman Khan , M. Murugappan , Muhammad E.H. Chowdhury","doi":"10.1016/j.compbiomed.2024.109284","DOIUrl":null,"url":null,"abstract":"<div><div>Sepsis, a life-threatening condition triggered by the body's response to infection, remains a significant global health challenge, annually affecting millions in the United States alone with substantial mortality and healthcare costs. Early prediction of sepsis is critical for timely intervention and improved patient outcomes. This study introduces an innovative predictive model leveraging machine learning techniques and a specific data-splitting approach on highly imbalanced electronic health records (EHRs). Using PhysioNet/CinC Challenge 2019 data from 40,336 patients, including vital signs, lab values, and demographics. Preliminary assessments using classical and stacked ML models with Synthetic Minority Oversampling Technique (SMOTE) augmentation were conducted, showing improved performance. It is found that stacking ML models enhances overall accuracy but faces limitations in precision, recall, and F1 score for positive class prediction. A novel data-splitting approach with 5-fold cross-validation and SMOTE and COPULA augmentation techniques demonstrated promise, with F1 scores ranging from 93 % to 94 % using the COPULA technique. COPULA excelled at predictions for different hours' onsets compared to the SMOTE technique. The proposed model outperformed existing studies, suggesting clinical viability for early sepsis prediction.</div></div>","PeriodicalId":10578,"journal":{"name":"Computers in biology and medicine","volume":"184 ","pages":"Article 109284"},"PeriodicalIF":7.0000,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A novel classical machine learning framework for early sepsis prediction using electronic health record data from ICU patients\",\"authors\":\"Johayra Prithula , Khandaker Reajul Islam , Jaya Kumar , Toh Leong Tan , Mamun Bin Ibne Reaz , Tawsifur Rahman , Susu M. Zughaier , Muhammad Salman Khan , M. Murugappan , Muhammad E.H. Chowdhury\",\"doi\":\"10.1016/j.compbiomed.2024.109284\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Sepsis, a life-threatening condition triggered by the body's response to infection, remains a significant global health challenge, annually affecting millions in the United States alone with substantial mortality and healthcare costs. Early prediction of sepsis is critical for timely intervention and improved patient outcomes. This study introduces an innovative predictive model leveraging machine learning techniques and a specific data-splitting approach on highly imbalanced electronic health records (EHRs). Using PhysioNet/CinC Challenge 2019 data from 40,336 patients, including vital signs, lab values, and demographics. Preliminary assessments using classical and stacked ML models with Synthetic Minority Oversampling Technique (SMOTE) augmentation were conducted, showing improved performance. It is found that stacking ML models enhances overall accuracy but faces limitations in precision, recall, and F1 score for positive class prediction. A novel data-splitting approach with 5-fold cross-validation and SMOTE and COPULA augmentation techniques demonstrated promise, with F1 scores ranging from 93 % to 94 % using the COPULA technique. COPULA excelled at predictions for different hours' onsets compared to the SMOTE technique. The proposed model outperformed existing studies, suggesting clinical viability for early sepsis prediction.</div></div>\",\"PeriodicalId\":10578,\"journal\":{\"name\":\"Computers in biology and medicine\",\"volume\":\"184 \",\"pages\":\"Article 109284\"},\"PeriodicalIF\":7.0000,\"publicationDate\":\"2024-11-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers in biology and medicine\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0010482524013696\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers in biology and medicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0010482524013696","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOLOGY","Score":null,"Total":0}
引用次数: 0
摘要
败血症是一种由机体对感染的反应引发的危及生命的病症,仍然是全球健康面临的重大挑战,每年仅在美国就有数百万人受到影响,死亡率和医疗成本居高不下。早期预测败血症对于及时干预和改善患者预后至关重要。本研究利用机器学习技术和特定的数据分割方法,在高度不平衡的电子健康记录(EHR)上引入了一个创新的预测模型。使用来自 40336 名患者的 2019 年 PhysioNet/CinC Challenge 数据,包括生命体征、化验值和人口统计数据。使用经典 ML 模型和带有合成少数群体过度采样技术(SMOTE)增强功能的堆叠 ML 模型进行了初步评估,结果显示性能有所提高。研究发现,堆叠 ML 模型提高了整体准确性,但在精确度、召回率和正类预测的 F1 分数方面面临限制。一种采用 5 倍交叉验证、SMOTE 和 COPULA 增强技术的新型数据拆分方法显示出良好的前景,使用 COPULA 技术的 F1 分数在 93 % 到 94 % 之间。与 SMOTE 技术相比,COPULA 在预测不同时段的起始时间方面表现出色。所提出的模型优于现有的研究,表明其在早期脓毒症预测方面具有临床可行性。
A novel classical machine learning framework for early sepsis prediction using electronic health record data from ICU patients
Sepsis, a life-threatening condition triggered by the body's response to infection, remains a significant global health challenge, annually affecting millions in the United States alone with substantial mortality and healthcare costs. Early prediction of sepsis is critical for timely intervention and improved patient outcomes. This study introduces an innovative predictive model leveraging machine learning techniques and a specific data-splitting approach on highly imbalanced electronic health records (EHRs). Using PhysioNet/CinC Challenge 2019 data from 40,336 patients, including vital signs, lab values, and demographics. Preliminary assessments using classical and stacked ML models with Synthetic Minority Oversampling Technique (SMOTE) augmentation were conducted, showing improved performance. It is found that stacking ML models enhances overall accuracy but faces limitations in precision, recall, and F1 score for positive class prediction. A novel data-splitting approach with 5-fold cross-validation and SMOTE and COPULA augmentation techniques demonstrated promise, with F1 scores ranging from 93 % to 94 % using the COPULA technique. COPULA excelled at predictions for different hours' onsets compared to the SMOTE technique. The proposed model outperformed existing studies, suggesting clinical viability for early sepsis prediction.
期刊介绍:
Computers in Biology and Medicine is an international forum for sharing groundbreaking advancements in the use of computers in bioscience and medicine. This journal serves as a medium for communicating essential research, instruction, ideas, and information regarding the rapidly evolving field of computer applications in these domains. By encouraging the exchange of knowledge, we aim to facilitate progress and innovation in the utilization of computers in biology and medicine.