{"title":"在临床环境中利用机器学习和不平衡数据处理的早期败血症预测模型","authors":"","doi":"10.1016/j.pmedr.2024.102841","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><p>Early and accurate diagnoses of sepsis patients are essential to reduce the mortality. However, the sepsis is still diagnosed in a traditional way in China despite the increasing number of related studies, which may to some extent lead to delays in the treatment.</p></div><div><h3>Methods</h3><p>The study included 2,385 patients, including 364 with sepsis, collected from the First Affiliated Hospital of Anhui Medical University and partner hospitals from April to July 2022. External validation was conducted using the MIMIC-III database (over 60,000 patients from 2001 to 2012) and the eICU Collaborative Research Database (139,000 patients from 2014 to 2015). Multiple algorithm models, along with the SHapley Additive exPlanations (SHAP) analysis, are applied to explore the main risk factors for the accurate prediction of the sepsis. Multiple Imputations for filling missing data and the Synthetic Minority Oversampling (SMOTE) balancing method for balancing data are used for the data processing.</p></div><div><h3>Result</h3><p>Eighteen diagnostic features are used in the predictive model for early sepsis. The Random Forest model has the best performance among all the models, with an Area Under the Curve (AUC) of 87% and an F1-score (F1) of 77%. Moreover, the interpretation from the SHAP analysis is generally consistent with the current clinical situation.</p></div><div><h3>Conclusion</h3><p>The study revealed the relationship between these 18 clinical features and diagnostic outcomes. The results indicate that patients with laboratory values of Systolic Blood Pressure, Albumin, and Heart Rate exceeding certain thresholds are at a high likelihood of developing sepsis.</p></div>","PeriodicalId":38066,"journal":{"name":"Preventive Medicine Reports","volume":null,"pages":null},"PeriodicalIF":2.4000,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2211335524002560/pdfft?md5=a121348e85396fb1b321b5830b0fe6d0&pid=1-s2.0-S2211335524002560-main.pdf","citationCount":"0","resultStr":"{\"title\":\"An early sepsis prediction model utilizing machine learning and unbalanced data processing in a clinical context\",\"authors\":\"\",\"doi\":\"10.1016/j.pmedr.2024.102841\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Background</h3><p>Early and accurate diagnoses of sepsis patients are essential to reduce the mortality. However, the sepsis is still diagnosed in a traditional way in China despite the increasing number of related studies, which may to some extent lead to delays in the treatment.</p></div><div><h3>Methods</h3><p>The study included 2,385 patients, including 364 with sepsis, collected from the First Affiliated Hospital of Anhui Medical University and partner hospitals from April to July 2022. External validation was conducted using the MIMIC-III database (over 60,000 patients from 2001 to 2012) and the eICU Collaborative Research Database (139,000 patients from 2014 to 2015). Multiple algorithm models, along with the SHapley Additive exPlanations (SHAP) analysis, are applied to explore the main risk factors for the accurate prediction of the sepsis. Multiple Imputations for filling missing data and the Synthetic Minority Oversampling (SMOTE) balancing method for balancing data are used for the data processing.</p></div><div><h3>Result</h3><p>Eighteen diagnostic features are used in the predictive model for early sepsis. The Random Forest model has the best performance among all the models, with an Area Under the Curve (AUC) of 87% and an F1-score (F1) of 77%. Moreover, the interpretation from the SHAP analysis is generally consistent with the current clinical situation.</p></div><div><h3>Conclusion</h3><p>The study revealed the relationship between these 18 clinical features and diagnostic outcomes. The results indicate that patients with laboratory values of Systolic Blood Pressure, Albumin, and Heart Rate exceeding certain thresholds are at a high likelihood of developing sepsis.</p></div>\",\"PeriodicalId\":38066,\"journal\":{\"name\":\"Preventive Medicine Reports\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.4000,\"publicationDate\":\"2024-08-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2211335524002560/pdfft?md5=a121348e85396fb1b321b5830b0fe6d0&pid=1-s2.0-S2211335524002560-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Preventive Medicine Reports\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2211335524002560\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Preventive Medicine Reports","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2211335524002560","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH","Score":null,"Total":0}
An early sepsis prediction model utilizing machine learning and unbalanced data processing in a clinical context
Background
Early and accurate diagnoses of sepsis patients are essential to reduce the mortality. However, the sepsis is still diagnosed in a traditional way in China despite the increasing number of related studies, which may to some extent lead to delays in the treatment.
Methods
The study included 2,385 patients, including 364 with sepsis, collected from the First Affiliated Hospital of Anhui Medical University and partner hospitals from April to July 2022. External validation was conducted using the MIMIC-III database (over 60,000 patients from 2001 to 2012) and the eICU Collaborative Research Database (139,000 patients from 2014 to 2015). Multiple algorithm models, along with the SHapley Additive exPlanations (SHAP) analysis, are applied to explore the main risk factors for the accurate prediction of the sepsis. Multiple Imputations for filling missing data and the Synthetic Minority Oversampling (SMOTE) balancing method for balancing data are used for the data processing.
Result
Eighteen diagnostic features are used in the predictive model for early sepsis. The Random Forest model has the best performance among all the models, with an Area Under the Curve (AUC) of 87% and an F1-score (F1) of 77%. Moreover, the interpretation from the SHAP analysis is generally consistent with the current clinical situation.
Conclusion
The study revealed the relationship between these 18 clinical features and diagnostic outcomes. The results indicate that patients with laboratory values of Systolic Blood Pressure, Albumin, and Heart Rate exceeding certain thresholds are at a high likelihood of developing sepsis.