{"title":"Machine learning-based prediction model for patients with recurrent Staphylococcus aureus bacteremia.","authors":"Yuan Li, Shuang Song, Liying Zhu, Xiaorun Zhang, Yijiao Mou, Maoxing Lei, Wenjing Wang, Zhen Tao","doi":"10.1186/s12911-025-02878-z","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Staphylococcus aureus bacteremia (SAB) remains a significant contributor to both community-acquired and healthcare-associated bloodstream infections. SAB exhibits a high recurrence rate and mortality rate, leading to numerous clinical treatment challenges. Particularly, since the outbreak of COVID-19, there has been a gradual increase in SAB patients, with a growing proportion of (Methicillin-resistant Staphylococcus aureus) MRSA infections. Therefore, we have constructed and validated a pediction model for recurrent SAB using machine learning. This model aids physicians in promptly assessing the condition and intervening proactively.</p><p><strong>Methods: </strong>The patients data is sourced from the Medical Information Mart for Intensive Care IV (MIMIC-IV) database version 2.2. The patients were divided into training and testing datasets using a 7:3 random sampling ratio. The process of feature selection employed two methods: Recursive Feature Elimination (RFE) and Least Absolute Shrinkage and Selection Operator (LASSO). Prediction models were built using Extreme Gradient Boosting (XGBoost), Random Forest (RF), Logistic Regression (LR), Support Vector Machine (SVM), and Artificial Neural Network (ANN). Model validation included Receiver Operating Characteristic (ROC) analysis, Decision Curve Analysis (DCA), and Precision-Recall Curve (PRC). We utilized SHAP (SHapley Additive exPlanations) values to demonstrate the significance of each feature and explain the XGBoost model.</p><p><strong>Results: </strong>After screening, MRSA, PTT, RBC, RDW, Neutrophils_abs, Sodium, Calcium, Vancomycin concentration, MCHC, MCV, and Prognostic Nutritional Index(PNI) were selected as features for constructing the model. Through combined evaluation using ROC、 DCA and PRC, XGBoost demonstrated the best predictive performance, achieving an AUC value of 0.76 (95% CI: 0.66-0.85) in ROC and 0.56 (95% CI: 0.37-0.75) in PRC. Building a website based on the Xgboost model. SHAP illustrated the feature importance ranking in the XGBoost model and provided examples to explain the XGBoost model.</p><p><strong>Conclusions: </strong>The adoption of XGBoost for model development holds widespread acceptance in the medical domain. The prediction model for recurrent SAB, developed by our team, aids physicians in timely diagnosis and treatment of patients.</p>","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":"25 1","pages":"99"},"PeriodicalIF":3.3000,"publicationDate":"2025-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medical Informatics and Decision Making","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12911-025-02878-z","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Staphylococcus aureus bacteremia (SAB) remains a significant contributor to both community-acquired and healthcare-associated bloodstream infections. SAB exhibits a high recurrence rate and mortality rate, leading to numerous clinical treatment challenges. Particularly, since the outbreak of COVID-19, there has been a gradual increase in SAB patients, with a growing proportion of (Methicillin-resistant Staphylococcus aureus) MRSA infections. Therefore, we have constructed and validated a pediction model for recurrent SAB using machine learning. This model aids physicians in promptly assessing the condition and intervening proactively.
Methods: The patients data is sourced from the Medical Information Mart for Intensive Care IV (MIMIC-IV) database version 2.2. The patients were divided into training and testing datasets using a 7:3 random sampling ratio. The process of feature selection employed two methods: Recursive Feature Elimination (RFE) and Least Absolute Shrinkage and Selection Operator (LASSO). Prediction models were built using Extreme Gradient Boosting (XGBoost), Random Forest (RF), Logistic Regression (LR), Support Vector Machine (SVM), and Artificial Neural Network (ANN). Model validation included Receiver Operating Characteristic (ROC) analysis, Decision Curve Analysis (DCA), and Precision-Recall Curve (PRC). We utilized SHAP (SHapley Additive exPlanations) values to demonstrate the significance of each feature and explain the XGBoost model.
Results: After screening, MRSA, PTT, RBC, RDW, Neutrophils_abs, Sodium, Calcium, Vancomycin concentration, MCHC, MCV, and Prognostic Nutritional Index(PNI) were selected as features for constructing the model. Through combined evaluation using ROC、 DCA and PRC, XGBoost demonstrated the best predictive performance, achieving an AUC value of 0.76 (95% CI: 0.66-0.85) in ROC and 0.56 (95% CI: 0.37-0.75) in PRC. Building a website based on the Xgboost model. SHAP illustrated the feature importance ranking in the XGBoost model and provided examples to explain the XGBoost model.
Conclusions: The adoption of XGBoost for model development holds widespread acceptance in the medical domain. The prediction model for recurrent SAB, developed by our team, aids physicians in timely diagnosis and treatment of patients.
期刊介绍:
BMC Medical Informatics and Decision Making is an open access journal publishing original peer-reviewed research articles in relation to the design, development, implementation, use, and evaluation of health information technologies and decision-making for human health.