基于SHapley加性解释的可解释性机器学习构建新生儿重症监护病房新生儿早发性脓毒症预测模型。

IF 1.5 4区 医学 Q2 PEDIATRICS Translational pediatrics Pub Date : 2024-11-30 Epub Date: 2024-11-26 DOI:10.21037/tp-24-278
Xuefeng Tan, Xiufang Zhang, Jie Chai, Wenjuan Ji, Jinling Ru, Cuilin Yang, Wenjing Zhou, Jing Bai, Yueling Xiong
{"title":"基于SHapley加性解释的可解释性机器学习构建新生儿重症监护病房新生儿早发性脓毒症预测模型。","authors":"Xuefeng Tan, Xiufang Zhang, Jie Chai, Wenjuan Ji, Jinling Ru, Cuilin Yang, Wenjing Zhou, Jing Bai, Yueling Xiong","doi":"10.21037/tp-24-278","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>The clinical characteristics of neonatal sepsis (NS) are subtle and non-specific, posing a serious threat to the lives of newborn infants. Early-onset sepsis (EOS) is sepsis that occurs within 72 hours after birth, with a high mortality rate. Identifying key factors of NS and conducting early diagnosis are of great practical significance. Thus, we developed a robust machine learning (ML) model for the early prediction of EOS in neonates admitted to the neonatal intensive care unit (NICU), investigated the pivotal risk factors associated with EOS development, and provided interpretable insights into the model's predictions.</p><p><strong>Methods: </strong>A retrospective cohort study was conducted. This includes 668 newborns (EOS and non-EOS) admitted to the NICU of Bozhou People's Hospital from January to December 2023, excluding 72 newborns born more than three days ago and 166 newborns with medical record data missing more than 30%. Finally, 430 newborns (EOS and non-EOS) were included in the study. Clinical case data were meticulously analyzed, and the dataset was randomly partitioned, allocating 75% for model training and the remaining 25% for test. Data preprocessing was meticulously performed using R language, and the least absolute shrinkage and selection operator (LASSO) regression was implemented to select salient features, mitigating the risk of overfitting. Six ML models were leveraged to forecast the incidence of EOS in neonates. The predictive performance of these models was rigorously evaluated using the receiver operating characteristic (ROC) curve and precision-recall (PR) curve. Furthermore, the SHapley Additive exPlanations (SHAP) framework was employed to provide intuitive explanations for the predictions made by the Categorical Boosting (CatBoost) model, which emerged as the top performer.</p><p><strong>Results: </strong>The ROC area under the curve (ROCAUC) of six ML models, CatBoost, random forest (RF), eXtreme Gradient Boosting (XGBoost), multilayer perceptron (MLP), support vector machine (SVM), logistic regression (LR) all exceeded 0.900 on the test set. Especially the CatBoost model exhibited superior performance, with favorable outcomes in calibration, decision curve analysis (DCA), and learning curves. Notably, the ROCAUC attained 0.975, and the area under the PR curve (PRAUC) reached 0.947, signifying a high degree of predictive accuracy. Utilizing the SHAP method, seven key features were identified and ranked by their importance: respiratory rate (RR), procalcitonin (PCT), nasal congestion (NC), yellow staining (YS), white blood cell count (WBC), fever, and amniotic fluid turbidity (AFT).</p><p><strong>Conclusions: </strong>By constructing a precision-oriented ML model and harnessing the SHAP method for interpretability, this study effectively identified crucial risk factors for EOS development in neonates. This approach enables early prediction of EOS risk, thereby facilitating timely and targeted clinical interventions for precise diagnosis and treatment.</p>","PeriodicalId":23294,"journal":{"name":"Translational pediatrics","volume":"13 11","pages":"1933-1946"},"PeriodicalIF":1.5000,"publicationDate":"2024-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11621883/pdf/","citationCount":"0","resultStr":"{\"title\":\"Constructing a predictive model for early-onset sepsis in neonatal intensive care unit newborns based on SHapley Additive exPlanations explainable machine learning.\",\"authors\":\"Xuefeng Tan, Xiufang Zhang, Jie Chai, Wenjuan Ji, Jinling Ru, Cuilin Yang, Wenjing Zhou, Jing Bai, Yueling Xiong\",\"doi\":\"10.21037/tp-24-278\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>The clinical characteristics of neonatal sepsis (NS) are subtle and non-specific, posing a serious threat to the lives of newborn infants. Early-onset sepsis (EOS) is sepsis that occurs within 72 hours after birth, with a high mortality rate. Identifying key factors of NS and conducting early diagnosis are of great practical significance. Thus, we developed a robust machine learning (ML) model for the early prediction of EOS in neonates admitted to the neonatal intensive care unit (NICU), investigated the pivotal risk factors associated with EOS development, and provided interpretable insights into the model's predictions.</p><p><strong>Methods: </strong>A retrospective cohort study was conducted. This includes 668 newborns (EOS and non-EOS) admitted to the NICU of Bozhou People's Hospital from January to December 2023, excluding 72 newborns born more than three days ago and 166 newborns with medical record data missing more than 30%. Finally, 430 newborns (EOS and non-EOS) were included in the study. Clinical case data were meticulously analyzed, and the dataset was randomly partitioned, allocating 75% for model training and the remaining 25% for test. Data preprocessing was meticulously performed using R language, and the least absolute shrinkage and selection operator (LASSO) regression was implemented to select salient features, mitigating the risk of overfitting. Six ML models were leveraged to forecast the incidence of EOS in neonates. The predictive performance of these models was rigorously evaluated using the receiver operating characteristic (ROC) curve and precision-recall (PR) curve. Furthermore, the SHapley Additive exPlanations (SHAP) framework was employed to provide intuitive explanations for the predictions made by the Categorical Boosting (CatBoost) model, which emerged as the top performer.</p><p><strong>Results: </strong>The ROC area under the curve (ROCAUC) of six ML models, CatBoost, random forest (RF), eXtreme Gradient Boosting (XGBoost), multilayer perceptron (MLP), support vector machine (SVM), logistic regression (LR) all exceeded 0.900 on the test set. Especially the CatBoost model exhibited superior performance, with favorable outcomes in calibration, decision curve analysis (DCA), and learning curves. Notably, the ROCAUC attained 0.975, and the area under the PR curve (PRAUC) reached 0.947, signifying a high degree of predictive accuracy. Utilizing the SHAP method, seven key features were identified and ranked by their importance: respiratory rate (RR), procalcitonin (PCT), nasal congestion (NC), yellow staining (YS), white blood cell count (WBC), fever, and amniotic fluid turbidity (AFT).</p><p><strong>Conclusions: </strong>By constructing a precision-oriented ML model and harnessing the SHAP method for interpretability, this study effectively identified crucial risk factors for EOS development in neonates. This approach enables early prediction of EOS risk, thereby facilitating timely and targeted clinical interventions for precise diagnosis and treatment.</p>\",\"PeriodicalId\":23294,\"journal\":{\"name\":\"Translational pediatrics\",\"volume\":\"13 11\",\"pages\":\"1933-1946\"},\"PeriodicalIF\":1.5000,\"publicationDate\":\"2024-11-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11621883/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Translational pediatrics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.21037/tp-24-278\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/11/26 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q2\",\"JCRName\":\"PEDIATRICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Translational pediatrics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.21037/tp-24-278","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/11/26 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"PEDIATRICS","Score":null,"Total":0}
引用次数: 0

摘要

背景:新生儿脓毒症(neonatal sepsis, NS)的临床特点微妙且非特异性,严重威胁新生儿的生命。早发性败血症(EOS)是出生后72小时内发生的败血症,死亡率高。识别NS的关键因素并进行早期诊断具有重要的现实意义。因此,我们开发了一个强大的机器学习(ML)模型,用于早期预测新生儿重症监护病房(NICU)新生儿的EOS,研究了与EOS发展相关的关键风险因素,并为模型预测提供了可解释的见解。方法:采用回顾性队列研究。其中包括2023年1月至12月在亳州人民医院NICU收治的668名新生儿(EOS和非EOS),不包括72名出生三天以上的新生儿和166名病历数据缺失超过30%的新生儿。最后,430名新生儿(包括EOS和非EOS)被纳入研究。对临床病例数据进行细致分析,并对数据集进行随机分区,分配75%用于模型训练,剩余25%用于测试。使用R语言进行数据预处理,并采用最小绝对收缩和选择算子(LASSO)回归来选择显著特征,降低了过拟合的风险。利用6个ML模型预测新生儿EOS的发生率。采用受试者工作特征(ROC)曲线和精确召回率(PR)曲线对模型的预测性能进行了严格评价。此外,采用SHapley加性解释(SHAP)框架为分类促进(CatBoost)模型所做的预测提供直观的解释,该模型表现最佳。结果:CatBoost、随机森林(RF)、极限梯度增强(XGBoost)、多层感知器(MLP)、支持向量机(SVM)、逻辑回归(LR)等6种ML模型的ROC曲线下面积(ROCAUC)在测试集上均超过0.900。特别是CatBoost模型,在校准、决策曲线分析(DCA)和学习曲线方面表现出良好的效果。值得注意的是,ROCAUC达到0.975,PR曲线下面积(PRAUC)达到0.947,表明预测精度较高。利用SHAP方法,确定七个关键特征并按其重要性进行排序:呼吸频率(RR)、降钙素原(PCT)、鼻充血(NC)、黄色染色(YS)、白细胞计数(WBC)、发热和羊水浑浊(AFT)。结论:通过构建一个精确的ML模型,并利用SHAP方法进行可解释性,本研究有效地识别了新生儿EOS发展的关键危险因素。这种方法可以早期预测EOS风险,从而促进及时和有针对性的临床干预,以实现精确的诊断和治疗。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Constructing a predictive model for early-onset sepsis in neonatal intensive care unit newborns based on SHapley Additive exPlanations explainable machine learning.

Background: The clinical characteristics of neonatal sepsis (NS) are subtle and non-specific, posing a serious threat to the lives of newborn infants. Early-onset sepsis (EOS) is sepsis that occurs within 72 hours after birth, with a high mortality rate. Identifying key factors of NS and conducting early diagnosis are of great practical significance. Thus, we developed a robust machine learning (ML) model for the early prediction of EOS in neonates admitted to the neonatal intensive care unit (NICU), investigated the pivotal risk factors associated with EOS development, and provided interpretable insights into the model's predictions.

Methods: A retrospective cohort study was conducted. This includes 668 newborns (EOS and non-EOS) admitted to the NICU of Bozhou People's Hospital from January to December 2023, excluding 72 newborns born more than three days ago and 166 newborns with medical record data missing more than 30%. Finally, 430 newborns (EOS and non-EOS) were included in the study. Clinical case data were meticulously analyzed, and the dataset was randomly partitioned, allocating 75% for model training and the remaining 25% for test. Data preprocessing was meticulously performed using R language, and the least absolute shrinkage and selection operator (LASSO) regression was implemented to select salient features, mitigating the risk of overfitting. Six ML models were leveraged to forecast the incidence of EOS in neonates. The predictive performance of these models was rigorously evaluated using the receiver operating characteristic (ROC) curve and precision-recall (PR) curve. Furthermore, the SHapley Additive exPlanations (SHAP) framework was employed to provide intuitive explanations for the predictions made by the Categorical Boosting (CatBoost) model, which emerged as the top performer.

Results: The ROC area under the curve (ROCAUC) of six ML models, CatBoost, random forest (RF), eXtreme Gradient Boosting (XGBoost), multilayer perceptron (MLP), support vector machine (SVM), logistic regression (LR) all exceeded 0.900 on the test set. Especially the CatBoost model exhibited superior performance, with favorable outcomes in calibration, decision curve analysis (DCA), and learning curves. Notably, the ROCAUC attained 0.975, and the area under the PR curve (PRAUC) reached 0.947, signifying a high degree of predictive accuracy. Utilizing the SHAP method, seven key features were identified and ranked by their importance: respiratory rate (RR), procalcitonin (PCT), nasal congestion (NC), yellow staining (YS), white blood cell count (WBC), fever, and amniotic fluid turbidity (AFT).

Conclusions: By constructing a precision-oriented ML model and harnessing the SHAP method for interpretability, this study effectively identified crucial risk factors for EOS development in neonates. This approach enables early prediction of EOS risk, thereby facilitating timely and targeted clinical interventions for precise diagnosis and treatment.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Translational pediatrics
Translational pediatrics Medicine-Pediatrics, Perinatology and Child Health
CiteScore
4.50
自引率
5.00%
发文量
108
期刊介绍: Information not localized
期刊最新文献
MiR-490-3p promotes cell apoptosis and cell-cycle arrest in osteosarcoma via the modulation of CDCA8/ATF3 by targeting NUSAP1. Aggressive behavior in adolescent patients with mental disorders: what we can do. Anesthetics change the oral microbial composition of children and increase the abundance of the genus Haemophilus. Birth weight and ponderal index percentiles for twins based on sex and chorionicity in a center of Guangdong Province, China. Clinical and genetic characteristics of patients with Alagille syndrome in China: identification of six novel JAG1 and NOTCH2 mutations.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1