在临床环境中利用机器学习和不平衡数据处理的早期败血症预测模型

IF 2.4 3区 医学 Q2 PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH Preventive Medicine Reports Pub Date : 2024-08-02 DOI:10.1016/j.pmedr.2024.102841
{"title":"在临床环境中利用机器学习和不平衡数据处理的早期败血症预测模型","authors":"","doi":"10.1016/j.pmedr.2024.102841","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><p>Early and accurate diagnoses of sepsis patients are essential to reduce the mortality. However, the sepsis is still diagnosed in a traditional way in China despite the increasing number of related studies, which may to some extent lead to delays in the treatment.</p></div><div><h3>Methods</h3><p>The study included 2,385 patients, including 364 with sepsis, collected from the First Affiliated Hospital of Anhui Medical University and partner hospitals from April to July 2022. External validation was conducted using the MIMIC-III database (over 60,000 patients from 2001 to 2012) and the eICU Collaborative Research Database (139,000 patients from 2014 to 2015). Multiple algorithm models, along with the SHapley Additive exPlanations (SHAP) analysis, are applied to explore the main risk factors for the accurate prediction of the sepsis. Multiple Imputations for filling missing data and the Synthetic Minority Oversampling (SMOTE) balancing method for balancing data are used for the data processing.</p></div><div><h3>Result</h3><p>Eighteen diagnostic features are used in the predictive model for early sepsis. The Random Forest model has the best performance among all the models, with an Area Under the Curve (AUC) of 87% and an F1-score (F1) of 77%. Moreover, the interpretation from the SHAP analysis is generally consistent with the current clinical situation.</p></div><div><h3>Conclusion</h3><p>The study revealed the relationship between these 18 clinical features and diagnostic outcomes. The results indicate that patients with laboratory values of Systolic Blood Pressure, Albumin, and Heart Rate exceeding certain thresholds are at a high likelihood of developing sepsis.</p></div>","PeriodicalId":38066,"journal":{"name":"Preventive Medicine Reports","volume":null,"pages":null},"PeriodicalIF":2.4000,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2211335524002560/pdfft?md5=a121348e85396fb1b321b5830b0fe6d0&pid=1-s2.0-S2211335524002560-main.pdf","citationCount":"0","resultStr":"{\"title\":\"An early sepsis prediction model utilizing machine learning and unbalanced data processing in a clinical context\",\"authors\":\"\",\"doi\":\"10.1016/j.pmedr.2024.102841\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Background</h3><p>Early and accurate diagnoses of sepsis patients are essential to reduce the mortality. However, the sepsis is still diagnosed in a traditional way in China despite the increasing number of related studies, which may to some extent lead to delays in the treatment.</p></div><div><h3>Methods</h3><p>The study included 2,385 patients, including 364 with sepsis, collected from the First Affiliated Hospital of Anhui Medical University and partner hospitals from April to July 2022. External validation was conducted using the MIMIC-III database (over 60,000 patients from 2001 to 2012) and the eICU Collaborative Research Database (139,000 patients from 2014 to 2015). Multiple algorithm models, along with the SHapley Additive exPlanations (SHAP) analysis, are applied to explore the main risk factors for the accurate prediction of the sepsis. Multiple Imputations for filling missing data and the Synthetic Minority Oversampling (SMOTE) balancing method for balancing data are used for the data processing.</p></div><div><h3>Result</h3><p>Eighteen diagnostic features are used in the predictive model for early sepsis. The Random Forest model has the best performance among all the models, with an Area Under the Curve (AUC) of 87% and an F1-score (F1) of 77%. Moreover, the interpretation from the SHAP analysis is generally consistent with the current clinical situation.</p></div><div><h3>Conclusion</h3><p>The study revealed the relationship between these 18 clinical features and diagnostic outcomes. The results indicate that patients with laboratory values of Systolic Blood Pressure, Albumin, and Heart Rate exceeding certain thresholds are at a high likelihood of developing sepsis.</p></div>\",\"PeriodicalId\":38066,\"journal\":{\"name\":\"Preventive Medicine Reports\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.4000,\"publicationDate\":\"2024-08-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2211335524002560/pdfft?md5=a121348e85396fb1b321b5830b0fe6d0&pid=1-s2.0-S2211335524002560-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Preventive Medicine Reports\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2211335524002560\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Preventive Medicine Reports","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2211335524002560","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH","Score":null,"Total":0}
引用次数: 0

摘要

背景脓毒症患者的早期准确诊断对于降低死亡率至关重要。方法 该研究纳入了 2022 年 4 月至 7 月期间从安徽医科大学第一附属医院及合作医院收集的 2385 例患者,其中包括 364 例脓毒症患者。利用MIMIC-III数据库(2001年至2012年的6万多名患者)和eICU合作研究数据库(2014年至2015年的13.9万名患者)进行了外部验证。多重算法模型与SHAPLEY Additive exPlanations(SHAP)分析相结合,用于探索准确预测败血症的主要风险因素。在数据处理过程中,使用了用于填补缺失数据的多重输入法和用于平衡数据的合成少数群体过度采样(SMOTE)平衡法。随机森林模型在所有模型中表现最佳,其曲线下面积(AUC)为 87%,F1 分数(F1)为 77%。此外,SHAP 分析的解释与目前的临床情况基本一致。结果表明,收缩压、白蛋白和心率的实验室值超过一定阈值的患者发生败血症的可能性很高。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
An early sepsis prediction model utilizing machine learning and unbalanced data processing in a clinical context

Background

Early and accurate diagnoses of sepsis patients are essential to reduce the mortality. However, the sepsis is still diagnosed in a traditional way in China despite the increasing number of related studies, which may to some extent lead to delays in the treatment.

Methods

The study included 2,385 patients, including 364 with sepsis, collected from the First Affiliated Hospital of Anhui Medical University and partner hospitals from April to July 2022. External validation was conducted using the MIMIC-III database (over 60,000 patients from 2001 to 2012) and the eICU Collaborative Research Database (139,000 patients from 2014 to 2015). Multiple algorithm models, along with the SHapley Additive exPlanations (SHAP) analysis, are applied to explore the main risk factors for the accurate prediction of the sepsis. Multiple Imputations for filling missing data and the Synthetic Minority Oversampling (SMOTE) balancing method for balancing data are used for the data processing.

Result

Eighteen diagnostic features are used in the predictive model for early sepsis. The Random Forest model has the best performance among all the models, with an Area Under the Curve (AUC) of 87% and an F1-score (F1) of 77%. Moreover, the interpretation from the SHAP analysis is generally consistent with the current clinical situation.

Conclusion

The study revealed the relationship between these 18 clinical features and diagnostic outcomes. The results indicate that patients with laboratory values of Systolic Blood Pressure, Albumin, and Heart Rate exceeding certain thresholds are at a high likelihood of developing sepsis.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Preventive Medicine Reports
Preventive Medicine Reports Medicine-Public Health, Environmental and Occupational Health
CiteScore
3.90
自引率
0.00%
发文量
353
期刊最新文献
Prevalence and sociodemographic associations with weight discrimination in early adolescents Editorial Board Advising patients on the use of non-alcoholic beverages that mirror alcohol The prevalence and risk factors associated with hypertension subtypes among ethnic Dai adults in rural China Turning to digital: Examining the relationship between offline healthcare barriers and U.S. older adults’ emotional well-being via online patient–provider communication and perceived quality of care (2017–2020)
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1