开发决策树分类算法，预测 COVID-19 患者的死亡率。

IF 2 Q2 EMERGENCY MEDICINE International Journal of Emergency Medicine Pub Date : 2024-09-27 DOI:10.1186/s12245-024-00681-7

Zahra Mohammadi-Pirouz, Karimollah Hajian-Tilaki, Mahmoud Sadeghi Haddat-Zavareh, Abazar Amoozadeh, Shabnam Bahrami

{"title":"开发决策树分类算法，预测 COVID-19 患者的死亡率。","authors":"Zahra Mohammadi-Pirouz, Karimollah Hajian-Tilaki, Mahmoud Sadeghi Haddat-Zavareh, Abazar Amoozadeh, Shabnam Bahrami","doi":"10.1186/s12245-024-00681-7","DOIUrl":null,"url":null,"abstract":"Introduction: The accurate prediction of COVID-19 mortality risk, considering influencing factors, is crucial in guiding effective public policies to alleviate the strain on the healthcare system. As such, this study aimed to assess the efficacy of decision tree algorithms (CART, C5.0, and CHAID) in predicting COVID-19 mortality risk and compare their performance with that of the logistic model.Methods: This retrospective cohort study examined 5080 cases of COVID-19 in Babol, a city in northern Iran, who tested positive for the virus via PCR from March 2020 to March 2022. In order to check the validity of the findings, the data was randomly divided into an 80% training set and a 20% testing set. The prediction models, such as Logistic regression models and decision tree algorithms, were trained on the 80% training data and tested on the 20% testing data. The accuracy of these methods for the test samples was assessed using measures like ROC curve, sensitivity, specificity, and AUC.Results: The findings revealed that the mortality rate for COVID-19 patients who were admitted to hospitals was 7.7%. Through cross validation, it was determined that the CHAID algorithm outperformed other decision tree and logistic regression algorithms in specificity, and precision but not sensitivity in predicting the risk of COVID-19 mortality. The CHAID algorithm demonstrated a specificity, precision, accuracy, and F-score of 0.98, 0.70, 0.95, and 0.52 respectively. All models indicated that factors such as ICU hospitalization, intubation, age, kidney disease, BUN, CRP, WBC, NLR, O2 sat, and hemoglobin were among the factors that influenced the mortality rate of COVID-19 patients.Conclusions: The CART and C5.0 models had outperformed in sensitivity but CHAID demonstrates a better performance compared to other decision tree algorithms in specificity, precision, accuracy and shows a slight improvement over the logistic regression method in predicting the risk of COVID-19 mortality in the population under study.","PeriodicalId":13967,"journal":{"name":"International Journal of Emergency Medicine","volume":null,"pages":null},"PeriodicalIF":2.0000,"publicationDate":"2024-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11438402/pdf/","citationCount":"0","resultStr":"{\"title\":\"Development of decision tree classification algorithms in predicting mortality of COVID-19 patients.\",\"authors\":\"Zahra Mohammadi-Pirouz, Karimollah Hajian-Tilaki, Mahmoud Sadeghi Haddat-Zavareh, Abazar Amoozadeh, Shabnam Bahrami\",\"doi\":\"10.1186/s12245-024-00681-7\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Introduction: The accurate prediction of COVID-19 mortality risk, considering influencing factors, is crucial in guiding effective public policies to alleviate the strain on the healthcare system. As such, this study aimed to assess the efficacy of decision tree algorithms (CART, C5.0, and CHAID) in predicting COVID-19 mortality risk and compare their performance with that of the logistic model.Methods: This retrospective cohort study examined 5080 cases of COVID-19 in Babol, a city in northern Iran, who tested positive for the virus via PCR from March 2020 to March 2022. In order to check the validity of the findings, the data was randomly divided into an 80% training set and a 20% testing set. The prediction models, such as Logistic regression models and decision tree algorithms, were trained on the 80% training data and tested on the 20% testing data. The accuracy of these methods for the test samples was assessed using measures like ROC curve, sensitivity, specificity, and AUC.Results: The findings revealed that the mortality rate for COVID-19 patients who were admitted to hospitals was 7.7%. Through cross validation, it was determined that the CHAID algorithm outperformed other decision tree and logistic regression algorithms in specificity, and precision but not sensitivity in predicting the risk of COVID-19 mortality. The CHAID algorithm demonstrated a specificity, precision, accuracy, and F-score of 0.98, 0.70, 0.95, and 0.52 respectively. All models indicated that factors such as ICU hospitalization, intubation, age, kidney disease, BUN, CRP, WBC, NLR, O2 sat, and hemoglobin were among the factors that influenced the mortality rate of COVID-19 patients.Conclusions: The CART and C5.0 models had outperformed in sensitivity but CHAID demonstrates a better performance compared to other decision tree algorithms in specificity, precision, accuracy and shows a slight improvement over the logistic regression method in predicting the risk of COVID-19 mortality in the population under study.\",\"PeriodicalId\":13967,\"journal\":{\"name\":\"International Journal of Emergency Medicine\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.0000,\"publicationDate\":\"2024-09-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11438402/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Emergency Medicine\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1186/s12245-024-00681-7\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"EMERGENCY MEDICINE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Emergency Medicine","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/s12245-024-00681-7","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"EMERGENCY MEDICINE","Score":null,"Total":0}

引用次数: 0

摘要

导言：考虑到各种影响因素，准确预测 COVID-19 的死亡风险对于指导有效的公共政策以减轻医疗系统的压力至关重要。因此，本研究旨在评估决策树算法（CART、C5.0 和 CHAID）在预测 COVID-19 死亡风险方面的功效，并比较其与逻辑模型的性能：这项回顾性队列研究调查了伊朗北部城市巴博勒的 5080 例 COVID-19 病例，这些病例在 2020 年 3 月至 2022 年 3 月期间通过 PCR 检测出病毒呈阳性。为了检验研究结果的有效性，研究人员将数据随机分为 80% 的训练集和 20% 的测试集。逻辑回归模型和决策树算法等预测模型在 80% 的训练数据上进行了训练，并在 20% 的测试数据上进行了测试。使用 ROC 曲线、灵敏度、特异性和 AUC 等指标评估了这些方法对测试样本的准确性：研究结果显示，COVID-19 住院患者的死亡率为 7.7%。通过交叉验证，确定 CHAID 算法在预测 COVID-19 死亡风险方面的特异性和精确性优于其他决策树算法和逻辑回归算法，但灵敏度不佳。CHAID 算法的特异性、精确性、准确性和 F 值分别为 0.98、0.70、0.95 和 0.52。所有模型都表明，ICU住院、插管、年龄、肾病、BUN、CRP、WBC、NLR、O2 饱和度和血红蛋白等因素都是影响 COVID-19 患者死亡率的因素：CART和C5.0模型的灵敏度优于其他决策树算法，但CHAID在特异性、精确性和准确性方面的表现优于其他决策树算法，而且在预测研究人群COVID-19死亡风险方面，CHAID比逻辑回归方法略有改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Development of decision tree classification algorithms in predicting mortality of COVID-19 patients.

Introduction: The accurate prediction of COVID-19 mortality risk, considering influencing factors, is crucial in guiding effective public policies to alleviate the strain on the healthcare system. As such, this study aimed to assess the efficacy of decision tree algorithms (CART, C5.0, and CHAID) in predicting COVID-19 mortality risk and compare their performance with that of the logistic model.

Methods: This retrospective cohort study examined 5080 cases of COVID-19 in Babol, a city in northern Iran, who tested positive for the virus via PCR from March 2020 to March 2022. In order to check the validity of the findings, the data was randomly divided into an 80% training set and a 20% testing set. The prediction models, such as Logistic regression models and decision tree algorithms, were trained on the 80% training data and tested on the 20% testing data. The accuracy of these methods for the test samples was assessed using measures like ROC curve, sensitivity, specificity, and AUC.

Results: The findings revealed that the mortality rate for COVID-19 patients who were admitted to hospitals was 7.7%. Through cross validation, it was determined that the CHAID algorithm outperformed other decision tree and logistic regression algorithms in specificity, and precision but not sensitivity in predicting the risk of COVID-19 mortality. The CHAID algorithm demonstrated a specificity, precision, accuracy, and F-score of 0.98, 0.70, 0.95, and 0.52 respectively. All models indicated that factors such as ICU hospitalization, intubation, age, kidney disease, BUN, CRP, WBC, NLR, O2 sat, and hemoglobin were among the factors that influenced the mortality rate of COVID-19 patients.

Conclusions: The CART and C5.0 models had outperformed in sensitivity but CHAID demonstrates a better performance compared to other decision tree algorithms in specificity, precision, accuracy and shows a slight improvement over the logistic regression method in predicting the risk of COVID-19 mortality in the population under study.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Journal of Emergency Medicine EMERGENCY MEDICINE-

CiteScore

4.60

自引率

0.00%

发文量

审稿时长

13 weeks

期刊介绍： The aim of the journal is to bring to light the various clinical advancements and research developments attained over the world and thus help the specialty forge ahead. It is directed towards physicians and medical personnel undergoing training or working within the field of Emergency Medicine. Medical students who are interested in pursuing a career in Emergency Medicine will also benefit from the journal. This is particularly useful for trainees in countries where the specialty is still in its infancy. Disciplines covered will include interesting clinical cases, the latest evidence-based practice and research developments in Emergency medicine including emergency pediatrics.