Yibai Xiong, Yan Ma, Lianguo Ruan, Dan Li, Cheng Lu, Luqi Huang
{"title":"Comparing different machine learning techniques for predicting COVID-19 severity.","authors":"Yibai Xiong, Yan Ma, Lianguo Ruan, Dan Li, Cheng Lu, Luqi Huang","doi":"10.1186/s40249-022-00946-4","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Coronavirus disease 2019 (COVID-19) is still ongoing spreading globally, machine learning techniques were used in disease diagnosis and to predict treatment outcomes, which showed favorable performance. The present study aims to predict COVID-19 severity at admission by different machine learning techniques including random forest (RF), support vector machine (SVM), and logistic regression (LR). Feature importance to COVID-19 severity were further identified.</p><p><strong>Methods: </strong>A retrospective design was adopted in the JinYinTan Hospital from January 26 to March 28, 2020, eighty-six demographic, clinical, and laboratory features were selected with LassoCV method, Spearman's rank correlation, experts' opinions, and literature evaluation. RF, SVM, and LR were performed to predict severe COVID-19, the performance of the models was compared by the area under curve (AUC). Additionally, feature importance to COVID-19 severity were analyzed by the best performance model.</p><p><strong>Results: </strong>A total of 287 patients were enrolled with 36.6% severe cases and 63.4% non-severe cases. The median age was 60.0 years (interquartile range: 49.0-68.0 years). Three models were established using 23 features including 1 clinical, 1 chest computed tomography (CT) and 21 laboratory features. Among three models, RF yielded better overall performance with the highest AUC of 0.970 than SVM of 0.948 and LR of 0.928, RF also achieved a favorable sensitivity of 96.7%, specificity of 69.5%, and accuracy of 84.5%. SVM had sensitivity of 93.9%, specificity of 79.0%, and accuracy of 88.5%. LR also achieved a favorable sensitivity of 92.3%, specificity of 72.3%, and accuracy of 85.2%. Additionally, chest-CT had highest importance to illness severity, and the following features were neutrophil to lymphocyte ratio, lactate dehydrogenase, and D-dimer, respectively.</p><p><strong>Conclusions: </strong>Our results indicated that RF could be a useful predictive tool to identify patients with severe COVID-19, which may facilitate effective care and further optimize resources.</p>","PeriodicalId":13587,"journal":{"name":"Infectious Diseases of Poverty","volume":"11 1","pages":"19"},"PeriodicalIF":4.8000,"publicationDate":"2022-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8851750/pdf/","citationCount":"24","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Infectious Diseases of Poverty","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s40249-022-00946-4","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"INFECTIOUS DISEASES","Score":null,"Total":0}
引用次数: 24
Abstract
Background: Coronavirus disease 2019 (COVID-19) is still ongoing spreading globally, machine learning techniques were used in disease diagnosis and to predict treatment outcomes, which showed favorable performance. The present study aims to predict COVID-19 severity at admission by different machine learning techniques including random forest (RF), support vector machine (SVM), and logistic regression (LR). Feature importance to COVID-19 severity were further identified.
Methods: A retrospective design was adopted in the JinYinTan Hospital from January 26 to March 28, 2020, eighty-six demographic, clinical, and laboratory features were selected with LassoCV method, Spearman's rank correlation, experts' opinions, and literature evaluation. RF, SVM, and LR were performed to predict severe COVID-19, the performance of the models was compared by the area under curve (AUC). Additionally, feature importance to COVID-19 severity were analyzed by the best performance model.
Results: A total of 287 patients were enrolled with 36.6% severe cases and 63.4% non-severe cases. The median age was 60.0 years (interquartile range: 49.0-68.0 years). Three models were established using 23 features including 1 clinical, 1 chest computed tomography (CT) and 21 laboratory features. Among three models, RF yielded better overall performance with the highest AUC of 0.970 than SVM of 0.948 and LR of 0.928, RF also achieved a favorable sensitivity of 96.7%, specificity of 69.5%, and accuracy of 84.5%. SVM had sensitivity of 93.9%, specificity of 79.0%, and accuracy of 88.5%. LR also achieved a favorable sensitivity of 92.3%, specificity of 72.3%, and accuracy of 85.2%. Additionally, chest-CT had highest importance to illness severity, and the following features were neutrophil to lymphocyte ratio, lactate dehydrogenase, and D-dimer, respectively.
Conclusions: Our results indicated that RF could be a useful predictive tool to identify patients with severe COVID-19, which may facilitate effective care and further optimize resources.
期刊介绍:
Infectious Diseases of Poverty is a peer-reviewed, open access journal that focuses on essential public health questions related to infectious diseases of poverty. It covers a wide range of topics and methods, including the biology of pathogens and vectors, diagnosis and detection, treatment and case management, epidemiology and modeling, zoonotic hosts and animal reservoirs, control strategies and implementation, new technologies, and their application.
The journal also explores the impact of transdisciplinary or multisectoral approaches on health systems, ecohealth, environmental management, and innovative technologies. It aims to provide a platform for the exchange of research and ideas that can contribute to the improvement of public health in resource-limited settings.
In summary, Infectious Diseases of Poverty aims to address the urgent challenges posed by infectious diseases in impoverished populations. By publishing high-quality research in various areas, the journal seeks to advance our understanding of these diseases and contribute to the development of effective strategies for prevention, diagnosis, and treatment.