{"title":"Support Vector Machine Outperforms Other Machine Learning Models in Early Diagnosis of Dengue Using Routine Clinical Data.","authors":"Ariba Qaiser, Sobia Manzoor, Asraf Hussain Hashmi, Hasnain Javed, Anam Zafar, Javed Ashraf","doi":"10.1155/2024/5588127","DOIUrl":null,"url":null,"abstract":"<p><p><b>Background:</b> There is a dire need for the establishment of active dengue surveillance to continuously detect cases, circulating serotypes, and determine the disease burden of dengue fever (DF) in the country and region. Predicting dengue PCR results using machine learning (ML) models represents a significant advancement in pre-emptive healthcare measures. This study outlines the comprehensive process of data preprocessing, model selection, and the underlying mechanisms of each algorithm employed to accurately predict dengue PCR outcomes. <b>Methods:</b> We analyzed data from 300 suspected dengue patients in Islamabad and Rawalpindi, Pakistan, from August to October 2023. NS1 antigen ELISA, IgM and IgG antibody tests, and serotype-specific real-time polymerase chain reaction (RT-PCR) were used to detect the dengue virus (DENV). Representative PCR-positive samples were sequenced by Sanger sequencing to confirm the circulation of various dengue serotypes. Demographic information, serological test results, and hematological parameters were used as inputs to the ML models, with the dengue PCR result serving as the output to be predicted. The models used were logistic regression, XGBoost, LightGBM, random forest, support vector machine (SVM), and CatBoost. <b>Results:</b> Of the 300 patients, 184 (61.33%) were PCR positive. Among the total positive cases detected by PCR, 9 (4.89%), 171 (92.93%), and 4 (2.17%) were infected with serotypes 1, 2, and 3, respectively. A total of 147 (79.89%) males and 37 (20.11%) females were infected, with a mean age of 33 ± 16 years. In addition, the mean platelet and leukocyte counts and the hematocrit percentages were 75,447%, 4189.02%, and 46.05%, respectively. The SVM was the best-performing ML model for predicting RT-PCR results, with 71.4% accuracy, 97.4% recall, and 71.6% precision. Hyperparameter tuning improved the recall to 100%. <b>Conclusion:</b> Our study documents three circulating serotypes in the capital territory of Pakistan and highlights that the SVM outperformed other models, potentially serving as a valuable tool in clinical settings to aid in the rapid diagnosis of DF.</p>","PeriodicalId":7473,"journal":{"name":"Advances in Virology","volume":null,"pages":null},"PeriodicalIF":1.1000,"publicationDate":"2024-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11493476/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advances in Virology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1155/2024/5588127","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q4","JCRName":"VIROLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Background: There is a dire need for the establishment of active dengue surveillance to continuously detect cases, circulating serotypes, and determine the disease burden of dengue fever (DF) in the country and region. Predicting dengue PCR results using machine learning (ML) models represents a significant advancement in pre-emptive healthcare measures. This study outlines the comprehensive process of data preprocessing, model selection, and the underlying mechanisms of each algorithm employed to accurately predict dengue PCR outcomes. Methods: We analyzed data from 300 suspected dengue patients in Islamabad and Rawalpindi, Pakistan, from August to October 2023. NS1 antigen ELISA, IgM and IgG antibody tests, and serotype-specific real-time polymerase chain reaction (RT-PCR) were used to detect the dengue virus (DENV). Representative PCR-positive samples were sequenced by Sanger sequencing to confirm the circulation of various dengue serotypes. Demographic information, serological test results, and hematological parameters were used as inputs to the ML models, with the dengue PCR result serving as the output to be predicted. The models used were logistic regression, XGBoost, LightGBM, random forest, support vector machine (SVM), and CatBoost. Results: Of the 300 patients, 184 (61.33%) were PCR positive. Among the total positive cases detected by PCR, 9 (4.89%), 171 (92.93%), and 4 (2.17%) were infected with serotypes 1, 2, and 3, respectively. A total of 147 (79.89%) males and 37 (20.11%) females were infected, with a mean age of 33 ± 16 years. In addition, the mean platelet and leukocyte counts and the hematocrit percentages were 75,447%, 4189.02%, and 46.05%, respectively. The SVM was the best-performing ML model for predicting RT-PCR results, with 71.4% accuracy, 97.4% recall, and 71.6% precision. Hyperparameter tuning improved the recall to 100%. Conclusion: Our study documents three circulating serotypes in the capital territory of Pakistan and highlights that the SVM outperformed other models, potentially serving as a valuable tool in clinical settings to aid in the rapid diagnosis of DF.