Support Vector Machine Outperforms Other Machine Learning Models in Early Diagnosis of Dengue Using Routine Clinical Data.

IF 1.1 Q4 VIROLOGY Advances in Virology Pub Date : 2024-10-14 eCollection Date: 2024-01-01 DOI:10.1155/2024/5588127

Ariba Qaiser, Sobia Manzoor, Asraf Hussain Hashmi, Hasnain Javed, Anam Zafar, Javed Ashraf

{"title":"Support Vector Machine Outperforms Other Machine Learning Models in Early Diagnosis of Dengue Using Routine Clinical Data.","authors":"Ariba Qaiser, Sobia Manzoor, Asraf Hussain Hashmi, Hasnain Javed, Anam Zafar, Javed Ashraf","doi":"10.1155/2024/5588127","DOIUrl":null,"url":null,"abstract":"Background: There is a dire need for the establishment of active dengue surveillance to continuously detect cases, circulating serotypes, and determine the disease burden of dengue fever (DF) in the country and region. Predicting dengue PCR results using machine learning (ML) models represents a significant advancement in pre-emptive healthcare measures. This study outlines the comprehensive process of data preprocessing, model selection, and the underlying mechanisms of each algorithm employed to accurately predict dengue PCR outcomes. Methods: We analyzed data from 300 suspected dengue patients in Islamabad and Rawalpindi, Pakistan, from August to October 2023. NS1 antigen ELISA, IgM and IgG antibody tests, and serotype-specific real-time polymerase chain reaction (RT-PCR) were used to detect the dengue virus (DENV). Representative PCR-positive samples were sequenced by Sanger sequencing to confirm the circulation of various dengue serotypes. Demographic information, serological test results, and hematological parameters were used as inputs to the ML models, with the dengue PCR result serving as the output to be predicted. The models used were logistic regression, XGBoost, LightGBM, random forest, support vector machine (SVM), and CatBoost. Results: Of the 300 patients, 184 (61.33%) were PCR positive. Among the total positive cases detected by PCR, 9 (4.89%), 171 (92.93%), and 4 (2.17%) were infected with serotypes 1, 2, and 3, respectively. A total of 147 (79.89%) males and 37 (20.11%) females were infected, with a mean age of 33 ± 16 years. In addition, the mean platelet and leukocyte counts and the hematocrit percentages were 75,447%, 4189.02%, and 46.05%, respectively. The SVM was the best-performing ML model for predicting RT-PCR results, with 71.4% accuracy, 97.4% recall, and 71.6% precision. Hyperparameter tuning improved the recall to 100%. Conclusion: Our study documents three circulating serotypes in the capital territory of Pakistan and highlights that the SVM outperformed other models, potentially serving as a valuable tool in clinical settings to aid in the rapid diagnosis of DF.","PeriodicalId":7473,"journal":{"name":"Advances in Virology","volume":"2024 ","pages":"5588127"},"PeriodicalIF":1.1000,"publicationDate":"2024-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11493476/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advances in Virology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1155/2024/5588127","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q4","JCRName":"VIROLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Background: There is a dire need for the establishment of active dengue surveillance to continuously detect cases, circulating serotypes, and determine the disease burden of dengue fever (DF) in the country and region. Predicting dengue PCR results using machine learning (ML) models represents a significant advancement in pre-emptive healthcare measures. This study outlines the comprehensive process of data preprocessing, model selection, and the underlying mechanisms of each algorithm employed to accurately predict dengue PCR outcomes. Methods: We analyzed data from 300 suspected dengue patients in Islamabad and Rawalpindi, Pakistan, from August to October 2023. NS1 antigen ELISA, IgM and IgG antibody tests, and serotype-specific real-time polymerase chain reaction (RT-PCR) were used to detect the dengue virus (DENV). Representative PCR-positive samples were sequenced by Sanger sequencing to confirm the circulation of various dengue serotypes. Demographic information, serological test results, and hematological parameters were used as inputs to the ML models, with the dengue PCR result serving as the output to be predicted. The models used were logistic regression, XGBoost, LightGBM, random forest, support vector machine (SVM), and CatBoost. Results: Of the 300 patients, 184 (61.33%) were PCR positive. Among the total positive cases detected by PCR, 9 (4.89%), 171 (92.93%), and 4 (2.17%) were infected with serotypes 1, 2, and 3, respectively. A total of 147 (79.89%) males and 37 (20.11%) females were infected, with a mean age of 33 ± 16 years. In addition, the mean platelet and leukocyte counts and the hematocrit percentages were 75,447%, 4189.02%, and 46.05%, respectively. The SVM was the best-performing ML model for predicting RT-PCR results, with 71.4% accuracy, 97.4% recall, and 71.6% precision. Hyperparameter tuning improved the recall to 100%. Conclusion: Our study documents three circulating serotypes in the capital territory of Pakistan and highlights that the SVM outperformed other models, potentially serving as a valuable tool in clinical settings to aid in the rapid diagnosis of DF.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

支持向量机在利用常规临床数据进行登革热早期诊断方面的表现优于其他机器学习模型

背景：目前迫切需要建立积极的登革热监测机制，以持续检测登革热病例、流行血清型，并确定国家和地区的登革热疾病负担。使用机器学习（ML）模型预测登革热 PCR 结果是预防性医疗保健措施的一大进步。本研究概述了数据预处理、模型选择的综合过程，以及准确预测登革热 PCR 结果所采用的每种算法的基本机制。研究方法我们分析了 2023 年 8 月至 10 月巴基斯坦伊斯兰堡和拉瓦尔品第 300 名登革热疑似患者的数据。采用 NS1 抗原酶联免疫吸附试验、IgM 和 IgG 抗体检测以及血清型特异性实时聚合酶链反应 (RT-PCR) 检测登革热病毒 (DENV)。对具有代表性的 PCR 阳性样本进行了桑格测序，以确认各种登革热血清型的流行情况。人口统计学信息、血清学检测结果和血液学参数被用作 ML 模型的输入，登革热 PCR 结果作为预测输出。使用的模型包括逻辑回归、XGBoost、LightGBM、随机森林、支持向量机（SVM）和 CatBoost。结果：在 300 例患者中，184 例（61.33%）为 PCR 阳性。在 PCR 检测出的所有阳性病例中，分别有 9 例（4.89%）、171 例（92.93%）和 4 例（2.17%）感染了血清型 1、2 和 3。男性感染者共 147 人（79.89%），女性 37 人（20.11%），平均年龄为 33 ± 16 岁。此外，平均血小板和白细胞计数以及血细胞比容百分比分别为 75447%、4189.02% 和 46.05%。SVM 是预测 RT-PCR 结果表现最好的 ML 模型，准确率为 71.4%，召回率为 97.4%，精确率为 71.6%。超参数调整将召回率提高到了 100%。结论我们的研究记录了巴基斯坦首都地区三种流行的血清型，并强调 SVM 的表现优于其他模型，有可能成为临床环境中帮助快速诊断 DF 的重要工具。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊