针对 COVID-19 患者数据的判别分析与支持向量机在混合分类和连续自变量上的比较

Scientific Journal of Informatics Pub Date : 2024-02-29 DOI:10.15294/sji.v11i1.48565

Husnul Aris Haikal, A. Wigena, Kusman Sadik, Efriwati Efriwati

{"title":"针对 COVID-19 患者数据的判别分析与支持向量机在混合分类和连续自变量上的比较","authors":"Husnul Aris Haikal, A. Wigena, Kusman Sadik, Efriwati Efriwati","doi":"10.15294/sji.v11i1.48565","DOIUrl":null,"url":null,"abstract":"Purpose: Numerous factors can affect the duration of COVID-19 recovery. One method involves utilizing natural herbal medication. This study seeks to determine the variables influencing the duration of COVID-19 recovery and to compare discriminant analysis and support vector machine models using COVID-19 patient data from West Sumatra.Methods: Two data mining methods, Discriminant Analysis and Support Vector Machine with different types of kernels (linear, polynomial, and radial basis function), were employed to categorize the time of COVID-19 recovery in this work. The study utilized 428 data points, with 75% allocated for training data and 25% for testing data. The independent factors were evaluated by determining the selection variables' information value (IV) to gauge their influence on the dependent variable. Data resampling techniques were employed to tackle the problem of data imbalance. This study employs data resampling techniques, including undersampling, oversampling, and SMOTE. The balancing accuracy of Discriminant Analysis and Support Vector Machine was examined.Result: The Discriminant Analysis with SMOTE achieved a balanced accuracy of 66.50%, outperforming the linear kernel Support Vector Machine with SMOTE, which had a balanced accuracy of 63.20% in this dataset.Novelty: This study assessed the novelty, originality, and value by comparing Discriminant Analysis and SVM algorithms with categorical and continuous independent variables. This research explores techniques for managing imbalanced data using undersampling, oversampling, and SMOTE, with variable selection based on information value assessment. ","PeriodicalId":30781,"journal":{"name":"Scientific Journal of Informatics","volume":"109 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-02-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comparison of Discriminant Analysis and Support Vector Machine on Mixed Categorical and Continuous Independent Variables for COVID-19 Patients Data\",\"authors\":\"Husnul Aris Haikal, A. Wigena, Kusman Sadik, Efriwati Efriwati\",\"doi\":\"10.15294/sji.v11i1.48565\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Purpose: Numerous factors can affect the duration of COVID-19 recovery. One method involves utilizing natural herbal medication. This study seeks to determine the variables influencing the duration of COVID-19 recovery and to compare discriminant analysis and support vector machine models using COVID-19 patient data from West Sumatra.Methods: Two data mining methods, Discriminant Analysis and Support Vector Machine with different types of kernels (linear, polynomial, and radial basis function), were employed to categorize the time of COVID-19 recovery in this work. The study utilized 428 data points, with 75% allocated for training data and 25% for testing data. The independent factors were evaluated by determining the selection variables' information value (IV) to gauge their influence on the dependent variable. Data resampling techniques were employed to tackle the problem of data imbalance. This study employs data resampling techniques, including undersampling, oversampling, and SMOTE. The balancing accuracy of Discriminant Analysis and Support Vector Machine was examined.Result: The Discriminant Analysis with SMOTE achieved a balanced accuracy of 66.50%, outperforming the linear kernel Support Vector Machine with SMOTE, which had a balanced accuracy of 63.20% in this dataset.Novelty: This study assessed the novelty, originality, and value by comparing Discriminant Analysis and SVM algorithms with categorical and continuous independent variables. This research explores techniques for managing imbalanced data using undersampling, oversampling, and SMOTE, with variable selection based on information value assessment. \",\"PeriodicalId\":30781,\"journal\":{\"name\":\"Scientific Journal of Informatics\",\"volume\":\"109 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-02-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Scientific Journal of Informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.15294/sji.v11i1.48565\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Scientific Journal of Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.15294/sji.v11i1.48565","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

目的：许多因素都会影响 COVID-19 的恢复时间。其中一种方法是使用天然草药。本研究旨在确定影响 COVID-19 康复持续时间的变量，并使用西苏门答腊的 COVID-19 患者数据比较判别分析和支持向量机模型：本研究采用了两种数据挖掘方法--判别分析和支持向量机，并使用了不同类型的核（线性、多项式和径向基函数）来对 COVID-19 的恢复时间进行分类。研究使用了 428 个数据点，其中 75% 用于训练数据，25% 用于测试数据。通过确定选择变量的信息值（IV）来评估独立因素，以衡量它们对因变量的影响。为解决数据不平衡问题，采用了数据重采样技术。本研究采用了数据重采样技术，包括欠采样、超采样和 SMOTE。对判别分析和支持向量机的平衡精度进行了检验：新颖性：本研究通过比较使用分类和连续自变量的判别分析算法和 SVM 算法，评估了其新颖性、原创性和价值。本研究探索了使用欠采样、过采样和 SMOTE 管理不平衡数据的技术，并根据信息价值评估选择变量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Comparison of Discriminant Analysis and Support Vector Machine on Mixed Categorical and Continuous Independent Variables for COVID-19 Patients Data

Purpose: Numerous factors can affect the duration of COVID-19 recovery. One method involves utilizing natural herbal medication. This study seeks to determine the variables influencing the duration of COVID-19 recovery and to compare discriminant analysis and support vector machine models using COVID-19 patient data from West Sumatra.Methods: Two data mining methods, Discriminant Analysis and Support Vector Machine with different types of kernels (linear, polynomial, and radial basis function), were employed to categorize the time of COVID-19 recovery in this work. The study utilized 428 data points, with 75% allocated for training data and 25% for testing data. The independent factors were evaluated by determining the selection variables' information value (IV) to gauge their influence on the dependent variable. Data resampling techniques were employed to tackle the problem of data imbalance. This study employs data resampling techniques, including undersampling, oversampling, and SMOTE. The balancing accuracy of Discriminant Analysis and Support Vector Machine was examined.Result: The Discriminant Analysis with SMOTE achieved a balanced accuracy of 66.50%, outperforming the linear kernel Support Vector Machine with SMOTE, which had a balanced accuracy of 63.20% in this dataset.Novelty: This study assessed the novelty, originality, and value by comparing Discriminant Analysis and SVM algorithms with categorical and continuous independent variables. This research explores techniques for managing imbalanced data using undersampling, oversampling, and SMOTE, with variable selection based on information value assessment.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Scientific Journal of Informatics

自引率

0.00%

发文量

审稿时长

24 weeks