应用数据挖掘技术对疑似丙型肝炎病毒感染患者进行分类

IF 4.4 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Intelligent medicine Pub Date : 2022-11-01 DOI:10.1016/j.imed.2021.12.003

Reza Safdari , Amir Deghatipour , Marsa Gholamzadeh , Keivan Maghooli

{"title":"应用数据挖掘技术对疑似丙型肝炎病毒感染患者进行分类","authors":"Reza Safdari , Amir Deghatipour , Marsa Gholamzadeh , Keivan Maghooli","doi":"10.1016/j.imed.2021.12.003","DOIUrl":null,"url":null,"abstract":"<div><h3><em><strong>Background</strong></em></h3><p>Hepatitis C virus (HCV) has a high prevalence worldwide, and the progression of the disease can cause irreversible damage to severe liver damage or even death. Therefore, developing prediction models using machine learning techniques is beneficial. This study was conducted to classify suspected patients with HCV infection using different classification models.</p></div><div><h3><em><strong>Methods</strong></em></h3><p>The study was conducted using a dataset derived from the University of California, Irvine (UCI) Machine Learning Repository. Since the HCV dataset was imbalanced, the synthetic minority oversampling technique (SMOTE) was applied to balance the dataset. After cleaning the dataset, it was divided into training and test data for developing six classification models. These six algorithms included the support vector machine (SVM), Gaussian Naïve Bayes (NB), decision tree (DT), random forest (RF), logistic regression (LR), and K-nearest neighbors (KNN) algorithm. The Python programming language was used to develop the classifiers. Receiver operating characteristic curve analysis and other metrics were used to evaluate the performance of the proposed models.</p></div><div><h3><em><strong>Results</strong></em></h3><p>After the evaluation of the models using different metrics, the RF classifier had the best performance among the six methods. The accuracy of the RF classifier was 97.29%. Accordingly, the area under the curve (AUC) for LR, KNN, DT, SVM, Gaussian NB, and RF models were 0.921, 0.963, 0.953, 0.972, 0.896, and 0.998, respectively, RF showing the best predictive performance.</p></div><div><h3><em><strong>Conclusion</strong></em></h3><p>Various machine learning techniques for classifying healthy and unhealthy patients were used in this study. Additionally, the developed models might identify the stage of HCV based on trained data.</p></div>","PeriodicalId":73400,"journal":{"name":"Intelligent medicine","volume":"2 4","pages":"Pages 193-198"},"PeriodicalIF":4.4000,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S266710262200002X/pdfft?md5=3cfd2b4dfcc0a2de358d480f072ee672&pid=1-s2.0-S266710262200002X-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Applying data mining techniques to classify patients with suspected hepatitis C virus infection\",\"authors\":\"Reza Safdari , Amir Deghatipour , Marsa Gholamzadeh , Keivan Maghooli\",\"doi\":\"10.1016/j.imed.2021.12.003\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3><em><strong>Background</strong></em></h3><p>Hepatitis C virus (HCV) has a high prevalence worldwide, and the progression of the disease can cause irreversible damage to severe liver damage or even death. Therefore, developing prediction models using machine learning techniques is beneficial. This study was conducted to classify suspected patients with HCV infection using different classification models.</p></div><div><h3><em><strong>Methods</strong></em></h3><p>The study was conducted using a dataset derived from the University of California, Irvine (UCI) Machine Learning Repository. Since the HCV dataset was imbalanced, the synthetic minority oversampling technique (SMOTE) was applied to balance the dataset. After cleaning the dataset, it was divided into training and test data for developing six classification models. These six algorithms included the support vector machine (SVM), Gaussian Naïve Bayes (NB), decision tree (DT), random forest (RF), logistic regression (LR), and K-nearest neighbors (KNN) algorithm. The Python programming language was used to develop the classifiers. Receiver operating characteristic curve analysis and other metrics were used to evaluate the performance of the proposed models.</p></div><div><h3><em><strong>Results</strong></em></h3><p>After the evaluation of the models using different metrics, the RF classifier had the best performance among the six methods. The accuracy of the RF classifier was 97.29%. Accordingly, the area under the curve (AUC) for LR, KNN, DT, SVM, Gaussian NB, and RF models were 0.921, 0.963, 0.953, 0.972, 0.896, and 0.998, respectively, RF showing the best predictive performance.</p></div><div><h3><em><strong>Conclusion</strong></em></h3><p>Various machine learning techniques for classifying healthy and unhealthy patients were used in this study. Additionally, the developed models might identify the stage of HCV based on trained data.</p></div>\",\"PeriodicalId\":73400,\"journal\":{\"name\":\"Intelligent medicine\",\"volume\":\"2 4\",\"pages\":\"Pages 193-198\"},\"PeriodicalIF\":4.4000,\"publicationDate\":\"2022-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S266710262200002X/pdfft?md5=3cfd2b4dfcc0a2de358d480f072ee672&pid=1-s2.0-S266710262200002X-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Intelligent medicine\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S266710262200002X\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Intelligent medicine","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S266710262200002X","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

摘要

丙型肝炎病毒(HCV)在世界范围内具有很高的患病率，疾病的进展可导致严重肝损伤甚至死亡的不可逆损害。因此，使用机器学习技术开发预测模型是有益的。本研究采用不同的分类模型对疑似HCV感染患者进行分类。方法本研究使用来自加州大学欧文分校(UCI)机器学习存储库的数据集进行。针对HCV数据不平衡的特点，采用合成少数派过采样技术(SMOTE)对数据进行平衡。对数据集进行清洗后，将其分为训练数据和测试数据，开发6个分类模型。这六种算法包括支持向量机(SVM)、高斯Naïve贝叶斯(NB)、决策树(DT)、随机森林(RF)、逻辑回归(LR)和k近邻(KNN)算法。使用Python编程语言开发分类器。使用受试者工作特征曲线分析和其他指标来评估所提出模型的性能。结果采用不同的指标对模型进行评价后，射频分类器在6种方法中表现最好。射频分类器的准确率为97.29%。因此，LR、KNN、DT、SVM、高斯NB和RF模型的曲线下面积(AUC)分别为0.921、0.963、0.953、0.972、0.896和0.998，其中RF模型的预测效果最好。结论本研究采用了多种机器学习技术对健康和不健康患者进行分类。此外，开发的模型可以根据训练数据确定HCV的阶段。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Applying data mining techniques to classify patients with suspected hepatitis C virus infection

Background

Hepatitis C virus (HCV) has a high prevalence worldwide, and the progression of the disease can cause irreversible damage to severe liver damage or even death. Therefore, developing prediction models using machine learning techniques is beneficial. This study was conducted to classify suspected patients with HCV infection using different classification models.

Methods

The study was conducted using a dataset derived from the University of California, Irvine (UCI) Machine Learning Repository. Since the HCV dataset was imbalanced, the synthetic minority oversampling technique (SMOTE) was applied to balance the dataset. After cleaning the dataset, it was divided into training and test data for developing six classification models. These six algorithms included the support vector machine (SVM), Gaussian Naïve Bayes (NB), decision tree (DT), random forest (RF), logistic regression (LR), and K-nearest neighbors (KNN) algorithm. The Python programming language was used to develop the classifiers. Receiver operating characteristic curve analysis and other metrics were used to evaluate the performance of the proposed models.

Results

After the evaluation of the models using different metrics, the RF classifier had the best performance among the six methods. The accuracy of the RF classifier was 97.29%. Accordingly, the area under the curve (AUC) for LR, KNN, DT, SVM, Gaussian NB, and RF models were 0.921, 0.963, 0.953, 0.972, 0.896, and 0.998, respectively, RF showing the best predictive performance.

Conclusion

Various machine learning techniques for classifying healthy and unhealthy patients were used in this study. Additionally, the developed models might identify the stage of HCV based on trained data.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Intelligent medicine Surgery, Radiology and Imaging, Artificial Intelligence, Biomedical Engineering

CiteScore

5.20

自引率

0.00%

发文量