应用数据挖掘技术对疑似丙型肝炎病毒感染患者进行分类

IF 4.4 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Intelligent medicine Pub Date : 2022-11-01 DOI:10.1016/j.imed.2021.12.003
Reza Safdari , Amir Deghatipour , Marsa Gholamzadeh , Keivan Maghooli
{"title":"应用数据挖掘技术对疑似丙型肝炎病毒感染患者进行分类","authors":"Reza Safdari ,&nbsp;Amir Deghatipour ,&nbsp;Marsa Gholamzadeh ,&nbsp;Keivan Maghooli","doi":"10.1016/j.imed.2021.12.003","DOIUrl":null,"url":null,"abstract":"<div><h3><em><strong>Background</strong></em></h3><p>Hepatitis C virus (HCV) has a high prevalence worldwide, and the progression of the disease can cause irreversible damage to severe liver damage or even death. Therefore, developing prediction models using machine learning techniques is beneficial. This study was conducted to classify suspected patients with HCV infection using different classification models.</p></div><div><h3><em><strong>Methods</strong></em></h3><p>The study was conducted using a dataset derived from the University of California, Irvine (UCI) Machine Learning Repository. Since the HCV dataset was imbalanced, the synthetic minority oversampling technique (SMOTE) was applied to balance the dataset. After cleaning the dataset, it was divided into training and test data for developing six classification models. These six algorithms included the support vector machine (SVM), Gaussian Naïve Bayes (NB), decision tree (DT), random forest (RF), logistic regression (LR), and K-nearest neighbors (KNN) algorithm. The Python programming language was used to develop the classifiers. Receiver operating characteristic curve analysis and other metrics were used to evaluate the performance of the proposed models.</p></div><div><h3><em><strong>Results</strong></em></h3><p>After the evaluation of the models using different metrics, the RF classifier had the best performance among the six methods. The accuracy of the RF classifier was 97.29%. Accordingly, the area under the curve (AUC) for LR, KNN, DT, SVM, Gaussian NB, and RF models were 0.921, 0.963, 0.953, 0.972, 0.896, and 0.998, respectively, RF showing the best predictive performance.</p></div><div><h3><em><strong>Conclusion</strong></em></h3><p>Various machine learning techniques for classifying healthy and unhealthy patients were used in this study. Additionally, the developed models might identify the stage of HCV based on trained data.</p></div>","PeriodicalId":73400,"journal":{"name":"Intelligent medicine","volume":"2 4","pages":"Pages 193-198"},"PeriodicalIF":4.4000,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S266710262200002X/pdfft?md5=3cfd2b4dfcc0a2de358d480f072ee672&pid=1-s2.0-S266710262200002X-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Applying data mining techniques to classify patients with suspected hepatitis C virus infection\",\"authors\":\"Reza Safdari ,&nbsp;Amir Deghatipour ,&nbsp;Marsa Gholamzadeh ,&nbsp;Keivan Maghooli\",\"doi\":\"10.1016/j.imed.2021.12.003\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3><em><strong>Background</strong></em></h3><p>Hepatitis C virus (HCV) has a high prevalence worldwide, and the progression of the disease can cause irreversible damage to severe liver damage or even death. Therefore, developing prediction models using machine learning techniques is beneficial. This study was conducted to classify suspected patients with HCV infection using different classification models.</p></div><div><h3><em><strong>Methods</strong></em></h3><p>The study was conducted using a dataset derived from the University of California, Irvine (UCI) Machine Learning Repository. Since the HCV dataset was imbalanced, the synthetic minority oversampling technique (SMOTE) was applied to balance the dataset. After cleaning the dataset, it was divided into training and test data for developing six classification models. These six algorithms included the support vector machine (SVM), Gaussian Naïve Bayes (NB), decision tree (DT), random forest (RF), logistic regression (LR), and K-nearest neighbors (KNN) algorithm. The Python programming language was used to develop the classifiers. Receiver operating characteristic curve analysis and other metrics were used to evaluate the performance of the proposed models.</p></div><div><h3><em><strong>Results</strong></em></h3><p>After the evaluation of the models using different metrics, the RF classifier had the best performance among the six methods. The accuracy of the RF classifier was 97.29%. Accordingly, the area under the curve (AUC) for LR, KNN, DT, SVM, Gaussian NB, and RF models were 0.921, 0.963, 0.953, 0.972, 0.896, and 0.998, respectively, RF showing the best predictive performance.</p></div><div><h3><em><strong>Conclusion</strong></em></h3><p>Various machine learning techniques for classifying healthy and unhealthy patients were used in this study. Additionally, the developed models might identify the stage of HCV based on trained data.</p></div>\",\"PeriodicalId\":73400,\"journal\":{\"name\":\"Intelligent medicine\",\"volume\":\"2 4\",\"pages\":\"Pages 193-198\"},\"PeriodicalIF\":4.4000,\"publicationDate\":\"2022-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S266710262200002X/pdfft?md5=3cfd2b4dfcc0a2de358d480f072ee672&pid=1-s2.0-S266710262200002X-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Intelligent medicine\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S266710262200002X\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Intelligent medicine","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S266710262200002X","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

摘要

丙型肝炎病毒(HCV)在世界范围内具有很高的患病率,疾病的进展可导致严重肝损伤甚至死亡的不可逆损害。因此,使用机器学习技术开发预测模型是有益的。本研究采用不同的分类模型对疑似HCV感染患者进行分类。方法本研究使用来自加州大学欧文分校(UCI)机器学习存储库的数据集进行。针对HCV数据不平衡的特点,采用合成少数派过采样技术(SMOTE)对数据进行平衡。对数据集进行清洗后,将其分为训练数据和测试数据,开发6个分类模型。这六种算法包括支持向量机(SVM)、高斯Naïve贝叶斯(NB)、决策树(DT)、随机森林(RF)、逻辑回归(LR)和k近邻(KNN)算法。使用Python编程语言开发分类器。使用受试者工作特征曲线分析和其他指标来评估所提出模型的性能。结果采用不同的指标对模型进行评价后,射频分类器在6种方法中表现最好。射频分类器的准确率为97.29%。因此,LR、KNN、DT、SVM、高斯NB和RF模型的曲线下面积(AUC)分别为0.921、0.963、0.953、0.972、0.896和0.998,其中RF模型的预测效果最好。结论本研究采用了多种机器学习技术对健康和不健康患者进行分类。此外,开发的模型可以根据训练数据确定HCV的阶段。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Applying data mining techniques to classify patients with suspected hepatitis C virus infection

Background

Hepatitis C virus (HCV) has a high prevalence worldwide, and the progression of the disease can cause irreversible damage to severe liver damage or even death. Therefore, developing prediction models using machine learning techniques is beneficial. This study was conducted to classify suspected patients with HCV infection using different classification models.

Methods

The study was conducted using a dataset derived from the University of California, Irvine (UCI) Machine Learning Repository. Since the HCV dataset was imbalanced, the synthetic minority oversampling technique (SMOTE) was applied to balance the dataset. After cleaning the dataset, it was divided into training and test data for developing six classification models. These six algorithms included the support vector machine (SVM), Gaussian Naïve Bayes (NB), decision tree (DT), random forest (RF), logistic regression (LR), and K-nearest neighbors (KNN) algorithm. The Python programming language was used to develop the classifiers. Receiver operating characteristic curve analysis and other metrics were used to evaluate the performance of the proposed models.

Results

After the evaluation of the models using different metrics, the RF classifier had the best performance among the six methods. The accuracy of the RF classifier was 97.29%. Accordingly, the area under the curve (AUC) for LR, KNN, DT, SVM, Gaussian NB, and RF models were 0.921, 0.963, 0.953, 0.972, 0.896, and 0.998, respectively, RF showing the best predictive performance.

Conclusion

Various machine learning techniques for classifying healthy and unhealthy patients were used in this study. Additionally, the developed models might identify the stage of HCV based on trained data.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Intelligent medicine
Intelligent medicine Surgery, Radiology and Imaging, Artificial Intelligence, Biomedical Engineering
CiteScore
5.20
自引率
0.00%
发文量
19
期刊最新文献
Impact of data balancing a multiclass dataset before the creation of association rules to study bacterial vaginosis Neuropsychological detection and prediction using machine learning algorithms: a comprehensive review Improved neurological diagnoses and treatment strategies via automated human brain tissue segmentation from clinical magnetic resonance imaging Increasing the accuracy and reproducibility of positron emission tomography radiomics for predicting pelvic lymph node metastasis in patients with cervical cancer using 3D local binary pattern-based texture features A clinical decision support system using rough set theory and machine learning for disease prediction
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1