利用症状数据诊断多种慢性病的机器学习模型性能评估

IF 0.6 Q4 AUTOMATION & CONTROL SYSTEMS AUTOMATIC CONTROL AND COMPUTER SCIENCES Pub Date : 2024-05-06 DOI:10.3103/S0146411624700093
Kulvinder Singh, Sanjeev Dhawan, Deepanshu Mehla
{"title":"利用症状数据诊断多种慢性病的机器学习模型性能评估","authors":"Kulvinder Singh,&nbsp;Sanjeev Dhawan,&nbsp;Deepanshu Mehla","doi":"10.3103/S0146411624700093","DOIUrl":null,"url":null,"abstract":"<p>An on-time and accurate analysis of the problem is essential to prevent and treat any illness. The utilization of machine learning (ML) for diagnosing a wide range of diseases is increasingly prevalent in the field of medical science based on symptoms experienced during diseases. The main objective of the research is to make a comparative analysis of different ML models that accurately predicts diseases based on symptoms. To do so, the dataset obtained from Kaggle comprises information related to 41 diseases including their symptoms which are in 17 columns with their weights. In other words, we have a group of 17 symptoms, independent variables (symptoms differ for each patient except some), and 1 target variable (disease). Furthermore, preprocessing is applied to data to make it suitable for the various machine learning approaches. After that, three scaling techniques are used: standard scaling, min-max, and PCA (principal component analysis) for normalization. The present study utilized a variety of ML models, which includes LGB classifier, KNN, random forest (RF), CatBoost, support vector machine (SVM), XGBoost, and a hybrid model that combined two existing approaches (SVM and XGBoost). Each scaling technique was assessed using various evaluative parameters such as root mean squared error (RMSE), cross-validation score, R2 score, mean squared error and accuracy. Random forest, LGB classifier, and XGBoost demonstrated superior performance when compared and evaluated to one another with regards to accuracy, R2 score, and RMSE, achieving scores of 98, 96, and 2.08% respectively. Also, the RF algorithm required less computation time in contrast to other scaling techniques, particularly in standard scaling, with a time of only 0.129 s.</p>","PeriodicalId":46238,"journal":{"name":"AUTOMATIC CONTROL AND COMPUTER SCIENCES","volume":"58 2","pages":"195 - 208"},"PeriodicalIF":0.6000,"publicationDate":"2024-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Performance Evaluation of Machine Learning Models for Multiple Chronic Disease Diagnosis Using Symptom Data\",\"authors\":\"Kulvinder Singh,&nbsp;Sanjeev Dhawan,&nbsp;Deepanshu Mehla\",\"doi\":\"10.3103/S0146411624700093\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>An on-time and accurate analysis of the problem is essential to prevent and treat any illness. The utilization of machine learning (ML) for diagnosing a wide range of diseases is increasingly prevalent in the field of medical science based on symptoms experienced during diseases. The main objective of the research is to make a comparative analysis of different ML models that accurately predicts diseases based on symptoms. To do so, the dataset obtained from Kaggle comprises information related to 41 diseases including their symptoms which are in 17 columns with their weights. In other words, we have a group of 17 symptoms, independent variables (symptoms differ for each patient except some), and 1 target variable (disease). Furthermore, preprocessing is applied to data to make it suitable for the various machine learning approaches. After that, three scaling techniques are used: standard scaling, min-max, and PCA (principal component analysis) for normalization. The present study utilized a variety of ML models, which includes LGB classifier, KNN, random forest (RF), CatBoost, support vector machine (SVM), XGBoost, and a hybrid model that combined two existing approaches (SVM and XGBoost). Each scaling technique was assessed using various evaluative parameters such as root mean squared error (RMSE), cross-validation score, R2 score, mean squared error and accuracy. Random forest, LGB classifier, and XGBoost demonstrated superior performance when compared and evaluated to one another with regards to accuracy, R2 score, and RMSE, achieving scores of 98, 96, and 2.08% respectively. Also, the RF algorithm required less computation time in contrast to other scaling techniques, particularly in standard scaling, with a time of only 0.129 s.</p>\",\"PeriodicalId\":46238,\"journal\":{\"name\":\"AUTOMATIC CONTROL AND COMPUTER SCIENCES\",\"volume\":\"58 2\",\"pages\":\"195 - 208\"},\"PeriodicalIF\":0.6000,\"publicationDate\":\"2024-05-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"AUTOMATIC CONTROL AND COMPUTER SCIENCES\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://link.springer.com/article/10.3103/S0146411624700093\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"AUTOMATIC CONTROL AND COMPUTER SCIENCES","FirstCategoryId":"1085","ListUrlMain":"https://link.springer.com/article/10.3103/S0146411624700093","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0

摘要

摘要 及时准确地分析问题对于预防和治疗任何疾病都至关重要。在医学领域,利用机器学习(ML)根据疾病症状诊断各种疾病的做法日益盛行。本研究的主要目的是对根据症状准确预测疾病的不同 ML 模型进行比较分析。为此,我们从 Kaggle 获取的数据集包含 41 种疾病的相关信息,其中包括 17 列疾病症状及其权重。换句话说,我们有一组 17 个症状、自变量(除部分症状外,每个患者的症状都不同)和 1 个目标变量(疾病)。此外,我们还对数据进行了预处理,使其适合各种机器学习方法。然后,使用三种缩放技术:标准缩放、最小-最大缩放和用于归一化的 PCA(主成分分析)。本研究采用了多种 ML 模型,其中包括 LGB 分类器、KNN、随机森林 (RF)、CatBoost、支持向量机 (SVM)、XGBoost 以及结合了两种现有方法(SVM 和 XGBoost)的混合模型。每种缩放技术都使用了各种评估参数,如均方根误差 (RMSE)、交叉验证得分、R2 得分、均方误差和准确率。随机森林、LGB 分类器和 XGBoost 在准确率、R2 分数和 RMSE 方面的相互比较和评估中表现出了卓越的性能,分别达到了 98%、96% 和 2.08%。此外,与其他缩放技术相比,RF 算法所需的计算时间更短,尤其是在标准缩放中,仅需 0.129 秒。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

摘要图片

摘要图片

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Performance Evaluation of Machine Learning Models for Multiple Chronic Disease Diagnosis Using Symptom Data

An on-time and accurate analysis of the problem is essential to prevent and treat any illness. The utilization of machine learning (ML) for diagnosing a wide range of diseases is increasingly prevalent in the field of medical science based on symptoms experienced during diseases. The main objective of the research is to make a comparative analysis of different ML models that accurately predicts diseases based on symptoms. To do so, the dataset obtained from Kaggle comprises information related to 41 diseases including their symptoms which are in 17 columns with their weights. In other words, we have a group of 17 symptoms, independent variables (symptoms differ for each patient except some), and 1 target variable (disease). Furthermore, preprocessing is applied to data to make it suitable for the various machine learning approaches. After that, three scaling techniques are used: standard scaling, min-max, and PCA (principal component analysis) for normalization. The present study utilized a variety of ML models, which includes LGB classifier, KNN, random forest (RF), CatBoost, support vector machine (SVM), XGBoost, and a hybrid model that combined two existing approaches (SVM and XGBoost). Each scaling technique was assessed using various evaluative parameters such as root mean squared error (RMSE), cross-validation score, R2 score, mean squared error and accuracy. Random forest, LGB classifier, and XGBoost demonstrated superior performance when compared and evaluated to one another with regards to accuracy, R2 score, and RMSE, achieving scores of 98, 96, and 2.08% respectively. Also, the RF algorithm required less computation time in contrast to other scaling techniques, particularly in standard scaling, with a time of only 0.129 s.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
AUTOMATIC CONTROL AND COMPUTER SCIENCES
AUTOMATIC CONTROL AND COMPUTER SCIENCES AUTOMATION & CONTROL SYSTEMS-
CiteScore
1.70
自引率
22.20%
发文量
47
期刊介绍: Automatic Control and Computer Sciences is a peer reviewed journal that publishes articles on• Control systems, cyber-physical system, real-time systems, robotics, smart sensors, embedded intelligence • Network information technologies, information security, statistical methods of data processing, distributed artificial intelligence, complex systems modeling, knowledge representation, processing and management • Signal and image processing, machine learning, machine perception, computer vision
期刊最新文献
Altitude-Based Dynamics Modulation and Power Analysis in LEO Satellites A Smart LSTM for Industrial Part Conformity: SME Material-Data Based Decision-Making Erratum to: Cluster Based QOS-Routing Protocol for VANET in Highway Environment Pyramidal Sun Sensor: A Novel Sun Tracking System Solution for Single Axis Parabolic Trough Collector Template-Free Neural Representations for Novel View Synthesis of Humans
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1