{"title":"Comparing the Performance of Machine Learning Models in Predicting the Risk of Chronic Kidney Disease","authors":"Sina Moosavi Kashani, Sanaz Zargar Balaye Jame","doi":"10.5812/jamm-140885","DOIUrl":null,"url":null,"abstract":"Background: Chronic kidney disease (CKD) poses a significant health burden worldwide, affecting approximately 10 - 15% of the global population. As one of the leading non-communicable diseases, CKD is a major cause of morbidity and mortality. Early identification of CKD is crucial for reducing its adverse effects on patient health. Prompt detection can significantly lessen the harmful consequences and enhance health outcomes for individuals with CKD. Objectives: This study aimed to evaluate and compare the effectiveness of various machine learning models in predicting the occurrence of CKD. Methods: The study involved the collection of data from a sample of 400 patients. We applied the well-established cross-industry standard process (CRISP) methodology for data mining to analyze the data. As part of this process, we efficiently handled missing data using the mode approach and addressed outliers through the interquartile range (IQR) method. We utilized sophisticated techniques, such as CatBoost (CB), random forest (RF), and artificial neural network (ANN) models to predict outcomes. For evaluation, we used the receiver operating characteristic (ROC) curve and calculated the area under the curve (AUC). Results: An analysis of 400 patient records in this study identified that variables like serum creatinine, packed cell volume, specific gravity, and hemoglobin were most influential in predicting CKD. The results indicated that the CB and RF models surpassed the ANN in predicting the disease. Ten critical predictors were pinpointed for accurate disease prediction. Conclusions: The ensemble models in this study not only showcased remarkable speed but also demonstrated superior accuracy. These findings suggest the potential of ensemble models as an effective tool for enhancing predictive performance in similar studies.","PeriodicalId":15058,"journal":{"name":"Journal of Archives in Military Medicine","volume":"2 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Archives in Military Medicine","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5812/jamm-140885","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Chronic kidney disease (CKD) poses a significant health burden worldwide, affecting approximately 10 - 15% of the global population. As one of the leading non-communicable diseases, CKD is a major cause of morbidity and mortality. Early identification of CKD is crucial for reducing its adverse effects on patient health. Prompt detection can significantly lessen the harmful consequences and enhance health outcomes for individuals with CKD. Objectives: This study aimed to evaluate and compare the effectiveness of various machine learning models in predicting the occurrence of CKD. Methods: The study involved the collection of data from a sample of 400 patients. We applied the well-established cross-industry standard process (CRISP) methodology for data mining to analyze the data. As part of this process, we efficiently handled missing data using the mode approach and addressed outliers through the interquartile range (IQR) method. We utilized sophisticated techniques, such as CatBoost (CB), random forest (RF), and artificial neural network (ANN) models to predict outcomes. For evaluation, we used the receiver operating characteristic (ROC) curve and calculated the area under the curve (AUC). Results: An analysis of 400 patient records in this study identified that variables like serum creatinine, packed cell volume, specific gravity, and hemoglobin were most influential in predicting CKD. The results indicated that the CB and RF models surpassed the ANN in predicting the disease. Ten critical predictors were pinpointed for accurate disease prediction. Conclusions: The ensemble models in this study not only showcased remarkable speed but also demonstrated superior accuracy. These findings suggest the potential of ensemble models as an effective tool for enhancing predictive performance in similar studies.