Jayson A. Sabejon, Jeyhozaphat B. Rejas, Gernel S. Lumacad, Reymund L. Zarate, Edwin Anthony D. Mendez, Frances Marie Lynn O. Tinoy
{"title":"XGBoost–Based Analysis of the Early–Stage Diabetes Risk Dataset","authors":"Jayson A. Sabejon, Jeyhozaphat B. Rejas, Gernel S. Lumacad, Reymund L. Zarate, Edwin Anthony D. Mendez, Frances Marie Lynn O. Tinoy","doi":"10.1109/APSIT58554.2023.10201658","DOIUrl":null,"url":null,"abstract":"Diabetes is a metabolic condition caused by either a lack of insulin production from the pancreas or insufficient utilization of insulin by the body. It is among the most prevalent diseases without a known cure, however, survival can be increased with timely detection. This study discussed the utilization of an ensemble learning method called extreme gradient boosting (XGBoost) algorithm for analyzing the early-stage diabetes risk dataset. First, a predictive model is formulated using the XGBoost algorithm in classifying a positive or negative diabetes case. Second, a feature importance analysis is implemented to measure the relative importance of each input feature in the dataset. Lastly, an XGBoost decision tree structure is generated illustrating set conditions of a negative or positive diabetes case. Experimental result showed that the formulated predictive model (accuracy = 0.9903, kappa coefficient = 0.9797, f-score = 0.990) outperformed the methods discussed in previous literatures. The feature importance analysis revealed that the ‘age’ variable has the highest relative score for early-stage diabetes risk prediction. This result confirms previous findings that age often does influence diabetes, since increased insulin resistance and impaired pancreatic islet function is associated with aging. In the latter part of this paper, the XGBoost decision tree model provided 13 different decision rules for early-stage diabetes risk prediction.","PeriodicalId":170044,"journal":{"name":"2023 International Conference in Advances in Power, Signal, and Information Technology (APSIT)","volume":"97 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Conference in Advances in Power, Signal, and Information Technology (APSIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/APSIT58554.2023.10201658","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Diabetes is a metabolic condition caused by either a lack of insulin production from the pancreas or insufficient utilization of insulin by the body. It is among the most prevalent diseases without a known cure, however, survival can be increased with timely detection. This study discussed the utilization of an ensemble learning method called extreme gradient boosting (XGBoost) algorithm for analyzing the early-stage diabetes risk dataset. First, a predictive model is formulated using the XGBoost algorithm in classifying a positive or negative diabetes case. Second, a feature importance analysis is implemented to measure the relative importance of each input feature in the dataset. Lastly, an XGBoost decision tree structure is generated illustrating set conditions of a negative or positive diabetes case. Experimental result showed that the formulated predictive model (accuracy = 0.9903, kappa coefficient = 0.9797, f-score = 0.990) outperformed the methods discussed in previous literatures. The feature importance analysis revealed that the ‘age’ variable has the highest relative score for early-stage diabetes risk prediction. This result confirms previous findings that age often does influence diabetes, since increased insulin resistance and impaired pancreatic islet function is associated with aging. In the latter part of this paper, the XGBoost decision tree model provided 13 different decision rules for early-stage diabetes risk prediction.