{"title":"Application of Logistic Regression Model in Prediction of Early Diabetes Across United States","authors":"I.Olufemi, C.Obunadike, A. Adefabi, D. Abimbola","doi":"10.37502/ijsmr.2023.6502","DOIUrl":null,"url":null,"abstract":"This study examines a case study and impact of predicting early diabetes in United States through the application of Logistic Regression Model. After comparing the predictive ability of machine learning algorithm (Binomial Logistic Model) to diabetes, the important features that causes diabetes were also studied. We predict the test data based on the important variables and compute the prediction accuracy using the Receiver Operating Characteristic (ROC) curve and Area Under Curve (AUC). From the correlation coefficient analysis, we can deduce that, out of the 16 PIE variables, only “Itching and Delayed healing” were statistically insignificant with the target variable (class) with a value of 83% and 33% respectively while “Alopecia and Gender/Sex” has a negative correlation with the target variable (class). In addition, the Lasso Regularization method was used to penalize our logistic regression model, and it was observed that the predictor variable “sudden_weight_loss” does not appear to be statistically significant in the model and the predictor variables “Polyuria and Polydipsa” contributed most to the prediction of Class \"Positive\" based on their parameter values and odd ratios. Since the confidence interval of our model falls between 93% and 99%, we are 95% confident that our AUC is accurate and thus, it indicates that our fitted model can predict diabetes status correctly.","PeriodicalId":14213,"journal":{"name":"International Journal of Scientific and Management Research","volume":"214 ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Scientific and Management Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.37502/ijsmr.2023.6502","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
This study examines a case study and impact of predicting early diabetes in United States through the application of Logistic Regression Model. After comparing the predictive ability of machine learning algorithm (Binomial Logistic Model) to diabetes, the important features that causes diabetes were also studied. We predict the test data based on the important variables and compute the prediction accuracy using the Receiver Operating Characteristic (ROC) curve and Area Under Curve (AUC). From the correlation coefficient analysis, we can deduce that, out of the 16 PIE variables, only “Itching and Delayed healing” were statistically insignificant with the target variable (class) with a value of 83% and 33% respectively while “Alopecia and Gender/Sex” has a negative correlation with the target variable (class). In addition, the Lasso Regularization method was used to penalize our logistic regression model, and it was observed that the predictor variable “sudden_weight_loss” does not appear to be statistically significant in the model and the predictor variables “Polyuria and Polydipsa” contributed most to the prediction of Class "Positive" based on their parameter values and odd ratios. Since the confidence interval of our model falls between 93% and 99%, we are 95% confident that our AUC is accurate and thus, it indicates that our fitted model can predict diabetes status correctly.