{"title":"基于决策树和随机森林分类器的逻辑回归和规则提取的糖尿病预测","authors":"M. Bhattacharya, D. Datta","doi":"10.1109/INCET57972.2023.10170270","DOIUrl":null,"url":null,"abstract":"The research work in this manuscript is focused towards extraction of rules from decision tree classifier to predict the status of a patient suffering diabetic. Basic approach of machine learning algorithm to classify diabetic condition of a patient depends on various features such as glucose, blood pressure, insulin, skin thickness, body mass index (BMI), diabetic pedigree function and age. Decision trees are easily interpretable machine learning models as classifiers whose predictive accuracy is low. However, in comparison random forest machine learning tree ensembles show high predictive accuracy while being regarded as black-box models. In this work, we have developed an algorithm to extract decision rules from the corresponding tree in the form of human readable format (IF antecedent, THEN consequent). We have also provided logistic regression model and tree structure of random forest model to classify the diabetic condition. Experimental results of 768 women samples from PIMA Indian datasets of diabetic proves that the proposed rule extraction methodology outperform similar recently developed methods in terms of human comprehension and also limits the number of antecedents in the retained rules, while preserving the same level of accuracy. Performance of all machine learning classifier models are measured in terms of various metrics such as recall, precision, accuracy and F1-score via confusion matrix.","PeriodicalId":403008,"journal":{"name":"2023 4th International Conference for Emerging Technology (INCET)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Diabetes Prediction using Logistic Regression and Rule Extraction from Decision Tree and Random Forest Classifiers\",\"authors\":\"M. Bhattacharya, D. Datta\",\"doi\":\"10.1109/INCET57972.2023.10170270\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The research work in this manuscript is focused towards extraction of rules from decision tree classifier to predict the status of a patient suffering diabetic. Basic approach of machine learning algorithm to classify diabetic condition of a patient depends on various features such as glucose, blood pressure, insulin, skin thickness, body mass index (BMI), diabetic pedigree function and age. Decision trees are easily interpretable machine learning models as classifiers whose predictive accuracy is low. However, in comparison random forest machine learning tree ensembles show high predictive accuracy while being regarded as black-box models. In this work, we have developed an algorithm to extract decision rules from the corresponding tree in the form of human readable format (IF antecedent, THEN consequent). We have also provided logistic regression model and tree structure of random forest model to classify the diabetic condition. Experimental results of 768 women samples from PIMA Indian datasets of diabetic proves that the proposed rule extraction methodology outperform similar recently developed methods in terms of human comprehension and also limits the number of antecedents in the retained rules, while preserving the same level of accuracy. Performance of all machine learning classifier models are measured in terms of various metrics such as recall, precision, accuracy and F1-score via confusion matrix.\",\"PeriodicalId\":403008,\"journal\":{\"name\":\"2023 4th International Conference for Emerging Technology (INCET)\",\"volume\":\"10 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-05-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 4th International Conference for Emerging Technology (INCET)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/INCET57972.2023.10170270\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 4th International Conference for Emerging Technology (INCET)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INCET57972.2023.10170270","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Diabetes Prediction using Logistic Regression and Rule Extraction from Decision Tree and Random Forest Classifiers
The research work in this manuscript is focused towards extraction of rules from decision tree classifier to predict the status of a patient suffering diabetic. Basic approach of machine learning algorithm to classify diabetic condition of a patient depends on various features such as glucose, blood pressure, insulin, skin thickness, body mass index (BMI), diabetic pedigree function and age. Decision trees are easily interpretable machine learning models as classifiers whose predictive accuracy is low. However, in comparison random forest machine learning tree ensembles show high predictive accuracy while being regarded as black-box models. In this work, we have developed an algorithm to extract decision rules from the corresponding tree in the form of human readable format (IF antecedent, THEN consequent). We have also provided logistic regression model and tree structure of random forest model to classify the diabetic condition. Experimental results of 768 women samples from PIMA Indian datasets of diabetic proves that the proposed rule extraction methodology outperform similar recently developed methods in terms of human comprehension and also limits the number of antecedents in the retained rules, while preserving the same level of accuracy. Performance of all machine learning classifier models are measured in terms of various metrics such as recall, precision, accuracy and F1-score via confusion matrix.