Aditya Gupta, I. S. Rajput, Gunjan, Vibha Jain, Soni Chaurasia
{"title":"NSGA‐II‐XGB:基于XGBoost框架的Meta启发式特征选择用于糖尿病预测","authors":"Aditya Gupta, I. S. Rajput, Gunjan, Vibha Jain, Soni Chaurasia","doi":"10.1002/cpe.7123","DOIUrl":null,"url":null,"abstract":"Diabetes is one of the most prevalent causes of casualties in the modern world. Early diagnosis of diabetes is the most promising way for increasing the chances of patients' survival. The ever‐growing technology of the current era, machine learning‐based algorithms pave the door in the healthcare industry by delivering efficient decision support services in real‐time. However, high‐dimensionality of the data obtained using multiple sources increases the computation time and significantly impacts the models' efficiency in classifying the results. Feature selection improves learning performance and reduces the computational cost by selecting subsets of features and eliminating unnecessary and irrelevant features. In this article, an attempt has been made to develop a hybrid machine learning model based on non‐dominated sorting genetic algorithm (NSGA‐II) and ensemble learning for the efficient categorization of diabetes. The proposed work uses various data preprocessing techniques, such as missing data handling and normalization, prior to model training. The most prominent and salient features are selected by exploiting the potential of the NSGA‐II in the diabetes dataset. Finally, an ensemble learning‐based extreme gradient boosting (XGBoost) model is modeled using features selected by NSGA‐II to classify patients as diabetic or non‐diabetic. The proposed methodology is experimentally validated using a hybridized dataset comprising 23 features, with 1288 instances of both male and female patients between the ages of 21 and 65. In addition, for performance evaluation, the results of statistical parameters are compared with several state‐of‐the‐art decision‐making models in the current domain. Experiment findings exemplify that the proposed NSGA‐II‐XGB approach gives better classification results with an average accuracy of 98.86%. Furthermore, the statistical results of specificity (88.6%), sensitivity (96.36%), and F‐score (97.84%) also support the utility of the proposed methodology in the early diagnosis of diabetes.","PeriodicalId":10584,"journal":{"name":"Concurrency and Computation: Practice and Experience","volume":"25 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"NSGA‐II‐XGB: Meta‐heuristic feature selection with XGBoost framework for diabetes prediction\",\"authors\":\"Aditya Gupta, I. S. Rajput, Gunjan, Vibha Jain, Soni Chaurasia\",\"doi\":\"10.1002/cpe.7123\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Diabetes is one of the most prevalent causes of casualties in the modern world. Early diagnosis of diabetes is the most promising way for increasing the chances of patients' survival. The ever‐growing technology of the current era, machine learning‐based algorithms pave the door in the healthcare industry by delivering efficient decision support services in real‐time. However, high‐dimensionality of the data obtained using multiple sources increases the computation time and significantly impacts the models' efficiency in classifying the results. Feature selection improves learning performance and reduces the computational cost by selecting subsets of features and eliminating unnecessary and irrelevant features. In this article, an attempt has been made to develop a hybrid machine learning model based on non‐dominated sorting genetic algorithm (NSGA‐II) and ensemble learning for the efficient categorization of diabetes. The proposed work uses various data preprocessing techniques, such as missing data handling and normalization, prior to model training. The most prominent and salient features are selected by exploiting the potential of the NSGA‐II in the diabetes dataset. Finally, an ensemble learning‐based extreme gradient boosting (XGBoost) model is modeled using features selected by NSGA‐II to classify patients as diabetic or non‐diabetic. The proposed methodology is experimentally validated using a hybridized dataset comprising 23 features, with 1288 instances of both male and female patients between the ages of 21 and 65. In addition, for performance evaluation, the results of statistical parameters are compared with several state‐of‐the‐art decision‐making models in the current domain. Experiment findings exemplify that the proposed NSGA‐II‐XGB approach gives better classification results with an average accuracy of 98.86%. Furthermore, the statistical results of specificity (88.6%), sensitivity (96.36%), and F‐score (97.84%) also support the utility of the proposed methodology in the early diagnosis of diabetes.\",\"PeriodicalId\":10584,\"journal\":{\"name\":\"Concurrency and Computation: Practice and Experience\",\"volume\":\"25 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-07-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Concurrency and Computation: Practice and Experience\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1002/cpe.7123\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Concurrency and Computation: Practice and Experience","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/cpe.7123","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
NSGA‐II‐XGB: Meta‐heuristic feature selection with XGBoost framework for diabetes prediction
Diabetes is one of the most prevalent causes of casualties in the modern world. Early diagnosis of diabetes is the most promising way for increasing the chances of patients' survival. The ever‐growing technology of the current era, machine learning‐based algorithms pave the door in the healthcare industry by delivering efficient decision support services in real‐time. However, high‐dimensionality of the data obtained using multiple sources increases the computation time and significantly impacts the models' efficiency in classifying the results. Feature selection improves learning performance and reduces the computational cost by selecting subsets of features and eliminating unnecessary and irrelevant features. In this article, an attempt has been made to develop a hybrid machine learning model based on non‐dominated sorting genetic algorithm (NSGA‐II) and ensemble learning for the efficient categorization of diabetes. The proposed work uses various data preprocessing techniques, such as missing data handling and normalization, prior to model training. The most prominent and salient features are selected by exploiting the potential of the NSGA‐II in the diabetes dataset. Finally, an ensemble learning‐based extreme gradient boosting (XGBoost) model is modeled using features selected by NSGA‐II to classify patients as diabetic or non‐diabetic. The proposed methodology is experimentally validated using a hybridized dataset comprising 23 features, with 1288 instances of both male and female patients between the ages of 21 and 65. In addition, for performance evaluation, the results of statistical parameters are compared with several state‐of‐the‐art decision‐making models in the current domain. Experiment findings exemplify that the proposed NSGA‐II‐XGB approach gives better classification results with an average accuracy of 98.86%. Furthermore, the statistical results of specificity (88.6%), sensitivity (96.36%), and F‐score (97.84%) also support the utility of the proposed methodology in the early diagnosis of diabetes.