NSGA‐II‐XGB:基于XGBoost框架的Meta启发式特征选择用于糖尿病预测

Concurrency and Computation: Practice and Experience Pub Date : 2022-07-27 DOI:10.1002/cpe.7123

Aditya Gupta, I. S. Rajput, Gunjan, Vibha Jain, Soni Chaurasia

{"title":"NSGA‐II‐XGB:基于XGBoost框架的Meta启发式特征选择用于糖尿病预测","authors":"Aditya Gupta, I. S. Rajput, Gunjan, Vibha Jain, Soni Chaurasia","doi":"10.1002/cpe.7123","DOIUrl":null,"url":null,"abstract":"Diabetes is one of the most prevalent causes of casualties in the modern world. Early diagnosis of diabetes is the most promising way for increasing the chances of patients' survival. The ever‐growing technology of the current era, machine learning‐based algorithms pave the door in the healthcare industry by delivering efficient decision support services in real‐time. However, high‐dimensionality of the data obtained using multiple sources increases the computation time and significantly impacts the models' efficiency in classifying the results. Feature selection improves learning performance and reduces the computational cost by selecting subsets of features and eliminating unnecessary and irrelevant features. In this article, an attempt has been made to develop a hybrid machine learning model based on non‐dominated sorting genetic algorithm (NSGA‐II) and ensemble learning for the efficient categorization of diabetes. The proposed work uses various data preprocessing techniques, such as missing data handling and normalization, prior to model training. The most prominent and salient features are selected by exploiting the potential of the NSGA‐II in the diabetes dataset. Finally, an ensemble learning‐based extreme gradient boosting (XGBoost) model is modeled using features selected by NSGA‐II to classify patients as diabetic or non‐diabetic. The proposed methodology is experimentally validated using a hybridized dataset comprising 23 features, with 1288 instances of both male and female patients between the ages of 21 and 65. In addition, for performance evaluation, the results of statistical parameters are compared with several state‐of‐the‐art decision‐making models in the current domain. Experiment findings exemplify that the proposed NSGA‐II‐XGB approach gives better classification results with an average accuracy of 98.86%. Furthermore, the statistical results of specificity (88.6%), sensitivity (96.36%), and F‐score (97.84%) also support the utility of the proposed methodology in the early diagnosis of diabetes.","PeriodicalId":10584,"journal":{"name":"Concurrency and Computation: Practice and Experience","volume":"25 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"NSGA‐II‐XGB: Meta‐heuristic feature selection with XGBoost framework for diabetes prediction\",\"authors\":\"Aditya Gupta, I. S. Rajput, Gunjan, Vibha Jain, Soni Chaurasia\",\"doi\":\"10.1002/cpe.7123\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Diabetes is one of the most prevalent causes of casualties in the modern world. Early diagnosis of diabetes is the most promising way for increasing the chances of patients' survival. The ever‐growing technology of the current era, machine learning‐based algorithms pave the door in the healthcare industry by delivering efficient decision support services in real‐time. However, high‐dimensionality of the data obtained using multiple sources increases the computation time and significantly impacts the models' efficiency in classifying the results. Feature selection improves learning performance and reduces the computational cost by selecting subsets of features and eliminating unnecessary and irrelevant features. In this article, an attempt has been made to develop a hybrid machine learning model based on non‐dominated sorting genetic algorithm (NSGA‐II) and ensemble learning for the efficient categorization of diabetes. The proposed work uses various data preprocessing techniques, such as missing data handling and normalization, prior to model training. The most prominent and salient features are selected by exploiting the potential of the NSGA‐II in the diabetes dataset. Finally, an ensemble learning‐based extreme gradient boosting (XGBoost) model is modeled using features selected by NSGA‐II to classify patients as diabetic or non‐diabetic. The proposed methodology is experimentally validated using a hybridized dataset comprising 23 features, with 1288 instances of both male and female patients between the ages of 21 and 65. In addition, for performance evaluation, the results of statistical parameters are compared with several state‐of‐the‐art decision‐making models in the current domain. Experiment findings exemplify that the proposed NSGA‐II‐XGB approach gives better classification results with an average accuracy of 98.86%. Furthermore, the statistical results of specificity (88.6%), sensitivity (96.36%), and F‐score (97.84%) also support the utility of the proposed methodology in the early diagnosis of diabetes.\",\"PeriodicalId\":10584,\"journal\":{\"name\":\"Concurrency and Computation: Practice and Experience\",\"volume\":\"25 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-07-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Concurrency and Computation: Practice and Experience\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1002/cpe.7123\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Concurrency and Computation: Practice and Experience","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/cpe.7123","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

糖尿病是现代世界最常见的伤亡原因之一。糖尿病的早期诊断是增加患者生存机会的最有希望的方法。当今时代不断发展的技术，基于机器学习的算法通过实时提供高效的决策支持服务，为医疗保健行业铺平了道路。然而，使用多个来源获得的数据的高维增加了计算时间，并显著影响模型对结果的分类效率。特征选择通过选择特征子集和消除不必要和不相关的特征来提高学习性能，降低计算成本。本文试图开发一种基于非主导排序遗传算法(NSGA‐II)和集成学习的混合机器学习模型，用于糖尿病的有效分类。提出的工作使用了各种数据预处理技术，如缺失数据处理和规范化，在模型训练之前。通过利用NSGA‐II在糖尿病数据集中的潜力来选择最突出和最显著的特征。最后，基于集成学习的极端梯度增强(XGBoost)模型使用NSGA‐II选择的特征来对患者进行糖尿病或非糖尿病分类。该方法使用包含23个特征的杂交数据集进行了实验验证，其中包括1288例年龄在21至65岁之间的男性和女性患者。此外，为了进行性能评估，统计参数的结果与当前领域中几个最先进的决策模型进行了比较。实验结果表明，提出的NSGA‐II‐XGB方法具有较好的分类效果，平均准确率为98.86%。此外，特异性(88.6%)、敏感性(96.36%)和F评分(97.84%)的统计结果也支持该方法在糖尿病早期诊断中的实用性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

NSGA‐II‐XGB: Meta‐heuristic feature selection with XGBoost framework for diabetes prediction

Diabetes is one of the most prevalent causes of casualties in the modern world. Early diagnosis of diabetes is the most promising way for increasing the chances of patients' survival. The ever‐growing technology of the current era, machine learning‐based algorithms pave the door in the healthcare industry by delivering efficient decision support services in real‐time. However, high‐dimensionality of the data obtained using multiple sources increases the computation time and significantly impacts the models' efficiency in classifying the results. Feature selection improves learning performance and reduces the computational cost by selecting subsets of features and eliminating unnecessary and irrelevant features. In this article, an attempt has been made to develop a hybrid machine learning model based on non‐dominated sorting genetic algorithm (NSGA‐II) and ensemble learning for the efficient categorization of diabetes. The proposed work uses various data preprocessing techniques, such as missing data handling and normalization, prior to model training. The most prominent and salient features are selected by exploiting the potential of the NSGA‐II in the diabetes dataset. Finally, an ensemble learning‐based extreme gradient boosting (XGBoost) model is modeled using features selected by NSGA‐II to classify patients as diabetic or non‐diabetic. The proposed methodology is experimentally validated using a hybridized dataset comprising 23 features, with 1288 instances of both male and female patients between the ages of 21 and 65. In addition, for performance evaluation, the results of statistical parameters are compared with several state‐of‐the‐art decision‐making models in the current domain. Experiment findings exemplify that the proposed NSGA‐II‐XGB approach gives better classification results with an average accuracy of 98.86%. Furthermore, the statistical results of specificity (88.6%), sensitivity (96.36%), and F‐score (97.84%) also support the utility of the proposed methodology in the early diagnosis of diabetes.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Concurrency and Computation: Practice and Experience

自引率

0.00%

发文量