Feature importance and model performance for prediabetes prediction: A comparative study

IF 3.7 3区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Journal of King Saud University - Science Pub Date : 2024-12-01 DOI:10.1016/j.jksus.2024.103583
Saeed Awad M Alqahtani , Hussah M Alobaid , Jamilah Alshammari , Safa A Alqarzae , Sheka Yagub Aloyouni , Ahood A. Al-Eidan , Salwa Alhamad , Abeer Almiman , Fadwa M Alkhulaifi , Suliman Alomar
{"title":"Feature importance and model performance for prediabetes prediction: A comparative study","authors":"Saeed Awad M Alqahtani ,&nbsp;Hussah M Alobaid ,&nbsp;Jamilah Alshammari ,&nbsp;Safa A Alqarzae ,&nbsp;Sheka Yagub Aloyouni ,&nbsp;Ahood A. Al-Eidan ,&nbsp;Salwa Alhamad ,&nbsp;Abeer Almiman ,&nbsp;Fadwa M Alkhulaifi ,&nbsp;Suliman Alomar","doi":"10.1016/j.jksus.2024.103583","DOIUrl":null,"url":null,"abstract":"<div><h3>Objectives</h3><div>Prediabetes is a significant health condition that elevates the risk of developing type 2 diabetes and other associated complications. This study aims to (1) explore the potential of machine learning models to improve the prediction of prediabetes, (2) compare the performance of various machine learning models with traditional regression methods, and (3) identify the most influential demographic, socioeconomic, and health-related factors associated with prediabetes.</div></div><div><h3>Methods</h3><div>This study utilized data from the 2021 Behavioral Risk Factor Surveillance System (BRFSS) and employed comprehensive data preprocessing techniques. Logistic regression analysis was conducted to assess correlations between features and prediabetes risk. Feature importance was quantified using Adjusted Mutual Information values. Multiple machine learning models, including Random Forest, K Nearest Neighbors (KNN), Extreme Gradient Boosting (XGBoost), Neural Network, and Logistic Regression, were used for prediction. The best model was selected and validated through cross-validation to ensure robustness.</div></div><div><h3>Results</h3><div>Significant associations were observed between prediabetes and key predictors such as cholesterol levels, BMI categories, hypertension status, age groups, and income categories. Among the models tested, Random Forest demonstrated the highest accuracy and robustness, outperforming traditional regression models.</div></div><div><h3>Conclusions</h3><div>This study highlights the potential of machine learning to enhance prediabetes prediction and underscores the importance of identifying high-risk individuals for early intervention. The findings contribute to population health strategies by integrating advanced analytical methods with public health data.</div></div>","PeriodicalId":16205,"journal":{"name":"Journal of King Saud University - Science","volume":"36 11","pages":"Article 103583"},"PeriodicalIF":3.7000,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of King Saud University - Science","FirstCategoryId":"103","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1018364724004956","RegionNum":3,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0

Abstract

Objectives

Prediabetes is a significant health condition that elevates the risk of developing type 2 diabetes and other associated complications. This study aims to (1) explore the potential of machine learning models to improve the prediction of prediabetes, (2) compare the performance of various machine learning models with traditional regression methods, and (3) identify the most influential demographic, socioeconomic, and health-related factors associated with prediabetes.

Methods

This study utilized data from the 2021 Behavioral Risk Factor Surveillance System (BRFSS) and employed comprehensive data preprocessing techniques. Logistic regression analysis was conducted to assess correlations between features and prediabetes risk. Feature importance was quantified using Adjusted Mutual Information values. Multiple machine learning models, including Random Forest, K Nearest Neighbors (KNN), Extreme Gradient Boosting (XGBoost), Neural Network, and Logistic Regression, were used for prediction. The best model was selected and validated through cross-validation to ensure robustness.

Results

Significant associations were observed between prediabetes and key predictors such as cholesterol levels, BMI categories, hypertension status, age groups, and income categories. Among the models tested, Random Forest demonstrated the highest accuracy and robustness, outperforming traditional regression models.

Conclusions

This study highlights the potential of machine learning to enhance prediabetes prediction and underscores the importance of identifying high-risk individuals for early intervention. The findings contribute to population health strategies by integrating advanced analytical methods with public health data.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of King Saud University - Science
Journal of King Saud University - Science Multidisciplinary-Multidisciplinary
CiteScore
7.20
自引率
2.60%
发文量
642
审稿时长
49 days
期刊介绍: Journal of King Saud University – Science is an official refereed publication of King Saud University and the publishing services is provided by Elsevier. It publishes peer-reviewed research articles in the fields of physics, astronomy, mathematics, statistics, chemistry, biochemistry, earth sciences, life and environmental sciences on the basis of scientific originality and interdisciplinary interest. It is devoted primarily to research papers but short communications, reviews and book reviews are also included. The editorial board and associated editors, composed of prominent scientists from around the world, are representative of the disciplines covered by the journal.
期刊最新文献
Editorial Board Nano-based herbal Mahkota Dewa (Phaleria macrocarpa) for pre-eclampsia: A histological study on placental and blood changes Corrigendum to “Interferon-stimulated gene (ISG12a) suppresses hepatitis B virus replication in Huh 7 cells line” [J. King Saud Univ. Sci. 36(9) (2024) 103377] Sustainable nanoparticles of Non-Zero-valent iron (nZVI) production from various biological wastes Copper and chromium binding by Pseudomonas aeruginosa strain PA01 for implications of heavy metal detoxification and soil remediation: A computational approach
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1