Machine learning-driven risk assessment of coronary heart disease: Analysis of NHANES data from 1999 to 2018.

Jin Lu, Haochang Hu, Jiaming Xiu, Yanfang Yang, Qifeng Zhu, Hanyi Dai, Xianbao Liu, Jian'an Wang
{"title":"Machine learning-driven risk assessment of coronary heart disease: Analysis of NHANES data from 1999 to 2018.","authors":"Jin Lu, Haochang Hu, Jiaming Xiu, Yanfang Yang, Qifeng Zhu, Hanyi Dai, Xianbao Liu, Jian'an Wang","doi":"10.11817/j.issn.1672-7347.2024.240394","DOIUrl":null,"url":null,"abstract":"<p><strong>Objectives: </strong>The high incidence of coronary artery heart disease (CHD) poses a significant burden and challenge to public health systems globally. Effective prevention and early diagnosis of CHD have become key strategies to alleviate this burden. This study aims to explore the application of advanced machine learning techniques to enhance the accuracy of early screening and risk assessment for CHD.</p><p><strong>Methods: </strong>A total of 49 490 study subjects from the National Health and Nutrition Examination Survey (NHANES) database spanning from 1999 to 2018 were included. The dataset was randomly divided into training (70%) and testing (30%) sets. The dependent variable (outcome variable) was whether the subjects were informed of a CHD diagnosis, categorizing them into a CHD group and a non-CHD group. We reviewed the literature on risk factors associated with CHD, ultimately including 68 independent variables. The variable characteristics of the study subjects were analyzed, comparing differences between the CHD and non-CHD groups. Machine learning algorithms, specifically random forest (randomForest_4.7-1.1) and XGBoost (xgboost_1.7.7.1) were utilized for variable selection. A comprehensive analysis of the top 10 variables identified by these 2 algorithms were conducted, selecting those mutually recognized by both. A generalized linear model was used to analyze the relationships between variables and CHD, and classical logistic regression was used to construct the CHD risk prediction model. The model's ability to distinguish between CHD and non-CHD individuals was assessed using the area under the receiver operating characteristic curve (AUC); calibration measurements were conducted with the Hosmer-Lemeshow goodness-of-fit test to evaluate the consistency between predicted values and actual CHD proportions; and decision curve analysis was applied to evaluate the clinical benefits of the model's risk prediction. Finally, a nomogram was constructed to visually present the risk scoring of the final model.</p><p><strong>Results: </strong>The mean age of the overall population was (49.53±18.31) years, with males comprising 51.8%. Compared to the non-CHD group, the CHD group was older [(69.05± 11.32) years vs (48.67±18.07) years, <i>P</i><0.001], had a higher proportion of females (67.1% vs 47.4%, <i>P</i><0.001), and exhibited statistically significant differences in classical cardiovascular risk factors such as body mass index, systolic blood pressure, diastolic blood pressure, and smoking (all <i>P</i><0.001). Additionally, there were statistically significant differences in non-classical cardiovascular factors, such as energy intake, vitamins E, vitamin K, calcium, phosphorus, magnesium, zinc, copper, sodium, potassium, and selenium (all <i>P</i><0.05). Six key variables most associated with CHD occurrence were ultimately identified. The CHD risk prediction model constructed was as follows: logit(p)= -7.783+0.074×age+0.003×creatinine-0.003×platelets+0.257×glycated hemoglobin+0.003× uric acid+0.101×coefficient of variation of red cell distribution width. The model demonstrated excellent discriminative ability in predicting CHD, with an accuracy of 0.712 and an AUC of 0.841. Calibration curves indicated good consistency between predicted probabilities and actual values in both the training and testing sets, demonstrating model stability and reliability. Decision curve analysis suggested that the model provided net benefits across a range of threshold probabilities, supporting its potential application in clinical decision-making.</p><p><strong>Conclusions: </strong>This study successfully identified potential risk factors for CHD using machine learning techniques and developed a concise and practical clinical prediction model. Further prospective clinical cohort studies are needed to validate its potential for clinical application, enabling effective cardiovascular disease prevention and intervention strategies in real-world healthcare settings.</p>","PeriodicalId":39801,"journal":{"name":"中南大学学报(医学版)","volume":"49 8","pages":"1175-1186"},"PeriodicalIF":0.0000,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11628228/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"中南大学学报(医学版)","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.11817/j.issn.1672-7347.2024.240394","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Medicine","Score":null,"Total":0}
引用次数: 0

Abstract

Objectives: The high incidence of coronary artery heart disease (CHD) poses a significant burden and challenge to public health systems globally. Effective prevention and early diagnosis of CHD have become key strategies to alleviate this burden. This study aims to explore the application of advanced machine learning techniques to enhance the accuracy of early screening and risk assessment for CHD.

Methods: A total of 49 490 study subjects from the National Health and Nutrition Examination Survey (NHANES) database spanning from 1999 to 2018 were included. The dataset was randomly divided into training (70%) and testing (30%) sets. The dependent variable (outcome variable) was whether the subjects were informed of a CHD diagnosis, categorizing them into a CHD group and a non-CHD group. We reviewed the literature on risk factors associated with CHD, ultimately including 68 independent variables. The variable characteristics of the study subjects were analyzed, comparing differences between the CHD and non-CHD groups. Machine learning algorithms, specifically random forest (randomForest_4.7-1.1) and XGBoost (xgboost_1.7.7.1) were utilized for variable selection. A comprehensive analysis of the top 10 variables identified by these 2 algorithms were conducted, selecting those mutually recognized by both. A generalized linear model was used to analyze the relationships between variables and CHD, and classical logistic regression was used to construct the CHD risk prediction model. The model's ability to distinguish between CHD and non-CHD individuals was assessed using the area under the receiver operating characteristic curve (AUC); calibration measurements were conducted with the Hosmer-Lemeshow goodness-of-fit test to evaluate the consistency between predicted values and actual CHD proportions; and decision curve analysis was applied to evaluate the clinical benefits of the model's risk prediction. Finally, a nomogram was constructed to visually present the risk scoring of the final model.

Results: The mean age of the overall population was (49.53±18.31) years, with males comprising 51.8%. Compared to the non-CHD group, the CHD group was older [(69.05± 11.32) years vs (48.67±18.07) years, P<0.001], had a higher proportion of females (67.1% vs 47.4%, P<0.001), and exhibited statistically significant differences in classical cardiovascular risk factors such as body mass index, systolic blood pressure, diastolic blood pressure, and smoking (all P<0.001). Additionally, there were statistically significant differences in non-classical cardiovascular factors, such as energy intake, vitamins E, vitamin K, calcium, phosphorus, magnesium, zinc, copper, sodium, potassium, and selenium (all P<0.05). Six key variables most associated with CHD occurrence were ultimately identified. The CHD risk prediction model constructed was as follows: logit(p)= -7.783+0.074×age+0.003×creatinine-0.003×platelets+0.257×glycated hemoglobin+0.003× uric acid+0.101×coefficient of variation of red cell distribution width. The model demonstrated excellent discriminative ability in predicting CHD, with an accuracy of 0.712 and an AUC of 0.841. Calibration curves indicated good consistency between predicted probabilities and actual values in both the training and testing sets, demonstrating model stability and reliability. Decision curve analysis suggested that the model provided net benefits across a range of threshold probabilities, supporting its potential application in clinical decision-making.

Conclusions: This study successfully identified potential risk factors for CHD using machine learning techniques and developed a concise and practical clinical prediction model. Further prospective clinical cohort studies are needed to validate its potential for clinical application, enabling effective cardiovascular disease prevention and intervention strategies in real-world healthcare settings.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
机器学习驱动的冠心病风险评估:1999年至2018年NHANES数据分析
目的:冠状动脉心脏病(CHD)的高发病率给全球公共卫生系统带来了巨大的负担和挑战。有效预防和早期诊断冠心病已成为减轻这一负担的关键策略。本研究旨在探索应用先进的机器学习技术提高冠心病早期筛查和风险评估的准确性。方法:从1999年至2018年的国家健康与营养检查调查(NHANES)数据库中共纳入49490名研究对象。数据集随机分为训练集(70%)和测试集(30%)。因变量(结果变量)是受试者是否被告知冠心病诊断,并将其分为冠心病组和非冠心病组。我们回顾了与冠心病相关的危险因素的文献,最终包括68个自变量。分析研究对象的变量特征,比较冠心病组与非冠心病组之间的差异。机器学习算法,特别是随机森林(randomForest_4.7-1.1)和XGBoost (xgboost_1.7.7.1)用于变量选择。对两种算法识别出的前10个变量进行综合分析,选取两种算法都能识别的变量。采用广义线性模型分析变量与冠心病的关系,采用经典logistic回归构建冠心病风险预测模型。使用受试者工作特征曲线下面积(AUC)评估模型区分冠心病和非冠心病个体的能力;采用Hosmer-Lemeshow拟合优度检验进行校正测量,以评估预测值与实际冠心病比例的一致性;并应用决策曲线分析评价模型风险预测的临床效益。最后,构造一个nomogram来直观地表示最终模型的风险评分。结果:人群平均年龄为(49.53±18.31)岁,男性占51.8%。与非冠心病组相比,冠心病组年龄较大[(69.05±11.32)岁vs(48.67±18.07)岁]。结论:本研究利用机器学习技术成功识别冠心病的潜在危险因素,并建立了简洁实用的临床预测模型。需要进一步的前瞻性临床队列研究来验证其临床应用潜力,从而在现实世界的医疗保健环境中实现有效的心血管疾病预防和干预策略。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
中南大学学报(医学版)
中南大学学报(医学版) Medicine-Medicine (all)
CiteScore
1.00
自引率
0.00%
发文量
8237
期刊介绍: Journal of Central South University (Medical Sciences), founded in 1958, is a comprehensive academic journal of medicine and health sponsored by the Ministry of Education and Central South University. The journal has been included in many important databases and authoritative abstract journals at home and abroad, such as the American Medline, Pubmed and its Index Medicus (IM), the Netherlands Medical Abstracts (EM), the American Chemical Abstracts (CA), the WHO Western Pacific Region Medical Index (WPRIM), and the Chinese Science Citation Database (Core Database) (CSCD); it is a statistical source journal of Chinese scientific and technological papers, a Chinese core journal, and a "double-effect" journal of the Chinese Journal Matrix; it is the "2nd, 3rd, and 4th China University Excellent Science and Technology Journal", "2008 China Excellent Science and Technology Journal", "RCCSE China Authoritative Academic Journal (A+)" and Hunan Province's "Top Ten Science and Technology Journals". The purpose of the journal is to reflect the new achievements, new technologies, and new experiences in medical research, medical treatment, and teaching, report new medical trends at home and abroad, promote academic exchanges, improve academic standards, and promote scientific and technological progress.
期刊最新文献
Multidisciplinary integration and fusion based on critical care medicine and immunology: History, current status, and prospects. Pre-assembled nanospheres in mucoadhesive microneedle patch for sustained release of triamcinolone in the treatment of oral submucous fibrosis. Pyroptosis and sepsis-associated acute kidney injury. Regulatory role of the mTOR signaling pathway in autophagy and mesangial proliferation in IgA nephropathy. Research progress in anti-renal fibrosis drugs.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1