使用机器学习算法开发和验证20岁及以上抑郁症患者冠心病风险预测模型。

IF 3 3区 医学 Q2 CARDIAC & CARDIOVASCULAR SYSTEMS Frontiers in Cardiovascular Medicine Pub Date : 2025-01-09 eCollection Date: 2024-01-01 DOI:10.3389/fcvm.2024.1504957
Yicheng Wang, Chuan-Yang Wu, Hui-Xian Fu, Jian-Cheng Zhang
{"title":"使用机器学习算法开发和验证20岁及以上抑郁症患者冠心病风险预测模型。","authors":"Yicheng Wang, Chuan-Yang Wu, Hui-Xian Fu, Jian-Cheng Zhang","doi":"10.3389/fcvm.2024.1504957","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Depression is being increasingly acknowledged as an important risk factor contributing to coronary heart disease (CHD). Currently, there is no predictive model specifically designed to evaluate the risk of coronary heart disease among individuals with depression. We aim to develop a machine learning (ML) model that will analyze risk factors and forecast the probability of coronary heart disease in individuals suffering from depression.</p><p><strong>Methods: </strong>This research employed data from the National Health and Nutrition Examination Survey (NHANES) from 2007-2018, which included 2,085 individuals who had previously been diagnosed with depression. The population was randomly divided into a training set and a validation set, with an 8:2 ratio. Univariate and multivariate logistic regression analyses were employed to identify independent risk factors for coronary heart disease in individuals with depression. Eight machine learning algorithms were applied to the training set to construct the model, including logistic regression (LR), random forest (RF), gradient boosting machine (GBM), support vector machine (SVM), extreme gradient boosting (XGBoost), classification and regression tree (CART), k-nearest neighbors (KNN), and neural network (NNET). The validation set are used to evaluate the various performances of eight machine learning models. Several evaluation metrics were employed to assess and compare the performance of eight different machine learning models, aiming to identify the most effective algorithm for predicting coronary heart disease risk in individuals with depression. The evaluation metrics applied in this study included the area under the receiver operating characteristic (ROC) curve, calibration curve, Brier scores, decision curve analysis (DCA), and the precision-recall (PR) curve. And internally validated by the bootstrap method.</p><p><strong>Results: </strong>Univariate and multivariate logistic regression analyses identified age, chest pain status, history of myocardial infarction, serum triglyceride levels, and education level as independent predictors of coronary heart disease risk. Eight machine learning algorithms are applied to construct the models, among which the Random Forest model has the best performance, with an (Area Under Curve) AUC of 0.987 for the random forest model in the training set, and an AUC of 0.848 for the PR curve. In the validation set, the random forest model achieves an AUC of 0.996, and an AUC of 0.960 for the PR curve, which demonstrates an excellent discriminative ability. Calibration curves indicated high congruence between observed and predicted odds, with minimal Brier scores of 0.026 and 0.021 for the training, respectively, reinforcing the model's ability to discriminate. Set and validation set, respectively, reinforcing the model's predictive accuracy. DCA curves confirmed net benefits of the random forest model across. Furthermore, the AUC of the random forest model was 0.928 after internal validation by bootstrap method, indicating that its discriminative ability is good, and the model is useful for clinical assessment of the risk of coronary heart disease in depressed people.</p><p><strong>Conclusion: </strong>The random forest algorithm exhibited the best predictive performance, potentially aiding clinicians in assessing the risk probabilities of coronary heart disease within this population.</p>","PeriodicalId":12414,"journal":{"name":"Frontiers in Cardiovascular Medicine","volume":"11 ","pages":"1504957"},"PeriodicalIF":3.0000,"publicationDate":"2025-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11754242/pdf/","citationCount":"0","resultStr":"{\"title\":\"Development and validation of a prediction model for coronary heart disease risk in depressed patients aged 20 years and older using machine learning algorithms.\",\"authors\":\"Yicheng Wang, Chuan-Yang Wu, Hui-Xian Fu, Jian-Cheng Zhang\",\"doi\":\"10.3389/fcvm.2024.1504957\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Depression is being increasingly acknowledged as an important risk factor contributing to coronary heart disease (CHD). Currently, there is no predictive model specifically designed to evaluate the risk of coronary heart disease among individuals with depression. We aim to develop a machine learning (ML) model that will analyze risk factors and forecast the probability of coronary heart disease in individuals suffering from depression.</p><p><strong>Methods: </strong>This research employed data from the National Health and Nutrition Examination Survey (NHANES) from 2007-2018, which included 2,085 individuals who had previously been diagnosed with depression. The population was randomly divided into a training set and a validation set, with an 8:2 ratio. Univariate and multivariate logistic regression analyses were employed to identify independent risk factors for coronary heart disease in individuals with depression. Eight machine learning algorithms were applied to the training set to construct the model, including logistic regression (LR), random forest (RF), gradient boosting machine (GBM), support vector machine (SVM), extreme gradient boosting (XGBoost), classification and regression tree (CART), k-nearest neighbors (KNN), and neural network (NNET). The validation set are used to evaluate the various performances of eight machine learning models. Several evaluation metrics were employed to assess and compare the performance of eight different machine learning models, aiming to identify the most effective algorithm for predicting coronary heart disease risk in individuals with depression. The evaluation metrics applied in this study included the area under the receiver operating characteristic (ROC) curve, calibration curve, Brier scores, decision curve analysis (DCA), and the precision-recall (PR) curve. And internally validated by the bootstrap method.</p><p><strong>Results: </strong>Univariate and multivariate logistic regression analyses identified age, chest pain status, history of myocardial infarction, serum triglyceride levels, and education level as independent predictors of coronary heart disease risk. Eight machine learning algorithms are applied to construct the models, among which the Random Forest model has the best performance, with an (Area Under Curve) AUC of 0.987 for the random forest model in the training set, and an AUC of 0.848 for the PR curve. In the validation set, the random forest model achieves an AUC of 0.996, and an AUC of 0.960 for the PR curve, which demonstrates an excellent discriminative ability. Calibration curves indicated high congruence between observed and predicted odds, with minimal Brier scores of 0.026 and 0.021 for the training, respectively, reinforcing the model's ability to discriminate. Set and validation set, respectively, reinforcing the model's predictive accuracy. DCA curves confirmed net benefits of the random forest model across. Furthermore, the AUC of the random forest model was 0.928 after internal validation by bootstrap method, indicating that its discriminative ability is good, and the model is useful for clinical assessment of the risk of coronary heart disease in depressed people.</p><p><strong>Conclusion: </strong>The random forest algorithm exhibited the best predictive performance, potentially aiding clinicians in assessing the risk probabilities of coronary heart disease within this population.</p>\",\"PeriodicalId\":12414,\"journal\":{\"name\":\"Frontiers in Cardiovascular Medicine\",\"volume\":\"11 \",\"pages\":\"1504957\"},\"PeriodicalIF\":3.0000,\"publicationDate\":\"2025-01-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11754242/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Frontiers in Cardiovascular Medicine\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.3389/fcvm.2024.1504957\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"CARDIAC & CARDIOVASCULAR SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Cardiovascular Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3389/fcvm.2024.1504957","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"CARDIAC & CARDIOVASCULAR SYSTEMS","Score":null,"Total":0}
引用次数: 0

摘要

背景:抑郁症越来越被认为是导致冠心病(CHD)的重要危险因素。目前,还没有专门设计的预测模型来评估抑郁症患者患冠心病的风险。我们的目标是开发一种机器学习(ML)模型,该模型将分析风险因素并预测抑郁症患者患冠心病的可能性。方法:本研究采用了2007-2018年国家健康与营养检查调查(NHANES)的数据,其中包括2,085名先前被诊断患有抑郁症的人。总体随机分为训练集和验证集,比例为8:2。采用单因素和多因素logistic回归分析确定抑郁症患者冠心病的独立危险因素。在训练集上应用了逻辑回归(LR)、随机森林(RF)、梯度增强机(GBM)、支持向量机(SVM)、极端梯度增强(XGBoost)、分类与回归树(CART)、k近邻(KNN)和神经网络(NNET)等8种机器学习算法构建模型。验证集用于评估八种机器学习模型的各种性能。采用几个评估指标来评估和比较八种不同机器学习模型的性能,旨在确定预测抑郁症患者冠心病风险的最有效算法。本研究采用的评价指标包括受试者工作特征(ROC)曲线下面积、校准曲线、Brier评分、决策曲线分析(DCA)和精密度-召回率(PR)曲线。并通过bootstrap方法进行内部验证。结果:单因素和多因素logistic回归分析确定年龄、胸痛状态、心肌梗死史、血清甘油三酯水平和教育水平是冠心病风险的独立预测因素。采用了8种机器学习算法构建模型,其中随机森林模型表现最好,训练集中随机森林模型的(曲线下面积)AUC为0.987,PR曲线的AUC为0.848。在验证集中,随机森林模型的AUC为0.996,PR曲线的AUC为0.960,具有良好的判别能力。校准曲线表明,观察到的概率和预测的概率高度一致,训练的Brier分数分别为0.026和0.021,增强了模型的区分能力。集和验证集,增强了模型的预测准确性。DCA曲线证实了随机森林模型的净收益。经bootstrap方法内部验证,随机森林模型的AUC为0.928,表明该模型的判别能力较好,可用于抑郁症患者冠心病风险的临床评估。结论:随机森林算法表现出最好的预测性能,潜在地帮助临床医生评估该人群中冠心病的风险概率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

摘要图片

摘要图片

摘要图片

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Development and validation of a prediction model for coronary heart disease risk in depressed patients aged 20 years and older using machine learning algorithms.

Background: Depression is being increasingly acknowledged as an important risk factor contributing to coronary heart disease (CHD). Currently, there is no predictive model specifically designed to evaluate the risk of coronary heart disease among individuals with depression. We aim to develop a machine learning (ML) model that will analyze risk factors and forecast the probability of coronary heart disease in individuals suffering from depression.

Methods: This research employed data from the National Health and Nutrition Examination Survey (NHANES) from 2007-2018, which included 2,085 individuals who had previously been diagnosed with depression. The population was randomly divided into a training set and a validation set, with an 8:2 ratio. Univariate and multivariate logistic regression analyses were employed to identify independent risk factors for coronary heart disease in individuals with depression. Eight machine learning algorithms were applied to the training set to construct the model, including logistic regression (LR), random forest (RF), gradient boosting machine (GBM), support vector machine (SVM), extreme gradient boosting (XGBoost), classification and regression tree (CART), k-nearest neighbors (KNN), and neural network (NNET). The validation set are used to evaluate the various performances of eight machine learning models. Several evaluation metrics were employed to assess and compare the performance of eight different machine learning models, aiming to identify the most effective algorithm for predicting coronary heart disease risk in individuals with depression. The evaluation metrics applied in this study included the area under the receiver operating characteristic (ROC) curve, calibration curve, Brier scores, decision curve analysis (DCA), and the precision-recall (PR) curve. And internally validated by the bootstrap method.

Results: Univariate and multivariate logistic regression analyses identified age, chest pain status, history of myocardial infarction, serum triglyceride levels, and education level as independent predictors of coronary heart disease risk. Eight machine learning algorithms are applied to construct the models, among which the Random Forest model has the best performance, with an (Area Under Curve) AUC of 0.987 for the random forest model in the training set, and an AUC of 0.848 for the PR curve. In the validation set, the random forest model achieves an AUC of 0.996, and an AUC of 0.960 for the PR curve, which demonstrates an excellent discriminative ability. Calibration curves indicated high congruence between observed and predicted odds, with minimal Brier scores of 0.026 and 0.021 for the training, respectively, reinforcing the model's ability to discriminate. Set and validation set, respectively, reinforcing the model's predictive accuracy. DCA curves confirmed net benefits of the random forest model across. Furthermore, the AUC of the random forest model was 0.928 after internal validation by bootstrap method, indicating that its discriminative ability is good, and the model is useful for clinical assessment of the risk of coronary heart disease in depressed people.

Conclusion: The random forest algorithm exhibited the best predictive performance, potentially aiding clinicians in assessing the risk probabilities of coronary heart disease within this population.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Frontiers in Cardiovascular Medicine
Frontiers in Cardiovascular Medicine Medicine-Cardiology and Cardiovascular Medicine
CiteScore
3.80
自引率
11.10%
发文量
3529
审稿时长
14 weeks
期刊介绍: Frontiers? Which frontiers? Where exactly are the frontiers of cardiovascular medicine? And who should be defining these frontiers? At Frontiers in Cardiovascular Medicine we believe it is worth being curious to foresee and explore beyond the current frontiers. In other words, we would like, through the articles published by our community journal Frontiers in Cardiovascular Medicine, to anticipate the future of cardiovascular medicine, and thus better prevent cardiovascular disorders and improve therapeutic options and outcomes of our patients.
期刊最新文献
Early escalation of oral lipid-lowering therapy to achieve LDL-cholesterol targets in patients after myocardial infarction: study protocol for the prospective, open-label HAnnover ChOlesterol Lowering in Acute Coronary Syndrome trial. Assessment of disease activity control and evaluation strategy in patients with Takayasu arteritis undergoing cardiac surgery: a retrospective cohort study. Advancing palliative care in cardiac arrest and cardiogenic shock: identifying evidence gaps and future research priorities. Immune checkpoint inhibitor-related myocarditis in a patient with hepatocellular carcinoma: a case report. The pathophysiological role of MiRNAs in heart failure.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1