使用机器学习算法开发和验证20岁及以上抑郁症患者冠心病风险预测模型。

IF 3 3区医学 Q2 CARDIAC & CARDIOVASCULAR SYSTEMS Frontiers in Cardiovascular Medicine Pub Date : 2025-01-09 eCollection Date: 2024-01-01 DOI:10.3389/fcvm.2024.1504957

Yicheng Wang, Chuan-Yang Wu, Hui-Xian Fu, Jian-Cheng Zhang

{"title":"使用机器学习算法开发和验证20岁及以上抑郁症患者冠心病风险预测模型。","authors":"Yicheng Wang, Chuan-Yang Wu, Hui-Xian Fu, Jian-Cheng Zhang","doi":"10.3389/fcvm.2024.1504957","DOIUrl":null,"url":null,"abstract":"Background: Depression is being increasingly acknowledged as an important risk factor contributing to coronary heart disease (CHD). Currently, there is no predictive model specifically designed to evaluate the risk of coronary heart disease among individuals with depression. We aim to develop a machine learning (ML) model that will analyze risk factors and forecast the probability of coronary heart disease in individuals suffering from depression.Methods: This research employed data from the National Health and Nutrition Examination Survey (NHANES) from 2007-2018, which included 2,085 individuals who had previously been diagnosed with depression. The population was randomly divided into a training set and a validation set, with an 8:2 ratio. Univariate and multivariate logistic regression analyses were employed to identify independent risk factors for coronary heart disease in individuals with depression. Eight machine learning algorithms were applied to the training set to construct the model, including logistic regression (LR), random forest (RF), gradient boosting machine (GBM), support vector machine (SVM), extreme gradient boosting (XGBoost), classification and regression tree (CART), k-nearest neighbors (KNN), and neural network (NNET). The validation set are used to evaluate the various performances of eight machine learning models. Several evaluation metrics were employed to assess and compare the performance of eight different machine learning models, aiming to identify the most effective algorithm for predicting coronary heart disease risk in individuals with depression. The evaluation metrics applied in this study included the area under the receiver operating characteristic (ROC) curve, calibration curve, Brier scores, decision curve analysis (DCA), and the precision-recall (PR) curve. And internally validated by the bootstrap method.Results: Univariate and multivariate logistic regression analyses identified age, chest pain status, history of myocardial infarction, serum triglyceride levels, and education level as independent predictors of coronary heart disease risk. Eight machine learning algorithms are applied to construct the models, among which the Random Forest model has the best performance, with an (Area Under Curve) AUC of 0.987 for the random forest model in the training set, and an AUC of 0.848 for the PR curve. In the validation set, the random forest model achieves an AUC of 0.996, and an AUC of 0.960 for the PR curve, which demonstrates an excellent discriminative ability. Calibration curves indicated high congruence between observed and predicted odds, with minimal Brier scores of 0.026 and 0.021 for the training, respectively, reinforcing the model's ability to discriminate. Set and validation set, respectively, reinforcing the model's predictive accuracy. DCA curves confirmed net benefits of the random forest model across. Furthermore, the AUC of the random forest model was 0.928 after internal validation by bootstrap method, indicating that its discriminative ability is good, and the model is useful for clinical assessment of the risk of coronary heart disease in depressed people.Conclusion: The random forest algorithm exhibited the best predictive performance, potentially aiding clinicians in assessing the risk probabilities of coronary heart disease within this population.","PeriodicalId":12414,"journal":{"name":"Frontiers in Cardiovascular Medicine","volume":"11 ","pages":"1504957"},"PeriodicalIF":3.0000,"publicationDate":"2025-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11754242/pdf/","citationCount":"0","resultStr":"{\"title\":\"Development and validation of a prediction model for coronary heart disease risk in depressed patients aged 20 years and older using machine learning algorithms.\",\"authors\":\"Yicheng Wang, Chuan-Yang Wu, Hui-Xian Fu, Jian-Cheng Zhang\",\"doi\":\"10.3389/fcvm.2024.1504957\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Depression is being increasingly acknowledged as an important risk factor contributing to coronary heart disease (CHD). Currently, there is no predictive model specifically designed to evaluate the risk of coronary heart disease among individuals with depression. We aim to develop a machine learning (ML) model that will analyze risk factors and forecast the probability of coronary heart disease in individuals suffering from depression.Methods: This research employed data from the National Health and Nutrition Examination Survey (NHANES) from 2007-2018, which included 2,085 individuals who had previously been diagnosed with depression. The population was randomly divided into a training set and a validation set, with an 8:2 ratio. Univariate and multivariate logistic regression analyses were employed to identify independent risk factors for coronary heart disease in individuals with depression. Eight machine learning algorithms were applied to the training set to construct the model, including logistic regression (LR), random forest (RF), gradient boosting machine (GBM), support vector machine (SVM), extreme gradient boosting (XGBoost), classification and regression tree (CART), k-nearest neighbors (KNN), and neural network (NNET). The validation set are used to evaluate the various performances of eight machine learning models. Several evaluation metrics were employed to assess and compare the performance of eight different machine learning models, aiming to identify the most effective algorithm for predicting coronary heart disease risk in individuals with depression. The evaluation metrics applied in this study included the area under the receiver operating characteristic (ROC) curve, calibration curve, Brier scores, decision curve analysis (DCA), and the precision-recall (PR) curve. And internally validated by the bootstrap method.Results: Univariate and multivariate logistic regression analyses identified age, chest pain status, history of myocardial infarction, serum triglyceride levels, and education level as independent predictors of coronary heart disease risk. Eight machine learning algorithms are applied to construct the models, among which the Random Forest model has the best performance, with an (Area Under Curve) AUC of 0.987 for the random forest model in the training set, and an AUC of 0.848 for the PR curve. In the validation set, the random forest model achieves an AUC of 0.996, and an AUC of 0.960 for the PR curve, which demonstrates an excellent discriminative ability. Calibration curves indicated high congruence between observed and predicted odds, with minimal Brier scores of 0.026 and 0.021 for the training, respectively, reinforcing the model's ability to discriminate. Set and validation set, respectively, reinforcing the model's predictive accuracy. DCA curves confirmed net benefits of the random forest model across. Furthermore, the AUC of the random forest model was 0.928 after internal validation by bootstrap method, indicating that its discriminative ability is good, and the model is useful for clinical assessment of the risk of coronary heart disease in depressed people.Conclusion: The random forest algorithm exhibited the best predictive performance, potentially aiding clinicians in assessing the risk probabilities of coronary heart disease within this population.\",\"PeriodicalId\":12414,\"journal\":{\"name\":\"Frontiers in Cardiovascular Medicine\",\"volume\":\"11 \",\"pages\":\"1504957\"},\"PeriodicalIF\":3.0000,\"publicationDate\":\"2025-01-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11754242/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Frontiers in Cardiovascular Medicine\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.3389/fcvm.2024.1504957\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"CARDIAC & CARDIOVASCULAR SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Cardiovascular Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3389/fcvm.2024.1504957","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"CARDIAC & CARDIOVASCULAR SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

背景：抑郁症越来越被认为是导致冠心病（CHD）的重要危险因素。目前，还没有专门设计的预测模型来评估抑郁症患者患冠心病的风险。我们的目标是开发一种机器学习（ML）模型，该模型将分析风险因素并预测抑郁症患者患冠心病的可能性。方法：本研究采用了2007-2018年国家健康与营养检查调查（NHANES）的数据，其中包括2,085名先前被诊断患有抑郁症的人。总体随机分为训练集和验证集，比例为8:2。采用单因素和多因素logistic回归分析确定抑郁症患者冠心病的独立危险因素。在训练集上应用了逻辑回归（LR）、随机森林（RF）、梯度增强机（GBM）、支持向量机（SVM）、极端梯度增强（XGBoost）、分类与回归树（CART）、k近邻（KNN）和神经网络（NNET）等8种机器学习算法构建模型。验证集用于评估八种机器学习模型的各种性能。采用几个评估指标来评估和比较八种不同机器学习模型的性能，旨在确定预测抑郁症患者冠心病风险的最有效算法。本研究采用的评价指标包括受试者工作特征（ROC）曲线下面积、校准曲线、Brier评分、决策曲线分析（DCA）和精密度-召回率（PR）曲线。并通过bootstrap方法进行内部验证。结果：单因素和多因素logistic回归分析确定年龄、胸痛状态、心肌梗死史、血清甘油三酯水平和教育水平是冠心病风险的独立预测因素。采用了8种机器学习算法构建模型，其中随机森林模型表现最好，训练集中随机森林模型的（曲线下面积）AUC为0.987，PR曲线的AUC为0.848。在验证集中，随机森林模型的AUC为0.996，PR曲线的AUC为0.960，具有良好的判别能力。校准曲线表明，观察到的概率和预测的概率高度一致，训练的Brier分数分别为0.026和0.021，增强了模型的区分能力。集和验证集，增强了模型的预测准确性。DCA曲线证实了随机森林模型的净收益。经bootstrap方法内部验证，随机森林模型的AUC为0.928，表明该模型的判别能力较好，可用于抑郁症患者冠心病风险的临床评估。结论：随机森林算法表现出最好的预测性能，潜在地帮助临床医生评估该人群中冠心病的风险概率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Development and validation of a prediction model for coronary heart disease risk in depressed patients aged 20 years and older using machine learning algorithms.

Background: Depression is being increasingly acknowledged as an important risk factor contributing to coronary heart disease (CHD). Currently, there is no predictive model specifically designed to evaluate the risk of coronary heart disease among individuals with depression. We aim to develop a machine learning (ML) model that will analyze risk factors and forecast the probability of coronary heart disease in individuals suffering from depression.

Methods: This research employed data from the National Health and Nutrition Examination Survey (NHANES) from 2007-2018, which included 2,085 individuals who had previously been diagnosed with depression. The population was randomly divided into a training set and a validation set, with an 8:2 ratio. Univariate and multivariate logistic regression analyses were employed to identify independent risk factors for coronary heart disease in individuals with depression. Eight machine learning algorithms were applied to the training set to construct the model, including logistic regression (LR), random forest (RF), gradient boosting machine (GBM), support vector machine (SVM), extreme gradient boosting (XGBoost), classification and regression tree (CART), k-nearest neighbors (KNN), and neural network (NNET). The validation set are used to evaluate the various performances of eight machine learning models. Several evaluation metrics were employed to assess and compare the performance of eight different machine learning models, aiming to identify the most effective algorithm for predicting coronary heart disease risk in individuals with depression. The evaluation metrics applied in this study included the area under the receiver operating characteristic (ROC) curve, calibration curve, Brier scores, decision curve analysis (DCA), and the precision-recall (PR) curve. And internally validated by the bootstrap method.

Results: Univariate and multivariate logistic regression analyses identified age, chest pain status, history of myocardial infarction, serum triglyceride levels, and education level as independent predictors of coronary heart disease risk. Eight machine learning algorithms are applied to construct the models, among which the Random Forest model has the best performance, with an (Area Under Curve) AUC of 0.987 for the random forest model in the training set, and an AUC of 0.848 for the PR curve. In the validation set, the random forest model achieves an AUC of 0.996, and an AUC of 0.960 for the PR curve, which demonstrates an excellent discriminative ability. Calibration curves indicated high congruence between observed and predicted odds, with minimal Brier scores of 0.026 and 0.021 for the training, respectively, reinforcing the model's ability to discriminate. Set and validation set, respectively, reinforcing the model's predictive accuracy. DCA curves confirmed net benefits of the random forest model across. Furthermore, the AUC of the random forest model was 0.928 after internal validation by bootstrap method, indicating that its discriminative ability is good, and the model is useful for clinical assessment of the risk of coronary heart disease in depressed people.

Conclusion: The random forest algorithm exhibited the best predictive performance, potentially aiding clinicians in assessing the risk probabilities of coronary heart disease within this population.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Frontiers in Cardiovascular Medicine Medicine-Cardiology and Cardiovascular Medicine

CiteScore

3.80

自引率

11.10%

发文量

3529

审稿时长

14 weeks

期刊介绍： Frontiers? Which frontiers? Where exactly are the frontiers of cardiovascular medicine? And who should be defining these frontiers? At Frontiers in Cardiovascular Medicine we believe it is worth being curious to foresee and explore beyond the current frontiers. In other words, we would like, through the articles published by our community journal Frontiers in Cardiovascular Medicine, to anticipate the future of cardiovascular medicine, and thus better prevent cardiovascular disorders and improve therapeutic options and outcomes of our patients.