A comparative study of model-centric and data-centric approaches in the development of cardiovascular disease risk prediction models in the UK Biobank.

IF 3.9 Q1 CARDIAC & CARDIOVASCULAR SYSTEMS European heart journal. Digital health Pub Date : 2023-08-01 DOI:10.1093/ehjdh/ztad033
Mohammad Mamouei, Thomas Fisher, Shishir Rao, Yikuan Li, Ghomalreza Salimi-Khorshidi, Kazem Rahimi
{"title":"A comparative study of model-centric and data-centric approaches in the development of cardiovascular disease risk prediction models in the UK Biobank.","authors":"Mohammad Mamouei,&nbsp;Thomas Fisher,&nbsp;Shishir Rao,&nbsp;Yikuan Li,&nbsp;Ghomalreza Salimi-Khorshidi,&nbsp;Kazem Rahimi","doi":"10.1093/ehjdh/ztad033","DOIUrl":null,"url":null,"abstract":"<p><strong>Aims: </strong>A diverse set of factors influence cardiovascular diseases (CVDs), but a systematic investigation of the interplay between these determinants and the contribution of each to CVD incidence prediction is largely missing from the literature. In this study, we leverage one of the most comprehensive biobanks worldwide, the UK Biobank, to investigate the contribution of different risk factor categories to more accurate incidence predictions in the overall population, by sex, different age groups, and ethnicity.</p><p><strong>Methods and results: </strong>The investigated categories include the history of medical events, behavioural factors, socioeconomic factors, environmental factors, and measurements. We included data from a cohort of 405 257 participants aged 37-73 years and trained various machine learning and deep learning models on different subsets of risk factors to predict CVD incidence. Each of the models was trained on the complete set of predictors and subsets where each category was excluded. The results were benchmarked against QRISK3. The findings highlight that (i) leveraging a more comprehensive medical history substantially improves model performance. Relative to QRISK3, the best performing models improved the discrimination by 3.78% and improved precision by 1.80%. (ii) Both model- and data-centric approaches are necessary to improve predictive performance. The benefits of using a comprehensive history of diseases were far more pronounced when a neural sequence model, BEHRT, was used. This highlights the importance of the temporality of medical events that existing clinical risk models fail to capture. (iii) Besides the history of diseases, socioeconomic factors and measurements had small but significant independent contributions to the predictive performance.</p><p><strong>Conclusion: </strong>These findings emphasize the need for considering broad determinants and novel modelling approaches to enhance CVD incidence prediction.</p>","PeriodicalId":72965,"journal":{"name":"European heart journal. Digital health","volume":null,"pages":null},"PeriodicalIF":3.9000,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/0e/a6/ztad033.PMC10393888.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"European heart journal. Digital health","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/ehjdh/ztad033","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CARDIAC & CARDIOVASCULAR SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Aims: A diverse set of factors influence cardiovascular diseases (CVDs), but a systematic investigation of the interplay between these determinants and the contribution of each to CVD incidence prediction is largely missing from the literature. In this study, we leverage one of the most comprehensive biobanks worldwide, the UK Biobank, to investigate the contribution of different risk factor categories to more accurate incidence predictions in the overall population, by sex, different age groups, and ethnicity.

Methods and results: The investigated categories include the history of medical events, behavioural factors, socioeconomic factors, environmental factors, and measurements. We included data from a cohort of 405 257 participants aged 37-73 years and trained various machine learning and deep learning models on different subsets of risk factors to predict CVD incidence. Each of the models was trained on the complete set of predictors and subsets where each category was excluded. The results were benchmarked against QRISK3. The findings highlight that (i) leveraging a more comprehensive medical history substantially improves model performance. Relative to QRISK3, the best performing models improved the discrimination by 3.78% and improved precision by 1.80%. (ii) Both model- and data-centric approaches are necessary to improve predictive performance. The benefits of using a comprehensive history of diseases were far more pronounced when a neural sequence model, BEHRT, was used. This highlights the importance of the temporality of medical events that existing clinical risk models fail to capture. (iii) Besides the history of diseases, socioeconomic factors and measurements had small but significant independent contributions to the predictive performance.

Conclusion: These findings emphasize the need for considering broad determinants and novel modelling approaches to enhance CVD incidence prediction.

Abstract Image

Abstract Image

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
以模型为中心和以数据为中心的方法在英国生物银行开发心血管疾病风险预测模型的比较研究。
目的:影响心血管疾病(CVD)的因素多种多样,但对这些决定因素之间的相互作用以及每个因素对CVD发病率预测的贡献的系统调查在文献中很大程度上缺失。在这项研究中,我们利用世界上最全面的生物银行之一,英国生物银行,调查不同风险因素类别对更准确的总体发病率预测的贡献,按性别,不同年龄组和种族划分。方法与结果:调查类别包括病史、行为因素、社会经济因素、环境因素和测量。我们纳入了年龄在37-73岁之间的405257名参与者的队列数据,并针对不同的危险因素子集训练了各种机器学习和深度学习模型,以预测心血管疾病的发病率。每个模型都在完整的预测因子集和子集上进行训练,其中每个类别都被排除在外。结果以QRISK3为基准。研究结果强调(i)利用更全面的病史大大提高了模型的性能。与QRISK3相比,表现最好的模型的识别率提高了3.78%,精度提高了1.80%。以模型为中心和以数据为中心的方法都是改善预测性能所必需的。当使用神经序列模型BEHRT时,使用综合病史的好处更加明显。这突出了现有临床风险模型未能捕捉到的医疗事件的时间性的重要性。(三)除病史外,社会经济因素和测量方法对预测效果的独立贡献虽小,但意义重大。结论:这些发现强调需要考虑广泛的决定因素和新的建模方法来增强CVD发病率预测。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
5.00
自引率
0.00%
发文量
0
期刊最新文献
Correction to: The association of electronic health literacy with behavioural and psychological coronary artery disease risk factors in patients after percutaneous coronary intervention: a 12-month follow-up study. Prospects for artificial intelligence-enhanced electrocardiogram as a unified screening tool for cardiac and non-cardiac conditions: an explorative study in emergency care. Unlocking the potential of artificial intelligence in electrocardiogram biometrics: age-related changes, anomaly detection, and data authenticity in mobile health platforms. Hypertrophic cardiomyopathy detection with artificial intelligence electrocardiography in international cohorts: an external validation study. Development and validation of risk prediction model for recurrent cardiovascular events among Chinese: the Personalized CARdiovascular DIsease risk Assessment for Chinese model.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1