Validation of risk prediction models applied to longitudinal electronic health record data for the prediction of major cardiovascular events in the presence of data shifts.

IF 4.4 Q1 CARDIAC & CARDIOVASCULAR SYSTEMS European heart journal. Digital health Pub Date : 2022-10-21 eCollection Date: 2022-12-01 DOI:10.1093/ehjdh/ztac061

Yikuan Li, Gholamreza Salimi-Khorshidi, Shishir Rao, Dexter Canoy, Abdelaali Hassaine, Thomas Lukasiewicz, Kazem Rahimi, Mohammad Mamouei

{"title":"Validation of risk prediction models applied to longitudinal electronic health record data for the prediction of major cardiovascular events in the presence of data shifts.","authors":"Yikuan Li, Gholamreza Salimi-Khorshidi, Shishir Rao, Dexter Canoy, Abdelaali Hassaine, Thomas Lukasiewicz, Kazem Rahimi, Mohammad Mamouei","doi":"10.1093/ehjdh/ztac061","DOIUrl":null,"url":null,"abstract":"Aims: Deep learning has dominated predictive modelling across different fields, but in medicine it has been met with mixed reception. In clinical practice, simple, statistical models and risk scores continue to inform cardiovascular disease risk predictions. This is due in part to the knowledge gap about how deep learning models perform in practice when they are subject to dynamic data shifts; a key criterion that common internal validation procedures do not address. We evaluated the performance of a novel deep learning model, BEHRT, under data shifts and compared it with several ML-based and established risk models.Methods and results: Using linked electronic health records of 1.1 million patients across England aged at least 35 years between 1985 and 2015, we replicated three established statistical models for predicting 5-year risk of incident heart failure, stroke, and coronary heart disease. The results were compared with a widely accepted machine learning model (random forests), and a novel deep learning model (BEHRT). In addition to internal validation, we investigated how data shifts affect model discrimination and calibration. To this end, we tested the models on cohorts from (i) distinct geographical regions; (ii) different periods. Using internal validation, the deep learning models substantially outperformed the best statistical models by 6%, 8%, and 11% in heart failure, stroke, and coronary heart disease, respectively, in terms of the area under the receiver operating characteristic curve.Conclusion: The performance of all models declined as a result of data shifts; despite this, the deep learning models maintained the best performance in all risk prediction tasks. Updating the model with the latest information can improve discrimination but if the prior distribution changes, the model may remain miscalibrated.","PeriodicalId":72965,"journal":{"name":"European heart journal. Digital health","volume":"3 4","pages":"535-547"},"PeriodicalIF":4.4000,"publicationDate":"2022-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9779795/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"European heart journal. Digital health","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/ehjdh/ztac061","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2022/12/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"CARDIAC & CARDIOVASCULAR SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Aims: Deep learning has dominated predictive modelling across different fields, but in medicine it has been met with mixed reception. In clinical practice, simple, statistical models and risk scores continue to inform cardiovascular disease risk predictions. This is due in part to the knowledge gap about how deep learning models perform in practice when they are subject to dynamic data shifts; a key criterion that common internal validation procedures do not address. We evaluated the performance of a novel deep learning model, BEHRT, under data shifts and compared it with several ML-based and established risk models.

Methods and results: Using linked electronic health records of 1.1 million patients across England aged at least 35 years between 1985 and 2015, we replicated three established statistical models for predicting 5-year risk of incident heart failure, stroke, and coronary heart disease. The results were compared with a widely accepted machine learning model (random forests), and a novel deep learning model (BEHRT). In addition to internal validation, we investigated how data shifts affect model discrimination and calibration. To this end, we tested the models on cohorts from (i) distinct geographical regions; (ii) different periods. Using internal validation, the deep learning models substantially outperformed the best statistical models by 6%, 8%, and 11% in heart failure, stroke, and coronary heart disease, respectively, in terms of the area under the receiver operating characteristic curve.

Conclusion: The performance of all models declined as a result of data shifts; despite this, the deep learning models maintained the best performance in all risk prediction tasks. Updating the model with the latest information can improve discrimination but if the prior distribution changes, the model may remain miscalibrated.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

验证应用于纵向电子健康记录数据的风险预测模型，以预测存在数据变化的主要心血管事件。

目的：深度学习在不同领域的预测建模中占据主导地位，但在医学领域，人们对它的反应褒贬不一。在临床实践中，简单的统计模型和风险评分继续为心血管疾病的风险预测提供信息。这在一定程度上是由于深度学习模型在实践中受到动态数据变化影响时的表现存在知识差距；一个常见的内部验证过程没有解决的关键标准。我们评估了一种新的深度学习模型BEHRT在数据移位下的性能，并将其与几种基于ml和已建立的风险模型进行了比较。方法和结果：使用1985年至2015年间英格兰110万名年龄在35岁以上患者的相关电子健康记录，我们复制了三种已建立的统计模型，用于预测5年心力衰竭、中风和冠心病的发生风险。将结果与广泛接受的机器学习模型（随机森林）和新的深度学习模型（BEHRT）进行比较。除了内部验证，我们还研究了数据移位如何影响模型判别和校准。为此，我们对来自不同地理区域的队列进行了模型测试；（ii）不同时期。通过内部验证，深度学习模型在心力衰竭、中风和冠心病的受试者工作特征曲线下的面积方面，分别比最佳统计模型高出6%、8%和11%。结论：所有模型的性能均因数据移位而下降；尽管如此，深度学习模型在所有风险预测任务中都保持了最好的表现。用最新的信息更新模型可以改善判别，但如果先验分布发生变化，模型可能仍然是错校准的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

European heart journal. Digital health

CiteScore

5.00

自引率

0.00%

发文量