Artificial intelligence methods applied to longitudinal data from electronic health records for prediction of cancer: a scoping review.

IF 3.4 3区医学 Q1 HEALTH CARE SCIENCES & SERVICES BMC Medical Research Methodology Pub Date : 2025-01-28 DOI:10.1186/s12874-025-02473-w

Victoria Moglia, Owen Johnson, Gordon Cook, Marc de Kamps, Lesley Smith

{"title":"Artificial intelligence methods applied to longitudinal data from electronic health records for prediction of cancer: a scoping review.","authors":"Victoria Moglia, Owen Johnson, Gordon Cook, Marc de Kamps, Lesley Smith","doi":"10.1186/s12874-025-02473-w","DOIUrl":null,"url":null,"abstract":"Background: Early detection and diagnosis of cancer are vital to improving outcomes for patients. Artificial intelligence (AI) models have shown promise in the early detection and diagnosis of cancer, but there is limited evidence on methods that fully exploit the longitudinal data stored within electronic health records (EHRs). This review aims to summarise methods currently utilised for prediction of cancer from longitudinal data and provides recommendations on how such models should be developed.Methods: The review was conducted following PRISMA-ScR guidance. Six databases (MEDLINE, EMBASE, Web of Science, IEEE Xplore, PubMed and SCOPUS) were searched for relevant records published before 2/2/2024. Search terms related to the concepts \"artificial intelligence\", \"prediction\", \"health records\", \"longitudinal\", and \"cancer\". Data were extracted relating to several areas of the articles: (1) publication details, (2) study characteristics, (3) input data, (4) model characteristics, (4) reproducibility, and (5) quality assessment using the PROBAST tool. Models were evaluated against a framework for terminology relating to reporting of cancer detection and risk prediction models.Results: Of 653 records screened, 33 were included in the review; 10 predicted risk of cancer, 18 performed either cancer detection or early detection, 4 predicted recurrence, and 1 predicted metastasis. The most common cancers predicted in the studies were colorectal (n = 9) and pancreatic cancer (n = 9). 16 studies used feature engineering to represent temporal data, with the most common features representing trends. 18 used deep learning models which take a direct sequential input, most commonly recurrent neural networks, but also including convolutional neural networks and transformers. Prediction windows and lead times varied greatly between studies, even for models predicting the same cancer. High risk of bias was found in 90% of the studies. This risk was often introduced due to inappropriate study design (n = 26) and sample size (n = 26).Conclusion: This review highlights the breadth of approaches to cancer prediction from longitudinal data. We identify areas where reporting of methods could be improved, particularly regarding where in a patients' trajectory the model is applied. The review shows opportunities for further work, including comparison of these approaches and their applications in other cancers.","PeriodicalId":9114,"journal":{"name":"BMC Medical Research Methodology","volume":"25 1","pages":"24"},"PeriodicalIF":3.4000,"publicationDate":"2025-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11773903/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medical Research Methodology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12874-025-02473-w","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Early detection and diagnosis of cancer are vital to improving outcomes for patients. Artificial intelligence (AI) models have shown promise in the early detection and diagnosis of cancer, but there is limited evidence on methods that fully exploit the longitudinal data stored within electronic health records (EHRs). This review aims to summarise methods currently utilised for prediction of cancer from longitudinal data and provides recommendations on how such models should be developed.

Methods: The review was conducted following PRISMA-ScR guidance. Six databases (MEDLINE, EMBASE, Web of Science, IEEE Xplore, PubMed and SCOPUS) were searched for relevant records published before 2/2/2024. Search terms related to the concepts "artificial intelligence", "prediction", "health records", "longitudinal", and "cancer". Data were extracted relating to several areas of the articles: (1) publication details, (2) study characteristics, (3) input data, (4) model characteristics, (4) reproducibility, and (5) quality assessment using the PROBAST tool. Models were evaluated against a framework for terminology relating to reporting of cancer detection and risk prediction models.

Results: Of 653 records screened, 33 were included in the review; 10 predicted risk of cancer, 18 performed either cancer detection or early detection, 4 predicted recurrence, and 1 predicted metastasis. The most common cancers predicted in the studies were colorectal (n = 9) and pancreatic cancer (n = 9). 16 studies used feature engineering to represent temporal data, with the most common features representing trends. 18 used deep learning models which take a direct sequential input, most commonly recurrent neural networks, but also including convolutional neural networks and transformers. Prediction windows and lead times varied greatly between studies, even for models predicting the same cancer. High risk of bias was found in 90% of the studies. This risk was often introduced due to inappropriate study design (n = 26) and sample size (n = 26).

Conclusion: This review highlights the breadth of approaches to cancer prediction from longitudinal data. We identify areas where reporting of methods could be improved, particularly regarding where in a patients' trajectory the model is applied. The review shows opportunities for further work, including comparison of these approaches and their applications in other cancers.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

应用于预测癌症的电子健康记录纵向数据的人工智能方法：范围审查。

背景：早期发现和诊断癌症对改善患者预后至关重要。人工智能（AI）模型在癌症的早期检测和诊断方面显示出了希望，但在充分利用电子健康记录（EHRs）中存储的纵向数据的方法方面，证据有限。本综述旨在总结目前用于从纵向数据预测癌症的方法，并就如何开发此类模型提出建议。方法：按照PRISMA-ScR指南进行综述。在六个数据库（MEDLINE， EMBASE, Web of Science, IEEE Xplore， PubMed和SCOPUS）中检索2024年2月2日之前发表的相关记录。与“人工智能”、“预测”、“健康记录”、“纵向”和“癌症”等概念相关的搜索词。提取与文章的几个方面相关的数据：(1)发表细节，(2)研究特征，(3)输入数据，(4)模型特征，(4)可重复性，(5)使用PROBAST工具进行质量评估。根据与癌症检测报告和风险预测模型相关的术语框架对模型进行评估。结果：筛选的653例病例中，33例纳入综述；10例预测癌症风险，18例进行了癌症检测或早期检测，4例预测复发，1例预测转移。研究中预测的最常见的癌症是结肠直肠癌（n = 9）和胰腺癌（n = 9）。16项研究使用特征工程来表示时间数据，最常见的特征表示趋势。18个使用了深度学习模型，这些模型采用直接顺序输入，最常见的是循环神经网络，但也包括卷积神经网络和变压器。不同研究的预测窗口和提前期差异很大，即使是预测同一种癌症的模型也是如此。在90%的研究中发现了高偏倚风险。这种风险通常是由于不适当的研究设计（n = 26）和样本量（n = 26）而引入的。结论：这篇综述强调了从纵向数据预测癌症的方法的广度。我们确定了报告方法可以改进的领域，特别是关于在患者的轨迹中应用模型的地方。该综述显示了进一步研究的机会，包括比较这些方法及其在其他癌症中的应用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

BMC Medical Research Methodology 医学-卫生保健

CiteScore

6.50

自引率

2.50%

发文量

298

审稿时长

3-8 weeks

期刊介绍： BMC Medical Research Methodology is an open access journal publishing original peer-reviewed research articles in methodological approaches to healthcare research. Articles on the methodology of epidemiological research, clinical trials and meta-analysis/systematic review are particularly encouraged, as are empirical studies of the associations between choice of methodology and study outcomes. BMC Medical Research Methodology does not aim to publish articles describing scientific methods or techniques: these should be directed to the BMC journal covering the relevant biomedical subject area.