Artificial intelligence methods applied to longitudinal data from electronic health records for prediction of cancer: a scoping review.

IF 3.4 3区 医学 Q1 HEALTH CARE SCIENCES & SERVICES BMC Medical Research Methodology Pub Date : 2025-01-28 DOI:10.1186/s12874-025-02473-w
Victoria Moglia, Owen Johnson, Gordon Cook, Marc de Kamps, Lesley Smith
{"title":"Artificial intelligence methods applied to longitudinal data from electronic health records for prediction of cancer: a scoping review.","authors":"Victoria Moglia, Owen Johnson, Gordon Cook, Marc de Kamps, Lesley Smith","doi":"10.1186/s12874-025-02473-w","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Early detection and diagnosis of cancer are vital to improving outcomes for patients. Artificial intelligence (AI) models have shown promise in the early detection and diagnosis of cancer, but there is limited evidence on methods that fully exploit the longitudinal data stored within electronic health records (EHRs). This review aims to summarise methods currently utilised for prediction of cancer from longitudinal data and provides recommendations on how such models should be developed.</p><p><strong>Methods: </strong>The review was conducted following PRISMA-ScR guidance. Six databases (MEDLINE, EMBASE, Web of Science, IEEE Xplore, PubMed and SCOPUS) were searched for relevant records published before 2/2/2024. Search terms related to the concepts \"artificial intelligence\", \"prediction\", \"health records\", \"longitudinal\", and \"cancer\". Data were extracted relating to several areas of the articles: (1) publication details, (2) study characteristics, (3) input data, (4) model characteristics, (4) reproducibility, and (5) quality assessment using the PROBAST tool. Models were evaluated against a framework for terminology relating to reporting of cancer detection and risk prediction models.</p><p><strong>Results: </strong>Of 653 records screened, 33 were included in the review; 10 predicted risk of cancer, 18 performed either cancer detection or early detection, 4 predicted recurrence, and 1 predicted metastasis. The most common cancers predicted in the studies were colorectal (n = 9) and pancreatic cancer (n = 9). 16 studies used feature engineering to represent temporal data, with the most common features representing trends. 18 used deep learning models which take a direct sequential input, most commonly recurrent neural networks, but also including convolutional neural networks and transformers. Prediction windows and lead times varied greatly between studies, even for models predicting the same cancer. High risk of bias was found in 90% of the studies. This risk was often introduced due to inappropriate study design (n = 26) and sample size (n = 26).</p><p><strong>Conclusion: </strong>This review highlights the breadth of approaches to cancer prediction from longitudinal data. We identify areas where reporting of methods could be improved, particularly regarding where in a patients' trajectory the model is applied. The review shows opportunities for further work, including comparison of these approaches and their applications in other cancers.</p>","PeriodicalId":9114,"journal":{"name":"BMC Medical Research Methodology","volume":"25 1","pages":"24"},"PeriodicalIF":3.4000,"publicationDate":"2025-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11773903/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medical Research Methodology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12874-025-02473-w","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Early detection and diagnosis of cancer are vital to improving outcomes for patients. Artificial intelligence (AI) models have shown promise in the early detection and diagnosis of cancer, but there is limited evidence on methods that fully exploit the longitudinal data stored within electronic health records (EHRs). This review aims to summarise methods currently utilised for prediction of cancer from longitudinal data and provides recommendations on how such models should be developed.

Methods: The review was conducted following PRISMA-ScR guidance. Six databases (MEDLINE, EMBASE, Web of Science, IEEE Xplore, PubMed and SCOPUS) were searched for relevant records published before 2/2/2024. Search terms related to the concepts "artificial intelligence", "prediction", "health records", "longitudinal", and "cancer". Data were extracted relating to several areas of the articles: (1) publication details, (2) study characteristics, (3) input data, (4) model characteristics, (4) reproducibility, and (5) quality assessment using the PROBAST tool. Models were evaluated against a framework for terminology relating to reporting of cancer detection and risk prediction models.

Results: Of 653 records screened, 33 were included in the review; 10 predicted risk of cancer, 18 performed either cancer detection or early detection, 4 predicted recurrence, and 1 predicted metastasis. The most common cancers predicted in the studies were colorectal (n = 9) and pancreatic cancer (n = 9). 16 studies used feature engineering to represent temporal data, with the most common features representing trends. 18 used deep learning models which take a direct sequential input, most commonly recurrent neural networks, but also including convolutional neural networks and transformers. Prediction windows and lead times varied greatly between studies, even for models predicting the same cancer. High risk of bias was found in 90% of the studies. This risk was often introduced due to inappropriate study design (n = 26) and sample size (n = 26).

Conclusion: This review highlights the breadth of approaches to cancer prediction from longitudinal data. We identify areas where reporting of methods could be improved, particularly regarding where in a patients' trajectory the model is applied. The review shows opportunities for further work, including comparison of these approaches and their applications in other cancers.

Abstract Image

Abstract Image

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
应用于预测癌症的电子健康记录纵向数据的人工智能方法:范围审查。
背景:早期发现和诊断癌症对改善患者预后至关重要。人工智能(AI)模型在癌症的早期检测和诊断方面显示出了希望,但在充分利用电子健康记录(EHRs)中存储的纵向数据的方法方面,证据有限。本综述旨在总结目前用于从纵向数据预测癌症的方法,并就如何开发此类模型提出建议。方法:按照PRISMA-ScR指南进行综述。在六个数据库(MEDLINE, EMBASE, Web of Science, IEEE Xplore, PubMed和SCOPUS)中检索2024年2月2日之前发表的相关记录。与“人工智能”、“预测”、“健康记录”、“纵向”和“癌症”等概念相关的搜索词。提取与文章的几个方面相关的数据:(1)发表细节,(2)研究特征,(3)输入数据,(4)模型特征,(4)可重复性,(5)使用PROBAST工具进行质量评估。根据与癌症检测报告和风险预测模型相关的术语框架对模型进行评估。结果:筛选的653例病例中,33例纳入综述;10例预测癌症风险,18例进行了癌症检测或早期检测,4例预测复发,1例预测转移。研究中预测的最常见的癌症是结肠直肠癌(n = 9)和胰腺癌(n = 9)。16项研究使用特征工程来表示时间数据,最常见的特征表示趋势。18个使用了深度学习模型,这些模型采用直接顺序输入,最常见的是循环神经网络,但也包括卷积神经网络和变压器。不同研究的预测窗口和提前期差异很大,即使是预测同一种癌症的模型也是如此。在90%的研究中发现了高偏倚风险。这种风险通常是由于不适当的研究设计(n = 26)和样本量(n = 26)而引入的。结论:这篇综述强调了从纵向数据预测癌症的方法的广度。我们确定了报告方法可以改进的领域,特别是关于在患者的轨迹中应用模型的地方。该综述显示了进一步研究的机会,包括比较这些方法及其在其他癌症中的应用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
BMC Medical Research Methodology
BMC Medical Research Methodology 医学-卫生保健
CiteScore
6.50
自引率
2.50%
发文量
298
审稿时长
3-8 weeks
期刊介绍: BMC Medical Research Methodology is an open access journal publishing original peer-reviewed research articles in methodological approaches to healthcare research. Articles on the methodology of epidemiological research, clinical trials and meta-analysis/systematic review are particularly encouraged, as are empirical studies of the associations between choice of methodology and study outcomes. BMC Medical Research Methodology does not aim to publish articles describing scientific methods or techniques: these should be directed to the BMC journal covering the relevant biomedical subject area.
期刊最新文献
Longitudinal trajectories of clinical knowledge performance in cardiology residency a mixed-effects analysis with psychometric adjustment of routine assessments. Comparative evaluation of Bayesian external information borrowing and frequentist approaches in underpowered confirmatory trials. Estimating intervention impacts when timing is unclear: an AR-LagDT model with distributed lags. Variational biomarker pooling with calibration for time-to-event outcomes across multiple clinical studies. Recall bias in population-based case-control studies of ovarian cancer and genital talcum powder use: potential impact and quantitative bias analysis.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1