A Systematic Review of Machine Learning-based Prognostic Models for Acute Pancreatitis: Towards Improving Methods and Reporting Quality

medRxiv - Gastroenterology Pub Date : 2024-06-27 DOI:10.1101/2024.06.26.24309389

Brian Critelli, Amier Hassan, Ila Lahooti, Lydia Noh, Jun Sung Park, Kathleen Tong, Ali Lahooti, Nathan Matzko, Jan Niklas Adams, Lukas Liss, Justin Quion, David Restrepo, Melica Nikahd, Stacey Culp, Adam Lacy-Hulbert, Cate Speake, James Buxbaum, Jason Bischof, Cemal Yazici, Anna Evans-Phillips, Sophie Terp, Alexandra Weissman, Darwin Conwell, Philip Hart, Mitchell Ramsey, Somashekar Krishna, Samuel Han, Erica Park, Raj Shah, Venkata Akshintala, John A Windsor, Nikhil K Mull, Georgios Papachristou, Leo Anthony Celi, Peter Lee

{"title":"A Systematic Review of Machine Learning-based Prognostic Models for Acute Pancreatitis: Towards Improving Methods and Reporting Quality","authors":"Brian Critelli, Amier Hassan, Ila Lahooti, Lydia Noh, Jun Sung Park, Kathleen Tong, Ali Lahooti, Nathan Matzko, Jan Niklas Adams, Lukas Liss, Justin Quion, David Restrepo, Melica Nikahd, Stacey Culp, Adam Lacy-Hulbert, Cate Speake, James Buxbaum, Jason Bischof, Cemal Yazici, Anna Evans-Phillips, Sophie Terp, Alexandra Weissman, Darwin Conwell, Philip Hart, Mitchell Ramsey, Somashekar Krishna, Samuel Han, Erica Park, Raj Shah, Venkata Akshintala, John A Windsor, Nikhil K Mull, Georgios Papachristou, Leo Anthony Celi, Peter Lee","doi":"10.1101/2024.06.26.24309389","DOIUrl":null,"url":null,"abstract":"Title: A Systematic Review of Machine Learning-based Prognostic Models for Acute Pancreatitis: Towards Improving Methods and Reporting Quality Background: An accurate prognostic tool is essential to aid clinical decision making (e.g., patient triage) and to advance personalized medicine. However, such prognostic tool is lacking for acute pancreatitis (AP). Increasingly machine learning (ML) techniques are being used to develop high-performing prognostic models in AP. However, methodologic and reporting quality has received little attention. High-quality reporting and study methodology are critical to model validity, reproducibility, and clinical implementation. In collaboration with content experts in ML methodology, we performed a systematic review critically appraising the quality of methodology and reporting of recently published ML AP prognostic models. Methods: Using a validated search strategy, we identified ML AP studies from the databases MEDLINE, PubMed, and EMBASE published between January 2021 and December 2023. Eligibility criteria included all retrospective or prospective studies that developed or validated new or existing ML models in patients with AP that predicted an outcome following an episode of AP. Meta-analysis was considered if there was homogeneity in the study design and in the type of outcome predicted. For risk of bias (ROB) assessment, we used the Prediction Model Risk of Bias Assessment Tool (PROBAST). Quality of reporting was assessed using the Transparent Reporting of a Multivariable Prediction Model of Individual Prognosis or Diagnosis – Artificial Intelligence (TRIPOD+AI) statement that defines standards for 27 items that should be reported in publications using ML prognostic models. Results: The search strategy identified 6480 publications of which 30 met the eligibility criteria. Studies originated from China (22), U.S (4), and other (4). All 30 studies developed a new ML model and none sought to validate an existing ML model, producing a total of 39 new ML models. AP severity (23/39) or mortality (6/39) were the most common outcomes predicted. The mean area-under-the-curve for all models and endpoints was 0.91 (SD 0.08). The ROB was high for at least one domain in all 39 models, particularly for the analysis domain (37/39 models). Steps were not taken to minimize over-optimistic model performance in 27/39 models. Due to heterogeneity in the study design and in how the outcomes were defined and determined, meta-analysis was not performed.\nStudies reported on only 15/27 items from TRIPOD+AI standards, with only 7/30 justifying sample size and 13/30 assessing data quality. Other reporting deficiencies included omissions regarding human-AI interaction (28/30), handling low-quality or incomplete data in practice (27/30), sharing analytical codes (25/30), study protocols (25/30) and reporting source data (19/30),. Discussion: There are significant deficiencies in the methodology and reporting of recently published ML based prognostic models in AP patients. These undermine the validity, reproducibility and implementation of these prognostic models despite their promise of superior predictive accuracy. Funding: none\nRegistration: Research Registry (reviewregistry1727)","PeriodicalId":501258,"journal":{"name":"medRxiv - Gastroenterology","volume":"12 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"medRxiv - Gastroenterology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.06.26.24309389","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Title: A Systematic Review of Machine Learning-based Prognostic Models for Acute Pancreatitis: Towards Improving Methods and Reporting Quality Background: An accurate prognostic tool is essential to aid clinical decision making (e.g., patient triage) and to advance personalized medicine. However, such prognostic tool is lacking for acute pancreatitis (AP). Increasingly machine learning (ML) techniques are being used to develop high-performing prognostic models in AP. However, methodologic and reporting quality has received little attention. High-quality reporting and study methodology are critical to model validity, reproducibility, and clinical implementation. In collaboration with content experts in ML methodology, we performed a systematic review critically appraising the quality of methodology and reporting of recently published ML AP prognostic models. Methods: Using a validated search strategy, we identified ML AP studies from the databases MEDLINE, PubMed, and EMBASE published between January 2021 and December 2023. Eligibility criteria included all retrospective or prospective studies that developed or validated new or existing ML models in patients with AP that predicted an outcome following an episode of AP. Meta-analysis was considered if there was homogeneity in the study design and in the type of outcome predicted. For risk of bias (ROB) assessment, we used the Prediction Model Risk of Bias Assessment Tool (PROBAST). Quality of reporting was assessed using the Transparent Reporting of a Multivariable Prediction Model of Individual Prognosis or Diagnosis – Artificial Intelligence (TRIPOD+AI) statement that defines standards for 27 items that should be reported in publications using ML prognostic models. Results: The search strategy identified 6480 publications of which 30 met the eligibility criteria. Studies originated from China (22), U.S (4), and other (4). All 30 studies developed a new ML model and none sought to validate an existing ML model, producing a total of 39 new ML models. AP severity (23/39) or mortality (6/39) were the most common outcomes predicted. The mean area-under-the-curve for all models and endpoints was 0.91 (SD 0.08). The ROB was high for at least one domain in all 39 models, particularly for the analysis domain (37/39 models). Steps were not taken to minimize over-optimistic model performance in 27/39 models. Due to heterogeneity in the study design and in how the outcomes were defined and determined, meta-analysis was not performed. Studies reported on only 15/27 items from TRIPOD+AI standards, with only 7/30 justifying sample size and 13/30 assessing data quality. Other reporting deficiencies included omissions regarding human-AI interaction (28/30), handling low-quality or incomplete data in practice (27/30), sharing analytical codes (25/30), study protocols (25/30) and reporting source data (19/30),. Discussion: There are significant deficiencies in the methodology and reporting of recently published ML based prognostic models in AP patients. These undermine the validity, reproducibility and implementation of these prognostic models despite their promise of superior predictive accuracy. Funding: none Registration: Research Registry (reviewregistry1727)

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于机器学习的急性胰腺炎预后模型的系统性回顾：改进方法和报告质量

标题：基于机器学习的急性胰腺炎预后模型的系统性回顾：提高方法和报告质量背景：准确的预后工具对于辅助临床决策（如患者分流）和推进个性化医疗至关重要。然而，急性胰腺炎（AP）尚缺乏此类预后工具。越来越多的机器学习（ML）技术被用于开发急性胰腺炎的高效预后模型。然而，方法学和报告质量却很少受到关注。高质量的报告和研究方法对模型的有效性、可重复性和临床实施至关重要。我们与 ML 方法学方面的专家合作，对最近发表的 ML AP 预后模型的方法学和报告质量进行了严格评估。方法：采用经过验证的检索策略，我们从 MEDLINE、PubMed 和 EMBASE 等数据库中确定了 2021 年 1 月至 2023 年 12 月间发表的 ML AP 研究。资格标准包括所有回顾性或前瞻性研究，这些研究针对 AP 患者开发或验证了新的或现有的 ML 模型，可预测 AP 发病后的结果。如果研究设计和预测结果类型具有同质性，则考虑进行荟萃分析。在偏倚风险 (ROB) 评估方面，我们使用了预测模型偏倚风险评估工具 (PROBAST)。报告质量采用 "个体预后或诊断的多变量预测模型的透明报告--人工智能（TRIPOD+AI）声明 "进行评估，该声明定义了在使用多变量预后模型的出版物中应报告的 27 个项目的标准。结果：搜索策略共发现 6480 篇出版物，其中 30 篇符合资格标准。研究分别来自中国（22 项）、美国（4 项）和其他国家（4 项）。所有 30 项研究都开发了一种新的 ML 模型，没有研究试图验证现有的 ML 模型，因此总共产生了 39 个新的 ML 模型。AP 严重程度（23/39）或死亡率（6/39）是最常见的预测结果。所有模型和终点的平均曲线下面积为 0.91（SD 0.08）。在所有 39 个模型中，至少有一个领域的 ROB 很高，尤其是分析领域（37/39 个模型）。在 27/39 个模型中，没有采取措施尽量降低过于乐观的模型性能。由于研究设计以及结果的定义和确定方式存在异质性，因此未进行荟萃分析。研究仅报告了 TRIPOD+AI 标准中的 15/27 个项目，其中仅 7/30 个项目说明了样本量的合理性，13/30 个项目评估了数据质量。其他报告缺陷包括遗漏了人类与人工智能之间的相互作用（28/30）、在实践中处理低质量或不完整数据（27/30）、共享分析代码（25/30）、研究方案（25/30）和报告源数据（19/30）。讨论：最近发表的基于 ML 的 AP 患者预后模型在方法和报告方面存在重大缺陷。尽管这些预后模型有望获得更高的预测准确性，但它们的有效性、可重复性和实施性都受到了影响。资助：无注册：研究注册（Reviewregistry1727）

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

medRxiv - Gastroenterology

自引率

0.00%

发文量