A Systematic Review of Machine Learning-based Prognostic Models for Acute Pancreatitis: Towards Improving Methods and Reporting Quality

Brian Critelli, Amier Hassan, Ila Lahooti, Lydia Noh, Jun Sung Park, Kathleen Tong, Ali Lahooti, Nathan Matzko, Jan Niklas Adams, Lukas Liss, Justin Quion, David Restrepo, Melica Nikahd, Stacey Culp, Adam Lacy-Hulbert, Cate Speake, James Buxbaum, Jason Bischof, Cemal Yazici, Anna Evans-Phillips, Sophie Terp, Alexandra Weissman, Darwin Conwell, Philip Hart, Mitchell Ramsey, Somashekar Krishna, Samuel Han, Erica Park, Raj Shah, Venkata Akshintala, John A Windsor, Nikhil K Mull, Georgios Papachristou, Leo Anthony Celi, Peter Lee
{"title":"A Systematic Review of Machine Learning-based Prognostic Models for Acute Pancreatitis: Towards Improving Methods and Reporting Quality","authors":"Brian Critelli, Amier Hassan, Ila Lahooti, Lydia Noh, Jun Sung Park, Kathleen Tong, Ali Lahooti, Nathan Matzko, Jan Niklas Adams, Lukas Liss, Justin Quion, David Restrepo, Melica Nikahd, Stacey Culp, Adam Lacy-Hulbert, Cate Speake, James Buxbaum, Jason Bischof, Cemal Yazici, Anna Evans-Phillips, Sophie Terp, Alexandra Weissman, Darwin Conwell, Philip Hart, Mitchell Ramsey, Somashekar Krishna, Samuel Han, Erica Park, Raj Shah, Venkata Akshintala, John A Windsor, Nikhil K Mull, Georgios Papachristou, Leo Anthony Celi, Peter Lee","doi":"10.1101/2024.06.26.24309389","DOIUrl":null,"url":null,"abstract":"Title: A Systematic Review of Machine Learning-based Prognostic Models for Acute Pancreatitis: Towards Improving Methods and Reporting Quality Background: An accurate prognostic tool is essential to aid clinical decision making (e.g., patient triage) and to advance personalized medicine. However, such prognostic tool is lacking for acute pancreatitis (AP). Increasingly machine learning (ML) techniques are being used to develop high-performing prognostic models in AP. However, methodologic and reporting quality has received little attention. High-quality reporting and study methodology are critical to model validity, reproducibility, and clinical implementation. In collaboration with content experts in ML methodology, we performed a systematic review critically appraising the quality of methodology and reporting of recently published ML AP prognostic models. Methods: Using a validated search strategy, we identified ML AP studies from the databases MEDLINE, PubMed, and EMBASE published between January 2021 and December 2023. Eligibility criteria included all retrospective or prospective studies that developed or validated new or existing ML models in patients with AP that predicted an outcome following an episode of AP. Meta-analysis was considered if there was homogeneity in the study design and in the type of outcome predicted. For risk of bias (ROB) assessment, we used the Prediction Model Risk of Bias Assessment Tool (PROBAST). Quality of reporting was assessed using the Transparent Reporting of a Multivariable Prediction Model of Individual Prognosis or Diagnosis – Artificial Intelligence (TRIPOD+AI) statement that defines standards for 27 items that should be reported in publications using ML prognostic models. Results: The search strategy identified 6480 publications of which 30 met the eligibility criteria. Studies originated from China (22), U.S (4), and other (4). All 30 studies developed a new ML model and none sought to validate an existing ML model, producing a total of 39 new ML models. AP severity (23/39) or mortality (6/39) were the most common outcomes predicted. The mean area-under-the-curve for all models and endpoints was 0.91 (SD 0.08). The ROB was high for at least one domain in all 39 models, particularly for the analysis domain (37/39 models). Steps were not taken to minimize over-optimistic model performance in 27/39 models. Due to heterogeneity in the study design and in how the outcomes were defined and determined, meta-analysis was not performed.\nStudies reported on only 15/27 items from TRIPOD+AI standards, with only 7/30 justifying sample size and 13/30 assessing data quality. Other reporting deficiencies included omissions regarding human-AI interaction (28/30), handling low-quality or incomplete data in practice (27/30), sharing analytical codes (25/30), study protocols (25/30) and reporting source data (19/30),. Discussion: There are significant deficiencies in the methodology and reporting of recently published ML based prognostic models in AP patients. These undermine the validity, reproducibility and implementation of these prognostic models despite their promise of superior predictive accuracy. Funding: none\nRegistration: Research Registry (reviewregistry1727)","PeriodicalId":501258,"journal":{"name":"medRxiv - Gastroenterology","volume":"12 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"medRxiv - Gastroenterology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.06.26.24309389","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Title: A Systematic Review of Machine Learning-based Prognostic Models for Acute Pancreatitis: Towards Improving Methods and Reporting Quality Background: An accurate prognostic tool is essential to aid clinical decision making (e.g., patient triage) and to advance personalized medicine. However, such prognostic tool is lacking for acute pancreatitis (AP). Increasingly machine learning (ML) techniques are being used to develop high-performing prognostic models in AP. However, methodologic and reporting quality has received little attention. High-quality reporting and study methodology are critical to model validity, reproducibility, and clinical implementation. In collaboration with content experts in ML methodology, we performed a systematic review critically appraising the quality of methodology and reporting of recently published ML AP prognostic models. Methods: Using a validated search strategy, we identified ML AP studies from the databases MEDLINE, PubMed, and EMBASE published between January 2021 and December 2023. Eligibility criteria included all retrospective or prospective studies that developed or validated new or existing ML models in patients with AP that predicted an outcome following an episode of AP. Meta-analysis was considered if there was homogeneity in the study design and in the type of outcome predicted. For risk of bias (ROB) assessment, we used the Prediction Model Risk of Bias Assessment Tool (PROBAST). Quality of reporting was assessed using the Transparent Reporting of a Multivariable Prediction Model of Individual Prognosis or Diagnosis – Artificial Intelligence (TRIPOD+AI) statement that defines standards for 27 items that should be reported in publications using ML prognostic models. Results: The search strategy identified 6480 publications of which 30 met the eligibility criteria. Studies originated from China (22), U.S (4), and other (4). All 30 studies developed a new ML model and none sought to validate an existing ML model, producing a total of 39 new ML models. AP severity (23/39) or mortality (6/39) were the most common outcomes predicted. The mean area-under-the-curve for all models and endpoints was 0.91 (SD 0.08). The ROB was high for at least one domain in all 39 models, particularly for the analysis domain (37/39 models). Steps were not taken to minimize over-optimistic model performance in 27/39 models. Due to heterogeneity in the study design and in how the outcomes were defined and determined, meta-analysis was not performed. Studies reported on only 15/27 items from TRIPOD+AI standards, with only 7/30 justifying sample size and 13/30 assessing data quality. Other reporting deficiencies included omissions regarding human-AI interaction (28/30), handling low-quality or incomplete data in practice (27/30), sharing analytical codes (25/30), study protocols (25/30) and reporting source data (19/30),. Discussion: There are significant deficiencies in the methodology and reporting of recently published ML based prognostic models in AP patients. These undermine the validity, reproducibility and implementation of these prognostic models despite their promise of superior predictive accuracy. Funding: none Registration: Research Registry (reviewregistry1727)
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于机器学习的急性胰腺炎预后模型的系统性回顾:改进方法和报告质量
标题:基于机器学习的急性胰腺炎预后模型的系统性回顾:提高方法和报告质量 背景:准确的预后工具对于辅助临床决策(如患者分流)和推进个性化医疗至关重要。然而,急性胰腺炎(AP)尚缺乏此类预后工具。越来越多的机器学习(ML)技术被用于开发急性胰腺炎的高效预后模型。然而,方法学和报告质量却很少受到关注。高质量的报告和研究方法对模型的有效性、可重复性和临床实施至关重要。我们与 ML 方法学方面的专家合作,对最近发表的 ML AP 预后模型的方法学和报告质量进行了严格评估。方法:采用经过验证的检索策略,我们从 MEDLINE、PubMed 和 EMBASE 等数据库中确定了 2021 年 1 月至 2023 年 12 月间发表的 ML AP 研究。资格标准包括所有回顾性或前瞻性研究,这些研究针对 AP 患者开发或验证了新的或现有的 ML 模型,可预测 AP 发病后的结果。如果研究设计和预测结果类型具有同质性,则考虑进行荟萃分析。在偏倚风险 (ROB) 评估方面,我们使用了预测模型偏倚风险评估工具 (PROBAST)。报告质量采用 "个体预后或诊断的多变量预测模型的透明报告--人工智能(TRIPOD+AI)声明 "进行评估,该声明定义了在使用多变量预后模型的出版物中应报告的 27 个项目的标准。结果:搜索策略共发现 6480 篇出版物,其中 30 篇符合资格标准。研究分别来自中国(22 项)、美国(4 项)和其他国家(4 项)。所有 30 项研究都开发了一种新的 ML 模型,没有研究试图验证现有的 ML 模型,因此总共产生了 39 个新的 ML 模型。AP 严重程度(23/39)或死亡率(6/39)是最常见的预测结果。所有模型和终点的平均曲线下面积为 0.91(SD 0.08)。在所有 39 个模型中,至少有一个领域的 ROB 很高,尤其是分析领域(37/39 个模型)。在 27/39 个模型中,没有采取措施尽量降低过于乐观的模型性能。由于研究设计以及结果的定义和确定方式存在异质性,因此未进行荟萃分析。研究仅报告了 TRIPOD+AI 标准中的 15/27 个项目,其中仅 7/30 个项目说明了样本量的合理性,13/30 个项目评估了数据质量。其他报告缺陷包括遗漏了人类与人工智能之间的相互作用(28/30)、在实践中处理低质量或不完整数据(27/30)、共享分析代码(25/30)、研究方案(25/30)和报告源数据(19/30)。讨论:最近发表的基于 ML 的 AP 患者预后模型在方法和报告方面存在重大缺陷。尽管这些预后模型有望获得更高的预测准确性,但它们的有效性、可重复性和实施性都受到了影响。资助:无注册:研究注册(Reviewregistry1727)
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Gut microbiome shifts in adolescents after sleeve gastrectomy with increased oral-associated taxa and pro-inflammatory potential Development of a machine-learning model for therapeutic efficacy prediction of preoperative treatment for esophageal cancer using single nucleotide variants of autophagy-related genes Why Symptoms Linger in Quiescent Crohn's Disease: Investigating the Impact of Sulfidogenic Microbes and Sulfur Metabolic Pathways Evidence that extracellular HSPB1 contributes to inflammation in alcohol-associated hepatitis Large language models outperform traditional natural language processing methods in extracting patient-reported outcomes in IBD
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1