Risk-Specific Training Cohorts to Address Class Imbalance in Surgical Risk Prediction.

IF 14.9 1区医学 Q1 SURGERY JAMA surgery Pub Date : 2024-12-01 DOI:10.1001/jamasurg.2024.4299

Jeremy A Balch, Matthew M Ruppert, Ziyuan Guan, Timothy R Buchanan, Kenneth L Abbott, Benjamin Shickel, Azra Bihorac, Muxuan Liang, Gilbert R Upchurch, Christopher J Tignanelli, Tyler J Loftus

{"title":"Risk-Specific Training Cohorts to Address Class Imbalance in Surgical Risk Prediction.","authors":"Jeremy A Balch, Matthew M Ruppert, Ziyuan Guan, Timothy R Buchanan, Kenneth L Abbott, Benjamin Shickel, Azra Bihorac, Muxuan Liang, Gilbert R Upchurch, Christopher J Tignanelli, Tyler J Loftus","doi":"10.1001/jamasurg.2024.4299","DOIUrl":null,"url":null,"abstract":"Importance: Machine learning tools are increasingly deployed for risk prediction and clinical decision support in surgery. Class imbalance adversely impacts predictive performance, especially for low-incidence complications.Objective: To evaluate risk-prediction model performance when trained on risk-specific cohorts.Design, setting, and participants: This cross-sectional study performed from February 2024 to July 2024 deployed a deep learning model, which generated risk scores for common postoperative complications. A total of 109 445 inpatient operations performed at 2 University of Florida Health hospitals from June 1, 2014, to May 5, 2021 were examined.Exposures: The model was trained de novo on separate cohorts for high-risk, medium-risk, and low-risk Common Procedure Terminology codes defined empirically by incidence of 5 postoperative complications: (1) in-hospital mortality; (2) prolonged intensive care unit (ICU) stay (≥48 hours); (3) prolonged mechanical ventilation (≥48 hours); (4) sepsis; and (5) acute kidney injury (AKI). Low-risk and high-risk cutoffs for complications were defined by the lower-third and upper-third prevalence in the dataset, except for mortality, cutoffs for which were set at 1% or less and greater than 3%, respectively.Main outcomes and measures: Model performance metrics were assessed for each risk-specific cohort alongside the baseline model. Metrics included area under the receiver operating characteristic curve (AUROC), area under the precision-recall curve (AUPRC), F1 scores, and accuracy for each model.Results: A total of 109 445 inpatient operations were examined among patients treated at 2 University of Florida Health hospitals in Gainesville (77 921 procedures [71.2%]) and Jacksonville (31 524 procedures [28.8%]). Median (IQR) patient age was 58 (43-68) years, and median (IQR) Charlson Comorbidity Index score was 2 (0-4). Among 109 445 operations, 55 646 patients were male (50.8%), and 66 495 patients (60.8%) underwent a nonemergent, inpatient operation. Training on the high-risk cohort had variable impact on AUROC, but significantly improved AUPRC (as assessed by nonoverlapping 95% confidence intervals) for predicting mortality (0.53; 95% CI, 0.43-0.64), AKI (0.61; 95% CI, 0.58-0.65), and prolonged ICU stay (0.91; 95% CI, 0.89-0.92). It also significantly improved F1 score for mortality (0.42; 95% CI, 0.36-0.49), prolonged mechanical ventilation (0.55; 95% CI, 0.52-0.58), sepsis (0.46; 95% CI, 0.43-0.49), and AKI (0.57; 95% CI, 0.54-0.59). After controlling for baseline model performance on high-risk cohorts, AUPRC increased significantly for in-hospital mortality only (0.53; 95% CI, 0.42-0.65 vs 0.29; 95% CI, 0.21-0.40).Conclusion and relevance: In this cross-sectional study, by training separate models using a priori knowledge for procedure-specific risk classes, improved performance in standard evaluation metrics was observed, especially for low-prevalence complications like in-hospital mortality. Used cautiously, this approach may represent an optimal training strategy for surgical risk-prediction models.","PeriodicalId":14690,"journal":{"name":"JAMA surgery","volume":" ","pages":"1424-1431"},"PeriodicalIF":14.9000,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11465118/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JAMA surgery","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1001/jamasurg.2024.4299","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"SURGERY","Score":null,"Total":0}

引用次数: 0

Abstract

Importance: Machine learning tools are increasingly deployed for risk prediction and clinical decision support in surgery. Class imbalance adversely impacts predictive performance, especially for low-incidence complications.

Objective: To evaluate risk-prediction model performance when trained on risk-specific cohorts.

Design, setting, and participants: This cross-sectional study performed from February 2024 to July 2024 deployed a deep learning model, which generated risk scores for common postoperative complications. A total of 109 445 inpatient operations performed at 2 University of Florida Health hospitals from June 1, 2014, to May 5, 2021 were examined.

Exposures: The model was trained de novo on separate cohorts for high-risk, medium-risk, and low-risk Common Procedure Terminology codes defined empirically by incidence of 5 postoperative complications: (1) in-hospital mortality; (2) prolonged intensive care unit (ICU) stay (≥48 hours); (3) prolonged mechanical ventilation (≥48 hours); (4) sepsis; and (5) acute kidney injury (AKI). Low-risk and high-risk cutoffs for complications were defined by the lower-third and upper-third prevalence in the dataset, except for mortality, cutoffs for which were set at 1% or less and greater than 3%, respectively.

Main outcomes and measures: Model performance metrics were assessed for each risk-specific cohort alongside the baseline model. Metrics included area under the receiver operating characteristic curve (AUROC), area under the precision-recall curve (AUPRC), F1 scores, and accuracy for each model.

Results: A total of 109 445 inpatient operations were examined among patients treated at 2 University of Florida Health hospitals in Gainesville (77 921 procedures [71.2%]) and Jacksonville (31 524 procedures [28.8%]). Median (IQR) patient age was 58 (43-68) years, and median (IQR) Charlson Comorbidity Index score was 2 (0-4). Among 109 445 operations, 55 646 patients were male (50.8%), and 66 495 patients (60.8%) underwent a nonemergent, inpatient operation. Training on the high-risk cohort had variable impact on AUROC, but significantly improved AUPRC (as assessed by nonoverlapping 95% confidence intervals) for predicting mortality (0.53; 95% CI, 0.43-0.64), AKI (0.61; 95% CI, 0.58-0.65), and prolonged ICU stay (0.91; 95% CI, 0.89-0.92). It also significantly improved F1 score for mortality (0.42; 95% CI, 0.36-0.49), prolonged mechanical ventilation (0.55; 95% CI, 0.52-0.58), sepsis (0.46; 95% CI, 0.43-0.49), and AKI (0.57; 95% CI, 0.54-0.59). After controlling for baseline model performance on high-risk cohorts, AUPRC increased significantly for in-hospital mortality only (0.53; 95% CI, 0.42-0.65 vs 0.29; 95% CI, 0.21-0.40).

Conclusion and relevance: In this cross-sectional study, by training separate models using a priori knowledge for procedure-specific risk classes, improved performance in standard evaluation metrics was observed, especially for low-prevalence complications like in-hospital mortality. Used cautiously, this approach may represent an optimal training strategy for surgical risk-prediction models.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

针对特定风险的训练队列，解决手术风险预测中的阶级失衡问题。

重要性：机器学习工具越来越多地被用于外科手术的风险预测和临床决策支持。类别不平衡会对预测性能产生不利影响，尤其是对低发病率的并发症：评估风险预测模型在特定风险队列中的训练效果：这项横断面研究于 2024 年 2 月至 2024 年 7 月间进行，采用了一种深度学习模型，该模型可生成常见术后并发症的风险评分。研究对象为2014年6月1日至2021年5月5日期间在佛罗里达大学健康医院的2家医院进行的109 445例住院手术：该模型在高风险、中风险和低风险通用手术术语代码的不同队列上进行了全新训练，这些代码是根据以下5种术后并发症的发生率经验定义的：(1)院内死亡率；(2)重症监护室（ICU）住院时间延长（≥48小时）；(3)机械通气时间延长（≥48小时）；(4)败血症；(5)急性肾损伤（AKI）。并发症的低风险和高风险临界值由数据集中发病率的下三分之一和上三分之一定义，但死亡率除外，其临界值分别设定为 1%或以下和 3%以上：在评估基线模型的同时，还评估了每个特定风险队列的模型性能指标。指标包括接收者操作特征曲线下面积（AUROC）、精确度-回忆曲线下面积（AUPRC）、F1 分数和每个模型的精确度：在盖恩斯维尔（77 921 例手术 [71.2%]）和杰克逊维尔（31 524 例手术 [28.8%]）的两家佛罗里达大学健康医院接受治疗的患者共接受了 109 445 例住院手术。患者年龄中位数（IQR）为 58（43-68）岁，夏尔森综合指数评分中位数（IQR）为 2（0-4）分。在 109 445 例手术中，55 646 例患者为男性（50.8%），66 495 例患者（60.8%）接受了非急诊住院手术。对高危人群的培训对 AUROC 的影响不一，但对预测死亡率（0.53；95% CI，0.43-0.64）、AKI（0.61；95% CI，0.58-0.65）和延长重症监护室住院时间（0.91；95% CI，0.89-0.92）的 AUPRC 有显著提高（以不重叠的 95% 置信区间评估）。它还能明显改善死亡率（0.42；95% CI，0.36-0.49）、机械通气时间延长（0.55；95% CI，0.52-0.58）、败血症（0.46；95% CI，0.43-0.49）和 AKI（0.57；95% CI，0.54-0.59）的 F1 评分。在控制了高风险队列的基线模型性能后，AUPRC 仅在院内死亡率方面显著增加（0.53；95% CI，0.42-0.65 vs 0.29；95% CI，0.21-0.40）：在这项横断面研究中，通过使用特定手术风险等级的先验知识训练单独的模型，观察到标准评价指标的性能有所改善，尤其是对于住院死亡率等低发生率并发症。谨慎使用这种方法，可能是手术风险预测模型的最佳训练策略。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

JAMA surgery SURGERY-

CiteScore

20.80

自引率

3.60%

发文量

400

期刊介绍： JAMA Surgery, an international peer-reviewed journal established in 1920, is the official publication of the Association of VA Surgeons, the Pacific Coast Surgical Association, and the Surgical Outcomes Club.It is a proud member of the JAMA Network, a consortium of peer-reviewed general medical and specialty publications.

期刊最新文献

Beyond Modifier 22-A Path to Recognizing Surgical Complexity. Cautionary Tale-The Details Are in the Data. Contemporary Outcomes of Cholecystectomy. Errors in Data Collection and Calculations. Firearm Mortality and Equitable Access to Trauma Care in Chicago.