在预测翻修关节置换术方面，机器学习并不优于传统的竞争风险建模。

IF 4.2 2区医学 Q1 ORTHOPEDICS Clinical Orthopaedics and Related Research® Pub Date : 2024-08-01 Epub Date: 2024-03-12 DOI:10.1097/CORR.0000000000003018

Jacobien H F Oosterhoff, Anne A H de Hond, Rinne M Peters, Liza N van Steenbergen, Juliette C Sorel, Wierd P Zijlstra, Rudolf W Poolman, David Ring, Paul C Jutte, Gino M M J Kerkhoffs, Hein Putter, Ewout W Steyerberg, Job N Doornberg

{"title":"在预测翻修关节置换术方面，机器学习并不优于传统的竞争风险建模。","authors":"Jacobien H F Oosterhoff, Anne A H de Hond, Rinne M Peters, Liza N van Steenbergen, Juliette C Sorel, Wierd P Zijlstra, Rudolf W Poolman, David Ring, Paul C Jutte, Gino M M J Kerkhoffs, Hein Putter, Ewout W Steyerberg, Job N Doornberg","doi":"10.1097/CORR.0000000000003018","DOIUrl":null,"url":null,"abstract":"Background: Estimating the risk of revision after arthroplasty could inform patient and surgeon decision-making. However, there is a lack of well-performing prediction models assisting in this task, which may be due to current conventional modeling approaches such as traditional survivorship estimators (such as Kaplan-Meier) or competing risk estimators. Recent advances in machine learning survival analysis might improve decision support tools in this setting. Therefore, this study aimed to assess the performance of machine learning compared with that of conventional modeling to predict revision after arthroplasty.Question/purpose: Does machine learning perform better than traditional regression models for estimating the risk of revision for patients undergoing hip or knee arthroplasty?Methods: Eleven datasets from published studies from the Dutch Arthroplasty Register reporting on factors associated with revision or survival after partial or total knee and hip arthroplasty between 2018 and 2022 were included in our study. The 11 datasets were observational registry studies, with a sample size ranging from 3038 to 218,214 procedures. We developed a set of time-to-event models for each dataset, leading to 11 comparisons. A set of predictors (factors associated with revision surgery) was identified based on the variables that were selected in the included studies. We assessed the predictive performance of two state-of-the-art statistical time-to-event models for 1-, 2-, and 3-year follow-up: a Fine and Gray model (which models the cumulative incidence of revision) and a cause-specific Cox model (which models the hazard of revision). These were compared with a machine-learning approach (a random survival forest model, which is a decision tree-based machine-learning algorithm for time-to-event analysis). Performance was assessed according to discriminative ability (time-dependent area under the receiver operating curve), calibration (slope and intercept), and overall prediction error (scaled Brier score). Discrimination, known as the area under the receiver operating characteristic curve, measures the model's ability to distinguish patients who achieved the outcomes from those who did not and ranges from 0.5 to 1.0, with 1.0 indicating the highest discrimination score and 0.50 the lowest. Calibration plots the predicted versus the observed probabilities; a perfect plot has an intercept of 0 and a slope of 1. The Brier score calculates a composite of discrimination and calibration, with 0 indicating perfect prediction and 1 the poorest. A scaled version of the Brier score, 1 - (model Brier score/null model Brier score), can be interpreted as the amount of overall prediction error.Results: Using machine learning survivorship analysis, we found no differences between the competing risks estimator and traditional regression models for patients undergoing arthroplasty in terms of discriminative ability (patients who received a revision compared with those who did not). We found no consistent differences between the validated performance (time-dependent area under the receiver operating characteristic curve) of different modeling approaches because these values ranged between -0.04 and 0.03 across the 11 datasets (the time-dependent area under the receiver operating characteristic curve of the models across 11 datasets ranged between 0.52 to 0.68). In addition, the calibration metrics and scaled Brier scores produced comparable estimates, showing no advantage of machine learning over traditional regression models.Conclusion: Machine learning did not outperform traditional regression models.Clinical relevance: Neither machine learning modeling nor traditional regression methods were sufficiently accurate in order to offer prognostic information when predicting revision arthroplasty. The benefit of these modeling approaches may be limited in this context.","PeriodicalId":10404,"journal":{"name":"Clinical Orthopaedics and Related Research®","volume":null,"pages":null},"PeriodicalIF":4.2000,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11272341/pdf/","citationCount":"0","resultStr":"{\"title\":\"Machine Learning Did Not Outperform Conventional Competing Risk Modeling to Predict Revision Arthroplasty.\",\"authors\":\"Jacobien H F Oosterhoff, Anne A H de Hond, Rinne M Peters, Liza N van Steenbergen, Juliette C Sorel, Wierd P Zijlstra, Rudolf W Poolman, David Ring, Paul C Jutte, Gino M M J Kerkhoffs, Hein Putter, Ewout W Steyerberg, Job N Doornberg\",\"doi\":\"10.1097/CORR.0000000000003018\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Estimating the risk of revision after arthroplasty could inform patient and surgeon decision-making. However, there is a lack of well-performing prediction models assisting in this task, which may be due to current conventional modeling approaches such as traditional survivorship estimators (such as Kaplan-Meier) or competing risk estimators. Recent advances in machine learning survival analysis might improve decision support tools in this setting. Therefore, this study aimed to assess the performance of machine learning compared with that of conventional modeling to predict revision after arthroplasty.Question/purpose: Does machine learning perform better than traditional regression models for estimating the risk of revision for patients undergoing hip or knee arthroplasty?Methods: Eleven datasets from published studies from the Dutch Arthroplasty Register reporting on factors associated with revision or survival after partial or total knee and hip arthroplasty between 2018 and 2022 were included in our study. The 11 datasets were observational registry studies, with a sample size ranging from 3038 to 218,214 procedures. We developed a set of time-to-event models for each dataset, leading to 11 comparisons. A set of predictors (factors associated with revision surgery) was identified based on the variables that were selected in the included studies. We assessed the predictive performance of two state-of-the-art statistical time-to-event models for 1-, 2-, and 3-year follow-up: a Fine and Gray model (which models the cumulative incidence of revision) and a cause-specific Cox model (which models the hazard of revision). These were compared with a machine-learning approach (a random survival forest model, which is a decision tree-based machine-learning algorithm for time-to-event analysis). Performance was assessed according to discriminative ability (time-dependent area under the receiver operating curve), calibration (slope and intercept), and overall prediction error (scaled Brier score). Discrimination, known as the area under the receiver operating characteristic curve, measures the model's ability to distinguish patients who achieved the outcomes from those who did not and ranges from 0.5 to 1.0, with 1.0 indicating the highest discrimination score and 0.50 the lowest. Calibration plots the predicted versus the observed probabilities; a perfect plot has an intercept of 0 and a slope of 1. The Brier score calculates a composite of discrimination and calibration, with 0 indicating perfect prediction and 1 the poorest. A scaled version of the Brier score, 1 - (model Brier score/null model Brier score), can be interpreted as the amount of overall prediction error.Results: Using machine learning survivorship analysis, we found no differences between the competing risks estimator and traditional regression models for patients undergoing arthroplasty in terms of discriminative ability (patients who received a revision compared with those who did not). We found no consistent differences between the validated performance (time-dependent area under the receiver operating characteristic curve) of different modeling approaches because these values ranged between -0.04 and 0.03 across the 11 datasets (the time-dependent area under the receiver operating characteristic curve of the models across 11 datasets ranged between 0.52 to 0.68). In addition, the calibration metrics and scaled Brier scores produced comparable estimates, showing no advantage of machine learning over traditional regression models.Conclusion: Machine learning did not outperform traditional regression models.Clinical relevance: Neither machine learning modeling nor traditional regression methods were sufficiently accurate in order to offer prognostic information when predicting revision arthroplasty. The benefit of these modeling approaches may be limited in this context.\",\"PeriodicalId\":10404,\"journal\":{\"name\":\"Clinical Orthopaedics and Related Research®\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":4.2000,\"publicationDate\":\"2024-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11272341/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Clinical Orthopaedics and Related Research®\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1097/CORR.0000000000003018\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/3/12 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"ORTHOPEDICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Clinical Orthopaedics and Related Research®","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1097/CORR.0000000000003018","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/3/12 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"ORTHOPEDICS","Score":null,"Total":0}

引用次数: 0

摘要

背景：估计关节置换术后的翻修风险可为患者和外科医生的决策提供依据。然而，目前缺乏性能良好的预测模型来协助完成这项任务，这可能是由于目前的传统建模方法造成的，如传统的存活率估计器（如 Kaplan-Meier）或竞争风险估计器。机器学习生存分析的最新进展可能会改善这种情况下的决策支持工具。因此，本研究旨在评估机器学习与传统建模相比在预测关节置换术后翻修方面的性能。问题/目的：在估计髋关节或膝关节置换术患者的翻修风险方面，机器学习是否优于传统回归模型？我们的研究纳入了 11 个数据集，这些数据集来自荷兰关节成形术登记处的已发表研究，报告了 2018 年至 2022 年间部分或全膝关节和髋关节成形术后翻修或存活的相关因素。这 11 个数据集均为观察性登记研究，样本量从 3038 例到 218214 例不等。我们为每个数据集开发了一套时间到事件模型，从而进行了 11 项比较。根据纳入研究中选择的变量，我们确定了一组预测因子（与翻修手术相关的因素）。我们评估了两种最先进的统计时间到事件模型对 1 年、2 年和 3 年随访的预测性能：Fine and Gray 模型（对翻修手术的累积发生率进行建模）和特定病因 Cox 模型（对翻修手术的危险性进行建模）。这些模型与机器学习方法（随机生存林模型，这是一种基于决策树的机器学习算法，用于时间到事件分析）进行了比较。根据判别能力（随时间变化的接收器工作曲线下面积）、校准（斜率和截距）和总体预测误差（按比例计算的布赖尔得分）对其性能进行了评估。判别能力，即接收者操作特征曲线下的面积，用于衡量模型区分达到结果和未达到结果的患者的能力，范围在 0.5 到 1.0 之间，1.0 表示最高的判别能力得分，0.50 表示最低的判别能力得分。校准图是预测概率与观察概率的对比图；完美的校准图截距为 0，斜率为 1。布赖尔评分计算的是区分度和校准度的综合得分，0 表示完美预测，1 表示最差预测。布赖尔得分的比例版本，即 1-（模型布赖尔得分/空模型布赖尔得分），可以解释为总体预测误差的大小：使用机器学习幸存者分析，我们发现竞争风险估计模型与传统回归模型对接受关节置换术的患者的判别能力（接受翻修的患者与未接受翻修的患者相比）没有差异。我们发现不同建模方法的验证性能（随时间变化的接收者操作特征曲线下面积）之间没有一致的差异，因为在 11 个数据集中，这些值介于-0.04 和 0.03 之间（在 11 个数据集中，模型随时间变化的接收者操作特征曲线下面积介于 0.52 和 0.68 之间）。此外，校准指标和缩放布赖尔分数产生的估计值相当，表明机器学习与传统回归模型相比没有优势：结论：机器学习并没有优于传统回归模型：临床意义：机器学习建模和传统回归方法都不够准确，无法在预测翻修关节置换术时提供预后信息。在这种情况下，这些建模方法的优势可能有限。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Machine Learning Did Not Outperform Conventional Competing Risk Modeling to Predict Revision Arthroplasty.

Background: Estimating the risk of revision after arthroplasty could inform patient and surgeon decision-making. However, there is a lack of well-performing prediction models assisting in this task, which may be due to current conventional modeling approaches such as traditional survivorship estimators (such as Kaplan-Meier) or competing risk estimators. Recent advances in machine learning survival analysis might improve decision support tools in this setting. Therefore, this study aimed to assess the performance of machine learning compared with that of conventional modeling to predict revision after arthroplasty.

Question/purpose: Does machine learning perform better than traditional regression models for estimating the risk of revision for patients undergoing hip or knee arthroplasty?

Methods: Eleven datasets from published studies from the Dutch Arthroplasty Register reporting on factors associated with revision or survival after partial or total knee and hip arthroplasty between 2018 and 2022 were included in our study. The 11 datasets were observational registry studies, with a sample size ranging from 3038 to 218,214 procedures. We developed a set of time-to-event models for each dataset, leading to 11 comparisons. A set of predictors (factors associated with revision surgery) was identified based on the variables that were selected in the included studies. We assessed the predictive performance of two state-of-the-art statistical time-to-event models for 1-, 2-, and 3-year follow-up: a Fine and Gray model (which models the cumulative incidence of revision) and a cause-specific Cox model (which models the hazard of revision). These were compared with a machine-learning approach (a random survival forest model, which is a decision tree-based machine-learning algorithm for time-to-event analysis). Performance was assessed according to discriminative ability (time-dependent area under the receiver operating curve), calibration (slope and intercept), and overall prediction error (scaled Brier score). Discrimination, known as the area under the receiver operating characteristic curve, measures the model's ability to distinguish patients who achieved the outcomes from those who did not and ranges from 0.5 to 1.0, with 1.0 indicating the highest discrimination score and 0.50 the lowest. Calibration plots the predicted versus the observed probabilities; a perfect plot has an intercept of 0 and a slope of 1. The Brier score calculates a composite of discrimination and calibration, with 0 indicating perfect prediction and 1 the poorest. A scaled version of the Brier score, 1 - (model Brier score/null model Brier score), can be interpreted as the amount of overall prediction error.

Results: Using machine learning survivorship analysis, we found no differences between the competing risks estimator and traditional regression models for patients undergoing arthroplasty in terms of discriminative ability (patients who received a revision compared with those who did not). We found no consistent differences between the validated performance (time-dependent area under the receiver operating characteristic curve) of different modeling approaches because these values ranged between -0.04 and 0.03 across the 11 datasets (the time-dependent area under the receiver operating characteristic curve of the models across 11 datasets ranged between 0.52 to 0.68). In addition, the calibration metrics and scaled Brier scores produced comparable estimates, showing no advantage of machine learning over traditional regression models.

Conclusion: Machine learning did not outperform traditional regression models.

Clinical relevance: Neither machine learning modeling nor traditional regression methods were sufficiently accurate in order to offer prognostic information when predicting revision arthroplasty. The benefit of these modeling approaches may be limited in this context.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Clinical Orthopaedics and Related Research® 医学-外科

CiteScore

7.00

自引率

11.90%

发文量

722

审稿时长

2.5 months

期刊介绍： Clinical Orthopaedics and Related Research® is a leading peer-reviewed journal devoted to the dissemination of new and important orthopaedic knowledge. CORR® brings readers the latest clinical and basic research, along with columns, commentaries, and interviews with authors.