Nick Assink, Maria P Gonzalez-Perrino, Raul Santana-Trejo, Job N Doornberg, Harm Hoekstra, Joep Kraeima, Frank F A IJpma
{"title":"Development of Machine Learning-based Algorithms to Predict the 2- and 5-year Risk of TKA After Tibial Plateau Fracture Treatment.","authors":"Nick Assink, Maria P Gonzalez-Perrino, Raul Santana-Trejo, Job N Doornberg, Harm Hoekstra, Joep Kraeima, Frank F A IJpma","doi":"10.1097/CORR.0000000000003442","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>When faced with a severe intraarticular injury like a tibial plateau fracture, patients count on surgeons to make an accurate estimation of prognosis. Unfortunately, there are few tools available that enable precise, personalized prognosis estimation tailored to each patient's unique circumstances, including their individual and fracture-specific characteristics. In this study, we developed and validated a clinical prediction model using machine-learning algorithms for the 2- and 5-year risk of TKA after tibia plateau fractures.</p><p><strong>Questions/purposes: </strong>Can machine learning-based probability calculators estimate the probability of 2- and 5-year risk of conversion to TKA in patients with a tibial plateau fracture?</p><p><strong>Methods: </strong>A multicenter, cross-sectional study was performed in six hospitals in patients treated for a tibial plateau fracture between 2003 to 2019. In total, 2057 patients were eligible for inclusion and were sent informed consent and a questionnaire to inquire whether they underwent conversion to TKA. For 56% (1160 of 2057), status of conversion to TKA was accounted for at a minimum of 2 years, and 53% (1082 of 2057) were accounted for at a minimum of 5 years. The mean follow-up among responders was 6 ± 4 years after injury. An analysis of nonresponders found that responders were slightly older than nonresponders (53 ± 16 years versus 51 ± 17 years; p = 0.001), they were more often women (68% [788 of 1160] versus 58% [523 of 897]; p = 0.001), they were treated nonoperatively less often (30% [346 of 1160] versus 43% [387 of 897]; p = 0.001), and they had larger fracture gaps (6.4 ± 6.3 mm versus 4.2 ± 5.2 mm; p < 0.001) and step-offs (6.3 ± 5.7 mm versus 4.5 ± 4.7 mm; p < 0.001). AO Foundation/Orthopaedic Trauma Association (AO/OTA) fracture classification did not differ between nonresponders and responders (B1 11% versus 15%, B2 16% versus 19%, B3 45% versus 39%, C2 6% versus 8%, C3 22% versus 17%; p = 0.26). A total of 70% (814 of 1160) of patients were treated with open reduction and internal fixation, whereas 30% (346 of 1160) of patients were treated nonoperatively with a cast. Most fractures (80% [930 of 1160]) were AO/OTA type B fractures, and 20% (230 of 1160) were type C. Of these patients, 7% (79 of 1160) and 10% (109 of 1082) underwent conversion to a TKA at 2- and 5-year follow-up, respectively. Patient characteristics were retrieved from electronic patient records, and imaging data were shared with the initiating center from which fracture characteristics were determined. Obtained features derived from follow-up questionnaires, electronic patient records, and radiographic assessments were eligible for development of the prediction model. The first step consisted of data cleaning and included simple type formatting and standardization of numerical columns. Subsequent feature selection consisted of a review of the published evidence and expert opinion. This was followed by bivariate analysis of the identified features. The features for the models included: age, gender, BMI, AO/OTA fracture classification, fracture displacement (gap, step-off), medial proximal tibial alignment, and posterior proximal tibial alignment. The data set was used to train three models: logistic regression, random forest, and XGBoost. Logistic regression models linear relationships, random forest handles nonlinear complexities with decision trees, and XGBoost excels with sequential error correction and regularization. The models were tested using a sixfold validation approach by training the model on data from five (of six) respective medical centers and validating it against the remaining center that was left out for training. Performance was assessed by the area under the receiver operating characteristic curve (AUC), which measures a model's ability to distinguish between classes. AUC varies between 0 and 1, with values closer to 1 indicating better performance. To ensure robust and reliable results, we used bootstrapping as a resampling technique. In addition, calibration curves were plotted, and calibration was assessed with the calibration slope and intercept. The calibration plot compares the estimated probabilities with the observed probabilities for the primary outcome. Calibration slope evaluates alignment between predicted probabilities and observed outcomes (1 = perfect, < 1 = overfit, > 1 = underfit). Calibration intercept indicates bias (0 = perfect, negative = underestimation, positive = overestimation). Last, the Brier score, measuring the mean squared error of predicted probabilities (0 = perfect), was calculated.</p><p><strong>Results: </strong>There were no differences among the models in terms of sensitivity and specificity; the AUCs for each overlapped broadly and ranged from 0.76 to 0.83. Calibration was most optimal in logistic regression for both 2- and 5-year models, with slopes of 0.82 (random forest 0.60, XGBoost 0.26) and 0.95 (random forest 0.85, XGBoost 0.48) and intercepts of 0.01 for both (random forest 0.01 to 0.02; XGBoost 0.05 to 0.07). Brier score was similar between models varying between 0.06 to 0.09. Given that its performance metrics were highest, we chose the logistic regression algorithm as the final prediction model. The web application providing the prediction tool is freely available and can be accessed through: https://3dtrauma.shinyapps.io/tka_prediction/.</p><p><strong>Conclusion: </strong>In this study, a personalized risk assessment tool was developed to support clinical decision-making and patient counseling. Our findings demonstrate that machine-learning algorithms, particularly logistic regression, can provide accurate and reliable predictions of TKA conversion at 2 and 5 years after a tibial plateau fracture. In addition, it provides a useful prognostic tool for surgeons who perform fracture surgery that can be used quickly and easily with patients in the clinic or emergency department once it complies with medical device regulations. External validation is needed to assess performance in other institutions and countries; to account for patient and surgeon preferences, resources, and cultures; and to further strengthen its clinical applicability.</p><p><strong>Level of evidence: </strong>Level III, therapeutic study.</p>","PeriodicalId":10404,"journal":{"name":"Clinical Orthopaedics and Related Research®","volume":" ","pages":""},"PeriodicalIF":4.2000,"publicationDate":"2025-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Clinical Orthopaedics and Related Research®","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1097/CORR.0000000000003442","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ORTHOPEDICS","Score":null,"Total":0}
引用次数: 0
Abstract
Background: When faced with a severe intraarticular injury like a tibial plateau fracture, patients count on surgeons to make an accurate estimation of prognosis. Unfortunately, there are few tools available that enable precise, personalized prognosis estimation tailored to each patient's unique circumstances, including their individual and fracture-specific characteristics. In this study, we developed and validated a clinical prediction model using machine-learning algorithms for the 2- and 5-year risk of TKA after tibia plateau fractures.
Questions/purposes: Can machine learning-based probability calculators estimate the probability of 2- and 5-year risk of conversion to TKA in patients with a tibial plateau fracture?
Methods: A multicenter, cross-sectional study was performed in six hospitals in patients treated for a tibial plateau fracture between 2003 to 2019. In total, 2057 patients were eligible for inclusion and were sent informed consent and a questionnaire to inquire whether they underwent conversion to TKA. For 56% (1160 of 2057), status of conversion to TKA was accounted for at a minimum of 2 years, and 53% (1082 of 2057) were accounted for at a minimum of 5 years. The mean follow-up among responders was 6 ± 4 years after injury. An analysis of nonresponders found that responders were slightly older than nonresponders (53 ± 16 years versus 51 ± 17 years; p = 0.001), they were more often women (68% [788 of 1160] versus 58% [523 of 897]; p = 0.001), they were treated nonoperatively less often (30% [346 of 1160] versus 43% [387 of 897]; p = 0.001), and they had larger fracture gaps (6.4 ± 6.3 mm versus 4.2 ± 5.2 mm; p < 0.001) and step-offs (6.3 ± 5.7 mm versus 4.5 ± 4.7 mm; p < 0.001). AO Foundation/Orthopaedic Trauma Association (AO/OTA) fracture classification did not differ between nonresponders and responders (B1 11% versus 15%, B2 16% versus 19%, B3 45% versus 39%, C2 6% versus 8%, C3 22% versus 17%; p = 0.26). A total of 70% (814 of 1160) of patients were treated with open reduction and internal fixation, whereas 30% (346 of 1160) of patients were treated nonoperatively with a cast. Most fractures (80% [930 of 1160]) were AO/OTA type B fractures, and 20% (230 of 1160) were type C. Of these patients, 7% (79 of 1160) and 10% (109 of 1082) underwent conversion to a TKA at 2- and 5-year follow-up, respectively. Patient characteristics were retrieved from electronic patient records, and imaging data were shared with the initiating center from which fracture characteristics were determined. Obtained features derived from follow-up questionnaires, electronic patient records, and radiographic assessments were eligible for development of the prediction model. The first step consisted of data cleaning and included simple type formatting and standardization of numerical columns. Subsequent feature selection consisted of a review of the published evidence and expert opinion. This was followed by bivariate analysis of the identified features. The features for the models included: age, gender, BMI, AO/OTA fracture classification, fracture displacement (gap, step-off), medial proximal tibial alignment, and posterior proximal tibial alignment. The data set was used to train three models: logistic regression, random forest, and XGBoost. Logistic regression models linear relationships, random forest handles nonlinear complexities with decision trees, and XGBoost excels with sequential error correction and regularization. The models were tested using a sixfold validation approach by training the model on data from five (of six) respective medical centers and validating it against the remaining center that was left out for training. Performance was assessed by the area under the receiver operating characteristic curve (AUC), which measures a model's ability to distinguish between classes. AUC varies between 0 and 1, with values closer to 1 indicating better performance. To ensure robust and reliable results, we used bootstrapping as a resampling technique. In addition, calibration curves were plotted, and calibration was assessed with the calibration slope and intercept. The calibration plot compares the estimated probabilities with the observed probabilities for the primary outcome. Calibration slope evaluates alignment between predicted probabilities and observed outcomes (1 = perfect, < 1 = overfit, > 1 = underfit). Calibration intercept indicates bias (0 = perfect, negative = underestimation, positive = overestimation). Last, the Brier score, measuring the mean squared error of predicted probabilities (0 = perfect), was calculated.
Results: There were no differences among the models in terms of sensitivity and specificity; the AUCs for each overlapped broadly and ranged from 0.76 to 0.83. Calibration was most optimal in logistic regression for both 2- and 5-year models, with slopes of 0.82 (random forest 0.60, XGBoost 0.26) and 0.95 (random forest 0.85, XGBoost 0.48) and intercepts of 0.01 for both (random forest 0.01 to 0.02; XGBoost 0.05 to 0.07). Brier score was similar between models varying between 0.06 to 0.09. Given that its performance metrics were highest, we chose the logistic regression algorithm as the final prediction model. The web application providing the prediction tool is freely available and can be accessed through: https://3dtrauma.shinyapps.io/tka_prediction/.
Conclusion: In this study, a personalized risk assessment tool was developed to support clinical decision-making and patient counseling. Our findings demonstrate that machine-learning algorithms, particularly logistic regression, can provide accurate and reliable predictions of TKA conversion at 2 and 5 years after a tibial plateau fracture. In addition, it provides a useful prognostic tool for surgeons who perform fracture surgery that can be used quickly and easily with patients in the clinic or emergency department once it complies with medical device regulations. External validation is needed to assess performance in other institutions and countries; to account for patient and surgeon preferences, resources, and cultures; and to further strengthen its clinical applicability.
期刊介绍:
Clinical Orthopaedics and Related Research® is a leading peer-reviewed journal devoted to the dissemination of new and important orthopaedic knowledge.
CORR® brings readers the latest clinical and basic research, along with columns, commentaries, and interviews with authors.