Mark E Cohen, Yaoming Liu, Bruce L Hall, Clifford Y Ko
{"title":"ACS NSQIP Surgical Risk Calculator Accuracy When Operative Risk is Represented by the Principal CPT® code Versus Many Codes.","authors":"Mark E Cohen, Yaoming Liu, Bruce L Hall, Clifford Y Ko","doi":"10.1097/SLA.0000000000006661","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>To determine whether ACS NSQIP risk calculator (RC) accuracy can be improved by incorporating CPT codes beyond the principal code.</p><p><strong>Background: </strong>Because of technical limitations, past and current RC algorithms have relied only on the principal CPT code, represented as a logit score, to adjust for procedure-related risk. RC performance was evaluated when using a new machine learning (ML) algorithm capable of incorporating an indeterminate number of high cardinality categorical variables (in this case, multiple CPT codes).</p><p><strong>Methods: </strong>ACS-NSQIP data from 5,020,713 patients from 2016-2020 were used. Predictive accuracy, for 13 outcomes, was assessed when the RC relied on, in addition to standard predictors, a logit score associated with the principal CPT code (extreme gradient boosting ML, XGB), or up to 21 codes in native categorical form (CatBoost ML, CATB). 80% of cases were used for training and 20% for validation. Discrimination (area under the receiver operator characteristic curve and area under the precision recall curve) and calibration (Hosmer-Lemeshow statistics) were assessed on the entire validation dataset and on a subset of that data that included only patients who had at least 1 CPT code recorded beyond the principal code.</p><p><strong>Results: </strong>There was no consistent accuracy advantage of CATB over XGB with respect to discrimination. XGB tended to have slightly better calibration than CATB when evaluated on the complete validation dataset but tended to have slightly worse calibration compared to CATB when the validation dataset was limited to the subset of 34.8% of cases where there was at least one code in addition to the principal CPT code. However, there was a subset of patients with 4 or more CPTs (about 8% of all patients) where CATB provided meaningfully more accurate estimates than XGB.</p><p><strong>Conclusions: </strong>While the current RC, relying on XGB and the principal CPT code, remains a viable approach to routine surgical risk assessment, an advanced version of the RC, based on the CATB algorithm and accommodating multiple CPT codes, may provide more accurate estimates.</p>","PeriodicalId":8017,"journal":{"name":"Annals of surgery","volume":" ","pages":""},"PeriodicalIF":7.5000,"publicationDate":"2025-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of surgery","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1097/SLA.0000000000006661","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"SURGERY","Score":null,"Total":0}
引用次数: 0
Abstract
Objective: To determine whether ACS NSQIP risk calculator (RC) accuracy can be improved by incorporating CPT codes beyond the principal code.
Background: Because of technical limitations, past and current RC algorithms have relied only on the principal CPT code, represented as a logit score, to adjust for procedure-related risk. RC performance was evaluated when using a new machine learning (ML) algorithm capable of incorporating an indeterminate number of high cardinality categorical variables (in this case, multiple CPT codes).
Methods: ACS-NSQIP data from 5,020,713 patients from 2016-2020 were used. Predictive accuracy, for 13 outcomes, was assessed when the RC relied on, in addition to standard predictors, a logit score associated with the principal CPT code (extreme gradient boosting ML, XGB), or up to 21 codes in native categorical form (CatBoost ML, CATB). 80% of cases were used for training and 20% for validation. Discrimination (area under the receiver operator characteristic curve and area under the precision recall curve) and calibration (Hosmer-Lemeshow statistics) were assessed on the entire validation dataset and on a subset of that data that included only patients who had at least 1 CPT code recorded beyond the principal code.
Results: There was no consistent accuracy advantage of CATB over XGB with respect to discrimination. XGB tended to have slightly better calibration than CATB when evaluated on the complete validation dataset but tended to have slightly worse calibration compared to CATB when the validation dataset was limited to the subset of 34.8% of cases where there was at least one code in addition to the principal CPT code. However, there was a subset of patients with 4 or more CPTs (about 8% of all patients) where CATB provided meaningfully more accurate estimates than XGB.
Conclusions: While the current RC, relying on XGB and the principal CPT code, remains a viable approach to routine surgical risk assessment, an advanced version of the RC, based on the CATB algorithm and accommodating multiple CPT codes, may provide more accurate estimates.
期刊介绍:
The Annals of Surgery is a renowned surgery journal, recognized globally for its extensive scholarly references. It serves as a valuable resource for the international medical community by disseminating knowledge regarding important developments in surgical science and practice. Surgeons regularly turn to the Annals of Surgery to stay updated on innovative practices and techniques. The journal also offers special editorial features such as "Advances in Surgical Technique," offering timely coverage of ongoing clinical issues. Additionally, the journal publishes monthly review articles that address the latest concerns in surgical practice.