Dong-Won Kang Ph.D. , Shouhao Zhou Ph.D. , Russell Torres Ph.D. , Abhinandan Chowdhury Ph.D. , Suman Niranjan Ph.D. , Ann Rogers M.D. , Chan Shen Ph.D.
{"title":"Predicting serious postoperative complications and evaluating racial fairness in machine learning algorithms for metabolic and bariatric surgery","authors":"Dong-Won Kang Ph.D. , Shouhao Zhou Ph.D. , Russell Torres Ph.D. , Abhinandan Chowdhury Ph.D. , Suman Niranjan Ph.D. , Ann Rogers M.D. , Chan Shen Ph.D.","doi":"10.1016/j.soard.2024.08.008","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>Predicting the risk of complications is critical in metabolic and bariatric surgery (MBS).</div></div><div><h3>Objectives</h3><div>To develop machine learning (ML) models to predict serious postoperative complications of MBS and evaluate racial fairness of the models.</div></div><div><h3>Setting</h3><div>Metabolic and Bariatric Surgery Accreditation and Quality Improvement Program (MBSAQIP) national database, United States.</div></div><div><h3>Methods</h3><div>We developed logistic regression, random forest (RF), gradient-boosted tree (GBT), and XGBoost model using the MBSAQIP Participant Use Data File from 2016 to 2020. To address the class imbalance, we randomly undersampled the complication-negative class to match the complication-positive class. Model performance was evaluated using the area under the receiver operating characteristic curve (AUROC), precision, recall, and F1 score. Fairness across White and non-White patient groups was assessed using equal opportunity difference and disparate impact metrics.</div></div><div><h3>Results</h3><div>A total of 40,858 patients were included after undersampling the complication-negative class. The XGBoost model was the best-performing model in terms of AUROC; however, the difference was not statistically significant. While the F1 score and precision did not vary significantly across models, the RF exhibited better recall compared to the logistic regression. Surgery type was the most important feature to predict complications, followed by operative time. The logistic regression model had the best fairness metrics for race.</div></div><div><h3>Conclusions</h3><div>The XGBoost model achieved the highest AUROC, albeit without a statistically significant difference. The RF may be useful when recall is the primary concern. Undersampling of the privileged group may improve the fairness of boosted tree models.</div></div>","PeriodicalId":49462,"journal":{"name":"Surgery for Obesity and Related Diseases","volume":"20 11","pages":"Pages 1056-1064"},"PeriodicalIF":3.5000,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Surgery for Obesity and Related Diseases","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1550728924007263","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"SURGERY","Score":null,"Total":0}
引用次数: 0
Abstract
Background
Predicting the risk of complications is critical in metabolic and bariatric surgery (MBS).
Objectives
To develop machine learning (ML) models to predict serious postoperative complications of MBS and evaluate racial fairness of the models.
Setting
Metabolic and Bariatric Surgery Accreditation and Quality Improvement Program (MBSAQIP) national database, United States.
Methods
We developed logistic regression, random forest (RF), gradient-boosted tree (GBT), and XGBoost model using the MBSAQIP Participant Use Data File from 2016 to 2020. To address the class imbalance, we randomly undersampled the complication-negative class to match the complication-positive class. Model performance was evaluated using the area under the receiver operating characteristic curve (AUROC), precision, recall, and F1 score. Fairness across White and non-White patient groups was assessed using equal opportunity difference and disparate impact metrics.
Results
A total of 40,858 patients were included after undersampling the complication-negative class. The XGBoost model was the best-performing model in terms of AUROC; however, the difference was not statistically significant. While the F1 score and precision did not vary significantly across models, the RF exhibited better recall compared to the logistic regression. Surgery type was the most important feature to predict complications, followed by operative time. The logistic regression model had the best fairness metrics for race.
Conclusions
The XGBoost model achieved the highest AUROC, albeit without a statistically significant difference. The RF may be useful when recall is the primary concern. Undersampling of the privileged group may improve the fairness of boosted tree models.
期刊介绍:
Surgery for Obesity and Related Diseases (SOARD), The Official Journal of the American Society for Metabolic and Bariatric Surgery (ASMBS) and the Brazilian Society for Bariatric Surgery, is an international journal devoted to the publication of peer-reviewed manuscripts of the highest quality with objective data regarding techniques for the treatment of severe obesity. Articles document the effects of surgically induced weight loss on obesity physiological, psychiatric and social co-morbidities.