Dong-Won Kang Ph.D. , Shouhao Zhou Ph.D. , Russell Torres Ph.D. , Abhinandan Chowdhury Ph.D. , Suman Niranjan Ph.D. , Ann Rogers M.D. , Chan Shen Ph.D.
{"title":"预测术后严重并发症,评估代谢和减肥手术机器学习算法的种族公平性。","authors":"Dong-Won Kang Ph.D. , Shouhao Zhou Ph.D. , Russell Torres Ph.D. , Abhinandan Chowdhury Ph.D. , Suman Niranjan Ph.D. , Ann Rogers M.D. , Chan Shen Ph.D.","doi":"10.1016/j.soard.2024.08.008","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>Predicting the risk of complications is critical in metabolic and bariatric surgery (MBS).</div></div><div><h3>Objectives</h3><div>To develop machine learning (ML) models to predict serious postoperative complications of MBS and evaluate racial fairness of the models.</div></div><div><h3>Setting</h3><div>Metabolic and Bariatric Surgery Accreditation and Quality Improvement Program (MBSAQIP) national database, United States.</div></div><div><h3>Methods</h3><div>We developed logistic regression, random forest (RF), gradient-boosted tree (GBT), and XGBoost model using the MBSAQIP Participant Use Data File from 2016 to 2020. To address the class imbalance, we randomly undersampled the complication-negative class to match the complication-positive class. Model performance was evaluated using the area under the receiver operating characteristic curve (AUROC), precision, recall, and F1 score. Fairness across White and non-White patient groups was assessed using equal opportunity difference and disparate impact metrics.</div></div><div><h3>Results</h3><div>A total of 40,858 patients were included after undersampling the complication-negative class. The XGBoost model was the best-performing model in terms of AUROC; however, the difference was not statistically significant. While the F1 score and precision did not vary significantly across models, the RF exhibited better recall compared to the logistic regression. Surgery type was the most important feature to predict complications, followed by operative time. The logistic regression model had the best fairness metrics for race.</div></div><div><h3>Conclusions</h3><div>The XGBoost model achieved the highest AUROC, albeit without a statistically significant difference. The RF may be useful when recall is the primary concern. Undersampling of the privileged group may improve the fairness of boosted tree models.</div></div>","PeriodicalId":49462,"journal":{"name":"Surgery for Obesity and Related Diseases","volume":"20 11","pages":"Pages 1056-1064"},"PeriodicalIF":3.5000,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Predicting serious postoperative complications and evaluating racial fairness in machine learning algorithms for metabolic and bariatric surgery\",\"authors\":\"Dong-Won Kang Ph.D. , Shouhao Zhou Ph.D. , Russell Torres Ph.D. , Abhinandan Chowdhury Ph.D. , Suman Niranjan Ph.D. , Ann Rogers M.D. , Chan Shen Ph.D.\",\"doi\":\"10.1016/j.soard.2024.08.008\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Background</h3><div>Predicting the risk of complications is critical in metabolic and bariatric surgery (MBS).</div></div><div><h3>Objectives</h3><div>To develop machine learning (ML) models to predict serious postoperative complications of MBS and evaluate racial fairness of the models.</div></div><div><h3>Setting</h3><div>Metabolic and Bariatric Surgery Accreditation and Quality Improvement Program (MBSAQIP) national database, United States.</div></div><div><h3>Methods</h3><div>We developed logistic regression, random forest (RF), gradient-boosted tree (GBT), and XGBoost model using the MBSAQIP Participant Use Data File from 2016 to 2020. To address the class imbalance, we randomly undersampled the complication-negative class to match the complication-positive class. Model performance was evaluated using the area under the receiver operating characteristic curve (AUROC), precision, recall, and F1 score. Fairness across White and non-White patient groups was assessed using equal opportunity difference and disparate impact metrics.</div></div><div><h3>Results</h3><div>A total of 40,858 patients were included after undersampling the complication-negative class. The XGBoost model was the best-performing model in terms of AUROC; however, the difference was not statistically significant. While the F1 score and precision did not vary significantly across models, the RF exhibited better recall compared to the logistic regression. Surgery type was the most important feature to predict complications, followed by operative time. The logistic regression model had the best fairness metrics for race.</div></div><div><h3>Conclusions</h3><div>The XGBoost model achieved the highest AUROC, albeit without a statistically significant difference. The RF may be useful when recall is the primary concern. Undersampling of the privileged group may improve the fairness of boosted tree models.</div></div>\",\"PeriodicalId\":49462,\"journal\":{\"name\":\"Surgery for Obesity and Related Diseases\",\"volume\":\"20 11\",\"pages\":\"Pages 1056-1064\"},\"PeriodicalIF\":3.5000,\"publicationDate\":\"2024-08-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Surgery for Obesity and Related Diseases\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1550728924007263\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"SURGERY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Surgery for Obesity and Related Diseases","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1550728924007263","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"SURGERY","Score":null,"Total":0}
引用次数: 0
摘要
背景:在代谢和减肥手术(MBS)中,预测并发症的风险至关重要:预测并发症风险对于代谢和减肥手术(MBS)至关重要:开发机器学习(ML)模型来预测代谢与减肥手术术后严重并发症,并评估模型的种族公平性:方法:我们开发了逻辑回归、随机聚类分析和 "MBS "模型,以预测MBS术后严重并发症:我们使用2016年至2020年的MBSAQIP参与者使用数据文件开发了逻辑回归、随机森林(RF)、梯度增强树(GBT)和XGBoost模型。为了解决类别不平衡的问题,我们随机对并发症阴性类别进行低采样,以匹配并发症阳性类别。模型性能使用接收者操作特征曲线下面积(AUROC)、精确度、召回率和 F1 分数进行评估。使用机会均等差异和差异影响指标对白人和非白人患者群体的公平性进行了评估:结果:在对并发症阴性患者进行低采样后,共纳入了 40858 名患者。就 AUROC 而言,XGBoost 模型是表现最好的模型,但差异在统计学上并不显著。虽然不同模型的 F1 得分和精确度差异不大,但 RF 与逻辑回归相比具有更好的召回率。手术类型是预测并发症的最重要特征,其次是手术时间。逻辑回归模型的种族公平性指标最好:XGBoost模型获得了最高的AUROC,尽管在统计学上没有显著差异。当召回率是主要考虑因素时,RF 可能会有用。对特权组进行低采样可能会提高提升树模型的公平性。
Predicting serious postoperative complications and evaluating racial fairness in machine learning algorithms for metabolic and bariatric surgery
Background
Predicting the risk of complications is critical in metabolic and bariatric surgery (MBS).
Objectives
To develop machine learning (ML) models to predict serious postoperative complications of MBS and evaluate racial fairness of the models.
Setting
Metabolic and Bariatric Surgery Accreditation and Quality Improvement Program (MBSAQIP) national database, United States.
Methods
We developed logistic regression, random forest (RF), gradient-boosted tree (GBT), and XGBoost model using the MBSAQIP Participant Use Data File from 2016 to 2020. To address the class imbalance, we randomly undersampled the complication-negative class to match the complication-positive class. Model performance was evaluated using the area under the receiver operating characteristic curve (AUROC), precision, recall, and F1 score. Fairness across White and non-White patient groups was assessed using equal opportunity difference and disparate impact metrics.
Results
A total of 40,858 patients were included after undersampling the complication-negative class. The XGBoost model was the best-performing model in terms of AUROC; however, the difference was not statistically significant. While the F1 score and precision did not vary significantly across models, the RF exhibited better recall compared to the logistic regression. Surgery type was the most important feature to predict complications, followed by operative time. The logistic regression model had the best fairness metrics for race.
Conclusions
The XGBoost model achieved the highest AUROC, albeit without a statistically significant difference. The RF may be useful when recall is the primary concern. Undersampling of the privileged group may improve the fairness of boosted tree models.
期刊介绍:
Surgery for Obesity and Related Diseases (SOARD), The Official Journal of the American Society for Metabolic and Bariatric Surgery (ASMBS) and the Brazilian Society for Bariatric Surgery, is an international journal devoted to the publication of peer-reviewed manuscripts of the highest quality with objective data regarding techniques for the treatment of severe obesity. Articles document the effects of surgically induced weight loss on obesity physiological, psychiatric and social co-morbidities.