用于预测糖尿病视网膜病变患者出现明显肝纤维化风险的可解释机器学习模型。

IF 3.8 3区医学 Q2 MEDICAL INFORMATICS BMC Medical Informatics and Decision Making Pub Date : 2024-11-11 DOI:10.1186/s12911-024-02749-z

Gangfeng Zhu, Na Yang, Qiang Yi, Rui Xu, Liangjian Zheng, Yunlong Zhu, Junyan Li, Jie Che, Cixiang Chen, Zenghong Lu, Li Huang, Yi Xiang, Tianlei Zheng

{"title":"用于预测糖尿病视网膜病变患者出现明显肝纤维化风险的可解释机器学习模型。","authors":"Gangfeng Zhu, Na Yang, Qiang Yi, Rui Xu, Liangjian Zheng, Yunlong Zhu, Junyan Li, Jie Che, Cixiang Chen, Zenghong Lu, Li Huang, Yi Xiang, Tianlei Zheng","doi":"10.1186/s12911-024-02749-z","DOIUrl":null,"url":null,"abstract":"Background: Diabetic retinopathy (DR), a prevalent complication in patients with type 2 diabetes, has attracted increasing attention. Recent studies have explored a plausible association between retinopathy and significant liver fibrosis. The aim of this investigation was to develop a sophisticated machine learning (ML) model, leveraging comprehensive clinical datasets, to forecast the likelihood of significant liver fibrosis in patients with retinopathy and to interpret the ML model by applying the SHapley Additive exPlanations (SHAP) method.Methods: This inquiry was based on data from the National Health and Nutrition Examination Survey 2005-2008 cohort. Utilizing the Fibrosis-4 index (FIB-4), liver fibrosis was stratified across a spectrum of grades (F0-F4). The severity of retinopathy was determined using retinal imaging and segmented into four discrete gradations. A ten-fold cross-validation approach was used to gauge the propensity towards liver fibrosis. Eight ML methodologies were used: Extreme Gradient Boosting, Random Forest, multilayer perceptron, Support Vector Machines, Logistic Regression (LR), Plain Bayes, Decision Tree, and k-nearest neighbors. The efficacy of these models was gauged using metrics, such as the area under the curve (AUC). The SHAP method was deployed to unravel the intricacies of feature importance and explicate the inner workings of the ML model.Results: The analysis included 5,364 participants, of whom 2,116 (39.45%) exhibited notable liver fibrosis. Following random allocation, 3,754 individuals were assigned to the training set and 1,610 were allocated to the validation cohort. Nine variables were curated for integration into the ML model. Among the eight ML models scrutinized, the LR model attained zenith in both AUC (0.867, 95% CI: 0.855-0.878) and F1 score (0.749, 95% CI: 0.732-0.767). In internal validation, this model sustained its superiority, with an AUC of 0.850 and an F1 score of 0.736, surpassing all other ML models. The SHAP methodology unveils the foremost factors through importance ranking.Conclusion: Sophisticated ML models were crafted using clinical data to discern the propensity for significant liver fibrosis in patients with retinopathy and to intervene early.Practice implications: Improved early detection of liver fibrosis risk in retinopathy patients enhances clinical intervention outcomes.","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":"24 1","pages":"332"},"PeriodicalIF":3.8000,"publicationDate":"2024-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11552118/pdf/","citationCount":"0","resultStr":"{\"title\":\"Explainable machine learning model for predicting the risk of significant liver fibrosis in patients with diabetic retinopathy.\",\"authors\":\"Gangfeng Zhu, Na Yang, Qiang Yi, Rui Xu, Liangjian Zheng, Yunlong Zhu, Junyan Li, Jie Che, Cixiang Chen, Zenghong Lu, Li Huang, Yi Xiang, Tianlei Zheng\",\"doi\":\"10.1186/s12911-024-02749-z\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Diabetic retinopathy (DR), a prevalent complication in patients with type 2 diabetes, has attracted increasing attention. Recent studies have explored a plausible association between retinopathy and significant liver fibrosis. The aim of this investigation was to develop a sophisticated machine learning (ML) model, leveraging comprehensive clinical datasets, to forecast the likelihood of significant liver fibrosis in patients with retinopathy and to interpret the ML model by applying the SHapley Additive exPlanations (SHAP) method.Methods: This inquiry was based on data from the National Health and Nutrition Examination Survey 2005-2008 cohort. Utilizing the Fibrosis-4 index (FIB-4), liver fibrosis was stratified across a spectrum of grades (F0-F4). The severity of retinopathy was determined using retinal imaging and segmented into four discrete gradations. A ten-fold cross-validation approach was used to gauge the propensity towards liver fibrosis. Eight ML methodologies were used: Extreme Gradient Boosting, Random Forest, multilayer perceptron, Support Vector Machines, Logistic Regression (LR), Plain Bayes, Decision Tree, and k-nearest neighbors. The efficacy of these models was gauged using metrics, such as the area under the curve (AUC). The SHAP method was deployed to unravel the intricacies of feature importance and explicate the inner workings of the ML model.Results: The analysis included 5,364 participants, of whom 2,116 (39.45%) exhibited notable liver fibrosis. Following random allocation, 3,754 individuals were assigned to the training set and 1,610 were allocated to the validation cohort. Nine variables were curated for integration into the ML model. Among the eight ML models scrutinized, the LR model attained zenith in both AUC (0.867, 95% CI: 0.855-0.878) and F1 score (0.749, 95% CI: 0.732-0.767). In internal validation, this model sustained its superiority, with an AUC of 0.850 and an F1 score of 0.736, surpassing all other ML models. The SHAP methodology unveils the foremost factors through importance ranking.Conclusion: Sophisticated ML models were crafted using clinical data to discern the propensity for significant liver fibrosis in patients with retinopathy and to intervene early.Practice implications: Improved early detection of liver fibrosis risk in retinopathy patients enhances clinical intervention outcomes.\",\"PeriodicalId\":9340,\"journal\":{\"name\":\"BMC Medical Informatics and Decision Making\",\"volume\":\"24 1\",\"pages\":\"332\"},\"PeriodicalIF\":3.8000,\"publicationDate\":\"2024-11-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11552118/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMC Medical Informatics and Decision Making\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1186/s12911-024-02749-z\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"MEDICAL INFORMATICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medical Informatics and Decision Making","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12911-024-02749-z","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}

引用次数: 0

摘要

背景：糖尿病视网膜病变（DR）是 2 型糖尿病患者的一种常见并发症，已引起越来越多的关注。最近的研究探讨了视网膜病变与严重肝纤维化之间的合理关联。这项研究的目的是利用全面的临床数据集开发一个复杂的机器学习（ML）模型，预测视网膜病变患者出现明显肝纤维化的可能性，并通过应用SHAPLE Additive exPlanations（SHAP）方法解释ML模型：这项研究基于 2005-2008 年全国健康与营养调查的队列数据。利用纤维化-4指数（FIB-4），对肝纤维化进行了分级（F0-F4）。视网膜病变的严重程度通过视网膜成像确定，并分为四个离散等级。采用十倍交叉验证方法来衡量肝纤维化的倾向。共使用了八种 ML 方法：极端梯度提升、随机森林、多层感知器、支持向量机、逻辑回归 (LR)、朴素贝叶斯、决策树和 k 近邻。这些模型的功效是通过曲线下面积（AUC）等指标来衡量的。采用 SHAP 方法揭示了特征重要性的复杂性，并解释了 ML 模型的内部运作：分析包括 5,364 名参与者，其中 2,116 人（39.45%）表现出明显的肝纤维化。经过随机分配，3754 人被分配到训练集，1610 人被分配到验证群组。九个变量被策划整合到 ML 模型中。在仔细研究的八个 ML 模型中，LR 模型的 AUC（0.867，95% CI：0.855-0.878）和 F1 分数（0.749，95% CI：0.732-0.767）都达到了顶峰。在内部验证中，该模型继续保持其优势，AUC 为 0.850，F1 得分为 0.736，超过了所有其他 ML 模型。SHAP 方法通过重要性排序揭示了最重要的因素：结论：利用临床数据建立的复杂 ML 模型可识别视网膜病变患者的肝纤维化倾向并进行早期干预：实践意义：提高对视网膜病变患者肝纤维化风险的早期检测，可增强临床干预效果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Explainable machine learning model for predicting the risk of significant liver fibrosis in patients with diabetic retinopathy.

Background: Diabetic retinopathy (DR), a prevalent complication in patients with type 2 diabetes, has attracted increasing attention. Recent studies have explored a plausible association between retinopathy and significant liver fibrosis. The aim of this investigation was to develop a sophisticated machine learning (ML) model, leveraging comprehensive clinical datasets, to forecast the likelihood of significant liver fibrosis in patients with retinopathy and to interpret the ML model by applying the SHapley Additive exPlanations (SHAP) method.

Methods: This inquiry was based on data from the National Health and Nutrition Examination Survey 2005-2008 cohort. Utilizing the Fibrosis-4 index (FIB-4), liver fibrosis was stratified across a spectrum of grades (F0-F4). The severity of retinopathy was determined using retinal imaging and segmented into four discrete gradations. A ten-fold cross-validation approach was used to gauge the propensity towards liver fibrosis. Eight ML methodologies were used: Extreme Gradient Boosting, Random Forest, multilayer perceptron, Support Vector Machines, Logistic Regression (LR), Plain Bayes, Decision Tree, and k-nearest neighbors. The efficacy of these models was gauged using metrics, such as the area under the curve (AUC). The SHAP method was deployed to unravel the intricacies of feature importance and explicate the inner workings of the ML model.

Results: The analysis included 5,364 participants, of whom 2,116 (39.45%) exhibited notable liver fibrosis. Following random allocation, 3,754 individuals were assigned to the training set and 1,610 were allocated to the validation cohort. Nine variables were curated for integration into the ML model. Among the eight ML models scrutinized, the LR model attained zenith in both AUC (0.867, 95% CI: 0.855-0.878) and F1 score (0.749, 95% CI: 0.732-0.767). In internal validation, this model sustained its superiority, with an AUC of 0.850 and an F1 score of 0.736, surpassing all other ML models. The SHAP methodology unveils the foremost factors through importance ranking.

Conclusion: Sophisticated ML models were crafted using clinical data to discern the propensity for significant liver fibrosis in patients with retinopathy and to intervene early.

Practice implications: Improved early detection of liver fibrosis risk in retinopathy patients enhances clinical intervention outcomes.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

BMC Medical Informatics and Decision Making 医学-医学：信息

CiteScore

7.20

自引率

5.70%

发文量

297

审稿时长

1 months

期刊介绍： BMC Medical Informatics and Decision Making is an open access journal publishing original peer-reviewed research articles in relation to the design, development, implementation, use, and evaluation of health information technologies and decision-making for human health.