Intelligent diagnosis of Kawasaki disease from real-world data using interpretable machine learning models.

IF 2.7 3区医学 Q2 CARDIAC & CARDIOVASCULAR SYSTEMS Hellenic Journal of Cardiology Pub Date : 2024-08-10 DOI:10.1016/j.hjc.2024.08.003

Yifan Duan, Ruiqi Wang, Zhilin Huang, Haoran Chen, Mingkun Tang, Jiayin Zhou, Zhengyong Hu, Wanfei Hu, Zhenli Chen, Qing Qian, Haolin Wang

{"title":"Intelligent diagnosis of Kawasaki disease from real-world data using interpretable machine learning models.","authors":"Yifan Duan, Ruiqi Wang, Zhilin Huang, Haoran Chen, Mingkun Tang, Jiayin Zhou, Zhengyong Hu, Wanfei Hu, Zhenli Chen, Qing Qian, Haolin Wang","doi":"10.1016/j.hjc.2024.08.003","DOIUrl":null,"url":null,"abstract":"Objective: This study aimed to leverage real-world electronic medical record data to develop interpretable machine learning models for diagnosis of Kawasaki disease while also exploring and prioritizing the significant risk factors.Methods: A comprehensive study was conducted on 4087 pediatric patients at the Children's Hospital of Chongqing, China. The study collected demographic data, physical examination results, and laboratory findings. Statistical analyses were performed using IBM SPSS Statistics, Version 26.0. The optimal feature subset was used to develop intelligent diagnostic prediction models based on the Light Gradient Boosting Machine, Explainable Boosting Machine (EBM), Gradient Boosting Classifier (GBC), Fast Interpretable Greedy-Tree Sums, Decision Tree, AdaBoost Classifier, and Logistic Regression. Model performance was evaluated in three dimensions: discriminative ability via receiver operating characteristic curves, calibration accuracy using calibration curves, and interpretability through SHAP (SHapley Additive exPlanations) and LIME (Local Interpretable Model-Agnostic Explanations).Results: In this study, Kawasaki disease was diagnosed in 2971 participants. Analysis was conducted on 31 indicators, including red blood cell distribution width and erythrocyte sedimentation rate. The EBM model demonstrated superior performance relative to other models, with an area under the curve of 0.97, second only to the GBC model. Furthermore, the EBM model exhibited the highest calibration accuracy and maintained its interpretability without relying on external analytical tools such as SHAP and LIME, thus reducing interpretation biases. Platelet distribution width, total protein, and erythrocyte sedimentation rate were identified by the model as significant predictors for the diagnosis of Kawasaki disease.Conclusion: This study used diverse machine learning models for early diagnosis of Kawasaki disease. The findings demonstrated that interpretable models such as EBM outperformed traditional machine learning models in terms of both interpretability and performance. Ensuring consistency between predictive models and clinical evidence is crucial for the successful integration of artificial intelligence into real-world clinical practice.","PeriodicalId":55062,"journal":{"name":"Hellenic Journal of Cardiology","volume":" ","pages":""},"PeriodicalIF":2.7000,"publicationDate":"2024-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Hellenic Journal of Cardiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.hjc.2024.08.003","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CARDIAC & CARDIOVASCULAR SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Objective: This study aimed to leverage real-world electronic medical record data to develop interpretable machine learning models for diagnosis of Kawasaki disease while also exploring and prioritizing the significant risk factors.

Methods: A comprehensive study was conducted on 4087 pediatric patients at the Children's Hospital of Chongqing, China. The study collected demographic data, physical examination results, and laboratory findings. Statistical analyses were performed using IBM SPSS Statistics, Version 26.0. The optimal feature subset was used to develop intelligent diagnostic prediction models based on the Light Gradient Boosting Machine, Explainable Boosting Machine (EBM), Gradient Boosting Classifier (GBC), Fast Interpretable Greedy-Tree Sums, Decision Tree, AdaBoost Classifier, and Logistic Regression. Model performance was evaluated in three dimensions: discriminative ability via receiver operating characteristic curves, calibration accuracy using calibration curves, and interpretability through SHAP (SHapley Additive exPlanations) and LIME (Local Interpretable Model-Agnostic Explanations).

Results: In this study, Kawasaki disease was diagnosed in 2971 participants. Analysis was conducted on 31 indicators, including red blood cell distribution width and erythrocyte sedimentation rate. The EBM model demonstrated superior performance relative to other models, with an area under the curve of 0.97, second only to the GBC model. Furthermore, the EBM model exhibited the highest calibration accuracy and maintained its interpretability without relying on external analytical tools such as SHAP and LIME, thus reducing interpretation biases. Platelet distribution width, total protein, and erythrocyte sedimentation rate were identified by the model as significant predictors for the diagnosis of Kawasaki disease.

Conclusion: This study used diverse machine learning models for early diagnosis of Kawasaki disease. The findings demonstrated that interpretable models such as EBM outperformed traditional machine learning models in terms of both interpretability and performance. Ensuring consistency between predictive models and clinical evidence is crucial for the successful integration of artificial intelligence into real-world clinical practice.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

利用可解释的机器学习模型从真实世界数据中智能诊断川崎病

目的：本研究旨在利用真实世界的电子病历（EMR）数据开发可解释的川崎病诊断机器学习模型：本研究旨在利用真实世界的电子病历（EMR）数据开发可解释的川崎病诊断机器学习模型，同时探索并优先考虑重要的风险因素：方法：对中国重庆市儿童医院的 4087 名儿科患者进行了一项综合研究。研究收集了人口统计学数据、体格检查结果和实验室检查结果。使用 SPSS 26.0 进行统计分析。利用最优特征子集开发了基于光梯度提升机（LGBM）、可解释提升机（EBM）、梯度提升分类器（GBC）、快速可解释贪婪树和（FIGS）、决策树（DT）、AdaBoost 分类器（AdaBoost）和逻辑回归（LR）的智能诊断预测模型。模型性能从三个方面进行评估：通过接收者操作特征曲线评估分辨能力，通过校准曲线评估校准准确性，以及通过夏普利加法解释（SHAP）和本地可解释模型-诊断解释（LIME）评估可解释性：在这项研究中，2971 名参与者被诊断为川崎病。对31项指标进行了分析，包括红细胞分布宽度和红细胞沉降率。与其他模型相比，EBM 模型表现出更优越的性能，其曲线下面积（AUC）为 0.97，仅次于 GBC 模型。此外，EBM 模型的校准精度最高，无需依赖 SHAP 和 LIME 等外部分析工具即可保持其可解释性，从而减少了解释偏差。该模型将血小板分布宽度、总蛋白和红细胞沉降率确定为诊断川崎病的重要预测指标：本研究采用了多种机器学习模型来进行川崎病的早期诊断。研究结果表明，EBM 等可解释模型在可解释性和性能方面均优于传统的机器学习模型。确保预测模型与临床证据之间的一致性是人工智能成功融入现实世界临床实践的关键。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Hellenic Journal of Cardiology CARDIAC & CARDIOVASCULAR SYSTEMS-

CiteScore

4.90

自引率

7.30%

发文量

审稿时长

56 days

期刊介绍： The Hellenic Journal of Cardiology (International Edition, ISSN 1109-9666) is the official journal of the Hellenic Society of Cardiology and aims to publish high-quality articles on all aspects of cardiovascular medicine. A primary goal is to publish in each issue a number of original articles related to clinical and basic research. Many of these will be accompanied by invited editorial comments. Hot topics, such as molecular cardiology, and innovative cardiac imaging and electrophysiological mapping techniques, will appear frequently in the journal in the form of invited expert articles or special reports. The Editorial Committee also attaches great importance to subjects related to continuing medical education, the implementation of guidelines and cost effectiveness in cardiology.