{"title":"Intelligent diagnosis of Kawasaki disease from real-world data using interpretable machine learning models.","authors":"Yifan Duan, Ruiqi Wang, Zhilin Huang, Haoran Chen, Mingkun Tang, Jiayin Zhou, Zhengyong Hu, Wanfei Hu, Zhenli Chen, Qing Qian, Haolin Wang","doi":"10.1016/j.hjc.2024.08.003","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>This study aimed to leverage real-world electronic medical record data to develop interpretable machine learning models for diagnosis of Kawasaki disease while also exploring and prioritizing the significant risk factors.</p><p><strong>Methods: </strong>A comprehensive study was conducted on 4087 pediatric patients at the Children's Hospital of Chongqing, China. The study collected demographic data, physical examination results, and laboratory findings. Statistical analyses were performed using IBM SPSS Statistics, Version 26.0. The optimal feature subset was used to develop intelligent diagnostic prediction models based on the Light Gradient Boosting Machine, Explainable Boosting Machine (EBM), Gradient Boosting Classifier (GBC), Fast Interpretable Greedy-Tree Sums, Decision Tree, AdaBoost Classifier, and Logistic Regression. Model performance was evaluated in three dimensions: discriminative ability via receiver operating characteristic curves, calibration accuracy using calibration curves, and interpretability through SHAP (SHapley Additive exPlanations) and LIME (Local Interpretable Model-Agnostic Explanations).</p><p><strong>Results: </strong>In this study, Kawasaki disease was diagnosed in 2971 participants. Analysis was conducted on 31 indicators, including red blood cell distribution width and erythrocyte sedimentation rate. The EBM model demonstrated superior performance relative to other models, with an area under the curve of 0.97, second only to the GBC model. Furthermore, the EBM model exhibited the highest calibration accuracy and maintained its interpretability without relying on external analytical tools such as SHAP and LIME, thus reducing interpretation biases. Platelet distribution width, total protein, and erythrocyte sedimentation rate were identified by the model as significant predictors for the diagnosis of Kawasaki disease.</p><p><strong>Conclusion: </strong>This study used diverse machine learning models for early diagnosis of Kawasaki disease. The findings demonstrated that interpretable models such as EBM outperformed traditional machine learning models in terms of both interpretability and performance. Ensuring consistency between predictive models and clinical evidence is crucial for the successful integration of artificial intelligence into real-world clinical practice.</p>","PeriodicalId":55062,"journal":{"name":"Hellenic Journal of Cardiology","volume":" ","pages":""},"PeriodicalIF":2.7000,"publicationDate":"2024-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Hellenic Journal of Cardiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.hjc.2024.08.003","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CARDIAC & CARDIOVASCULAR SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Objective: This study aimed to leverage real-world electronic medical record data to develop interpretable machine learning models for diagnosis of Kawasaki disease while also exploring and prioritizing the significant risk factors.
Methods: A comprehensive study was conducted on 4087 pediatric patients at the Children's Hospital of Chongqing, China. The study collected demographic data, physical examination results, and laboratory findings. Statistical analyses were performed using IBM SPSS Statistics, Version 26.0. The optimal feature subset was used to develop intelligent diagnostic prediction models based on the Light Gradient Boosting Machine, Explainable Boosting Machine (EBM), Gradient Boosting Classifier (GBC), Fast Interpretable Greedy-Tree Sums, Decision Tree, AdaBoost Classifier, and Logistic Regression. Model performance was evaluated in three dimensions: discriminative ability via receiver operating characteristic curves, calibration accuracy using calibration curves, and interpretability through SHAP (SHapley Additive exPlanations) and LIME (Local Interpretable Model-Agnostic Explanations).
Results: In this study, Kawasaki disease was diagnosed in 2971 participants. Analysis was conducted on 31 indicators, including red blood cell distribution width and erythrocyte sedimentation rate. The EBM model demonstrated superior performance relative to other models, with an area under the curve of 0.97, second only to the GBC model. Furthermore, the EBM model exhibited the highest calibration accuracy and maintained its interpretability without relying on external analytical tools such as SHAP and LIME, thus reducing interpretation biases. Platelet distribution width, total protein, and erythrocyte sedimentation rate were identified by the model as significant predictors for the diagnosis of Kawasaki disease.
Conclusion: This study used diverse machine learning models for early diagnosis of Kawasaki disease. The findings demonstrated that interpretable models such as EBM outperformed traditional machine learning models in terms of both interpretability and performance. Ensuring consistency between predictive models and clinical evidence is crucial for the successful integration of artificial intelligence into real-world clinical practice.
期刊介绍:
The Hellenic Journal of Cardiology (International Edition, ISSN 1109-9666) is the official journal of the Hellenic Society of Cardiology and aims to publish high-quality articles on all aspects of cardiovascular medicine. A primary goal is to publish in each issue a number of original articles related to clinical and basic research. Many of these will be accompanied by invited editorial comments.
Hot topics, such as molecular cardiology, and innovative cardiac imaging and electrophysiological mapping techniques, will appear frequently in the journal in the form of invited expert articles or special reports. The Editorial Committee also attaches great importance to subjects related to continuing medical education, the implementation of guidelines and cost effectiveness in cardiology.