基于血常规和生化检测数据构建心血管泛疾病机器学习诊断模型。

IF 8.5 1区医学 Q1 CARDIAC & CARDIOVASCULAR SYSTEMS Cardiovascular Diabetology Pub Date : 2024-09-28 DOI:10.1186/s12933-024-02439-0

Zhicheng Wang, Ying Gu, Lindan Huang, Shuai Liu, Qun Chen, Yunyun Yang, Guolin Hong, Wanshan Ning

{"title":"基于血常规和生化检测数据构建心血管泛疾病机器学习诊断模型。","authors":"Zhicheng Wang, Ying Gu, Lindan Huang, Shuai Liu, Qun Chen, Yunyun Yang, Guolin Hong, Wanshan Ning","doi":"10.1186/s12933-024-02439-0","DOIUrl":null,"url":null,"abstract":"Background: Cardiovascular disease, also known as circulation system disease, remains the leading cause of morbidity and mortality worldwide. Traditional methods for diagnosing cardiovascular disease are often expensive and time-consuming. So the purpose of this study is to construct machine learning models for the diagnosis of cardiovascular diseases using easily accessible blood routine and biochemical detection data and explore the unique hematologic features of cardiovascular diseases, including some metabolic indicators.Methods: After the data preprocessing, 25,794 healthy people and 32,822 circulation system disease patients with the blood routine and biochemical detection data were utilized for our study. We selected logistic regression, random forest, support vector machine, eXtreme Gradient Boosting (XGBoost), and deep neural network to construct models. Finally, the SHAP algorithm was used to interpret models.Results: The circulation system disease prediction model constructed by XGBoost possessed the best performance (AUC: 0.9921 (0.9911-0.9930); Acc: 0.9618 (0.9588-0.9645); Sn: 0.9690 (0.9655-0.9723); Sp: 0.9526 (0.9477-0.9572); PPV: 0.9631 (0.9592-0.9668); NPV: 0.9600 (0.9556-0.9644); MCC: 0.9224 (0.9165-0.9279); F1 score: 0.9661 (0.9634-0.9686)). Most models of distinguishing various circulation system diseases also had good performance, the model performance of distinguishing dilated cardiomyopathy from other circulation system diseases was the best (AUC: 0.9267 (0.8663-0.9752)). The model interpretation by the SHAP algorithm indicated features from biochemical detection made major contributions to predicting circulation system disease, such as potassium (K), total protein (TP), albumin (ALB), and indirect bilirubin (NBIL). But for models of distinguishing various circulation system diseases, we found that red blood cell count (RBC), K, direct bilirubin (DBIL), and glucose (GLU) were the top 4 features subdividing various circulation system diseases.Conclusions: The present study constructed multiple models using 50 features from the blood routine and biochemical detection data for the diagnosis of various circulation system diseases. At the same time, the unique hematologic features of various circulation system diseases, including some metabolic-related indicators, were also explored. This cost-effective work will benefit more people and help diagnose and prevent circulation system diseases.","PeriodicalId":9374,"journal":{"name":"Cardiovascular Diabetology","volume":"23 1","pages":"351"},"PeriodicalIF":8.5000,"publicationDate":"2024-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11439295/pdf/","citationCount":"0","resultStr":"{\"title\":\"Construction of machine learning diagnostic models for cardiovascular pan-disease based on blood routine and biochemical detection data.\",\"authors\":\"Zhicheng Wang, Ying Gu, Lindan Huang, Shuai Liu, Qun Chen, Yunyun Yang, Guolin Hong, Wanshan Ning\",\"doi\":\"10.1186/s12933-024-02439-0\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Cardiovascular disease, also known as circulation system disease, remains the leading cause of morbidity and mortality worldwide. Traditional methods for diagnosing cardiovascular disease are often expensive and time-consuming. So the purpose of this study is to construct machine learning models for the diagnosis of cardiovascular diseases using easily accessible blood routine and biochemical detection data and explore the unique hematologic features of cardiovascular diseases, including some metabolic indicators.Methods: After the data preprocessing, 25,794 healthy people and 32,822 circulation system disease patients with the blood routine and biochemical detection data were utilized for our study. We selected logistic regression, random forest, support vector machine, eXtreme Gradient Boosting (XGBoost), and deep neural network to construct models. Finally, the SHAP algorithm was used to interpret models.Results: The circulation system disease prediction model constructed by XGBoost possessed the best performance (AUC: 0.9921 (0.9911-0.9930); Acc: 0.9618 (0.9588-0.9645); Sn: 0.9690 (0.9655-0.9723); Sp: 0.9526 (0.9477-0.9572); PPV: 0.9631 (0.9592-0.9668); NPV: 0.9600 (0.9556-0.9644); MCC: 0.9224 (0.9165-0.9279); F1 score: 0.9661 (0.9634-0.9686)). Most models of distinguishing various circulation system diseases also had good performance, the model performance of distinguishing dilated cardiomyopathy from other circulation system diseases was the best (AUC: 0.9267 (0.8663-0.9752)). The model interpretation by the SHAP algorithm indicated features from biochemical detection made major contributions to predicting circulation system disease, such as potassium (K), total protein (TP), albumin (ALB), and indirect bilirubin (NBIL). But for models of distinguishing various circulation system diseases, we found that red blood cell count (RBC), K, direct bilirubin (DBIL), and glucose (GLU) were the top 4 features subdividing various circulation system diseases.Conclusions: The present study constructed multiple models using 50 features from the blood routine and biochemical detection data for the diagnosis of various circulation system diseases. At the same time, the unique hematologic features of various circulation system diseases, including some metabolic-related indicators, were also explored. This cost-effective work will benefit more people and help diagnose and prevent circulation system diseases.\",\"PeriodicalId\":9374,\"journal\":{\"name\":\"Cardiovascular Diabetology\",\"volume\":\"23 1\",\"pages\":\"351\"},\"PeriodicalIF\":8.5000,\"publicationDate\":\"2024-09-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11439295/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Cardiovascular Diabetology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1186/s12933-024-02439-0\",\"RegionNum\":1,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CARDIAC & CARDIOVASCULAR SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cardiovascular Diabetology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12933-024-02439-0","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CARDIAC & CARDIOVASCULAR SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

背景：心血管疾病又称循环系统疾病，仍然是全球发病率和死亡率的主要原因。诊断心血管疾病的传统方法通常既昂贵又耗时。因此，本研究的目的是利用易于获取的血常规和生化检测数据，构建用于诊断心血管疾病的机器学习模型，并探索心血管疾病的独特血液学特征，包括一些代谢指标：经过数据预处理后，25794 名健康人和 32822 名循环系统疾病患者的血常规和生化检测数据被用于我们的研究。我们选择了逻辑回归、随机森林、支持向量机、极梯度提升（XGBoost）和深度神经网络来构建模型。最后，使用 SHAP 算法对模型进行解释：XGBoost构建的循环系统疾病预测模型性能最佳（AUC：0.9921 (0.9911-0.9930); Acc: 0.9618 (0.9588-0.9645); Sn：0.9690（0.9655-0.9723）；Sp：0.9526（0.9477-0.9572）；PPV：0.9631（0.9592-0.9668）；NPV：0.9600（0.9556-0.9644）；MCC：0.9224（0.9165-0.9279）；F1 评分：0.9661（0.9634-0.9686））。大多数区分各种循环系统疾病的模型也具有良好的性能，其中区分扩张型心肌病和其他循环系统疾病的模型性能最好（AUC：0.9267 (0.8663-0.9752)).SHAP 算法对模型的解释表明，生化检测特征对循环系统疾病的预测做出了重要贡献，如钾（K）、总蛋白（TP）、白蛋白（ALB）和间接胆红素（NBIL）。但在区分各种循环系统疾病的模型中，我们发现红细胞计数（RBC）、K、直接胆红素（DBIL）和葡萄糖（GLU）是细分各种循环系统疾病的前 4 个特征：本研究利用血常规和生化检测数据中的 50 个特征构建了多个模型，用于诊断各种循环系统疾病。结论：本研究利用血常规和生化检测数据中的 50 个特征构建了多个模型，用于诊断各种循环系统疾病，同时还探索了各种循环系统疾病的独特血液学特征，包括一些与代谢相关的指标。这项具有成本效益的工作将造福更多的人，帮助诊断和预防循环系统疾病。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Construction of machine learning diagnostic models for cardiovascular pan-disease based on blood routine and biochemical detection data.

Background: Cardiovascular disease, also known as circulation system disease, remains the leading cause of morbidity and mortality worldwide. Traditional methods for diagnosing cardiovascular disease are often expensive and time-consuming. So the purpose of this study is to construct machine learning models for the diagnosis of cardiovascular diseases using easily accessible blood routine and biochemical detection data and explore the unique hematologic features of cardiovascular diseases, including some metabolic indicators.

Methods: After the data preprocessing, 25,794 healthy people and 32,822 circulation system disease patients with the blood routine and biochemical detection data were utilized for our study. We selected logistic regression, random forest, support vector machine, eXtreme Gradient Boosting (XGBoost), and deep neural network to construct models. Finally, the SHAP algorithm was used to interpret models.

Results: The circulation system disease prediction model constructed by XGBoost possessed the best performance (AUC: 0.9921 (0.9911-0.9930); Acc: 0.9618 (0.9588-0.9645); Sn: 0.9690 (0.9655-0.9723); Sp: 0.9526 (0.9477-0.9572); PPV: 0.9631 (0.9592-0.9668); NPV: 0.9600 (0.9556-0.9644); MCC: 0.9224 (0.9165-0.9279); F1 score: 0.9661 (0.9634-0.9686)). Most models of distinguishing various circulation system diseases also had good performance, the model performance of distinguishing dilated cardiomyopathy from other circulation system diseases was the best (AUC: 0.9267 (0.8663-0.9752)). The model interpretation by the SHAP algorithm indicated features from biochemical detection made major contributions to predicting circulation system disease, such as potassium (K), total protein (TP), albumin (ALB), and indirect bilirubin (NBIL). But for models of distinguishing various circulation system diseases, we found that red blood cell count (RBC), K, direct bilirubin (DBIL), and glucose (GLU) were the top 4 features subdividing various circulation system diseases.

Conclusions: The present study constructed multiple models using 50 features from the blood routine and biochemical detection data for the diagnosis of various circulation system diseases. At the same time, the unique hematologic features of various circulation system diseases, including some metabolic-related indicators, were also explored. This cost-effective work will benefit more people and help diagnose and prevent circulation system diseases.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Cardiovascular Diabetology 医学-内分泌学与代谢

CiteScore

12.30

自引率

15.10%

发文量

240

审稿时长

1 months

期刊介绍： Cardiovascular Diabetology is a journal that welcomes manuscripts exploring various aspects of the relationship between diabetes, cardiovascular health, and the metabolic syndrome. We invite submissions related to clinical studies, genetic investigations, experimental research, pharmacological studies, epidemiological analyses, and molecular biology research in this field.