A. Andreychenko, A. D. Ermak, D. V. Gavrilov, R. Novitskiy, A. V. Gusev
{"title":"Development and validation of machine learning models to predict unplanned hospitalizations of patients with diabetes within the next 12 months","authors":"A. Andreychenko, A. D. Ermak, D. V. Gavrilov, R. Novitskiy, A. V. Gusev","doi":"10.14341/dm13065","DOIUrl":null,"url":null,"abstract":"BACKGROUND: The incidence of diabetes mellitus (DM) both in the Russian Federation and in the world has been steadily increasing for several decades. Stable population growth and current epidemiological characteristics of DM lead to enormous economic costs and significant social losses throughout the world. The disease often progresses with the development of specific complications, while significantly increasing the likelihood of hospitalization. The creation and inference of a machine learning model for predicting hospitalizations of patients with DM to an inpatient medical facility will make it possible to personalize the provision of medical care and optimize the load on the entire healthcare system.AIM: Development and validation of models for predicting unplanned hospitalizations of patients with diabetes due to the disease itself and its complications using machine learning algorithms and data from real clinical practice.MATERIALS AND METHODS: 170,141 depersonalized electronic health records of 23,742 diabetic patients were included in the study. Anamnestic, constitutional, clinical, instrumental and laboratory data, widely used in routine medical practice, were considered as potential predictors, a total of 33 signs. Logistic regression (LR), gradient boosting methods (LightGBM, XGBoost, CatBoost), decision tree-based methods (RandomForest and ExtraTrees), and a neural network-based algorithm (Multi-layer Perceptron) were compared. External validation was performed on the data of the separate region of Russian Federation.RESULTS: The best results and stability to external validation data were shown by the LightGBM model with an AUC of 0.818 (95% CI 0.802–0.834) in internal testing and 0.802 (95% CI 0.773–0.832) in external validation.CONCLUSION: The metrics of the best model were superior to previously published studies. The results of external validation showed the relative stability of the model to new data from another region, that reflects the possibility of the model’s application in real clinical practice.","PeriodicalId":11327,"journal":{"name":"Diabetes Mellitus","volume":null,"pages":null},"PeriodicalIF":0.7000,"publicationDate":"2024-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Diabetes Mellitus","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14341/dm13065","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"ENDOCRINOLOGY & METABOLISM","Score":null,"Total":0}
引用次数: 0
Abstract
BACKGROUND: The incidence of diabetes mellitus (DM) both in the Russian Federation and in the world has been steadily increasing for several decades. Stable population growth and current epidemiological characteristics of DM lead to enormous economic costs and significant social losses throughout the world. The disease often progresses with the development of specific complications, while significantly increasing the likelihood of hospitalization. The creation and inference of a machine learning model for predicting hospitalizations of patients with DM to an inpatient medical facility will make it possible to personalize the provision of medical care and optimize the load on the entire healthcare system.AIM: Development and validation of models for predicting unplanned hospitalizations of patients with diabetes due to the disease itself and its complications using machine learning algorithms and data from real clinical practice.MATERIALS AND METHODS: 170,141 depersonalized electronic health records of 23,742 diabetic patients were included in the study. Anamnestic, constitutional, clinical, instrumental and laboratory data, widely used in routine medical practice, were considered as potential predictors, a total of 33 signs. Logistic regression (LR), gradient boosting methods (LightGBM, XGBoost, CatBoost), decision tree-based methods (RandomForest and ExtraTrees), and a neural network-based algorithm (Multi-layer Perceptron) were compared. External validation was performed on the data of the separate region of Russian Federation.RESULTS: The best results and stability to external validation data were shown by the LightGBM model with an AUC of 0.818 (95% CI 0.802–0.834) in internal testing and 0.802 (95% CI 0.773–0.832) in external validation.CONCLUSION: The metrics of the best model were superior to previously published studies. The results of external validation showed the relative stability of the model to new data from another region, that reflects the possibility of the model’s application in real clinical practice.