Development and validation of interpretable machine learning models to predict glomerular filtration rate in chronic kidney disease Colombian patients.
Luis H Rojas, Angela J Pereira-Morales, William Amador, Albert Montenegro, Walberto Buelvas, Víctor de la Espriella
{"title":"Development and validation of interpretable machine learning models to predict glomerular filtration rate in chronic kidney disease Colombian patients.","authors":"Luis H Rojas, Angela J Pereira-Morales, William Amador, Albert Montenegro, Walberto Buelvas, Víctor de la Espriella","doi":"10.1177/00045632241285528","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>ML predictive models have shown their capability to improve risk prediction and assist medical decision-making, nevertheless, there is a lack of accuracy systems to early identify future rapid CKD progressors in Colombia and even in South America.</p><p><strong>Objective: </strong>The purpose of this study was to develop a series of interpretable machine learning models that predict GFR at 6-months, 9-months, and 12-months.</p><p><strong>Study design and setting: </strong>Over 29,000 CKD patients stage 1 to 3b (estimated GFR, <60 mL/min/1.73 m<sup>2</sup>) with an average of 3-year follow-up data were included. We used the machine learning extreme gradient boosting (XGBoost) to build three models to predict the next eGFR. Models were internally and externally validated. In addition, we included SHapley Additive exPlanation (SHAP) values to offer interpretable global and local prediction models.</p><p><strong>Results: </strong>All models showed a good performance in development and external validation. However, the 6-months XGBoost prediction model showed the best performance in internal (MAE average = 6.07; RSME = 78.87), and in external validation (MAE average = 6.45, RSME = 18.94). The top 3 most influential features that pushed the predicted eGFR value to lower values were the interpolated values for eGFR and creatinine, and eGFR at baseline.</p><p><strong>Conclusion: </strong>In the current study we have developed and validated machine learning models to predict the next eGFR value at different intervals. Furthermore, we attempted to approach the need for prediction explanation by offering transparent predictions.</p>","PeriodicalId":8005,"journal":{"name":"Annals of Clinical Biochemistry","volume":" ","pages":"45632241285528"},"PeriodicalIF":2.1000,"publicationDate":"2024-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of Clinical Biochemistry","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1177/00045632241285528","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MEDICAL LABORATORY TECHNOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Background: ML predictive models have shown their capability to improve risk prediction and assist medical decision-making, nevertheless, there is a lack of accuracy systems to early identify future rapid CKD progressors in Colombia and even in South America.
Objective: The purpose of this study was to develop a series of interpretable machine learning models that predict GFR at 6-months, 9-months, and 12-months.
Study design and setting: Over 29,000 CKD patients stage 1 to 3b (estimated GFR, <60 mL/min/1.73 m2) with an average of 3-year follow-up data were included. We used the machine learning extreme gradient boosting (XGBoost) to build three models to predict the next eGFR. Models were internally and externally validated. In addition, we included SHapley Additive exPlanation (SHAP) values to offer interpretable global and local prediction models.
Results: All models showed a good performance in development and external validation. However, the 6-months XGBoost prediction model showed the best performance in internal (MAE average = 6.07; RSME = 78.87), and in external validation (MAE average = 6.45, RSME = 18.94). The top 3 most influential features that pushed the predicted eGFR value to lower values were the interpolated values for eGFR and creatinine, and eGFR at baseline.
Conclusion: In the current study we have developed and validated machine learning models to predict the next eGFR value at different intervals. Furthermore, we attempted to approach the need for prediction explanation by offering transparent predictions.
期刊介绍:
Annals of Clinical Biochemistry is the fully peer reviewed international journal of the Association for Clinical Biochemistry and Laboratory Medicine.
Annals of Clinical Biochemistry accepts papers that contribute to knowledge in all fields of laboratory medicine, especially those pertaining to the understanding, diagnosis and treatment of human disease. It publishes papers on clinical biochemistry, clinical audit, metabolic medicine, immunology, genetics, biotechnology, haematology, microbiology, computing and management where they have both biochemical and clinical relevance. Papers describing evaluation or implementation of commercial reagent kits or the performance of new analysers require substantial original information. Unless of exceptional interest and novelty, studies dealing with the redox status in various diseases are not generally considered within the journal''s scope. Studies documenting the association of single nucleotide polymorphisms (SNPs) with particular phenotypes will not normally be considered, given the greater strength of genome wide association studies (GWAS). Research undertaken in non-human animals will not be considered for publication in the Annals.
Annals of Clinical Biochemistry is also the official journal of NVKC (de Nederlandse Vereniging voor Klinische Chemie) and JSCC (Japan Society of Clinical Chemistry).