Predicting chronic kidney disease progression using small pathology datasets and explainable machine learning models

Sandeep Reddy , Supriya Roy , Kay Weng Choy , Sourav Sharma , Karen M Dwyer , Chaitanya Manapragada , Zane Miller , Joy Cheon , Bahareh Nakisa
{"title":"Predicting chronic kidney disease progression using small pathology datasets and explainable machine learning models","authors":"Sandeep Reddy ,&nbsp;Supriya Roy ,&nbsp;Kay Weng Choy ,&nbsp;Sourav Sharma ,&nbsp;Karen M Dwyer ,&nbsp;Chaitanya Manapragada ,&nbsp;Zane Miller ,&nbsp;Joy Cheon ,&nbsp;Bahareh Nakisa","doi":"10.1016/j.cmpbup.2024.100160","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><p>Chronic kidney disease (CKD) poses a major global public health burden, with over 700 million affected. Early identification of those in whom the disease is likely to progress enables timely therapeutic interventions to delay advancement to kidney failure.</p></div><div><h3>Methods</h3><p>This study developed explainable machine learning models leveraging pathology data to accurately predict CKD trajectory, targeting improved prognostic capability even in early stages using limited datasets. Key variables used in this study include age, gender, most recent estimated glomerular filtration rate (eGFR), mean eGFR, and eGFR slope over time prior to the incidence of kidney failure. Supervised classification modelling techniques included decision tree and random forest algorithms selected for interpretability. Internal validation on an Australian tertiary centre cohort (<em>n</em> = 706; 353 with kidney failure and 353 without) achieved exceptional predictive accuracy. To address the inherent class imbalance, centroid-cluster-based under-sampling was applied to the Australian dataset. For external validation, the model was applied to a dataset (<em>n</em> = 597 adults) sourced from a Japanese CKD registry. Transfer learning was subsequently employed by fine-tuning machine learning models on 15 % of the external dataset (<em>n</em> = 89) before evaluating the remaining 508 patients.</p></div><div><h3>Results</h3><p>Internal validation achieved exceptional predictive accuracy, with the area under the receiver operating characteristic curve (ROC-AUC) reaching 0.94 and 0.98 on the binary task of predicting kidney failure for decision tree and random forest, respectively. External validation demonstrated performant results with an ROC-AUC of 0.88 for the decision tree and 0.93 for the random forest model. Decision tree model analysis revealed the most recent eGFR and eGFR slope as the most informative variables for prediction in the Japanese cohort.</p></div><div><h3>Conclusion</h3><p>The research highlights the utility of deploying explainable machine learning techniques to forecast CKD trajectory even in the early stages utilising limited real-world datasets.</p></div>","PeriodicalId":72670,"journal":{"name":"Computer methods and programs in biomedicine update","volume":"6 ","pages":"Article 100160"},"PeriodicalIF":0.0000,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666990024000272/pdfft?md5=990fdaf12f5d28d2cae65af47c229654&pid=1-s2.0-S2666990024000272-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer methods and programs in biomedicine update","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666990024000272","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Background

Chronic kidney disease (CKD) poses a major global public health burden, with over 700 million affected. Early identification of those in whom the disease is likely to progress enables timely therapeutic interventions to delay advancement to kidney failure.

Methods

This study developed explainable machine learning models leveraging pathology data to accurately predict CKD trajectory, targeting improved prognostic capability even in early stages using limited datasets. Key variables used in this study include age, gender, most recent estimated glomerular filtration rate (eGFR), mean eGFR, and eGFR slope over time prior to the incidence of kidney failure. Supervised classification modelling techniques included decision tree and random forest algorithms selected for interpretability. Internal validation on an Australian tertiary centre cohort (n = 706; 353 with kidney failure and 353 without) achieved exceptional predictive accuracy. To address the inherent class imbalance, centroid-cluster-based under-sampling was applied to the Australian dataset. For external validation, the model was applied to a dataset (n = 597 adults) sourced from a Japanese CKD registry. Transfer learning was subsequently employed by fine-tuning machine learning models on 15 % of the external dataset (n = 89) before evaluating the remaining 508 patients.

Results

Internal validation achieved exceptional predictive accuracy, with the area under the receiver operating characteristic curve (ROC-AUC) reaching 0.94 and 0.98 on the binary task of predicting kidney failure for decision tree and random forest, respectively. External validation demonstrated performant results with an ROC-AUC of 0.88 for the decision tree and 0.93 for the random forest model. Decision tree model analysis revealed the most recent eGFR and eGFR slope as the most informative variables for prediction in the Japanese cohort.

Conclusion

The research highlights the utility of deploying explainable machine learning techniques to forecast CKD trajectory even in the early stages utilising limited real-world datasets.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用小型病理数据集和可解释的机器学习模型预测慢性肾病进展
背景慢性肾脏病(CKD)是全球主要的公共卫生负担,有超过 7 亿人受到影响。本研究开发了可解释的机器学习模型,利用病理数据准确预测 CKD 的发展轨迹,目的是利用有限的数据集提高早期阶段的预后能力。本研究使用的关键变量包括年龄、性别、最近估计的肾小球滤过率(eGFR)、平均eGFR和肾衰竭发生前一段时间的eGFR斜率。有监督的分类建模技术包括决策树和随机森林算法,这些算法是为了提高可解释性而选择的。在澳大利亚三级中心队列(n = 706;353 例肾衰竭患者和 353 例非肾衰竭患者)中进行的内部验证获得了极高的预测准确性。为了解决固有的类别不平衡问题,对澳大利亚数据集采用了基于中心簇的低采样。为了进行外部验证,该模型被应用于来自日本慢性肾功能衰竭登记处的数据集(n = 597 名成人)。结果内部验证取得了优异的预测准确性,在预测肾衰竭的二元任务上,决策树和随机森林的接收者操作特征曲线下面积(ROC-AUC)分别达到了0.94和0.98。外部验证结果表明,决策树的 ROC-AUC 为 0.88,随机森林模型的 ROC-AUC 为 0.93。决策树模型分析表明,在日本队列中,最近的 eGFR 和 eGFR 斜率是最有参考价值的预测变量。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
5.90
自引率
0.00%
发文量
0
审稿时长
10 weeks
期刊最新文献
Fostering digital health literacy to enhance trust and improve health outcomes Machine learning from real data: A mental health registry case study ResfEANet: ResNet-fused External Attention Network for Tuberculosis Diagnosis using Chest X-ray Images Role-playing recovery in social virtual worlds: Adult use of child avatars as PTSD therapy Comparative evaluation of low-cost 3D scanning devices for ear acquisition
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1