Development and validation of an interpretable machine learning model for predicting post-stroke epilepsy

IF 2 4区医学 Q3 CLINICAL NEUROLOGY Epilepsy Research Pub Date : 2024-06-28 DOI:10.1016/j.eplepsyres.2024.107397

Yue Yu , Zhibin Chen , Yong Yang , Jiajun Zhang , Yan Wang

{"title":"Development and validation of an interpretable machine learning model for predicting post-stroke epilepsy","authors":"Yue Yu , Zhibin Chen , Yong Yang , Jiajun Zhang , Yan Wang","doi":"10.1016/j.eplepsyres.2024.107397","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><p>Epilepsy is a serious complication after an ischemic stroke. Although two studies have developed prediction model for post-stroke epilepsy (PSE), their accuracy remains insufficient, and their applicability to different populations is uncertain. With the rapid advancement of computer technology, machine learning (ML) offers new opportunities for creating more accurate prediction models. However, the potential of ML in predicting PSE is still not well understood. The purpose of this study was to develop prediction models for PSE among ischemic stroke patients.</p></div><div><h3>Methods</h3><p>Patients with ischemic stroke from two stroke centers were included in this retrospective cohort study. At the baseline level, 33 input variables were considered candidate features. The 2-year PSE prediction models in the derivation cohort were built using six ML algorithms. The predictive performance of these machine learning models required further appraisal and comparison with the reference model using the conventional triage classification information. The Shapley additive explanation (SHAP), based on fair profit allocation among many stakeholders according to their contributions, is used to interpret the predicted outcomes of the naive Bayes (NB) model.</p></div><div><h3>Results</h3><p>A total of 1977 patients were included to build the predictive model for PSE. The Boruta method identified NIHSS score, hospital length of stay, D-dimer level, and cortical involvement as the optimal features, with the receiver operating characteristic curves ranging from 0.709 to 0.849. An additional 870 patients were used to validate the ML and reference models. The NB model achieved the best performance among the PSE prediction models with an area under the receiver operating curve of 0.757. At the 20 % absolute risk threshold, the NB model also provided a sensitivity of 0.739 and a specificity of 0.720. The reference model had poor sensitivities of only 0.15 despite achieving a helpful AUC of 0.732. Furthermore, the SHAP method analysis demonstrated that a higher NIHSS score, longer hospital length of stay, higher D-dimer level, and cortical involvement were positive predictors of epilepsy after ischemic stroke.</p></div><div><h3>Conclusions</h3><p>Our study confirmed the feasibility of applying the ML method to use easy-to-obtain variables for accurate prediction of PSE and provided improved strategies and effective resource allocation for high-risk patients. In addition, the SHAP method could improve model transparency and make it easier for clinicians to grasp the prediction model's reliability.</p></div>","PeriodicalId":11914,"journal":{"name":"Epilepsy Research","volume":"205 ","pages":"Article 107397"},"PeriodicalIF":2.0000,"publicationDate":"2024-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Epilepsy Research","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0920121124001128","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Background

Epilepsy is a serious complication after an ischemic stroke. Although two studies have developed prediction model for post-stroke epilepsy (PSE), their accuracy remains insufficient, and their applicability to different populations is uncertain. With the rapid advancement of computer technology, machine learning (ML) offers new opportunities for creating more accurate prediction models. However, the potential of ML in predicting PSE is still not well understood. The purpose of this study was to develop prediction models for PSE among ischemic stroke patients.

Methods

Patients with ischemic stroke from two stroke centers were included in this retrospective cohort study. At the baseline level, 33 input variables were considered candidate features. The 2-year PSE prediction models in the derivation cohort were built using six ML algorithms. The predictive performance of these machine learning models required further appraisal and comparison with the reference model using the conventional triage classification information. The Shapley additive explanation (SHAP), based on fair profit allocation among many stakeholders according to their contributions, is used to interpret the predicted outcomes of the naive Bayes (NB) model.

Results

A total of 1977 patients were included to build the predictive model for PSE. The Boruta method identified NIHSS score, hospital length of stay, D-dimer level, and cortical involvement as the optimal features, with the receiver operating characteristic curves ranging from 0.709 to 0.849. An additional 870 patients were used to validate the ML and reference models. The NB model achieved the best performance among the PSE prediction models with an area under the receiver operating curve of 0.757. At the 20 % absolute risk threshold, the NB model also provided a sensitivity of 0.739 and a specificity of 0.720. The reference model had poor sensitivities of only 0.15 despite achieving a helpful AUC of 0.732. Furthermore, the SHAP method analysis demonstrated that a higher NIHSS score, longer hospital length of stay, higher D-dimer level, and cortical involvement were positive predictors of epilepsy after ischemic stroke.

Conclusions

Our study confirmed the feasibility of applying the ML method to use easy-to-obtain variables for accurate prediction of PSE and provided improved strategies and effective resource allocation for high-risk patients. In addition, the SHAP method could improve model transparency and make it easier for clinicians to grasp the prediction model's reliability.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

开发并验证用于预测中风后癫痫的可解释机器学习模型。

背景：癫痫是缺血性脑卒中后的一种严重并发症。虽然已有两项研究建立了脑卒中后癫痫（PSE）的预测模型，但其准确性仍然不足，对不同人群的适用性也不确定。随着计算机技术的飞速发展，机器学习（ML）为创建更准确的预测模型提供了新的机遇。然而，人们对机器学习在预测 PSE 方面的潜力仍不甚了解。本研究旨在开发缺血性脑卒中患者 PSE 的预测模型：这项回顾性队列研究纳入了来自两个卒中中心的缺血性卒中患者。在基线水平上，33 个输入变量被认为是候选特征。衍生队列中的 2 年 PSE 预测模型采用六种 ML 算法建立。这些机器学习模型的预测性能需要进一步评估，并与使用传统分诊分类信息的参考模型进行比较。沙普利加法解释（SHAP）是根据许多利益相关者的贡献在他们之间进行公平的利润分配，用来解释天真贝叶斯（NB）模型的预测结果：共有 1977 名患者被纳入 PSE 预测模型。Boruta 方法确定 NIHSS 评分、住院时间、D-二聚体水平和皮质受累为最佳特征，接收者操作特征曲线范围为 0.709 至 0.849。另外还使用了 870 名患者来验证 ML 模型和参考模型。在 PSE 预测模型中，NB 模型的性能最佳，接收器工作曲线下面积为 0.757。在 20% 绝对风险阈值下，NB 模型的灵敏度为 0.739，特异性为 0.720。参考模型的灵敏度较低，只有 0.15，尽管其 AUC 达到了 0.732。此外，SHAP方法分析表明，较高的NIHSS评分、较长的住院时间、较高的D-二聚体水平和皮质受累是缺血性卒中后癫痫的积极预测因素：我们的研究证实了应用 ML 方法使用易于获得的变量准确预测 PSE 的可行性，并为高危患者提供了改进策略和有效的资源分配。此外，SHAP 方法可以提高模型的透明度，使临床医生更容易掌握预测模型的可靠性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Epilepsy Research 医学-临床神经学

CiteScore

0.10

自引率

4.50%

发文量

143

审稿时长

62 days

期刊介绍： Epilepsy Research provides for publication of high quality articles in both basic and clinical epilepsy research, with a special emphasis on translational research that ultimately relates to epilepsy as a human condition. The journal is intended to provide a forum for reporting the best and most rigorous epilepsy research from all disciplines ranging from biophysics and molecular biology to epidemiological and psychosocial research. As such the journal will publish original papers relevant to epilepsy from any scientific discipline and also studies of a multidisciplinary nature. Clinical and experimental research papers adopting fresh conceptual approaches to the study of epilepsy and its treatment are encouraged. The overriding criteria for publication are novelty, significant clinical or experimental relevance, and interest to a multidisciplinary audience in the broad arena of epilepsy. Review articles focused on any topic of epilepsy research will also be considered, but only if they present an exceptionally clear synthesis of current knowledge and future directions of a research area, based on a critical assessment of the available data or on hypotheses that are likely to stimulate more critical thinking and further advances in an area of epilepsy research.