Jinxin Liu, Haoyue He, Yanglingxi Wang, Jun Du, Kaixin Liang, Jun Xue, Yidan Liang, Peng Chen, Shanshan Tian, Yongbing Deng
{"title":"Predictive models for secondary epilepsy in patients with acute ischemic stroke within one year.","authors":"Jinxin Liu, Haoyue He, Yanglingxi Wang, Jun Du, Kaixin Liang, Jun Xue, Yidan Liang, Peng Chen, Shanshan Tian, Yongbing Deng","doi":"10.7554/eLife.98759","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Post-stroke epilepsy (PSE) is a critical complication that worsens both prognosis and quality of life in patients with ischemic stroke. An interpretable machine learning model was developed to predict PSE using medical records from four hospitals in Chongqing.</p><p><strong>Methods: </strong>Medical records, imaging reports, and laboratory test results from 21,459 ischemic stroke patients were collected and analyzed. Univariable and multivariable statistical analyses identified key predictive factors. The dataset was split into a 70% training set and a 30% testing set. To address the class imbalance, the Synthetic Minority Oversampling Technique combined with Edited Nearest Neighbors was employed. Nine widely used machine learning algorithms were evaluated using relevant prediction metrics, with SHAP (SHapley Additive exPlanations) used to interpret the model and assess the contributions of different features.</p><p><strong>Results: </strong>Regression analyses revealed that complications such as hydrocephalus, cerebral hernia, and deep vein thrombosis, as well as specific brain regions (frontal, parietal, and temporal lobes), significantly contributed to PSE. Factors such as age, gender, NIH Stroke Scale (NIHSS) scores, and laboratory results like WBC count and D-dimer levels were associated with increased PSE risk. Tree-based methods like Random Forest, XGBoost, and LightGBM showed strong predictive performance, achieving an AUC of 0.99.</p><p><strong>Conclusions: </strong>The model accurately predicts PSE risk, with tree-based models demonstrating superior performance. NIHSS score, WBC count, and D-dimer were identified as the most crucial predictors.</p><p><strong>Funding: </strong>The research is funded by Central University basic research young teachers and students research ability promotion sub-projec t(2023CDJYGRH-ZD06), and by Emergency Medicine Chongqing Key Laboratory Talent Innovation and development joint fund project (2024RCCX10).</p>","PeriodicalId":11640,"journal":{"name":"eLife","volume":"13 ","pages":""},"PeriodicalIF":6.4000,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11563573/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"eLife","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.7554/eLife.98759","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Post-stroke epilepsy (PSE) is a critical complication that worsens both prognosis and quality of life in patients with ischemic stroke. An interpretable machine learning model was developed to predict PSE using medical records from four hospitals in Chongqing.
Methods: Medical records, imaging reports, and laboratory test results from 21,459 ischemic stroke patients were collected and analyzed. Univariable and multivariable statistical analyses identified key predictive factors. The dataset was split into a 70% training set and a 30% testing set. To address the class imbalance, the Synthetic Minority Oversampling Technique combined with Edited Nearest Neighbors was employed. Nine widely used machine learning algorithms were evaluated using relevant prediction metrics, with SHAP (SHapley Additive exPlanations) used to interpret the model and assess the contributions of different features.
Results: Regression analyses revealed that complications such as hydrocephalus, cerebral hernia, and deep vein thrombosis, as well as specific brain regions (frontal, parietal, and temporal lobes), significantly contributed to PSE. Factors such as age, gender, NIH Stroke Scale (NIHSS) scores, and laboratory results like WBC count and D-dimer levels were associated with increased PSE risk. Tree-based methods like Random Forest, XGBoost, and LightGBM showed strong predictive performance, achieving an AUC of 0.99.
Conclusions: The model accurately predicts PSE risk, with tree-based models demonstrating superior performance. NIHSS score, WBC count, and D-dimer were identified as the most crucial predictors.
Funding: The research is funded by Central University basic research young teachers and students research ability promotion sub-projec t(2023CDJYGRH-ZD06), and by Emergency Medicine Chongqing Key Laboratory Talent Innovation and development joint fund project (2024RCCX10).
期刊介绍:
eLife is a distinguished, not-for-profit, peer-reviewed open access scientific journal that specializes in the fields of biomedical and life sciences. eLife is known for its selective publication process, which includes a variety of article types such as:
Research Articles: Detailed reports of original research findings.
Short Reports: Concise presentations of significant findings that do not warrant a full-length research article.
Tools and Resources: Descriptions of new tools, technologies, or resources that facilitate scientific research.
Research Advances: Brief reports on significant scientific advancements that have immediate implications for the field.
Scientific Correspondence: Short communications that comment on or provide additional information related to published articles.
Review Articles: Comprehensive overviews of a specific topic or field within the life sciences.