{"title":"Leveraging multivariate analysis and adjusted mutual information to improve stroke prediction and interpretability.","authors":"Moutasem S Aboonq, Saeed A Alqahtani","doi":"10.17712/nsj.2024.3.20230100","DOIUrl":null,"url":null,"abstract":"<p><strong>Objectives: </strong>To develop a machine learning model to accurately predict stroke risk based on demographic and clinical data. It also sought to identify the most significant stroke risk factors and determine the optimal machine learning algorithm for stroke prediction.</p><p><strong>Methods: </strong>This cross-sectional study analyzed data on 438,693 adults from the 2021 Behavioral Risk Factor Surveillance System. Features encompassed demographics and clinical factors. Descriptive analysis profiled the dataset. Logistic regression quantified risk relationships. Adjusted mutual information evaluated feature importance. Multiple machine learning models were built and evaluated on metrics like accuracy, AUC ROC, and F1 score.</p><p><strong>Results: </strong>Key factors significantly associated with higher stroke odds included older age, diabetes, hypertension, high cholesterol, and history of myocardial infarction or angina. Random forest model achieved the best performance with accuracy of 72.46%, AUC ROC of 0.72, and F1 score of 0.74. Cross-validation confirmed its reliability. Top features were hypertension, myocardial infarction history, angina, age, diabetes status, and cholesterol.</p><p><strong>Conclusion: </strong>The random forest model robustly predicted stroke risk using demographic and clinical variables. Feature importance highlighted priorities like hypertension and diabetes for clinical monitoring and intervention. This could help enable data-driven stroke prevention strategies.</p>","PeriodicalId":19284,"journal":{"name":"Neurosciences","volume":"29 3","pages":"190-196"},"PeriodicalIF":1.2000,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11305345/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurosciences","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.17712/nsj.2024.3.20230100","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Objectives: To develop a machine learning model to accurately predict stroke risk based on demographic and clinical data. It also sought to identify the most significant stroke risk factors and determine the optimal machine learning algorithm for stroke prediction.
Methods: This cross-sectional study analyzed data on 438,693 adults from the 2021 Behavioral Risk Factor Surveillance System. Features encompassed demographics and clinical factors. Descriptive analysis profiled the dataset. Logistic regression quantified risk relationships. Adjusted mutual information evaluated feature importance. Multiple machine learning models were built and evaluated on metrics like accuracy, AUC ROC, and F1 score.
Results: Key factors significantly associated with higher stroke odds included older age, diabetes, hypertension, high cholesterol, and history of myocardial infarction or angina. Random forest model achieved the best performance with accuracy of 72.46%, AUC ROC of 0.72, and F1 score of 0.74. Cross-validation confirmed its reliability. Top features were hypertension, myocardial infarction history, angina, age, diabetes status, and cholesterol.
Conclusion: The random forest model robustly predicted stroke risk using demographic and clinical variables. Feature importance highlighted priorities like hypertension and diabetes for clinical monitoring and intervention. This could help enable data-driven stroke prevention strategies.
期刊介绍:
Neurosciences is an open access, peer-reviewed, quarterly publication. Authors are invited to submit for publication articles reporting original work related to the nervous system, e.g., neurology, neurophysiology, neuroradiology, neurosurgery, neurorehabilitation, neurooncology, neuropsychiatry, and neurogenetics, etc. Basic research withclear clinical implications will also be considered. Review articles of current interest and high standard are welcomed for consideration. Prospective workshould not be backdated. There are also sections for Case Reports, Brief Communication, Correspondence, and medical news items. To promote continuous education, training, and learning, we include Clinical Images and MCQ’s. Highlights of international and regional meetings of interest, and specialized supplements will also be considered. All submissions must conform to the Uniform Requirements.