Marta Fernandes, Kaileigh Gallagher, Niels Turley, Aditya Gupta, M Brandon Westover, Aneesh B Singhal, Sahar F Zafar
{"title":"Automated extraction of post-stroke functional outcomes from unstructured electronic health records.","authors":"Marta Fernandes, Kaileigh Gallagher, Niels Turley, Aditya Gupta, M Brandon Westover, Aneesh B Singhal, Sahar F Zafar","doi":"10.1177/23969873251314340","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>Population level tracking of post-stroke functional outcomes is critical to guide interventions that reduce the burden of stroke-related disability. However, functional outcomes are often missing or documented in unstructured notes. We developed a natural language processing (NLP) model that reads electronic health records (EHR) notes to automatically determine the modified Rankin Scale (mRS).</p><p><strong>Method: </strong>We included consecutive patients (⩾18 years) with acute stroke admitted to our center (2015-2024). mRS scores were obtained from the Get With the Guidelines registry and clinical notes (if documented), and used as the gold standard to compare against NLP-generated scores. We used text-based features from notes, along with age, sex, discharge status, and outpatient follow-up to train a logistic regression for prediction of good (0-2) versus poor (3-6) mRS, and a linear regression for the full range of mRS scores. The models were trained for prediction of mRS at hospital discharge and post-discharge. The models were externally validated in a dataset of patients with brain injuries from a different healthcare center.</p><p><strong>Findings: </strong>We included 5307 patients, 5006 in train and test and 301 in validation; average age was 69 (SD 15) and 65 (SD 17) years, respectively; 47% female. The logistic regression achieved an area under the receiver operating curve (AUROC) of 0.94 [CI 0.93-0.95] (test) and 0.94 [0.91-0.96] (validation), and the linear model a root mean squared error (RMSE) of 0.91 [0.87-0.94] (test) and 1.17 [1.06-1.28] (validation).</p><p><strong>Discussion and conclusion: </strong>The NLP-based model is suitable for use in large-scale phenotyping of stroke functional outcomes and population health research.</p>","PeriodicalId":46821,"journal":{"name":"European Stroke Journal","volume":" ","pages":"23969873251314340"},"PeriodicalIF":5.8000,"publicationDate":"2025-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11752148/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Stroke Journal","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1177/23969873251314340","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Purpose: Population level tracking of post-stroke functional outcomes is critical to guide interventions that reduce the burden of stroke-related disability. However, functional outcomes are often missing or documented in unstructured notes. We developed a natural language processing (NLP) model that reads electronic health records (EHR) notes to automatically determine the modified Rankin Scale (mRS).
Method: We included consecutive patients (⩾18 years) with acute stroke admitted to our center (2015-2024). mRS scores were obtained from the Get With the Guidelines registry and clinical notes (if documented), and used as the gold standard to compare against NLP-generated scores. We used text-based features from notes, along with age, sex, discharge status, and outpatient follow-up to train a logistic regression for prediction of good (0-2) versus poor (3-6) mRS, and a linear regression for the full range of mRS scores. The models were trained for prediction of mRS at hospital discharge and post-discharge. The models were externally validated in a dataset of patients with brain injuries from a different healthcare center.
Findings: We included 5307 patients, 5006 in train and test and 301 in validation; average age was 69 (SD 15) and 65 (SD 17) years, respectively; 47% female. The logistic regression achieved an area under the receiver operating curve (AUROC) of 0.94 [CI 0.93-0.95] (test) and 0.94 [0.91-0.96] (validation), and the linear model a root mean squared error (RMSE) of 0.91 [0.87-0.94] (test) and 1.17 [1.06-1.28] (validation).
Discussion and conclusion: The NLP-based model is suitable for use in large-scale phenotyping of stroke functional outcomes and population health research.
目的:人群水平的脑卒中后功能结局跟踪对指导干预措施减轻脑卒中相关残疾负担至关重要。然而,功能性的结果经常被遗漏或记录在非结构化的笔记中。我们开发了一种自然语言处理(NLP)模型,该模型读取电子健康记录(EHR)笔记以自动确定修改后的兰金量表(mRS)。方法:我们纳入了连续入住我们中心(2015-2024)的急性中风患者(大于或等于18岁)。mRS评分从Get With the Guidelines注册表和临床记录(如果有记录)中获得,并用作与nlp生成的评分进行比较的金标准。我们使用病历中基于文本的特征,以及年龄、性别、出院状况和门诊随访来训练预测良好(0-2)和差(3-6)mRS的逻辑回归,并对mRS评分的整个范围进行线性回归。对模型进行训练以预测出院时和出院后的mRS。这些模型在来自不同医疗中心的脑损伤患者数据集中进行了外部验证。结果:我们纳入了5307例患者,5006例在训练和试验中,301例在验证中;平均年龄分别为69岁(SD 15)和65岁(SD 17);47%的女性。logistic回归的受试者工作曲线下面积(AUROC)为0.94 [CI 0.93-0.95](检验)和0.94[0.91-0.96](验证),线性模型的均方根误差(RMSE)为0.91[0.87-0.94](检验)和1.17[1.06-1.28](验证)。讨论与结论:基于nlp的模型适用于脑卒中功能结局的大规模表型分析和人群健康研究。
期刊介绍:
Launched in 2016 the European Stroke Journal (ESJ) is the official journal of the European Stroke Organisation (ESO), a professional non-profit organization with over 1,400 individual members, and affiliations to numerous related national and international societies. ESJ covers clinical stroke research from all fields, including clinical trials, epidemiology, primary and secondary prevention, diagnosis, acute and post-acute management, guidelines, translation of experimental findings into clinical practice, rehabilitation, organisation of stroke care, and societal impact. It is open to authors from all relevant medical and health professions. Article types include review articles, original research, protocols, guidelines, editorials and letters to the Editor. Through ESJ, authors and researchers have gained a new platform for the rapid and professional publication of peer reviewed scientific material of the highest standards; publication in ESJ is highly competitive. The journal and its editorial team has developed excellent cooperation with sister organisations such as the World Stroke Organisation and the International Journal of Stroke, and the American Heart Organization/American Stroke Association and the journal Stroke. ESJ is fully peer-reviewed and is a member of the Committee on Publication Ethics (COPE). Issues are published 4 times a year (March, June, September and December) and articles are published OnlineFirst prior to issue publication.