Elliot A Martin, Adam G D'Souza, Vineet Saini, Karen Tang, Hude Quan, Cathy A Eastwood
{"title":"Extracting social determinants of health from inpatient electronic medical records using natural language processing.","authors":"Elliot A Martin, Adam G D'Souza, Vineet Saini, Karen Tang, Hude Quan, Cathy A Eastwood","doi":"10.1016/j.jeph.2024.202791","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Social determinants of health (SDOH) have been shown to be important predictors of health outcomes. Here we developed methods to extract them from inpatient electronic medical record (EMR) data using techniques compatible with current EMR systems.</p><p><strong>Methods: </strong>Four social determinants were targeted: patient language barriers, employment status, education, and whether the patient lives alone. Inpatients aged 18 and older with records in the Calgary-wide EMR system were studied. Algorithms were developed on the January 2019 hospital admissions (n=8,999) and validated on the January 2018 hospital admissions (n=8,839). SDOH documented as structured data were compared against those extracted from unstructured free-text notes.</p><p><strong>Results: </strong>More than twice as many patients had a note documenting a language barrier in EMR data than in structured data; 12 % of patients indicated by EMR notes to be living alone had a partner noted in their structured marital status. The Positive Predictive Value (PPV) of the elements extracted from notes was high, at 99 % (95 % CI 94.0 %-100.0 %) for language barriers, 98 % (95 % CI 92.6 %-99.9 %) for living alone, 96 % (95 % CI 89.8 %-98.8 %) for unemployment, and 88 % (95 % CI 80.0 %-93.1 %) for retirement.</p><p><strong>Conclusions: </strong>All SDOH elements were extracted with high PPV. SDOH documentation was largely missing in structured data and sometimes misleading.</p>","PeriodicalId":517428,"journal":{"name":"Journal of epidemiology and population health","volume":"72 6","pages":"202791"},"PeriodicalIF":0.0000,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of epidemiology and population health","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1016/j.jeph.2024.202791","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/11/14 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Social determinants of health (SDOH) have been shown to be important predictors of health outcomes. Here we developed methods to extract them from inpatient electronic medical record (EMR) data using techniques compatible with current EMR systems.
Methods: Four social determinants were targeted: patient language barriers, employment status, education, and whether the patient lives alone. Inpatients aged 18 and older with records in the Calgary-wide EMR system were studied. Algorithms were developed on the January 2019 hospital admissions (n=8,999) and validated on the January 2018 hospital admissions (n=8,839). SDOH documented as structured data were compared against those extracted from unstructured free-text notes.
Results: More than twice as many patients had a note documenting a language barrier in EMR data than in structured data; 12 % of patients indicated by EMR notes to be living alone had a partner noted in their structured marital status. The Positive Predictive Value (PPV) of the elements extracted from notes was high, at 99 % (95 % CI 94.0 %-100.0 %) for language barriers, 98 % (95 % CI 92.6 %-99.9 %) for living alone, 96 % (95 % CI 89.8 %-98.8 %) for unemployment, and 88 % (95 % CI 80.0 %-93.1 %) for retirement.
Conclusions: All SDOH elements were extracted with high PPV. SDOH documentation was largely missing in structured data and sometimes misleading.