Application of natural language processing to identify social needs from patient medical notes: development and assessment of a scalable, performant, and rule-based model in an integrated healthcare delivery system.

IF 2.5 Q2 HEALTH CARE SCIENCES & SERVICES JAMIA Open Pub Date : 2023-10-04 eCollection Date: 2023-12-01 DOI:10.1093/jamiaopen/ooad085
Geoffrey M Gray, Ayah Zirikly, Luis M Ahumada, Masoud Rouhizadeh, Thomas Richards, Christopher Kitchen, Iman Foroughmand, Elham Hatef
{"title":"Application of natural language processing to identify social needs from patient medical notes: development and assessment of a scalable, performant, and rule-based model in an integrated healthcare delivery system.","authors":"Geoffrey M Gray, Ayah Zirikly, Luis M Ahumada, Masoud Rouhizadeh, Thomas Richards, Christopher Kitchen, Iman Foroughmand, Elham Hatef","doi":"10.1093/jamiaopen/ooad085","DOIUrl":null,"url":null,"abstract":"<p><strong>Objectives: </strong>To develop and test a scalable, performant, and rule-based model for identifying 3 major domains of social needs (residential instability, food insecurity, and transportation issues) from the unstructured data in electronic health records (EHRs).</p><p><strong>Materials and methods: </strong>We included patients aged 18 years or older who received care at the Johns Hopkins Health System (JHHS) between July 2016 and June 2021 and had at least 1 unstructured (free-text) note in their EHR during the study period. We used a combination of manual lexicon curation and semiautomated lexicon creation for feature development. We developed an initial rules-based pipeline (Match Pipeline) using 2 keyword sets for each social needs domain. We performed rule-based keyword matching for distinct lexicons and tested the algorithm using an annotated dataset comprising 192 patients. Starting with a set of expert-identified keywords, we tested the adjustments by evaluating false positives and negatives identified in the labeled dataset. We assessed the performance of the algorithm using measures of precision, recall, and <i>F</i>1 score.</p><p><strong>Results: </strong>The algorithm for identifying residential instability had the best overall performance, with a weighted average for precision, recall, and <i>F</i>1 score of 0.92, 0.84, and 0.92 for identifying patients with homelessness and 0.84, 0.82, and 0.79 for identifying patients with housing insecurity. Metrics for the food insecurity algorithm were high but the transportation issues algorithm was the lowest overall performing metric.</p><p><strong>Discussion: </strong>The NLP algorithm in identifying social needs at JHHS performed relatively well and would provide the opportunity for implementation in a healthcare system.</p><p><strong>Conclusion: </strong>The NLP approach developed in this project could be adapted and potentially operationalized in the routine data processes of a healthcare system.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"6 4","pages":"ooad085"},"PeriodicalIF":2.5000,"publicationDate":"2023-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/2e/eb/ooad085.PMC10550267.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JAMIA Open","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/jamiaopen/ooad085","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/12/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0

Abstract

Objectives: To develop and test a scalable, performant, and rule-based model for identifying 3 major domains of social needs (residential instability, food insecurity, and transportation issues) from the unstructured data in electronic health records (EHRs).

Materials and methods: We included patients aged 18 years or older who received care at the Johns Hopkins Health System (JHHS) between July 2016 and June 2021 and had at least 1 unstructured (free-text) note in their EHR during the study period. We used a combination of manual lexicon curation and semiautomated lexicon creation for feature development. We developed an initial rules-based pipeline (Match Pipeline) using 2 keyword sets for each social needs domain. We performed rule-based keyword matching for distinct lexicons and tested the algorithm using an annotated dataset comprising 192 patients. Starting with a set of expert-identified keywords, we tested the adjustments by evaluating false positives and negatives identified in the labeled dataset. We assessed the performance of the algorithm using measures of precision, recall, and F1 score.

Results: The algorithm for identifying residential instability had the best overall performance, with a weighted average for precision, recall, and F1 score of 0.92, 0.84, and 0.92 for identifying patients with homelessness and 0.84, 0.82, and 0.79 for identifying patients with housing insecurity. Metrics for the food insecurity algorithm were high but the transportation issues algorithm was the lowest overall performing metric.

Discussion: The NLP algorithm in identifying social needs at JHHS performed relatively well and would provide the opportunity for implementation in a healthcare system.

Conclusion: The NLP approach developed in this project could be adapted and potentially operationalized in the routine data processes of a healthcare system.

Abstract Image

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
应用自然语言处理从患者病历中识别社会需求:在综合医疗服务提供系统中开发和评估可扩展、高效和基于规则的模型。
目标:开发和测试一种可扩展的、高性能的,和基于规则的模型,用于从电子健康记录(EHR)中的非结构化数据中识别社会需求的3个主要领域(居住不稳定、粮食不安全和交通问题)。材料和方法:我们纳入了2016年7月至2021年6月在约翰斯·霍普金斯卫生系统(JHHS)接受护理的18岁或以上患者,他们至少有1名非结构化(自由文本)研究期间EHR中的注释。我们使用了手动词典管理和半自动词典创建相结合的方法来开发功能。我们开发了一个初始的基于规则的管道(Match pipeline),为每个社会需求领域使用2个关键字集。我们对不同的词典进行了基于规则的关键词匹配,并使用包含192名患者的注释数据集测试了该算法。从一组专家识别的关键词开始,我们通过评估标记数据集中识别的假阳性和阴性来测试调整。我们使用精度、召回率和F1分数来评估算法的性能。结果:用于识别居住不稳定的算法具有最佳的总体性能,用于识别无家可归患者的精确度、召回率和F1得分的加权平均值分别为0.92、0.84和0.92,用于识别住房不安全患者的加权平均数分别为0.84、0.82和0.79。粮食不安全算法的指标很高,但运输问题算法是总体表现最低的指标。讨论:在JHHS识别社会需求的NLP算法表现相对较好,将为在医疗系统中实施提供机会。结论:该项目中开发的NLP方法可以在医疗保健系统的常规数据过程中进行调整,并有可能付诸实施。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
JAMIA Open
JAMIA Open Medicine-Health Informatics
CiteScore
4.10
自引率
4.80%
发文量
102
审稿时长
16 weeks
期刊最新文献
Implementation of a rule-based algorithm to find patients eligible for cancer clinical trials. Implications of mappings between International Classification of Diseases clinical diagnosis codes and Human Phenotype Ontology terms. MMFP-Tableau: enabling precision mitochondrial medicine through integration, visualization, and analytics of clinical and research health system electronic data. Addressing ethical issues in healthcare artificial intelligence using a lifecycle-informed process. Development of an evidence- and consensus-based Digital Healthcare Equity Framework.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1