Extracting social support and social isolation information from clinical psychiatry notes: comparing a rule-based natural language processing system and a large language model.

IF 4.7 2区医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Journal of the American Medical Informatics Association Pub Date : 2025-01-01 DOI:10.1093/jamia/ocae260

Braja Gopal Patra, Lauren A Lepow, Praneet Kasi Reddy Jagadeesh Kumar, Veer Vekaria, Mohit Manoj Sharma, Prakash Adekkanattu, Brian Fennessy, Gavin Hynes, Isotta Landi, Jorge A Sanchez-Ruiz, Euijung Ryu, Joanna M Biernacka, Girish N Nadkarni, Ardesheer Talati, Myrna Weissman, Mark Olfson, J John Mann, Yiye Zhang, Alexander W Charney, Jyotishman Pathak

{"title":"Extracting social support and social isolation information from clinical psychiatry notes: comparing a rule-based natural language processing system and a large language model.","authors":"Braja Gopal Patra, Lauren A Lepow, Praneet Kasi Reddy Jagadeesh Kumar, Veer Vekaria, Mohit Manoj Sharma, Prakash Adekkanattu, Brian Fennessy, Gavin Hynes, Isotta Landi, Jorge A Sanchez-Ruiz, Euijung Ryu, Joanna M Biernacka, Girish N Nadkarni, Ardesheer Talati, Myrna Weissman, Mark Olfson, J John Mann, Yiye Zhang, Alexander W Charney, Jyotishman Pathak","doi":"10.1093/jamia/ocae260","DOIUrl":null,"url":null,"abstract":"Objectives: Social support (SS) and social isolation (SI) are social determinants of health (SDOH) associated with psychiatric outcomes. In electronic health records (EHRs), individual-level SS/SI is typically documented in narrative clinical notes rather than as structured coded data. Natural language processing (NLP) algorithms can automate the otherwise labor-intensive process of extraction of such information.Materials and methods: Psychiatric encounter notes from Mount Sinai Health System (MSHS, n = 300) and Weill Cornell Medicine (WCM, n = 225) were annotated to create a gold-standard corpus. A rule-based system (RBS) involving lexicons and a large language model (LLM) using FLAN-T5-XL were developed to identify mentions of SS and SI and their subcategories (eg, social network, instrumental support, and loneliness).Results: For extracting SS/SI, the RBS obtained higher macroaveraged F1-scores than the LLM at both MSHS (0.89 versus 0.65) and WCM (0.85 versus 0.82). For extracting the subcategories, the RBS also outperformed the LLM at both MSHS (0.90 versus 0.62) and WCM (0.82 versus 0.81).Discussion and conclusion: Unexpectedly, the RBS outperformed the LLMs across all metrics. An intensive review demonstrates that this finding is due to the divergent approach taken by the RBS and LLM. The RBS was designed and refined to follow the same specific rules as the gold-standard annotations. Conversely, the LLM was more inclusive with categorization and conformed to common English-language understanding. Both approaches offer advantages, although additional replication studies are warranted.","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"218-226"},"PeriodicalIF":4.7000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11648716/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the American Medical Informatics Association","FirstCategoryId":"91","ListUrlMain":"https://doi.org/10.1093/jamia/ocae260","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Objectives: Social support (SS) and social isolation (SI) are social determinants of health (SDOH) associated with psychiatric outcomes. In electronic health records (EHRs), individual-level SS/SI is typically documented in narrative clinical notes rather than as structured coded data. Natural language processing (NLP) algorithms can automate the otherwise labor-intensive process of extraction of such information.

Materials and methods: Psychiatric encounter notes from Mount Sinai Health System (MSHS, n = 300) and Weill Cornell Medicine (WCM, n = 225) were annotated to create a gold-standard corpus. A rule-based system (RBS) involving lexicons and a large language model (LLM) using FLAN-T5-XL were developed to identify mentions of SS and SI and their subcategories (eg, social network, instrumental support, and loneliness).

Results: For extracting SS/SI, the RBS obtained higher macroaveraged F1-scores than the LLM at both MSHS (0.89 versus 0.65) and WCM (0.85 versus 0.82). For extracting the subcategories, the RBS also outperformed the LLM at both MSHS (0.90 versus 0.62) and WCM (0.82 versus 0.81).

Discussion and conclusion: Unexpectedly, the RBS outperformed the LLMs across all metrics. An intensive review demonstrates that this finding is due to the divergent approach taken by the RBS and LLM. The RBS was designed and refined to follow the same specific rules as the gold-standard annotations. Conversely, the LLM was more inclusive with categorization and conformed to common English-language understanding. Both approaches offer advantages, although additional replication studies are warranted.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

从临床精神病学笔记中提取社会支持和社会隔离信息：比较基于规则的自然语言处理系统和大型语言模型。

目的：社会支持（SS）和社会隔离（SI社会支持（SS）和社会隔离（SI）是与精神疾病结果相关的健康社会决定因素（SDOH）。在电子健康记录（EHR）中，个人层面的社会支持/社会隔离通常记录在叙述性临床笔记中，而非结构化编码数据。自然语言处理（NLP）算法可以自动完成提取此类信息的劳动密集型过程：对西奈山医疗系统（MSHS，n = 300）和威尔康奈尔医学中心（WCM，n = 225）的精神病就诊记录进行注释，以创建黄金标准语料库。使用 FLAN-T5-XL 开发了一个基于规则的系统 (RBS)，其中包括词典和大语言模型 (LLM)，用于识别 SS 和 SI 及其子类别（如社交网络、工具支持和孤独感）：在提取 SS/SI 时，RBS 在 MSHS（0.89 对 0.65）和 WCM（0.85 对 0.82）的宏观平均 F1 分数均高于 LLM。在提取子类别方面，RBS 在 MSHS（0.90 对 0.62）和 WCM（0.82 对 0.81）上的表现也优于 LLM：出乎意料的是，RBS 在所有指标上都优于 LLM。深入研究表明，这一发现是由于 RBS 和 LLM 采用了不同的方法。RBS 的设计和改进遵循了与黄金标准注释相同的特定规则。相反，LLM 在分类方面更具包容性，符合英语的一般理解。这两种方法都具有优势，不过还需要进行更多的重复研究。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Journal of the American Medical Informatics Association 医学-计算机：跨学科应用

CiteScore

14.50

自引率

7.80%

发文量

230

审稿时长

3-8 weeks

期刊介绍： JAMIA is AMIA''s premier peer-reviewed journal for biomedical and health informatics. Covering the full spectrum of activities in the field, JAMIA includes informatics articles in the areas of clinical care, clinical research, translational science, implementation science, imaging, education, consumer health, public health, and policy. JAMIA''s articles describe innovative informatics research and systems that help to advance biomedical science and to promote health. Case reports, perspectives and reviews also help readers stay connected with the most important informatics developments in implementation, policy and education.