Utilizing large language models for detecting hospital-acquired conditions: an empirical study on pulmonary embolism.

IF 4.7 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Journal of the American Medical Informatics Association Pub Date : 2025-03-19 DOI:10.1093/jamia/ocaf048
Cheligeer Cheligeer, Danielle A Southern, Jun Yan, Guosong Wu, Jie Pan, Seungwon Lee, Elliot A Martin, Hamed Jafarpour, Cathy A Eastwood, Yong Zeng, Hude Quan
{"title":"Utilizing large language models for detecting hospital-acquired conditions: an empirical study on pulmonary embolism.","authors":"Cheligeer Cheligeer, Danielle A Southern, Jun Yan, Guosong Wu, Jie Pan, Seungwon Lee, Elliot A Martin, Hamed Jafarpour, Cathy A Eastwood, Yong Zeng, Hude Quan","doi":"10.1093/jamia/ocaf048","DOIUrl":null,"url":null,"abstract":"<p><strong>Objectives: </strong>Adverse event detection from Electronic Medical Records (EMRs) is challenging due to the low incidence of the event, variability in clinical documentation, and the complexity of data formats. Pulmonary embolism as an adverse event (PEAE) is particularly difficult to identify using existing approaches. This study aims to develop and evaluate a Large Language Model (LLM)-based framework for detecting PEAE from unstructured narrative data in EMRs.</p><p><strong>Materials and methods: </strong>We conducted a chart review of adult patients (aged 18-100) admitted to tertiary-care hospitals in Calgary, Alberta, Canada, between 2017-2022. We developed an LLM-based detection framework consisting of three modules: evidence extraction (implementing both keyword-based and semantic similarity-based filtering methods), discharge information extraction (focusing on six key clinical sections), and PEAE detection. Four open-source LLMs (Llama3, Mistral-7B, Gemma, and Phi-3) were evaluated using positive predictive value, sensitivity, specificity, and F1-score. Model performance for population-level surveillance was assessed at yearly, quarterly, and monthly granularities.</p><p><strong>Results: </strong>The chart review included 10 066 patients, with 40 cases of PEAE identified (0.4% prevalence). All four LLMs demonstrated high sensitivity (87.5-100%) and specificity (94.9-98.9%) across different experimental conditions. Gemma achieved the highest F1-score (28.11%) using keyword-based retrieval with discharge summary inclusion, along with 98.4% specificity, 87.5% sensitivity, and 99.95% negative predictive value. Keyword-based filtering reduced the median chunks per patient from 789 to 310, while semantic filtering further reduced this to 9 chunks. Including discharge summaries improved performance metrics across most models. For population-level surveillance, all models showed strong correlation with actual PEAE trends at yearly granularity (r=0.92-0.99), with Llama3 achieving the highest correlation (0.988).</p><p><strong>Discussion: </strong>The results of our method for PEAE detection using EMR notes demonstrate high sensitivity and specificity across all four tested LLMs, indicating strong performance in distinguishing PEAE from non-PEAE cases. However, the low incidence rate of PEAE contributed to a lower PPV. The keyword-based chunking approach consistently outperformed semantic similarity-based methods, achieving higher F1 scores and PPV, underscoring the importance of domain knowledge in text segmentation. Including discharge summaries further enhanced performance metrics. Our population-based analysis revealed better performance for yearly trends compared to monthly granularity, suggesting the framework's utility for long-term surveillance despite dataset imbalance. Error analysis identified contextual misinterpretation, terminology confusion, and preprocessing limitations as key challenges for future improvement.</p><p><strong>Conclusions: </strong>Our proposed method demonstrates that LLMs can effectively detect PEAE from narrative EMRs with high sensitivity and specificity. While these models serve as effective screening tools to exclude non-PEAE cases, their lower PPV indicates they cannot be relied upon solely for definitive PEAE identification. Further chart review remains necessary for confirmation. Future work should focus on improving contextual understanding, medical terminology interpretation, and exploring advanced prompting techniques to enhance precision in adverse event detection from EMRs.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7000,"publicationDate":"2025-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the American Medical Informatics Association","FirstCategoryId":"91","ListUrlMain":"https://doi.org/10.1093/jamia/ocaf048","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Objectives: Adverse event detection from Electronic Medical Records (EMRs) is challenging due to the low incidence of the event, variability in clinical documentation, and the complexity of data formats. Pulmonary embolism as an adverse event (PEAE) is particularly difficult to identify using existing approaches. This study aims to develop and evaluate a Large Language Model (LLM)-based framework for detecting PEAE from unstructured narrative data in EMRs.

Materials and methods: We conducted a chart review of adult patients (aged 18-100) admitted to tertiary-care hospitals in Calgary, Alberta, Canada, between 2017-2022. We developed an LLM-based detection framework consisting of three modules: evidence extraction (implementing both keyword-based and semantic similarity-based filtering methods), discharge information extraction (focusing on six key clinical sections), and PEAE detection. Four open-source LLMs (Llama3, Mistral-7B, Gemma, and Phi-3) were evaluated using positive predictive value, sensitivity, specificity, and F1-score. Model performance for population-level surveillance was assessed at yearly, quarterly, and monthly granularities.

Results: The chart review included 10 066 patients, with 40 cases of PEAE identified (0.4% prevalence). All four LLMs demonstrated high sensitivity (87.5-100%) and specificity (94.9-98.9%) across different experimental conditions. Gemma achieved the highest F1-score (28.11%) using keyword-based retrieval with discharge summary inclusion, along with 98.4% specificity, 87.5% sensitivity, and 99.95% negative predictive value. Keyword-based filtering reduced the median chunks per patient from 789 to 310, while semantic filtering further reduced this to 9 chunks. Including discharge summaries improved performance metrics across most models. For population-level surveillance, all models showed strong correlation with actual PEAE trends at yearly granularity (r=0.92-0.99), with Llama3 achieving the highest correlation (0.988).

Discussion: The results of our method for PEAE detection using EMR notes demonstrate high sensitivity and specificity across all four tested LLMs, indicating strong performance in distinguishing PEAE from non-PEAE cases. However, the low incidence rate of PEAE contributed to a lower PPV. The keyword-based chunking approach consistently outperformed semantic similarity-based methods, achieving higher F1 scores and PPV, underscoring the importance of domain knowledge in text segmentation. Including discharge summaries further enhanced performance metrics. Our population-based analysis revealed better performance for yearly trends compared to monthly granularity, suggesting the framework's utility for long-term surveillance despite dataset imbalance. Error analysis identified contextual misinterpretation, terminology confusion, and preprocessing limitations as key challenges for future improvement.

Conclusions: Our proposed method demonstrates that LLMs can effectively detect PEAE from narrative EMRs with high sensitivity and specificity. While these models serve as effective screening tools to exclude non-PEAE cases, their lower PPV indicates they cannot be relied upon solely for definitive PEAE identification. Further chart review remains necessary for confirmation. Future work should focus on improving contextual understanding, medical terminology interpretation, and exploring advanced prompting techniques to enhance precision in adverse event detection from EMRs.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of the American Medical Informatics Association
Journal of the American Medical Informatics Association 医学-计算机:跨学科应用
CiteScore
14.50
自引率
7.80%
发文量
230
审稿时长
3-8 weeks
期刊介绍: JAMIA is AMIA''s premier peer-reviewed journal for biomedical and health informatics. Covering the full spectrum of activities in the field, JAMIA includes informatics articles in the areas of clinical care, clinical research, translational science, implementation science, imaging, education, consumer health, public health, and policy. JAMIA''s articles describe innovative informatics research and systems that help to advance biomedical science and to promote health. Case reports, perspectives and reviews also help readers stay connected with the most important informatics developments in implementation, policy and education.
期刊最新文献
Utilizing large language models for detecting hospital-acquired conditions: an empirical study on pulmonary embolism. Patient and clinician acceptability of automated extraction of social drivers of health from clinical notes in primary care. Unmet social needs and diverticulitis: a phenotyping algorithm and cross-sectional analysis. A call for the informatics community to define priority practice and research areas at the intersection of climate and health: report from 2023 mini-summit. Emerging algorithmic bias: fairness drift as the next dimension of model maintenance and sustainability.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1