Optimizing Large Language Models for Discharge Prediction: Best Practices in Leveraging Electronic Health Record Audit Logs

medRxiv - Health Informatics Pub Date : 2024-09-13 DOI:10.1101/2024.09.12.24313594

Xinmeng Zhang, Chao Yan, Yuyang Yang, Zhuohang Li, Yubo Feng, Bradley A. Malin, You Chen

{"title":"Optimizing Large Language Models for Discharge Prediction: Best Practices in Leveraging Electronic Health Record Audit Logs","authors":"Xinmeng Zhang, Chao Yan, Yuyang Yang, Zhuohang Li, Yubo Feng, Bradley A. Malin, You Chen","doi":"10.1101/2024.09.12.24313594","DOIUrl":null,"url":null,"abstract":"Electronic Health Record (EHR) audit log data are increasingly utilized for clinical tasks, from workflow modeling to predictive analyses of discharge events, adverse kidney outcomes, and hospital readmissions. These data encapsulate user-EHR interactions, reflecting both healthcare professionals' behavior and patients' health statuses. To harness this temporal information effectively, this study explores the application of Large Language Models (LLMs) in leveraging audit log data for clinical prediction tasks, specifically focusing on discharge predictions. Utilizing a year's worth of EHR data from Vanderbilt University Medical Center, we fine-tuned LLMs with randomly selected 10,000 training examples. Our findings reveal that LLaMA-2 70B, with an AUROC of 0.80 [0.77-0.82], outperforms both GPT-4 128K in a zero-shot, with an AUROC of 0.68 [0.65-0.71], and DeBERTa, with an AUROC of 0.78 [0.75-0.82]. Among various serialization methods, the first-occurrence approach — wherein only the initial appearance of each event in a sequence is retained — shows superior performance. Furthermore, for the fine-tuned LLaMA-2 70B, logit outputs yield a higher AUROC of 0.80 [0.77-0.82] compared to text outputs, with an AUROC of 0.69 [0.67-0.72]. This study underscores the potential of fine-tuned LLMs, particularly when combined with strategic sequence serialization, in advancing clinical prediction tasks.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":"50 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"medRxiv - Health Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.09.12.24313594","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Electronic Health Record (EHR) audit log data are increasingly utilized for clinical tasks, from workflow modeling to predictive analyses of discharge events, adverse kidney outcomes, and hospital readmissions. These data encapsulate user-EHR interactions, reflecting both healthcare professionals' behavior and patients' health statuses. To harness this temporal information effectively, this study explores the application of Large Language Models (LLMs) in leveraging audit log data for clinical prediction tasks, specifically focusing on discharge predictions. Utilizing a year's worth of EHR data from Vanderbilt University Medical Center, we fine-tuned LLMs with randomly selected 10,000 training examples. Our findings reveal that LLaMA-2 70B, with an AUROC of 0.80 [0.77-0.82], outperforms both GPT-4 128K in a zero-shot, with an AUROC of 0.68 [0.65-0.71], and DeBERTa, with an AUROC of 0.78 [0.75-0.82]. Among various serialization methods, the first-occurrence approach — wherein only the initial appearance of each event in a sequence is retained — shows superior performance. Furthermore, for the fine-tuned LLaMA-2 70B, logit outputs yield a higher AUROC of 0.80 [0.77-0.82] compared to text outputs, with an AUROC of 0.69 [0.67-0.72]. This study underscores the potential of fine-tuned LLMs, particularly when combined with strategic sequence serialization, in advancing clinical prediction tasks.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

优化出院预测的大型语言模型：利用电子健康记录审计日志的最佳实践

电子病历（EHR）审计日志数据越来越多地被用于临床任务，从工作流程建模到出院事件、不良肾脏结果和再入院的预测分析。这些数据囊括了用户与 EHR 之间的互动，反映了医护人员的行为和患者的健康状况。为了有效利用这些时间信息，本研究探索了大型语言模型（LLM）在利用审计日志数据进行临床预测任务中的应用，尤其侧重于出院预测。我们利用范德比尔特大学医疗中心一年的电子病历数据，通过随机选择的 10,000 个训练示例对 LLM 进行了微调。我们的研究结果表明，LLaMA-2 70B 的 AUROC 为 0.80 [0.77-0.82]，优于 GPT-4 128K 的 AUROC 0.68 [0.65-0.71]，也优于 DeBERTa 的 AUROC 0.78 [0.75-0.82]。在各种序列化方法中，首次出现法--即只保留序列中每个事件的首次出现--表现出更优越的性能。此外，对于经过微调的 LLaMA-2 70B，逻辑输出的 AUROC 为 0.80 [0.77-0.82]，而文本输出的 AUROC 为 0.69 [0.67-0.72]。这项研究强调了微调 LLMs 的潜力，尤其是在与战略性序列序列化相结合时，可推动临床预测任务的发展。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

medRxiv - Health Informatics

自引率

0.00%

发文量