Leveraging large language models for generating responses to patient messages-a subjective analysis.

IF 4.6 2区医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Journal of the American Medical Informatics Association Pub Date : 2024-05-20 DOI:10.1093/jamia/ocae052

Siru Liu, Allison B McCoy, Aileen P Wright, Babatunde Carew, Julian Z Genkins, Sean S Huang, Josh F Peterson, Bryan Steitz, Adam Wright

{"title":"Leveraging large language models for generating responses to patient messages-a subjective analysis.","authors":"Siru Liu, Allison B McCoy, Aileen P Wright, Babatunde Carew, Julian Z Genkins, Sean S Huang, Josh F Peterson, Bryan Steitz, Adam Wright","doi":"10.1093/jamia/ocae052","DOIUrl":null,"url":null,"abstract":"Objective: This study aimed to develop and assess the performance of fine-tuned large language models for generating responses to patient messages sent via an electronic health record patient portal.Materials and methods: Utilizing a dataset of messages and responses extracted from the patient portal at a large academic medical center, we developed a model (CLAIR-Short) based on a pre-trained large language model (LLaMA-65B). In addition, we used the OpenAI API to update physician responses from an open-source dataset into a format with informative paragraphs that offered patient education while emphasizing empathy and professionalism. By combining with this dataset, we further fine-tuned our model (CLAIR-Long). To evaluate fine-tuned models, we used 10 representative patient portal questions in primary care to generate responses. We asked primary care physicians to review generated responses from our models and ChatGPT and rated them for empathy, responsiveness, accuracy, and usefulness.Results: The dataset consisted of 499 794 pairs of patient messages and corresponding responses from the patient portal, with 5000 patient messages and ChatGPT-updated responses from an online platform. Four primary care physicians participated in the survey. CLAIR-Short exhibited the ability to generate concise responses similar to provider's responses. CLAIR-Long responses provided increased patient educational content compared to CLAIR-Short and were rated similarly to ChatGPT's responses, receiving positive evaluations for responsiveness, empathy, and accuracy, while receiving a neutral rating for usefulness.Conclusion: This subjective analysis suggests that leveraging large language models to generate responses to patient messages demonstrates significant potential in facilitating communication between patients and healthcare providers.","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"1367-1379"},"PeriodicalIF":4.6000,"publicationDate":"2024-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11105129/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the American Medical Informatics Association","FirstCategoryId":"91","ListUrlMain":"https://doi.org/10.1093/jamia/ocae052","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Objective: This study aimed to develop and assess the performance of fine-tuned large language models for generating responses to patient messages sent via an electronic health record patient portal.

Materials and methods: Utilizing a dataset of messages and responses extracted from the patient portal at a large academic medical center, we developed a model (CLAIR-Short) based on a pre-trained large language model (LLaMA-65B). In addition, we used the OpenAI API to update physician responses from an open-source dataset into a format with informative paragraphs that offered patient education while emphasizing empathy and professionalism. By combining with this dataset, we further fine-tuned our model (CLAIR-Long). To evaluate fine-tuned models, we used 10 representative patient portal questions in primary care to generate responses. We asked primary care physicians to review generated responses from our models and ChatGPT and rated them for empathy, responsiveness, accuracy, and usefulness.

Results: The dataset consisted of 499 794 pairs of patient messages and corresponding responses from the patient portal, with 5000 patient messages and ChatGPT-updated responses from an online platform. Four primary care physicians participated in the survey. CLAIR-Short exhibited the ability to generate concise responses similar to provider's responses. CLAIR-Long responses provided increased patient educational content compared to CLAIR-Short and were rated similarly to ChatGPT's responses, receiving positive evaluations for responsiveness, empathy, and accuracy, while receiving a neutral rating for usefulness.

Conclusion: This subjective analysis suggests that leveraging large language models to generate responses to patient messages demonstrates significant potential in facilitating communication between patients and healthcare providers.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

利用大型语言模型生成对患者信息的回复--主观分析。

目的：本研究旨在开发和评估微调大语言模型的性能，以生成通过电子健康记录患者门户网站发送的患者信息回复：本研究旨在开发和评估微调大语言模型的性能，以生成对通过电子健康记录患者门户网站发送的患者信息的回复：利用从一家大型学术医疗中心的患者门户网站提取的信息和回复数据集，我们开发了一个基于预训练大语言模型（LLaMA-65B）的模型（CLAIR-Short）。此外，我们还使用 OpenAI API 将开源数据集中的医生回复更新为包含信息段落的格式，在强调同理心和专业性的同时提供患者教育。结合这一数据集，我们进一步微调了我们的模型（CLAIR-Long）。为了评估微调后的模型，我们使用了 10 个具有代表性的初级保健患者门户问题来生成回复。我们请初级保健医生审查从我们的模型和 ChatGPT 生成的回复，并对其共鸣性、响应性、准确性和实用性进行评分：数据集包括来自患者门户网站的 499 794 对患者信息和相应回复，以及来自在线平台的 5000 条患者信息和 ChatGPT 更新回复。四名初级保健医生参与了调查。CLAIR-Short 显示了生成与提供者回复类似的简明回复的能力。与 CLAIR-Short 相比，CLAIR-Long 回复提供了更多的患者教育内容，其评价与 ChatGPT 的回复类似，在响应性、同理心和准确性方面获得了积极评价，而在实用性方面则获得了中性评价：这项主观分析表明，利用大型语言模型生成对患者信息的回复在促进患者与医疗服务提供者之间的沟通方面具有巨大潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Journal of the American Medical Informatics Association 医学-计算机：跨学科应用

CiteScore

14.50

自引率

7.80%

发文量

230

审稿时长

3-8 weeks

期刊介绍： JAMIA is AMIA''s premier peer-reviewed journal for biomedical and health informatics. Covering the full spectrum of activities in the field, JAMIA includes informatics articles in the areas of clinical care, clinical research, translational science, implementation science, imaging, education, consumer health, public health, and policy. JAMIA''s articles describe innovative informatics research and systems that help to advance biomedical science and to promote health. Case reports, perspectives and reviews also help readers stay connected with the most important informatics developments in implementation, policy and education.