Assessing accuracy of ChatGPT in response to questions from day to day pharmaceutical care in hospitals

IF 1.8 Q3 PHARMACOLOGY & PHARMACY Exploratory research in clinical and social pharmacy Pub Date : 2024-06-13 DOI:10.1016/j.rcsop.2024.100464

Merel van Nuland , Anne-Fleur H. Lobbezoo , Ewoudt M.W. van de Garde , Maikel Herbrink , Inger van Heijl , Tim Bognàr , Jeroen P.A. Houwen , Marloes Dekens , Demi Wannet , Toine Egberts , Paul D. van der Linden

{"title":"Assessing accuracy of ChatGPT in response to questions from day to day pharmaceutical care in hospitals","authors":"Merel van Nuland , Anne-Fleur H. Lobbezoo , Ewoudt M.W. van de Garde , Maikel Herbrink , Inger van Heijl , Tim Bognàr , Jeroen P.A. Houwen , Marloes Dekens , Demi Wannet , Toine Egberts , Paul D. van der Linden","doi":"10.1016/j.rcsop.2024.100464","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><p>The advent of Large Language Models (LLMs) such as ChatGPT introduces opportunities within the medical field. Nonetheless, use of LLM poses a risk when healthcare practitioners and patients present clinical questions to these programs without a comprehensive understanding of its suitability for clinical contexts.</p></div><div><h3>Objective</h3><p>The objective of this study was to assess ChatGPT's ability to generate appropriate responses to clinical questions that hospital pharmacists could encounter during routine patient care.</p></div><div><h3>Methods</h3><p>Thirty questions from 10 different domains within clinical pharmacy were collected during routine care. Questions were presented to ChatGPT in a standardized format, including patients' age, sex, drug name, dose, and indication. Subsequently, relevant information regarding specific cases were provided, and the prompt was concluded with the query “what would a hospital pharmacist do?”. The impact on accuracy was assessed for each domain by modifying personification to “what would you do?”, presenting the question in Dutch, and regenerating the primary question. All responses were independently evaluated by two senior hospital pharmacists, focusing on the availability of an advice, accuracy and concordance.</p></div><div><h3>Results</h3><p>In 77% of questions, ChatGPT provided an advice in response to the question. For these responses, accuracy and concordance were determined. Accuracy was correct and complete for 26% of responses, correct but incomplete for 22% of responses, partially correct and partially incorrect for 30% of responses and completely incorrect for 22% of responses. The reproducibility was poor, with merely 10% of responses remaining consistent upon regeneration of the primary question.</p></div><div><h3>Conclusions</h3><p>While concordance of responses was excellent, the accuracy and reproducibility were poor. With the described method, ChatGPT should not be used to address questions encountered by hospital pharmacists during their shifts. However, it is important to acknowledge the limitations of our methodology, including potential biases, which may have influenced the findings.</p></div>","PeriodicalId":73003,"journal":{"name":"Exploratory research in clinical and social pharmacy","volume":"15 ","pages":"Article 100464"},"PeriodicalIF":1.8000,"publicationDate":"2024-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667276624000611/pdfft?md5=7dba765dfd1e9f2fac71ba4ccdc63981&pid=1-s2.0-S2667276624000611-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Exploratory research in clinical and social pharmacy","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2667276624000611","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"PHARMACOLOGY & PHARMACY","Score":null,"Total":0}

引用次数: 0

Abstract

Background

The advent of Large Language Models (LLMs) such as ChatGPT introduces opportunities within the medical field. Nonetheless, use of LLM poses a risk when healthcare practitioners and patients present clinical questions to these programs without a comprehensive understanding of its suitability for clinical contexts.

Objective

The objective of this study was to assess ChatGPT's ability to generate appropriate responses to clinical questions that hospital pharmacists could encounter during routine patient care.

Methods

Thirty questions from 10 different domains within clinical pharmacy were collected during routine care. Questions were presented to ChatGPT in a standardized format, including patients' age, sex, drug name, dose, and indication. Subsequently, relevant information regarding specific cases were provided, and the prompt was concluded with the query “what would a hospital pharmacist do?”. The impact on accuracy was assessed for each domain by modifying personification to “what would you do?”, presenting the question in Dutch, and regenerating the primary question. All responses were independently evaluated by two senior hospital pharmacists, focusing on the availability of an advice, accuracy and concordance.

Results

In 77% of questions, ChatGPT provided an advice in response to the question. For these responses, accuracy and concordance were determined. Accuracy was correct and complete for 26% of responses, correct but incomplete for 22% of responses, partially correct and partially incorrect for 30% of responses and completely incorrect for 22% of responses. The reproducibility was poor, with merely 10% of responses remaining consistent upon regeneration of the primary question.

Conclusions

While concordance of responses was excellent, the accuracy and reproducibility were poor. With the described method, ChatGPT should not be used to address questions encountered by hospital pharmacists during their shifts. However, it is important to acknowledge the limitations of our methodology, including potential biases, which may have influenced the findings.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

评估 ChatGPT 回答医院日常药物护理问题的准确性

背景大语言模型（LLM）（如 ChatGPT）的出现为医疗领域带来了机遇。本研究的目的是评估 ChatGPT 对医院药剂师在日常患者护理过程中可能遇到的临床问题生成适当回复的能力。方法在日常护理过程中收集了来自临床药学 10 个不同领域的 30 个问题。问题以标准化格式呈现给 ChatGPT，包括患者的年龄、性别、药物名称、剂量和适应症。随后，提供具体病例的相关信息，并以 "医院药剂师会怎么做？"的询问结束提示。通过将拟人化修改为 "您会怎么做？"、用荷兰语提出问题并重新生成主问题，对每个领域的准确性影响进行了评估。所有回答均由两名资深医院药剂师进行独立评估，重点关注建议的可用性、准确性和一致性。结果在 77% 的问题中，ChatGPT 针对问题提供了建议。对于这些回答，确定了准确性和一致性。准确性方面，26% 的回答正确且完整，22% 的回答正确但不完整，30% 的回答部分正确和部分不正确，22% 的回答完全不正确。重现性很差，只有 10% 的回答在主问题重新生成后保持一致。根据所描述的方法，ChatGPT 不应用于解决医院药剂师在工作中遇到的问题。但是，必须承认我们的方法存在局限性，包括可能影响研究结果的潜在偏见。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊