Assessing the potential of ChatGPT-4 to accurately identify drug-drug interactions and provide clinical pharmacotherapy recommendations

medRxiv - Pharmacology and Therapeutics Pub Date : 2024-06-30 DOI:10.1101/2024.06.29.24309701

Amoreena Most, Aaron Chase, Andrea Sikora

{"title":"Assessing the potential of ChatGPT-4 to accurately identify drug-drug interactions and provide clinical pharmacotherapy recommendations","authors":"Amoreena Most, Aaron Chase, Andrea Sikora","doi":"10.1101/2024.06.29.24309701","DOIUrl":null,"url":null,"abstract":"Background: Large language models (LLMs) such as ChatGPT have emerged as promising artificial intelligence tools to support clinical decision making. The ability of ChatGPT to evaluate medication regimens, identify drug-drug interactions (DDIs), and provide clinical recommendations is unknown. The purpose of this study is to examine the performance of GPT-4 to identify clinically relevant DDIs and assess accuracy of recommendations provided. Methods: A total of 15 medication regimens were created containing commonly encountered DDIs that were considered either clinically significant or clinically unimportant. Two separate prompts were developed for medication regimen evaluation. The primary outcome was if GPT-4 identified the most relevant DDI within the medication regimen. Secondary outcomes included rating GPT-4s interaction rationale, clinical relevance ranking, and overall clinical recommendations. Interrater reliability was determined using kappa statistic. Results: GPT-4 identified the intended DDI in 90% of medication regimens provided (27/30). GPT-4 categorized 86% as highly clinically relevant compared to 53% being categorized as highly clinically relevant by expert opinion. Inappropriate clinical recommendations potentially causing patient harm were provided in 14% of responses provided by GPT-4 (2/14), and 63% of responses contained accurate information but incomplete recommendations (19/30). Conclusions: While GPT-4 demonstrated promise in its ability to identify clinically relevant DDIs, application to clinical cases remains an area of investigation. Findings from this study may assist in future development and refinement of LLMs for drug-drug interaction queries to assist in clinical decision-making.","PeriodicalId":501447,"journal":{"name":"medRxiv - Pharmacology and Therapeutics","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"medRxiv - Pharmacology and Therapeutics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.06.29.24309701","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Large language models (LLMs) such as ChatGPT have emerged as promising artificial intelligence tools to support clinical decision making. The ability of ChatGPT to evaluate medication regimens, identify drug-drug interactions (DDIs), and provide clinical recommendations is unknown. The purpose of this study is to examine the performance of GPT-4 to identify clinically relevant DDIs and assess accuracy of recommendations provided. Methods: A total of 15 medication regimens were created containing commonly encountered DDIs that were considered either clinically significant or clinically unimportant. Two separate prompts were developed for medication regimen evaluation. The primary outcome was if GPT-4 identified the most relevant DDI within the medication regimen. Secondary outcomes included rating GPT-4s interaction rationale, clinical relevance ranking, and overall clinical recommendations. Interrater reliability was determined using kappa statistic. Results: GPT-4 identified the intended DDI in 90% of medication regimens provided (27/30). GPT-4 categorized 86% as highly clinically relevant compared to 53% being categorized as highly clinically relevant by expert opinion. Inappropriate clinical recommendations potentially causing patient harm were provided in 14% of responses provided by GPT-4 (2/14), and 63% of responses contained accurate information but incomplete recommendations (19/30). Conclusions: While GPT-4 demonstrated promise in its ability to identify clinically relevant DDIs, application to clinical cases remains an area of investigation. Findings from this study may assist in future development and refinement of LLMs for drug-drug interaction queries to assist in clinical decision-making.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

评估 ChatGPT-4 在准确识别药物间相互作用和提供临床药物治疗建议方面的潜力

背景：大型语言模型（LLMs），如 ChatGPT，已成为支持临床决策的前景广阔的人工智能工具。目前尚不清楚 ChatGPT 评估用药方案、识别药物相互作用（DDI）和提供临床建议的能力。本研究旨在检查 GPT-4 在识别临床相关 DDIs 方面的性能，并评估所提供建议的准确性。研究方法共创建了 15 个用药方案，其中包含常见的 DDIs，这些 DDIs 要么被认为具有临床意义，要么被认为不具有临床意义。为药物治疗方案评估开发了两种不同的提示。主要结果是 GPT-4 是否识别出用药方案中最相关的 DDI。次要结果包括对 GPT-4 的相互作用理由、临床相关性排名和总体临床建议进行评分。使用卡帕统计量确定交互作用之间的可靠性。结果在所提供的药物治疗方案中，GPT-4 确定了 90% 的预期 DDI（27/30）。GPT-4 将 86% 的临床相关性归类为高度临床相关性，而专家意见将 53% 的临床相关性归类为高度临床相关性。在 GPT-4 提供的答复中，有 14% 的答复（2/14）提供了可能对患者造成伤害的不恰当临床建议，63% 的答复包含准确信息但建议不完整（19/30）。结论：虽然 GPT-4 在识别临床相关 DDI 方面表现出了良好的前景，但其在临床病例中的应用仍是一个有待研究的领域。本研究的结果可能有助于今后开发和改进用于药物相互作用查询的 LLM，以协助临床决策。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

medRxiv - Pharmacology and Therapeutics

自引率

0.00%

发文量