GP or ChatGPT? Ability of large language models (LLMs) to support general practitioners when prescribing antibiotics.

IF 3.6 2区 医学 Q1 INFECTIOUS DISEASES Journal of Antimicrobial Chemotherapy Pub Date : 2025-05-02 DOI:10.1093/jac/dkaf077
Oanh Ngoc Nguyen, Doaa Amin, James Bennett, Øystein Hetlevik, Sara Malik, Andrew Tout, Heike Vornhagen, Akke Vellinga
{"title":"GP or ChatGPT? Ability of large language models (LLMs) to support general practitioners when prescribing antibiotics.","authors":"Oanh Ngoc Nguyen, Doaa Amin, James Bennett, Øystein Hetlevik, Sara Malik, Andrew Tout, Heike Vornhagen, Akke Vellinga","doi":"10.1093/jac/dkaf077","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>Large language models (LLMs) are becoming ubiquitous and widely implemented. LLMs could also be used for diagnosis and treatment. National antibiotic prescribing guidelines are customized and informed by local laboratory data on antimicrobial resistance.</p><p><strong>Methods: </strong>Based on 24 vignettes with information on type of infection, gender, age group and comorbidities, GPs and LLMs were prompted to provide a treatment. Four countries (Ireland, UK, USA and Norway) were included and a GP from each country and six LLMs (ChatGPT, Gemini, Copilot, Mistral AI, Claude and Llama 3.1) were provided with the vignettes, including their location (country). Responses were compared with the country's national prescribing guidelines. In addition, limitations of LLMs such as hallucination, toxicity and data leakage were assessed.</p><p><strong>Results: </strong>GPs' answers to the vignettes showed high accuracy in relation to diagnosis (96%-100%) and yes/no antibiotic prescribing (83%-92%). GPs referenced (100%) and prescribed (58%-92%) according to national guidelines, but dose/duration of treatment was less accurate (50%-75%). Overall, the GPs' accuracy had a mean of 74%. LLMs scored high in relation to diagnosis (92%-100%), antibiotic prescribing (88%-100%) and the choice of antibiotic (59%-100%) but correct referencing often failed (38%-96%), in particular for the Norwegian guidelines (0%-13%). Data leakage was shown to be an issue as personal information was repeated in the models' responses to the vignettes.</p><p><strong>Conclusions: </strong>LLMs may be safe to guide antibiotic prescribing in general practice. However, to interpret vignettes, apply national guidelines and prescribe the right dose and duration, GPs remain best placed.</p>","PeriodicalId":14969,"journal":{"name":"Journal of Antimicrobial Chemotherapy","volume":" ","pages":"1324-1330"},"PeriodicalIF":3.6000,"publicationDate":"2025-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12046391/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Antimicrobial Chemotherapy","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1093/jac/dkaf077","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"INFECTIOUS DISEASES","Score":null,"Total":0}
引用次数: 0

Abstract

Introduction: Large language models (LLMs) are becoming ubiquitous and widely implemented. LLMs could also be used for diagnosis and treatment. National antibiotic prescribing guidelines are customized and informed by local laboratory data on antimicrobial resistance.

Methods: Based on 24 vignettes with information on type of infection, gender, age group and comorbidities, GPs and LLMs were prompted to provide a treatment. Four countries (Ireland, UK, USA and Norway) were included and a GP from each country and six LLMs (ChatGPT, Gemini, Copilot, Mistral AI, Claude and Llama 3.1) were provided with the vignettes, including their location (country). Responses were compared with the country's national prescribing guidelines. In addition, limitations of LLMs such as hallucination, toxicity and data leakage were assessed.

Results: GPs' answers to the vignettes showed high accuracy in relation to diagnosis (96%-100%) and yes/no antibiotic prescribing (83%-92%). GPs referenced (100%) and prescribed (58%-92%) according to national guidelines, but dose/duration of treatment was less accurate (50%-75%). Overall, the GPs' accuracy had a mean of 74%. LLMs scored high in relation to diagnosis (92%-100%), antibiotic prescribing (88%-100%) and the choice of antibiotic (59%-100%) but correct referencing often failed (38%-96%), in particular for the Norwegian guidelines (0%-13%). Data leakage was shown to be an issue as personal information was repeated in the models' responses to the vignettes.

Conclusions: LLMs may be safe to guide antibiotic prescribing in general practice. However, to interpret vignettes, apply national guidelines and prescribe the right dose and duration, GPs remain best placed.

Abstract Image

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
全科医生还是 ChatGPT?大型语言模型 (LLM) 在为全科医生开抗生素处方时提供支持的能力。
大型语言模型(llm)正变得无处不在并被广泛实现。llm也可用于诊断和治疗。国家抗生素处方指南是根据当地抗微生物药物耐药性实验室数据定制的。方法:基于24个包含感染类型、性别、年龄组和合并症信息的小片段,提示全科医生和llm提供治疗。包括四个国家(爱尔兰,英国,美国和挪威),每个国家的GP和六位llm (ChatGPT, Gemini, Copilot, Mistral AI, Claude和Llama 3.1)提供了包括其位置(国家)的小插图。这些回应与该国的国家处方指南进行了比较。此外,还评估了llm的局限性,如幻觉、毒性和数据泄露。结果:全科医生对小短片的回答与诊断(96%-100%)和是/否抗生素处方(83%-92%)相关的准确性较高。根据国家指南参考全科医生(100%)和处方(58%-92%),但剂量/治疗持续时间的准确性较低(50%-75%)。总体而言,全科医生的准确率平均为74%。llm在诊断(92%-100%)、抗生素处方(88%-100%)和抗生素选择(59%-100%)方面得分很高,但正确参考往往失败(38%-96%),特别是挪威指南(0%-13%)。数据泄露被证明是一个问题,因为个人信息在模型对小插曲的回应中被重复。结论:llm在一般实践中可以安全指导抗生素处方。然而,在解释小插曲、应用国家指南和开出正确的剂量和持续时间方面,全科医生仍然处于最佳位置。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
9.20
自引率
5.80%
发文量
423
审稿时长
2-4 weeks
期刊介绍: The Journal publishes articles that further knowledge and advance the science and application of antimicrobial chemotherapy with antibiotics and antifungal, antiviral and antiprotozoal agents. The Journal publishes primarily in human medicine, and articles in veterinary medicine likely to have an impact on global health.
期刊最新文献
Comment on: Possible increased risk of Epstein-Barr virus (EBV) infection and posttransplant lymphoproliferative disease (PTLD) in letermovir-exposed haematopoietic cell transplantation recipients. Impact of loading dose β-lactam therapy on outcomes of KPC-producing Klebsiella pneumoniae bloodstream infections in non-ICU patients: a real world study. Beyond plasma: defining the translation value of epithelial lining fluid exposure profiles for the treatment of pneumonia. Infant HIV transmission despite maternal viral suppression: a case of post-weaning seroconversion. Rezafungin exhibits anti-biofilm properties against fungal biofilms in vitro.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1