Large Language Model (LLM)-Powered Chatbots Fail to Generate Guideline-Consistent Content on Resuscitation and May Provide Potentially Harmful Advice.

IF 2.1 4区医学 Q2 EMERGENCY MEDICINE Prehospital and Disaster Medicine Pub Date : 2023-12-01 Epub Date: 2023-11-06 DOI:10.1017/S1049023X23006568

Alexei A Birkun, Adhish Gautam

{"title":"Large Language Model (LLM)-Powered Chatbots Fail to Generate Guideline-Consistent Content on Resuscitation and May Provide Potentially Harmful Advice.","authors":"Alexei A Birkun, Adhish Gautam","doi":"10.1017/S1049023X23006568","DOIUrl":null,"url":null,"abstract":"Introduction: Innovative large language model (LLM)-powered chatbots, which are extremely popular nowadays, represent potential sources of information on resuscitation for the general public. For instance, the chatbot-generated advice could be used for purposes of community resuscitation education or for just-in-time informational support of untrained lay rescuers in a real-life emergency.Study objective: This study focused on assessing performance of two prominent LLM-based chatbots, particularly in terms of quality of the chatbot-generated advice on how to give help to a non-breathing victim.Methods: In May 2023, the new Bing (Microsoft Corporation, USA) and Bard (Google LLC, USA) chatbots were inquired (n = 20 each): \"What to do if someone is not breathing?\" Content of the chatbots' responses was evaluated for compliance with the 2021 Resuscitation Council United Kingdom guidelines using a pre-developed checklist.Results: Both chatbots provided context-dependent textual responses to the query. However, coverage of the guideline-consistent instructions on help to a non-breathing victim within the responses was poor: mean percentage of the responses completely satisfying the checklist criteria was 9.5% for Bing and 11.4% for Bard (P >.05). Essential elements of the bystander action, including early start and uninterrupted performance of chest compressions with adequate depth, rate, and chest recoil, as well as request for and use of an automated external defibrillator (AED), were missing as a rule. Moreover, 55.0% of Bard's responses contained plausible sounding, but nonsensical guidance, called artificial hallucinations, that create risk for inadequate care and harm to a victim.Conclusion: The LLM-powered chatbots' advice on help to a non-breathing victim omits essential details of resuscitation technique and occasionally contains deceptive, potentially harmful directives. Further research and regulatory measures are required to mitigate risks related to the chatbot-generated misinformation of public on resuscitation.","PeriodicalId":20400,"journal":{"name":"Prehospital and Disaster Medicine","volume":null,"pages":null},"PeriodicalIF":2.1000,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Prehospital and Disaster Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1017/S1049023X23006568","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/11/6 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"EMERGENCY MEDICINE","Score":null,"Total":0}

引用次数: 0

Abstract

Introduction: Innovative large language model (LLM)-powered chatbots, which are extremely popular nowadays, represent potential sources of information on resuscitation for the general public. For instance, the chatbot-generated advice could be used for purposes of community resuscitation education or for just-in-time informational support of untrained lay rescuers in a real-life emergency.

Study objective: This study focused on assessing performance of two prominent LLM-based chatbots, particularly in terms of quality of the chatbot-generated advice on how to give help to a non-breathing victim.

Methods: In May 2023, the new Bing (Microsoft Corporation, USA) and Bard (Google LLC, USA) chatbots were inquired (n = 20 each): "What to do if someone is not breathing?" Content of the chatbots' responses was evaluated for compliance with the 2021 Resuscitation Council United Kingdom guidelines using a pre-developed checklist.

Results: Both chatbots provided context-dependent textual responses to the query. However, coverage of the guideline-consistent instructions on help to a non-breathing victim within the responses was poor: mean percentage of the responses completely satisfying the checklist criteria was 9.5% for Bing and 11.4% for Bard (P >.05). Essential elements of the bystander action, including early start and uninterrupted performance of chest compressions with adequate depth, rate, and chest recoil, as well as request for and use of an automated external defibrillator (AED), were missing as a rule. Moreover, 55.0% of Bard's responses contained plausible sounding, but nonsensical guidance, called artificial hallucinations, that create risk for inadequate care and harm to a victim.

Conclusion: The LLM-powered chatbots' advice on help to a non-breathing victim omits essential details of resuscitation technique and occasionally contains deceptive, potentially harmful directives. Further research and regulatory measures are required to mitigate risks related to the chatbot-generated misinformation of public on resuscitation.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

大型语言模型（LLM）支持的聊天机器人无法生成复苏指南一致的内容，可能会提供潜在的有害建议。

引言：创新的大型语言模型（LLM）驱动的聊天机器人如今非常流行，为公众提供了复苏方面的潜在信息来源。例如，聊天机器人生成的建议可以用于社区复苏教育，也可以用于在现实紧急情况下为未经培训的非专业救援人员提供即时信息支持。研究目标：本研究重点评估了两个著名的基于LLM的聊天机器人的性能，特别是在聊天机器人生成的关于如何帮助没有呼吸的受害者的建议的质量方面。方法：2023年5月，对新的必应（美国微软公司）和巴德（美国谷歌有限责任公司）聊天机器人进行了询问（各20人）：“如果有人没有呼吸该怎么办？”使用预先制定的检查表，评估了聊天机器人的回复内容是否符合2021年英国复苏委员会的指导方针。结果：两个聊天机器人都为查询提供了依赖上下文的文本响应。然而，指南中关于帮助无呼吸受害者的一致性说明在反应中的覆盖率很低：Bing和Bard完全满足检查表标准的反应的平均百分比分别为9.5%和11.4%（P>0.05），胸部反冲，以及对自动体外除颤器（AED）的要求和使用，通常都不见了。此外，巴德55.0%的回答包含听起来合理但荒谬的指导，称为人为幻觉，这会给受害者带来护理不足和伤害的风险。结论：LLM提供的聊天机器人关于帮助没有呼吸的受害者的建议省略了复苏技术的基本细节，偶尔还会包含欺骗性的、潜在有害的指令。需要进一步的研究和监管措施来减轻与聊天机器人产生的公众复苏错误信息有关的风险。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Prehospital and Disaster Medicine Medicine-Emergency Medicine

CiteScore

3.10

自引率

13.60%

发文量

279

期刊介绍： Prehospital and Disaster Medicine (PDM) is an official publication of the World Association for Disaster and Emergency Medicine. Currently in its 25th volume, Prehospital and Disaster Medicine is one of the leading scientific journals focusing on prehospital and disaster health. It is the only peer-reviewed international journal in its field, published bi-monthly, providing a readable, usable worldwide source of research and analysis. PDM is currently distributed in more than 55 countries. Its readership includes physicians, professors, EMTs and paramedics, nurses, emergency managers, disaster planners, hospital administrators, sociologists, and psychologists.