[ChatGPT for use in technology-enhanced learning in anesthesiology and emergency medicine and potential clinical application of AI language models : Between hype and reality around artificial intelligence in medical use].

Die Anaesthesiologie Pub Date : 2024-05-01 DOI:10.1007/s00101-024-01403-7

Philipp Humbsch, Evelyn Horn, Konrad Bohm, Robert Gintrowicz

{"title":"[ChatGPT for use in technology-enhanced learning in anesthesiology and emergency medicine and potential clinical application of AI language models : Between hype and reality around artificial intelligence in medical use].","authors":"Philipp Humbsch, Evelyn Horn, Konrad Bohm, Robert Gintrowicz","doi":"10.1007/s00101-024-01403-7","DOIUrl":null,"url":null,"abstract":"Background: The utilization of AI language models in education and academia is currently a subject of research, and applications in clinical settings are also being tested. Studies conducted by various research groups have demonstrated that language models can answer questions related to medical board examinations, and there are potential applications of these models in medical education as well.Research question: This study aims to investigate the extent to which current version language models prove effective for addressing medical inquiries, their potential utility in medical education, and the challenges that still exist in the functioning of AI language models.Method: The program ChatGPT, based on GPT 3.5, had to answer 1025 questions from the second part (M2) of the medical board examination. The study examined whether any errors and what types of errors occurred. Additionally, the language model was asked to generate essays on the learning objectives outlined in the standard curriculum for specialist training in anesthesiology and the supplementary qualification in emergency medicine. These essays were analyzed afterwards and checked for errors and anomalies.Results: The findings indicated that ChatGPT was able to correctly answer the questions with an accuracy rate exceeding 69%, even when the questions included references to visual aids. This represented an improvement in the accuracy of answering board examination questions compared to a study conducted in March; however, when it came to generating essays a high error rate was observed.Discussion: Considering the current pace of ongoing improvements in AI language models, widespread clinical implementation, especially in emergency departments as well as emergency and intensive care medicine with the assistance of medical trainees, is a plausible scenario. These models can provide insights to support medical professionals in their work, without relying solely on the language model. Although the use of these models in education holds promise, it currently requires a significant amount of supervision. Due to hallucinations caused by inadequate training environments for the language model, the generated texts might deviate from the current state of scientific knowledge. Direct deployment in patient care settings without permanent physician supervision does not yet appear to be achievable at present.","PeriodicalId":72805,"journal":{"name":"Die Anaesthesiologie","volume":" ","pages":"324-335"},"PeriodicalIF":0.0000,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11076380/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Die Anaesthesiologie","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s00101-024-01403-7","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Background: The utilization of AI language models in education and academia is currently a subject of research, and applications in clinical settings are also being tested. Studies conducted by various research groups have demonstrated that language models can answer questions related to medical board examinations, and there are potential applications of these models in medical education as well.

Research question: This study aims to investigate the extent to which current version language models prove effective for addressing medical inquiries, their potential utility in medical education, and the challenges that still exist in the functioning of AI language models.

Method: The program ChatGPT, based on GPT 3.5, had to answer 1025 questions from the second part (M2) of the medical board examination. The study examined whether any errors and what types of errors occurred. Additionally, the language model was asked to generate essays on the learning objectives outlined in the standard curriculum for specialist training in anesthesiology and the supplementary qualification in emergency medicine. These essays were analyzed afterwards and checked for errors and anomalies.

Results: The findings indicated that ChatGPT was able to correctly answer the questions with an accuracy rate exceeding 69%, even when the questions included references to visual aids. This represented an improvement in the accuracy of answering board examination questions compared to a study conducted in March; however, when it came to generating essays a high error rate was observed.

Discussion: Considering the current pace of ongoing improvements in AI language models, widespread clinical implementation, especially in emergency departments as well as emergency and intensive care medicine with the assistance of medical trainees, is a plausible scenario. These models can provide insights to support medical professionals in their work, without relying solely on the language model. Although the use of these models in education holds promise, it currently requires a significant amount of supervision. Due to hallucinations caused by inadequate training environments for the language model, the generated texts might deviate from the current state of scientific knowledge. Direct deployment in patient care settings without permanent physician supervision does not yet appear to be achievable at present.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

[用于麻醉学和急诊医学技术强化学习的 ChatGPT 以及人工智能语言模型的潜在临床应用：人工智能在医疗应用中的炒作与现实之间]。

背景：目前，人工智能语言模型在教育和学术界的应用是一个研究课题，在临床环境中的应用也在测试之中。多个研究小组进行的研究表明，语言模型可以回答与医学考试相关的问题，这些模型在医学教育中也有潜在的应用前景：研究问题：本研究旨在调查当前版本的语言模型在多大程度上能有效解决医学问题，它们在医学教育中的潜在作用，以及人工智能语言模型在运行过程中仍然存在的挑战：方法：基于 GPT 3.5 的 ChatGPT 程序必须回答医学考试第二部分（M2）的 1025 个问题。研究考察了是否出现错误以及错误的类型。此外，还要求语言模型就麻醉学专科培训和急诊医学补充资格标准课程中列出的学习目标撰写论文。之后对这些文章进行了分析，并检查了错误和异常情况：结果表明，ChatGPT 能够正确回答问题，正确率超过 69%，即使问题中包含对直观教具的引用。这表明，与三月份进行的一项研究相比，回答董事会考试问题的准确率有所提高；但是，在生成论文时，出现了较高的错误率：讨论：考虑到目前人工智能语言模型不断改进的速度，在临床上，特别是在急诊科以及急诊和重症监护医学中，在医学实习生的协助下广泛实施人工智能语言模型是一个可行的方案。这些模型可以提供见解，支持医疗专业人员的工作，而无需完全依赖语言模型。虽然这些模型在教育中的应用前景广阔，但目前还需要大量的监督。由于语言模型的训练环境不足而产生的幻觉，生成的文本可能会偏离当前的科学知识水平。目前似乎还无法在没有医生长期监督的情况下在病人护理环境中直接使用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Die Anaesthesiologie

自引率

0.00%

发文量