[ChatGPT for use in technology-enhanced learning in anesthesiology and emergency medicine and potential clinical application of AI language models : Between hype and reality around artificial intelligence in medical use].

Philipp Humbsch, Evelyn Horn, Konrad Bohm, Robert Gintrowicz
{"title":"[ChatGPT for use in technology-enhanced learning in anesthesiology and emergency medicine and potential clinical application of AI language models : Between hype and reality around artificial intelligence in medical use].","authors":"Philipp Humbsch, Evelyn Horn, Konrad Bohm, Robert Gintrowicz","doi":"10.1007/s00101-024-01403-7","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>The utilization of AI language models in education and academia is currently a subject of research, and applications in clinical settings are also being tested. Studies conducted by various research groups have demonstrated that language models can answer questions related to medical board examinations, and there are potential applications of these models in medical education as well.</p><p><strong>Research question: </strong>This study aims to investigate the extent to which current version language models prove effective for addressing medical inquiries, their potential utility in medical education, and the challenges that still exist in the functioning of AI language models.</p><p><strong>Method: </strong>The program ChatGPT, based on GPT 3.5, had to answer 1025 questions from the second part (M2) of the medical board examination. The study examined whether any errors and what types of errors occurred. Additionally, the language model was asked to generate essays on the learning objectives outlined in the standard curriculum for specialist training in anesthesiology and the supplementary qualification in emergency medicine. These essays were analyzed afterwards and checked for errors and anomalies.</p><p><strong>Results: </strong>The findings indicated that ChatGPT was able to correctly answer the questions with an accuracy rate exceeding 69%, even when the questions included references to visual aids. This represented an improvement in the accuracy of answering board examination questions compared to a study conducted in March; however, when it came to generating essays a high error rate was observed.</p><p><strong>Discussion: </strong>Considering the current pace of ongoing improvements in AI language models, widespread clinical implementation, especially in emergency departments as well as emergency and intensive care medicine with the assistance of medical trainees, is a plausible scenario. These models can provide insights to support medical professionals in their work, without relying solely on the language model. Although the use of these models in education holds promise, it currently requires a significant amount of supervision. Due to hallucinations caused by inadequate training environments for the language model, the generated texts might deviate from the current state of scientific knowledge. Direct deployment in patient care settings without permanent physician supervision does not yet appear to be achievable at present.</p>","PeriodicalId":72805,"journal":{"name":"Die Anaesthesiologie","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11076380/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Die Anaesthesiologie","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s00101-024-01403-7","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Background: The utilization of AI language models in education and academia is currently a subject of research, and applications in clinical settings are also being tested. Studies conducted by various research groups have demonstrated that language models can answer questions related to medical board examinations, and there are potential applications of these models in medical education as well.

Research question: This study aims to investigate the extent to which current version language models prove effective for addressing medical inquiries, their potential utility in medical education, and the challenges that still exist in the functioning of AI language models.

Method: The program ChatGPT, based on GPT 3.5, had to answer 1025 questions from the second part (M2) of the medical board examination. The study examined whether any errors and what types of errors occurred. Additionally, the language model was asked to generate essays on the learning objectives outlined in the standard curriculum for specialist training in anesthesiology and the supplementary qualification in emergency medicine. These essays were analyzed afterwards and checked for errors and anomalies.

Results: The findings indicated that ChatGPT was able to correctly answer the questions with an accuracy rate exceeding 69%, even when the questions included references to visual aids. This represented an improvement in the accuracy of answering board examination questions compared to a study conducted in March; however, when it came to generating essays a high error rate was observed.

Discussion: Considering the current pace of ongoing improvements in AI language models, widespread clinical implementation, especially in emergency departments as well as emergency and intensive care medicine with the assistance of medical trainees, is a plausible scenario. These models can provide insights to support medical professionals in their work, without relying solely on the language model. Although the use of these models in education holds promise, it currently requires a significant amount of supervision. Due to hallucinations caused by inadequate training environments for the language model, the generated texts might deviate from the current state of scientific knowledge. Direct deployment in patient care settings without permanent physician supervision does not yet appear to be achievable at present.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
[用于麻醉学和急诊医学技术强化学习的 ChatGPT 以及人工智能语言模型的潜在临床应用:人工智能在医疗应用中的炒作与现实之间]。
背景:目前,人工智能语言模型在教育和学术界的应用是一个研究课题,在临床环境中的应用也在测试之中。多个研究小组进行的研究表明,语言模型可以回答与医学考试相关的问题,这些模型在医学教育中也有潜在的应用前景:研究问题:本研究旨在调查当前版本的语言模型在多大程度上能有效解决医学问题,它们在医学教育中的潜在作用,以及人工智能语言模型在运行过程中仍然存在的挑战:方法:基于 GPT 3.5 的 ChatGPT 程序必须回答医学考试第二部分(M2)的 1025 个问题。研究考察了是否出现错误以及错误的类型。此外,还要求语言模型就麻醉学专科培训和急诊医学补充资格标准课程中列出的学习目标撰写论文。之后对这些文章进行了分析,并检查了错误和异常情况:结果表明,ChatGPT 能够正确回答问题,正确率超过 69%,即使问题中包含对直观教具的引用。这表明,与三月份进行的一项研究相比,回答董事会考试问题的准确率有所提高;但是,在生成论文时,出现了较高的错误率:讨论:考虑到目前人工智能语言模型不断改进的速度,在临床上,特别是在急诊科以及急诊和重症监护医学中,在医学实习生的协助下广泛实施人工智能语言模型是一个可行的方案。这些模型可以提供见解,支持医疗专业人员的工作,而无需完全依赖语言模型。虽然这些模型在教育中的应用前景广阔,但目前还需要大量的监督。由于语言模型的训练环境不足而产生的幻觉,生成的文本可能会偏离当前的科学知识水平。目前似乎还无法在没有医生长期监督的情况下在病人护理环境中直接使用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
[Focus on neurosurgical intensive care medicine 2022-2024 : Summary of selected studies in intensive care medicine]. [Focus emergency medicine 2023/2024-Summary of selected studies in emergency medicine]. [Intraoperative hypotension in children-Measurement and treatment]. [Prehospital blood transfusion : Opportunities and challenges for the German emergency medical services]. Inferior vena cava collapsibility index for predicting hypotension after spinal anesthesia in patients undergoing total knee arthroplasty.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1