[ChatGPT for use in technology-enhanced learning in anesthesiology and emergency medicine and potential clinical application of AI language models : Between hype and reality around artificial intelligence in medical use].
Philipp Humbsch, Evelyn Horn, Konrad Bohm, Robert Gintrowicz
{"title":"[ChatGPT for use in technology-enhanced learning in anesthesiology and emergency medicine and potential clinical application of AI language models : Between hype and reality around artificial intelligence in medical use].","authors":"Philipp Humbsch, Evelyn Horn, Konrad Bohm, Robert Gintrowicz","doi":"10.1007/s00101-024-01403-7","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>The utilization of AI language models in education and academia is currently a subject of research, and applications in clinical settings are also being tested. Studies conducted by various research groups have demonstrated that language models can answer questions related to medical board examinations, and there are potential applications of these models in medical education as well.</p><p><strong>Research question: </strong>This study aims to investigate the extent to which current version language models prove effective for addressing medical inquiries, their potential utility in medical education, and the challenges that still exist in the functioning of AI language models.</p><p><strong>Method: </strong>The program ChatGPT, based on GPT 3.5, had to answer 1025 questions from the second part (M2) of the medical board examination. The study examined whether any errors and what types of errors occurred. Additionally, the language model was asked to generate essays on the learning objectives outlined in the standard curriculum for specialist training in anesthesiology and the supplementary qualification in emergency medicine. These essays were analyzed afterwards and checked for errors and anomalies.</p><p><strong>Results: </strong>The findings indicated that ChatGPT was able to correctly answer the questions with an accuracy rate exceeding 69%, even when the questions included references to visual aids. This represented an improvement in the accuracy of answering board examination questions compared to a study conducted in March; however, when it came to generating essays a high error rate was observed.</p><p><strong>Discussion: </strong>Considering the current pace of ongoing improvements in AI language models, widespread clinical implementation, especially in emergency departments as well as emergency and intensive care medicine with the assistance of medical trainees, is a plausible scenario. These models can provide insights to support medical professionals in their work, without relying solely on the language model. Although the use of these models in education holds promise, it currently requires a significant amount of supervision. Due to hallucinations caused by inadequate training environments for the language model, the generated texts might deviate from the current state of scientific knowledge. Direct deployment in patient care settings without permanent physician supervision does not yet appear to be achievable at present.</p>","PeriodicalId":72805,"journal":{"name":"Die Anaesthesiologie","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11076380/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Die Anaesthesiologie","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s00101-024-01403-7","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Background: The utilization of AI language models in education and academia is currently a subject of research, and applications in clinical settings are also being tested. Studies conducted by various research groups have demonstrated that language models can answer questions related to medical board examinations, and there are potential applications of these models in medical education as well.
Research question: This study aims to investigate the extent to which current version language models prove effective for addressing medical inquiries, their potential utility in medical education, and the challenges that still exist in the functioning of AI language models.
Method: The program ChatGPT, based on GPT 3.5, had to answer 1025 questions from the second part (M2) of the medical board examination. The study examined whether any errors and what types of errors occurred. Additionally, the language model was asked to generate essays on the learning objectives outlined in the standard curriculum for specialist training in anesthesiology and the supplementary qualification in emergency medicine. These essays were analyzed afterwards and checked for errors and anomalies.
Results: The findings indicated that ChatGPT was able to correctly answer the questions with an accuracy rate exceeding 69%, even when the questions included references to visual aids. This represented an improvement in the accuracy of answering board examination questions compared to a study conducted in March; however, when it came to generating essays a high error rate was observed.
Discussion: Considering the current pace of ongoing improvements in AI language models, widespread clinical implementation, especially in emergency departments as well as emergency and intensive care medicine with the assistance of medical trainees, is a plausible scenario. These models can provide insights to support medical professionals in their work, without relying solely on the language model. Although the use of these models in education holds promise, it currently requires a significant amount of supervision. Due to hallucinations caused by inadequate training environments for the language model, the generated texts might deviate from the current state of scientific knowledge. Direct deployment in patient care settings without permanent physician supervision does not yet appear to be achievable at present.