Large language models (LLMs) in radiology exams for medical students: Performance and consequences.

IF 1.3 4区 医学 Q3 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING Rofo-fortschritte Auf Dem Gebiet Der Rontgenstrahlen Und Der Bildgebenden Verfahren Pub Date : 2024-11-04 DOI:10.1055/a-2437-2067
Jennifer Gotta, Quang Anh Le Hong, Vitali Koch, Leon D Gruenewald, Tobias Geyer, Simon S Martin, Jan-Erik Scholtz, Christian Booz, Daniel Pinto Dos Santos, Scherwin Mahmoudi, Katrin Eichler, Tatjana Gruber-Rouh, Renate Hammerstingl, Teodora Biciusca, Lisa Joy Juergens, Elena Höhne, Christoph Mader, Thomas J Vogl, Philipp Reschke
{"title":"Large language models (LLMs) in radiology exams for medical students: Performance and consequences.","authors":"Jennifer Gotta, Quang Anh Le Hong, Vitali Koch, Leon D Gruenewald, Tobias Geyer, Simon S Martin, Jan-Erik Scholtz, Christian Booz, Daniel Pinto Dos Santos, Scherwin Mahmoudi, Katrin Eichler, Tatjana Gruber-Rouh, Renate Hammerstingl, Teodora Biciusca, Lisa Joy Juergens, Elena Höhne, Christoph Mader, Thomas J Vogl, Philipp Reschke","doi":"10.1055/a-2437-2067","DOIUrl":null,"url":null,"abstract":"<p><p>The evolving field of medical education is being shaped by technological advancements, including the integration of Large Language Models (LLMs) like ChatGPT. These models could be invaluable resources for medical students, by simplifying complex concepts and enhancing interactive learning by providing personalized support. LLMs have shown impressive performance in professional examinations, even without specific domain training, making them particularly relevant in the medical field. This study aims to assess the performance of LLMs in radiology examinations for medical students, thereby shedding light on their current capabilities and implications.This study was conducted using 151 multiple-choice questions, which were used for radiology exams for medical students. The questions were categorized by type and topic and were then processed using OpenAI's GPT-3.5 and GPT- 4 via their API, or manually put into Perplexity AI with GPT-3.5 and Bing. LLM performance was evaluated overall, by question type and by topic.GPT-3.5 achieved a 67.6% overall accuracy on all 151 questions, while GPT-4 outperformed it significantly with an 88.1% overall accuracy (p<0.001). GPT-4 demonstrated superior performance in both lower-order and higher-order questions compared to GPT-3.5, Perplexity AI, and medical students, with GPT-4 particularly excelling in higher-order questions. All GPT models would have successfully passed the radiology exam for medical students at our university.In conclusion, our study highlights the potential of LLMs as accessible knowledge resources for medical students. GPT-4 performed well on lower-order as well as higher-order questions, making ChatGPT-4 a potentially very useful tool for reviewing radiology exam questions. Radiologists should be aware of ChatGPT's limitations, including its tendency to confidently provide incorrect responses. · ChatGPT demonstrated remarkable performance, achieving a passing grade on a radiology examination for medical students that did not include image questions.. · GPT-4 exhibits significantly improved performance compared to its predecessors GPT-3.5 and Perplexity AI with 88% of questions answered correctly.. · Radiologists as well as medical students should be aware of ChatGPT's limitations, including its tendency to confidently provide incorrect responses.. · Gotta J, Le Hong QA, Koch V et al. Large language models (LLMs) in radiology exams for medical students: Performance and consequences. Fortschr Röntgenstr 2024; DOI 10.1055/a-2437-2067.</p>","PeriodicalId":21490,"journal":{"name":"Rofo-fortschritte Auf Dem Gebiet Der Rontgenstrahlen Und Der Bildgebenden Verfahren","volume":" ","pages":""},"PeriodicalIF":1.3000,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Rofo-fortschritte Auf Dem Gebiet Der Rontgenstrahlen Und Der Bildgebenden Verfahren","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1055/a-2437-2067","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0

Abstract

The evolving field of medical education is being shaped by technological advancements, including the integration of Large Language Models (LLMs) like ChatGPT. These models could be invaluable resources for medical students, by simplifying complex concepts and enhancing interactive learning by providing personalized support. LLMs have shown impressive performance in professional examinations, even without specific domain training, making them particularly relevant in the medical field. This study aims to assess the performance of LLMs in radiology examinations for medical students, thereby shedding light on their current capabilities and implications.This study was conducted using 151 multiple-choice questions, which were used for radiology exams for medical students. The questions were categorized by type and topic and were then processed using OpenAI's GPT-3.5 and GPT- 4 via their API, or manually put into Perplexity AI with GPT-3.5 and Bing. LLM performance was evaluated overall, by question type and by topic.GPT-3.5 achieved a 67.6% overall accuracy on all 151 questions, while GPT-4 outperformed it significantly with an 88.1% overall accuracy (p<0.001). GPT-4 demonstrated superior performance in both lower-order and higher-order questions compared to GPT-3.5, Perplexity AI, and medical students, with GPT-4 particularly excelling in higher-order questions. All GPT models would have successfully passed the radiology exam for medical students at our university.In conclusion, our study highlights the potential of LLMs as accessible knowledge resources for medical students. GPT-4 performed well on lower-order as well as higher-order questions, making ChatGPT-4 a potentially very useful tool for reviewing radiology exam questions. Radiologists should be aware of ChatGPT's limitations, including its tendency to confidently provide incorrect responses. · ChatGPT demonstrated remarkable performance, achieving a passing grade on a radiology examination for medical students that did not include image questions.. · GPT-4 exhibits significantly improved performance compared to its predecessors GPT-3.5 and Perplexity AI with 88% of questions answered correctly.. · Radiologists as well as medical students should be aware of ChatGPT's limitations, including its tendency to confidently provide incorrect responses.. · Gotta J, Le Hong QA, Koch V et al. Large language models (LLMs) in radiology exams for medical students: Performance and consequences. Fortschr Röntgenstr 2024; DOI 10.1055/a-2437-2067.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
医学生放射学考试中的大型语言模型(LLM):成绩与后果
不断发展的医学教育领域正受到技术进步的影响,其中包括像 ChatGPT 这样的大语言模型(LLM)的整合。这些模型可以简化复杂的概念,并通过提供个性化支持加强互动学习,是医学生的宝贵资源。即使没有接受过特定领域的培训,LLMs 在专业考试中的表现也令人印象深刻,因此它们在医学领域尤为重要。本研究旨在评估法学硕士在医学生放射学考试中的表现,从而揭示他们目前的能力和意义。本研究使用了 151 道用于医学生放射学考试的多项选择题。这些问题按类型和主题分类,然后使用 OpenAI 的 GPT-3.5 和 GPT- 4 通过其 API 进行处理,或通过 GPT-3.5 和 Bing 手动输入 Perplexity AI。我们按问题类型和主题对 LLM 的整体性能进行了评估。GPT-3.5 在所有 151 个问题上的整体准确率为 67.6%,而 GPT-4 的表现明显优于 GPT-3.5,整体准确率为 88.1%(P
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
1.20
自引率
5.60%
发文量
340
期刊介绍: Die RöFo veröffentlicht Originalarbeiten, Übersichtsartikel und Fallberichte aus dem Bereich der Radiologie und den weiteren bildgebenden Verfahren in der Medizin. Es dürfen nur Arbeiten eingereicht werden, die noch nicht veröffentlicht sind und die auch nicht gleichzeitig einer anderen Zeitschrift zur Veröffentlichung angeboten wurden. Alle eingereichten Beiträge unterliegen einer sorgfältigen fachlichen Begutachtung. Gegründet 1896 – nur knapp 1 Jahr nach der Entdeckung der Röntgenstrahlen durch C.W. Röntgen – blickt die RöFo auf über 100 Jahre Erfahrung als wichtigstes Publikationsmedium in der deutschsprachigen Radiologie zurück. Sie ist damit die älteste radiologische Fachzeitschrift und schafft es erfolgreich, lange Kontinuität mit dem Anspruch an wissenschaftliches Publizieren auf internationalem Niveau zu verbinden. Durch ihren zentralen Platz im Verlagsprogramm stellte die RöFo die Basis für das heute umfassende und erfolgreiche Radiologie-Medienangebot im Georg Thieme Verlag. Besonders eng verbunden ist die RöFo mit der Geschichte der Röntgengesellschaften in Deutschland und Österreich. Sie ist offizielles Organ von DRG und ÖRG und die Mitglieder der Fachgesellschaften erhalten die Zeitschrift im Rahmen ihrer Mitgliedschaft. Mit ihrem wissenschaftlichen Kernteil und dem eigenen Mitteilungsteil der Fachgesellschaften bietet die RöFo Monat für Monat ein Forum für den Austausch von Inhalten und Botschaften der radiologischen Community im deutschsprachigen Raum.
期刊最新文献
The worldwide COVID-19 pandemic caused a decline in sonographic examinations - is this a continuing trend? Contrast-enhanced ultrasound of the liver: Vascular pathologies and interventions. The value of AI-based analysis of fractional flow reserve of volume computed tomographically detected coronary artery stenosis with regard to their hemodynamic relevance. [Intracranial and spinal dural arteriovenous fistulas]. 2-Hydroxyglutarate as an MR spectroscopic predictor of an IDH mutation in gliomas.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1