Comparing the Diagnostic Performance of GPT-4-based ChatGPT, GPT-4V-based ChatGPT, and Radiologists in Challenging Neuroradiology Cases.

IF 2.4 3区 医学 Q2 CLINICAL NEUROLOGY Clinical Neuroradiology Pub Date : 2024-12-01 Epub Date: 2024-05-28 DOI:10.1007/s00062-024-01426-y
Daisuke Horiuchi, Hiroyuki Tatekawa, Tatsushi Oura, Satoshi Oue, Shannon L Walston, Hirotaka Takita, Shu Matsushita, Yasuhito Mitsuyama, Taro Shimono, Yukio Miki, Daiju Ueda
{"title":"Comparing the Diagnostic Performance of GPT-4-based ChatGPT, GPT-4V-based ChatGPT, and Radiologists in Challenging Neuroradiology Cases.","authors":"Daisuke Horiuchi, Hiroyuki Tatekawa, Tatsushi Oura, Satoshi Oue, Shannon L Walston, Hirotaka Takita, Shu Matsushita, Yasuhito Mitsuyama, Taro Shimono, Yukio Miki, Daiju Ueda","doi":"10.1007/s00062-024-01426-y","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>To compare the diagnostic performance among Generative Pre-trained Transformer (GPT)-4-based ChatGPT, GPT‑4 with vision (GPT-4V) based ChatGPT, and radiologists in challenging neuroradiology cases.</p><p><strong>Methods: </strong>We collected 32 consecutive \"Freiburg Neuropathology Case Conference\" cases from the journal Clinical Neuroradiology between March 2016 and December 2023. We input the medical history and imaging findings into GPT-4-based ChatGPT and the medical history and images into GPT-4V-based ChatGPT, then both generated a diagnosis for each case. Six radiologists (three radiology residents and three board-certified radiologists) independently reviewed all cases and provided diagnoses. ChatGPT and radiologists' diagnostic accuracy rates were evaluated based on the published ground truth. Chi-square tests were performed to compare the diagnostic accuracy of GPT-4-based ChatGPT, GPT-4V-based ChatGPT, and radiologists.</p><p><strong>Results: </strong>GPT‑4 and GPT-4V-based ChatGPTs achieved accuracy rates of 22% (7/32) and 16% (5/32), respectively. Radiologists achieved the following accuracy rates: three radiology residents 28% (9/32), 31% (10/32), and 28% (9/32); and three board-certified radiologists 38% (12/32), 47% (15/32), and 44% (14/32). GPT-4-based ChatGPT's diagnostic accuracy was lower than each radiologist, although not significantly (all p > 0.07). GPT-4V-based ChatGPT's diagnostic accuracy was also lower than each radiologist and significantly lower than two board-certified radiologists (p = 0.02 and 0.03) (not significant for radiology residents and one board-certified radiologist [all p > 0.09]).</p><p><strong>Conclusion: </strong>While GPT-4-based ChatGPT demonstrated relatively higher diagnostic performance than GPT-4V-based ChatGPT, the diagnostic performance of GPT‑4 and GPT-4V-based ChatGPTs did not reach the performance level of either radiology residents or board-certified radiologists in challenging neuroradiology cases.</p>","PeriodicalId":49298,"journal":{"name":"Clinical Neuroradiology","volume":" ","pages":"779-787"},"PeriodicalIF":2.4000,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Clinical Neuroradiology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s00062-024-01426-y","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/5/28 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Purpose: To compare the diagnostic performance among Generative Pre-trained Transformer (GPT)-4-based ChatGPT, GPT‑4 with vision (GPT-4V) based ChatGPT, and radiologists in challenging neuroradiology cases.

Methods: We collected 32 consecutive "Freiburg Neuropathology Case Conference" cases from the journal Clinical Neuroradiology between March 2016 and December 2023. We input the medical history and imaging findings into GPT-4-based ChatGPT and the medical history and images into GPT-4V-based ChatGPT, then both generated a diagnosis for each case. Six radiologists (three radiology residents and three board-certified radiologists) independently reviewed all cases and provided diagnoses. ChatGPT and radiologists' diagnostic accuracy rates were evaluated based on the published ground truth. Chi-square tests were performed to compare the diagnostic accuracy of GPT-4-based ChatGPT, GPT-4V-based ChatGPT, and radiologists.

Results: GPT‑4 and GPT-4V-based ChatGPTs achieved accuracy rates of 22% (7/32) and 16% (5/32), respectively. Radiologists achieved the following accuracy rates: three radiology residents 28% (9/32), 31% (10/32), and 28% (9/32); and three board-certified radiologists 38% (12/32), 47% (15/32), and 44% (14/32). GPT-4-based ChatGPT's diagnostic accuracy was lower than each radiologist, although not significantly (all p > 0.07). GPT-4V-based ChatGPT's diagnostic accuracy was also lower than each radiologist and significantly lower than two board-certified radiologists (p = 0.02 and 0.03) (not significant for radiology residents and one board-certified radiologist [all p > 0.09]).

Conclusion: While GPT-4-based ChatGPT demonstrated relatively higher diagnostic performance than GPT-4V-based ChatGPT, the diagnostic performance of GPT‑4 and GPT-4V-based ChatGPTs did not reach the performance level of either radiology residents or board-certified radiologists in challenging neuroradiology cases.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
比较基于 GPT-4 的 ChatGPT、基于 GPT-4V 的 ChatGPT 和放射科医生在神经放射学疑难病例中的诊断效果。
目的:比较基于生成预训练变换器(GPT)-4 的 ChatGPT、基于视觉的 GPT-4 的 ChatGPT 和放射科医生在具有挑战性的神经放射学病例中的诊断性能:我们从《临床神经放射学》杂志中收集了 2016 年 3 月至 2023 年 12 月间的 32 个连续的 "弗莱堡神经病理学病例会议 "病例。我们将病史和影像检查结果输入基于 GPT-4 的 ChatGPT,将病史和影像输入基于 GPT-4V 的 ChatGPT,然后两者为每个病例生成诊断。六位放射科医生(三位放射科住院医师和三位经委员会认证的放射科医生)独立审查所有病例并提供诊断。根据已公布的基本事实对 ChatGPT 和放射科医生的诊断准确率进行了评估。对基于 GPT-4 的 ChatGPT、基于 GPT-4V 的 ChatGPT 和放射医师的诊断准确率进行了卡方检验:结果:基于 GPT-4 和 GPT-4V 的 ChatGPT 的准确率分别为 22%(7/32)和 16%(5/32)。放射科医生的准确率如下:三位放射科住院医生分别为 28%(9/32)、31%(10/32)和 28%(9/32);三位经委员会认证的放射科医生分别为 38%(12/32)、47%(15/32)和 44%(14/32)。基于 GPT-4 的 ChatGPT 诊断准确率低于每位放射科医生,但差异不明显(均 p > 0.07)。基于 GPT-4V 的 ChatGPT 诊断准确性也低于每位放射科医生,且明显低于两位获得医学会认证的放射科医生(P = 0.02 和 0.03)(放射科住院医师和一位获得医学会认证的放射科医生的诊断准确性不显著[所有 P > 0.09]):结论:虽然基于 GPT-4 的 ChatGPT 的诊断性能相对高于基于 GPT-4V 的 ChatGPT,但在具有挑战性的神经放射学病例中,基于 GPT-4 和 GPT-4V 的 ChatGPT 的诊断性能并未达到放射科住院医师或具有医师资格的放射科医师的性能水平。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Clinical Neuroradiology
Clinical Neuroradiology CLINICAL NEUROLOGY-RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING
CiteScore
5.00
自引率
3.60%
发文量
106
审稿时长
>12 weeks
期刊介绍: Clinical Neuroradiology provides current information, original contributions, and reviews in the field of neuroradiology. An interdisciplinary approach is accomplished by diagnostic and therapeutic contributions related to associated subjects. The international coverage and relevance of the journal is underlined by its being the official journal of the German, Swiss, and Austrian Societies of Neuroradiology.
期刊最新文献
Endovascular Treatment of Carotid Artery Dissection Caused by Eagle's Syndrome : Case Report. Pre-stroke Functional Status in Patients Undergoing Mechanical Thrombectomy: How Relevant Are False Estimations in the Emergency Setting? The DERIVO 2 Heal Embolization Device in the Treatment of Ruptured and Unruptured Intracranial Aneurysms: a Retrospective Multicenter Study. A New Fibrin-Heparine Coated Self-Expanding Stent for the Rescue Treatment of Intracranial Stenosis-a Multicentric Study. Trevo 3 Mm and/or AXS Catalyst 5 for the Treatment of Medium Distal Vessel Occlusion Stroke-results from the ASSIST Registry.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1