ChatGPT in radiology: A systematic review of performance, pitfalls, and future perspectives

IF 4.9 2区 医学 Q1 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING Diagnostic and Interventional Imaging Pub Date : 2024-04-27 DOI:10.1016/j.diii.2024.04.003
Pedram Keshavarz , Sara Bagherieh , Seyed Ali Nabipoorashrafi , Hamid Chalian , Amir Ali Rahsepar , Grace Hyun J. Kim , Cameron Hassani , Steven S. Raman , Arash Bedayat
{"title":"ChatGPT in radiology: A systematic review of performance, pitfalls, and future perspectives","authors":"Pedram Keshavarz ,&nbsp;Sara Bagherieh ,&nbsp;Seyed Ali Nabipoorashrafi ,&nbsp;Hamid Chalian ,&nbsp;Amir Ali Rahsepar ,&nbsp;Grace Hyun J. Kim ,&nbsp;Cameron Hassani ,&nbsp;Steven S. Raman ,&nbsp;Arash Bedayat","doi":"10.1016/j.diii.2024.04.003","DOIUrl":null,"url":null,"abstract":"<div><h3>Purpose</h3><p>The purpose of this study was to systematically review the reported performances of ChatGPT, identify potential limitations, and explore future directions for its integration, optimization, and ethical considerations in radiology applications.</p></div><div><h3>Materials and methods</h3><p>After a comprehensive review of PubMed, Web of Science, Embase, and Google Scholar databases, a cohort of published studies was identified up to January 1, 2024, utilizing ChatGPT for clinical radiology applications.</p></div><div><h3>Results</h3><p>Out of 861 studies derived, 44 studies evaluated the performance of ChatGPT; among these, 37 (37/44; 84.1%) demonstrated high performance, and seven (7/44; 15.9%) indicated it had a lower performance in providing information on diagnosis and clinical decision support (6/44; 13.6%) and patient communication and educational content (1/44; 2.3%). Twenty-four (24/44; 54.5%) studies reported the proportion of ChatGPT's performance. Among these, 19 (19/24; 79.2%) studies recorded a median accuracy of 70.5%, and in five (5/24; 20.8%) studies, there was a median agreement of 83.6% between ChatGPT outcomes and reference standards [radiologists’ decision or guidelines], generally confirming ChatGPT's high accuracy in these studies. Eleven studies compared two recent ChatGPT versions, and in ten (10/11; 90.9%), ChatGPTv4 outperformed v3.5, showing notable enhancements in addressing higher-order thinking questions, better comprehension of radiology terms, and improved accuracy in describing images. Risks and concerns about using ChatGPT included biased responses, limited originality, and the potential for inaccurate information leading to misinformation, hallucinations, improper citations and fake references, cybersecurity vulnerabilities, and patient privacy risks.</p></div><div><h3>Conclusion</h3><p>Although ChatGPT's effectiveness has been shown in 84.1% of radiology studies, there are still multiple pitfalls and limitations to address. It is too soon to confirm its complete proficiency and accuracy, and more extensive multicenter studies utilizing diverse datasets and pre-training techniques are required to verify ChatGPT's role in radiology.</p></div>","PeriodicalId":48656,"journal":{"name":"Diagnostic and Interventional Imaging","volume":null,"pages":null},"PeriodicalIF":4.9000,"publicationDate":"2024-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2211568424001050/pdfft?md5=c220d60df18e8b0b15148619195f5a41&pid=1-s2.0-S2211568424001050-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Diagnostic and Interventional Imaging","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2211568424001050","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0

Abstract

Purpose

The purpose of this study was to systematically review the reported performances of ChatGPT, identify potential limitations, and explore future directions for its integration, optimization, and ethical considerations in radiology applications.

Materials and methods

After a comprehensive review of PubMed, Web of Science, Embase, and Google Scholar databases, a cohort of published studies was identified up to January 1, 2024, utilizing ChatGPT for clinical radiology applications.

Results

Out of 861 studies derived, 44 studies evaluated the performance of ChatGPT; among these, 37 (37/44; 84.1%) demonstrated high performance, and seven (7/44; 15.9%) indicated it had a lower performance in providing information on diagnosis and clinical decision support (6/44; 13.6%) and patient communication and educational content (1/44; 2.3%). Twenty-four (24/44; 54.5%) studies reported the proportion of ChatGPT's performance. Among these, 19 (19/24; 79.2%) studies recorded a median accuracy of 70.5%, and in five (5/24; 20.8%) studies, there was a median agreement of 83.6% between ChatGPT outcomes and reference standards [radiologists’ decision or guidelines], generally confirming ChatGPT's high accuracy in these studies. Eleven studies compared two recent ChatGPT versions, and in ten (10/11; 90.9%), ChatGPTv4 outperformed v3.5, showing notable enhancements in addressing higher-order thinking questions, better comprehension of radiology terms, and improved accuracy in describing images. Risks and concerns about using ChatGPT included biased responses, limited originality, and the potential for inaccurate information leading to misinformation, hallucinations, improper citations and fake references, cybersecurity vulnerabilities, and patient privacy risks.

Conclusion

Although ChatGPT's effectiveness has been shown in 84.1% of radiology studies, there are still multiple pitfalls and limitations to address. It is too soon to confirm its complete proficiency and accuracy, and more extensive multicenter studies utilizing diverse datasets and pre-training techniques are required to verify ChatGPT's role in radiology.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
放射学中的 ChatGPT:对性能、陷阱和未来前景的系统回顾。
目的:本研究旨在系统回顾已报道的 ChatGPT 性能,识别潜在的局限性,并探索其在放射学应用中的整合、优化和伦理考虑的未来方向:在对PubMed、Web of Science、Embase和Google Scholar数据库进行全面审查后,确定了截至2024年1月1日利用ChatGPT进行临床放射学应用的已发表研究:在得出的 861 项研究中,44 项研究对 ChatGPT 的性能进行了评估;其中 37 项(37/44;84.1%)表现出较高的性能,7 项(7/44;15.9%)表示在提供诊断信息和临床决策支持(6/44;13.6%)以及患者交流和教育内容(1/44;2.3%)方面性能较低。有 24 项研究(24/44;54.5%)报告了 ChatGPT 的性能比例。其中,19 项研究(19/24;79.2%)记录的准确率中位数为 70.5%,5 项研究(5/24;20.8%)的 ChatGPT 结果与参考标准(放射科医生的决定或指南)的一致性中位数为 83.6%,总体上证实了 ChatGPT 在这些研究中的高准确率。有 11 项研究比较了两个最新的 ChatGPT 版本,其中 10 项研究(10/11;90.9%)发现 ChatGPTv4 的表现优于 v3.5,这表明 ChatGPTv4 在解决高阶思维问题、更好地理解放射学术语以及提高图像描述准确性方面都有显著提升。使用 ChatGPT 的风险和顾虑包括:回答有偏差、原创性有限、信息不准确可能导致误报、幻觉、引用不当和虚假引用、网络安全漏洞和患者隐私风险:虽然 ChatGPT 在 84.1% 的放射学研究中显示出其有效性,但仍有许多隐患和局限性需要解决。要确认其完全熟练和准确还为时尚早,需要利用不同的数据集和预培训技术进行更广泛的多中心研究,以验证 ChatGPT 在放射学中的作用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Diagnostic and Interventional Imaging
Diagnostic and Interventional Imaging Medicine-Radiology, Nuclear Medicine and Imaging
CiteScore
8.50
自引率
29.10%
发文量
126
审稿时长
11 days
期刊介绍: Diagnostic and Interventional Imaging accepts publications originating from any part of the world based only on their scientific merit. The Journal focuses on illustrated articles with great iconographic topics and aims at aiding sharpening clinical decision-making skills as well as following high research topics. All articles are published in English. Diagnostic and Interventional Imaging publishes editorials, technical notes, letters, original and review articles on abdominal, breast, cancer, cardiac, emergency, forensic medicine, head and neck, musculoskeletal, gastrointestinal, genitourinary, interventional, obstetric, pediatric, thoracic and vascular imaging, neuroradiology, nuclear medicine, as well as contrast material, computer developments, health policies and practice, and medical physics relevant to imaging.
期刊最新文献
Artificial intelligence in interventional radiology: Current concepts and future trends. Spontaneous necrosis and regression of focal nodular hyperplasia. Comparison between contrast-enhanced fat-suppressed 3D FLAIR brain MR images and T2-weighted orbital MR images at 3 Tesla for the diagnosis of acute optic neuritis. The effect of radiology on climate change: Can AI help us move toward a green future? Diagnostic performance and relationships of structural parameters and strain components for the diagnosis of cardiac amyloidosis with MRI.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1