ChatGPT in radiology: A systematic review of performance, pitfalls, and future perspectives

IF 4.9 2区医学 Q1 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING Diagnostic and Interventional Imaging Pub Date : 2024-04-27 DOI:10.1016/j.diii.2024.04.003

Pedram Keshavarz , Sara Bagherieh , Seyed Ali Nabipoorashrafi , Hamid Chalian , Amir Ali Rahsepar , Grace Hyun J. Kim , Cameron Hassani , Steven S. Raman , Arash Bedayat

{"title":"ChatGPT in radiology: A systematic review of performance, pitfalls, and future perspectives","authors":"Pedram Keshavarz , Sara Bagherieh , Seyed Ali Nabipoorashrafi , Hamid Chalian , Amir Ali Rahsepar , Grace Hyun J. Kim , Cameron Hassani , Steven S. Raman , Arash Bedayat","doi":"10.1016/j.diii.2024.04.003","DOIUrl":null,"url":null,"abstract":"<div><h3>Purpose</h3><p>The purpose of this study was to systematically review the reported performances of ChatGPT, identify potential limitations, and explore future directions for its integration, optimization, and ethical considerations in radiology applications.</p></div><div><h3>Materials and methods</h3><p>After a comprehensive review of PubMed, Web of Science, Embase, and Google Scholar databases, a cohort of published studies was identified up to January 1, 2024, utilizing ChatGPT for clinical radiology applications.</p></div><div><h3>Results</h3><p>Out of 861 studies derived, 44 studies evaluated the performance of ChatGPT; among these, 37 (37/44; 84.1%) demonstrated high performance, and seven (7/44; 15.9%) indicated it had a lower performance in providing information on diagnosis and clinical decision support (6/44; 13.6%) and patient communication and educational content (1/44; 2.3%). Twenty-four (24/44; 54.5%) studies reported the proportion of ChatGPT's performance. Among these, 19 (19/24; 79.2%) studies recorded a median accuracy of 70.5%, and in five (5/24; 20.8%) studies, there was a median agreement of 83.6% between ChatGPT outcomes and reference standards [radiologists’ decision or guidelines], generally confirming ChatGPT's high accuracy in these studies. Eleven studies compared two recent ChatGPT versions, and in ten (10/11; 90.9%), ChatGPTv4 outperformed v3.5, showing notable enhancements in addressing higher-order thinking questions, better comprehension of radiology terms, and improved accuracy in describing images. Risks and concerns about using ChatGPT included biased responses, limited originality, and the potential for inaccurate information leading to misinformation, hallucinations, improper citations and fake references, cybersecurity vulnerabilities, and patient privacy risks.</p></div><div><h3>Conclusion</h3><p>Although ChatGPT's effectiveness has been shown in 84.1% of radiology studies, there are still multiple pitfalls and limitations to address. It is too soon to confirm its complete proficiency and accuracy, and more extensive multicenter studies utilizing diverse datasets and pre-training techniques are required to verify ChatGPT's role in radiology.</p></div>","PeriodicalId":48656,"journal":{"name":"Diagnostic and Interventional Imaging","volume":"105 7","pages":"Pages 251-265"},"PeriodicalIF":4.9000,"publicationDate":"2024-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2211568424001050/pdfft?md5=c220d60df18e8b0b15148619195f5a41&pid=1-s2.0-S2211568424001050-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Diagnostic and Interventional Imaging","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2211568424001050","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}

引用次数: 0

Abstract

Purpose

The purpose of this study was to systematically review the reported performances of ChatGPT, identify potential limitations, and explore future directions for its integration, optimization, and ethical considerations in radiology applications.

Materials and methods

After a comprehensive review of PubMed, Web of Science, Embase, and Google Scholar databases, a cohort of published studies was identified up to January 1, 2024, utilizing ChatGPT for clinical radiology applications.

Results

Out of 861 studies derived, 44 studies evaluated the performance of ChatGPT; among these, 37 (37/44; 84.1%) demonstrated high performance, and seven (7/44; 15.9%) indicated it had a lower performance in providing information on diagnosis and clinical decision support (6/44; 13.6%) and patient communication and educational content (1/44; 2.3%). Twenty-four (24/44; 54.5%) studies reported the proportion of ChatGPT's performance. Among these, 19 (19/24; 79.2%) studies recorded a median accuracy of 70.5%, and in five (5/24; 20.8%) studies, there was a median agreement of 83.6% between ChatGPT outcomes and reference standards [radiologists’ decision or guidelines], generally confirming ChatGPT's high accuracy in these studies. Eleven studies compared two recent ChatGPT versions, and in ten (10/11; 90.9%), ChatGPTv4 outperformed v3.5, showing notable enhancements in addressing higher-order thinking questions, better comprehension of radiology terms, and improved accuracy in describing images. Risks and concerns about using ChatGPT included biased responses, limited originality, and the potential for inaccurate information leading to misinformation, hallucinations, improper citations and fake references, cybersecurity vulnerabilities, and patient privacy risks.

Conclusion

Although ChatGPT's effectiveness has been shown in 84.1% of radiology studies, there are still multiple pitfalls and limitations to address. It is too soon to confirm its complete proficiency and accuracy, and more extensive multicenter studies utilizing diverse datasets and pre-training techniques are required to verify ChatGPT's role in radiology.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

放射学中的 ChatGPT：对性能、陷阱和未来前景的系统回顾。

目的：本研究旨在系统回顾已报道的 ChatGPT 性能，识别潜在的局限性，并探索其在放射学应用中的整合、优化和伦理考虑的未来方向：在对PubMed、Web of Science、Embase和Google Scholar数据库进行全面审查后，确定了截至2024年1月1日利用ChatGPT进行临床放射学应用的已发表研究：在得出的 861 项研究中，44 项研究对 ChatGPT 的性能进行了评估；其中 37 项（37/44；84.1%）表现出较高的性能，7 项（7/44；15.9%）表示在提供诊断信息和临床决策支持（6/44；13.6%）以及患者交流和教育内容（1/44；2.3%）方面性能较低。有 24 项研究（24/44；54.5%）报告了 ChatGPT 的性能比例。其中，19 项研究（19/24；79.2%）记录的准确率中位数为 70.5%，5 项研究（5/24；20.8%）的 ChatGPT 结果与参考标准（放射科医生的决定或指南）的一致性中位数为 83.6%，总体上证实了 ChatGPT 在这些研究中的高准确率。有 11 项研究比较了两个最新的 ChatGPT 版本，其中 10 项研究（10/11；90.9%）发现 ChatGPTv4 的表现优于 v3.5，这表明 ChatGPTv4 在解决高阶思维问题、更好地理解放射学术语以及提高图像描述准确性方面都有显著提升。使用 ChatGPT 的风险和顾虑包括：回答有偏差、原创性有限、信息不准确可能导致误报、幻觉、引用不当和虚假引用、网络安全漏洞和患者隐私风险：虽然 ChatGPT 在 84.1% 的放射学研究中显示出其有效性，但仍有许多隐患和局限性需要解决。要确认其完全熟练和准确还为时尚早，需要利用不同的数据集和预培训技术进行更广泛的多中心研究，以验证 ChatGPT 在放射学中的作用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Diagnostic and Interventional Imaging Medicine-Radiology, Nuclear Medicine and Imaging

CiteScore

8.50

自引率

29.10%

发文量

126

审稿时长

11 days

期刊介绍： Diagnostic and Interventional Imaging accepts publications originating from any part of the world based only on their scientific merit. The Journal focuses on illustrated articles with great iconographic topics and aims at aiding sharpening clinical decision-making skills as well as following high research topics. All articles are published in English. Diagnostic and Interventional Imaging publishes editorials, technical notes, letters, original and review articles on abdominal, breast, cancer, cardiac, emergency, forensic medicine, head and neck, musculoskeletal, gastrointestinal, genitourinary, interventional, obstetric, pediatric, thoracic and vascular imaging, neuroradiology, nuclear medicine, as well as contrast material, computer developments, health policies and practice, and medical physics relevant to imaging.