Vera Sorin, Eyal Klang, Tamer Sobeh, Eli Konen, Shai Shrot, Adva Livne, Yulian Weissbuch, Chen Hoffmann, Yiftach Barash
{"title":"Generative pre-trained transformer (GPT)-4 support for differential diagnosis in neuroradiology.","authors":"Vera Sorin, Eyal Klang, Tamer Sobeh, Eli Konen, Shai Shrot, Adva Livne, Yulian Weissbuch, Chen Hoffmann, Yiftach Barash","doi":"10.21037/qims-24-200","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Differential diagnosis in radiology relies on the accurate identification of imaging patterns. The use of large language models (LLMs) in radiology holds promise, with many potential applications that may enhance the efficiency of radiologists' workflow. The study aimed to evaluate the efficacy of generative pre-trained transformer (GPT)-4, a LLM, in providing differential diagnoses in neuroradiology, comparing its performance with board-certified neuroradiologists.</p><p><strong>Methods: </strong>Sixty neuroradiology reports with variable diagnoses were inserted into GPT-4, which was tasked with generating a top-3 differential diagnosis for each case. The results were compared to the true diagnoses and to the differential diagnoses provided by three blinded neuroradiologists. Diagnostic accuracy and agreement between readers were assessed.</p><p><strong>Results: </strong>Of the 60 patients (mean age 47.8 years, 65% female), GPT-4 correctly included the diagnoses in its differentials in 61.7% (37/60) of cases, while the neuroradiologists' accuracy ranged from 63.3% (38/60) to 73.3% (44/60). Agreement between GPT-4 and the neuroradiologists, and among the neuroradiologists was fair to moderate [Cohen's kappa (kw) 0.34-0.44 and kw 0.39-0.54, respectively].</p><p><strong>Conclusions: </strong>GPT-4 shows potential as a support tool for differential diagnosis in neuroradiology, though it was outperformed by human experts. Radiologists should remain mindful to the limitations of LLMs, while harboring their potential to enhance educational and clinical work.</p>","PeriodicalId":54267,"journal":{"name":"Quantitative Imaging in Medicine and Surgery","volume":"14 10","pages":"7551-7560"},"PeriodicalIF":2.9000,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11485343/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Quantitative Imaging in Medicine and Surgery","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.21037/qims-24-200","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/9/23 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Differential diagnosis in radiology relies on the accurate identification of imaging patterns. The use of large language models (LLMs) in radiology holds promise, with many potential applications that may enhance the efficiency of radiologists' workflow. The study aimed to evaluate the efficacy of generative pre-trained transformer (GPT)-4, a LLM, in providing differential diagnoses in neuroradiology, comparing its performance with board-certified neuroradiologists.
Methods: Sixty neuroradiology reports with variable diagnoses were inserted into GPT-4, which was tasked with generating a top-3 differential diagnosis for each case. The results were compared to the true diagnoses and to the differential diagnoses provided by three blinded neuroradiologists. Diagnostic accuracy and agreement between readers were assessed.
Results: Of the 60 patients (mean age 47.8 years, 65% female), GPT-4 correctly included the diagnoses in its differentials in 61.7% (37/60) of cases, while the neuroradiologists' accuracy ranged from 63.3% (38/60) to 73.3% (44/60). Agreement between GPT-4 and the neuroradiologists, and among the neuroradiologists was fair to moderate [Cohen's kappa (kw) 0.34-0.44 and kw 0.39-0.54, respectively].
Conclusions: GPT-4 shows potential as a support tool for differential diagnosis in neuroradiology, though it was outperformed by human experts. Radiologists should remain mindful to the limitations of LLMs, while harboring their potential to enhance educational and clinical work.