Purpose
This study aimed to evaluate the diagnostic performance of prompt-adapted ChatGPT-4 Vision (ChatGPT-4V) in interpreting contrast-enhanced ultrasound (CEUS) images for the assessment of cystic renal masses (CRMs) using Bosniak classification. Additionally, it tested the ability of the best prompt to assist radiologists with different experience.
Materials and methods
This retrospective study included 103 CRMs in patients who underwent CEUS and CT. ChatGPT-4V and six radiologists (three senior and three junior) independently assigned the Bosniak category (BC) based solely on CEUS images. Subsequently, radiologists re-assessed these images while reviewing ChatGPT-4V-generated BC and decided whether to modify their initial assessments. The diagnostic performance of radiologists and prompts was assessed using the area under the receiver operating characteristic curve (AUC).
Results
The AUCs for prompts ranged from 0.507 to 0.688, whereas that for radiologists ranged from 0.685 to 0.831. Among all prompts, Reflection of Thoughts (ROT) prompt achieved the highest AUC, demonstrating performance comparable to juniors (0.688 vs. 0.715, P = 0.727). Although the AUC was lower than that of seniors (0.688 vs. 0.832, P = 0.019), ROT improved the AUCs of juniors: from 0.714 to 0.834 for junior 1, from 0.685 to 0.782 for junior 2, and from 0.704 to 0.783 for junior 3, with the post-assistance performance of all three being comparable to that of seniors.
Conclusion
Prompt-adapted ChatGPT-4V showed variable performance in interpreting CEUS images. ROT as the best-performing prompt achieved diagnostic performance comparable to juniors, and it could help juniors achieve an AUC comparable to that of seniors.
扫码关注我们
求助内容:
应助结果提醒方式:
