Background: Previous studies evaluated the ability of large language models (LLMs) in medical disciplines; however, few have focused on image analysis, and none specifically on cardiovascular imaging or nuclear cardiology. This study assesses four LLMs-GPT-4, GPT-4 Turbo, GPT-4omni (GPT-4o) (Open AI), and Gemini (Google Inc.)-in responding to questions from the 2023 American Society of Nuclear Cardiology Board Preparation Exam, reflecting the scope of the Certification Board of Nuclear Cardiology (CBNC) examination.
Methods: We used 168 questions: 141 text-only and 27 image-based, categorized into four sections mirroring the CBNC exam. Each LLM was presented with the same standardized prompt and applied to each section 30 times to account for stochasticity. Performance over six weeks was assessed for all models except GPT-4o. McNemar's test compared correct response proportions.
Results: GPT-4, Gemini, GPT-4 Turbo, and GPT-4o correctly answered median percentages of 56.8% (95% confidence interval 55.4% - 58.0%), 40.5% (39.9% - 42.9%), 60.7% (59.5% - 61.3%), and 63.1% (62.5%-64.3%) of questions, respectively. GPT-4o significantly outperformed other models (P = .007 vs GPT-4 Turbo, P < .001 vs GPT-4 and Gemini). GPT-4o excelled on text-only questions compared to GPT-4, Gemini, and GPT-4 Turbo (P < .001, P < .001, and P = .001), while Gemini performed worse on image-based questions (P < .001 for all).
Conclusion: GPT-4o demonstrated superior performance among the four LLMs, achieving scores likely within or just outside the range required to pass a test akin to the CBNC examination. Although improvements in medical image interpretation are needed, GPT-4o shows potential to support physicians in answering text-based clinical questions.
Lower extremity peripheral artery disease (PAD) is characterized by impairment of blood flow associated with arterial stenosis and frequently coexisting microvascular disease and is associated with high rates of morbidity and mortality. Current diagnostic modalities have limited accuracy in early diagnosis, risk stratification, preprocedural assessment, and evaluation of therapy and are focused on the detection of obstructive atherosclerotic disease. Early diagnosis and assessment of both large vessels and microcirculation may improve risk stratification and guide therapeutic interventions. Single-photon emission computed tomography and positron emission tomography imaging have been shown to be accurate to detect changes in perfusion in preclinical models and clinical disease, and have the potential to overcome limitations of existing diagnostic modalities, while offering novel information about perfusion, metabolic, and molecular processes. This review provides a comprehensive reassessment of radiotracer-based imaging of PAD in preclinical and clinical studies, emphasizing the challenges that arise due to the complex physiology in the peripheral vasculature. We will also highlight the latest advancements, underscoring emerging artificial intelligence and big data analysis, as well as clinically relevant areas where the field could advance in the next decade.