To explore the integration of generative AI, specifically large language models (LLMs), in ophthalmology education and practice, addressing their applications, benefits, challenges, and future directions.
A literature review and analysis of current AI applications and educational programs in ophthalmology.
Analysis of published studies, reviews, articles, websites, and institutional reports on AI use in ophthalmology. Examination of educational programs incorporating AI, including curriculum frameworks, training methodologies, and evaluations of AI performance on medical examinations and clinical case studies.
Generative AI, particularly LLMs, shows potential to improve diagnostic accuracy and patient care in ophthalmology. Applications include aiding in patient, physician, and medical students’ education. However, challenges such as AI hallucinations, biases, lack of interpretability, and outdated training data limit clinical deployment. Studies revealed varying levels of accuracy of LLMs on ophthalmology board exam questions, underscoring the need for more reliable AI integration. Several educational programs nationwide provide AI and data science training relevant to clinical medicine and ophthalmology.
Generative AI and LLMs offer promising advancements in ophthalmology education and practice. Addressing challenges through comprehensive curricula that include fundamental AI principles, ethical guidelines, and updated, unbiased training data is crucial. Future directions include developing clinically relevant evaluation metrics, implementing hybrid models with human oversight, leveraging image-rich data, and benchmarking AI performance against ophthalmologists. Robust policies on data privacy, security, and transparency are essential for fostering a safe and ethical environment for AI applications in ophthalmology.
Saliency maps (SM) allow clinicians to better understand the opaque decision-making process in artificial intelligence (AI) models by visualising the important features responsible for predictions. This ultimately improves interpretability and confidence. In this work, we review the use case for SMs, exploring their impact on clinicians’ understanding and trust in AI models. We use the following ophthalmic conditions as examples: (1) glaucoma, (2) myopia, (3) age-related macular degeneration, and (4) diabetic retinopathy.
A multi-field search on MEDLINE, Embase, and Web of Science was conducted using specific keywords. Only studies on the use of SMs in glaucoma, myopia, AMD, or DR were considered for inclusion.
Findings reveal that SMs are often used to validate AI models and advocate for their adoption, potentially leading to biased claims. Overlooking the technical limitations of SMs, and the conductance of superficial assessments of their quality and relevance, was discerned. Uncertainties persist regarding the role of saliency maps in building trust in AI. It is crucial to enhance understanding of SMs' technical constraints and improve evaluation of their quality, impact, and suitability for specific tasks. Establishing a standardised framework for selecting and assessing SMs, as well as exploring their relationship with other reliability sources (e.g. safety and generalisability), is essential for enhancing clinicians' trust in AI.
We conclude that SMs are not beneficial for interpretability and trust-building purposes in their current forms. Instead, SMs may confer benefits to model debugging, model performance enhancement, and hypothesis testing (e.g. novel biomarkers).
Natural Language Processing (NLP) is a subfield of artificial intelligence that focuses on the interaction between computers and human language, enabling computers to understand, generate, and derive meaning from human language. NLP's potential applications in the medical field are extensive and vary from extracting data from Electronic Health Records –one of its most well-known and frequently exploited uses– to investigating relationships among genetics, biomarkers, drugs, and diseases for the proposal of new medications. NLP can be useful for clinical decision support, patient monitoring, or medical image analysis. Despite its vast potential, the real-world application of NLP is still limited due to various challenges and constraints, meaning that its evolution predominantly continues within the research domain. However, with the increasingly widespread use of NLP, particularly with the availability of large language models, such as ChatGPT, it is crucial for medical professionals to be aware of the status, uses, and limitations of these technologies.