To evaluate the capabilities of Chat Generative Pre-Trained Transformer (ChatGPT), as a large language model (LLM), for diagnosing glaucoma using the Ocular Hypertension Treatment Study (OHTS) dataset, and comparing the diagnostic capability of ChatGPT 3.5 and ChatGPT 4.0.
Prospective data collection study.
A total of 3170 eyes of 1585 subjects from the OHTS were included in this study.
We selected demographic, clinical, ocular, visual field, optic nerve head photo, and history of disease parameters of each participant and developed case reports by converting tabular data into textual format based on information from both eyes of all subjects. We then developed a procedure using the application programming interface of ChatGPT, a LLM-based chatbot, to automatically input prompts into a chat box. This was followed by querying 2 different generations of ChatGPT (versions 3.5 and 4.0) regarding the underlying diagnosis of each subject. We then evaluated the output responses based on several objective metrics.
Area under the receiver operating characteristic curve (AUC), accuracy, specificity, sensitivity, and F1 score.
Chat Generative Pre-Trained Transformer 3.5 achieved AUC of 0.74, accuracy of 66%, specificity of 64%, sensitivity of 85%, and F1 score of 0.72. Chat Generative Pre-Trained Transformer 4.0 obtained AUC of 0.76, accuracy of 87%, specificity of 90%, sensitivity of 61%, and F1 score of 0.92.
The accuracy of ChatGPT 4.0 in diagnosing glaucoma based on input data from OHTS was promising. The overall accuracy of ChatGPT 4.0 was higher than ChatGPT 3.5. However, ChatGPT 3.5 was found to be more sensitive than ChatGPT 4.0. In its current forms, ChatGPT may serve as a useful tool in exploring disease status of ocular hypertensive eyes when specific data are available for analysis. In the future, leveraging LLMs with multimodal capabilities, allowing for integration of imaging and diagnostic testing as part of the analyses, could further enhance diagnostic capabilities and enhance diagnostic accuracy.
Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.
The neighborhood and built environment social determinant of health domain has several social risk factors (SRFs) that are modifiable through policy efforts. We investigated the impact of neighborhood-level SRFs on presenting glaucoma severity at a tertiary eye care center.
A cross-sectional study from August 2012 to May 2022 in the University of Michigan electronic health record (EHR).
Patients with a diagnosis of any open-angle glaucoma with ≥1 eye care visit at the University of Michigan Kellogg Eye Center and ≥1 reliable visual field (VF).
Participants who met inclusion criteria were identified by International Classification of Diseases ninth and tenth revision codes (365.x/H40.x). Data extracted from the EHR included patient demographics, address, presenting mean deviation (MD), and VF reliability. Addresses were mapped to SRF measures at the census tract, block group, and county levels. Multilevel linear regression models were used to estimate the fixed effects of each SRF on MD, after adjusting for patient-level demographic factors and a random effect for neighborhood. Interactions between each SRF measure with patient-level race and Medicaid status were tested for an additive effect on MD.
The main outcome measure was the effect of SRF on presenting MD.
In total, 4428 patients were included in the analysis who were, on average, 70.3 years old (standard deviation = 11.9), 52.6% self-identified as female, 75.8% self-identified as White race, and 8.9% had Medicaid. The median value of presenting MD was −4.94 decibels (dB) (interquartile range = −11.45 to −2.07 dB). Neighborhood differences accounted for 4.4% of the variability in presenting MD. Neighborhood-level measures, including worse area deprivation (estimate, β = −0.31 per 1-unit increase; P < 0.001), increased segregation (β = −0.92 per 0.1-unit increase in Theil’s H index; P < 0.001), and increased neighborhood Medicaid (β = −0.68; P < 0.001) were associated with worse presenting MD. Significant interaction effects with race and Medicaid status were found in several neighborhood-level SRF measures.
Although patients’ neighborhood SRF measures accounted for a minority of the variability in presenting MD, most neighborhood-level SRFs are modifiable and were associated with clinically meaningful differences in presenting MD. Policies that aim to reduce neighborhood inequities by addressing allocation of resources could have lasting impacts on vision outcomes.
Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.
Spectral-domain OCT angiography (SD-OCTA) scans were tested in an algorithm developed for use with swept-source OCT angiography (SS-OCTA) scans to determine if SD-OCTA scans yielded similar results for the detection and measurement of persistent choroidal hypertransmission defects (hyperTDs).
Retrospective study.
Forty pairs of scans from 32 patients with late-stage nonexudative age-related macular degeneration (AMD).
Patients underwent both SD-OCTA and SS-OCTA imaging at the same visit using the 6 × 6 mm OCTA scan patterns. Using a semiautomatic algorithm that helped with outlining the hyperTDs, 2 graders independently validated persistent hyperTDs, which are defined as having a greatest linear dimension ≥250 μm on the en face images generated using a slab extending from 64 to 400 μm beneath Bruch’s membrane. The number of lesions and square root (sqrt) total area of the hyperTDs were obtained from the algorithm using each imaging method.
The mean sqrt area measurements and the number of hyperTDs were compared.
The number of lesions and sqrt total area of the hyperTDs were highly concordant between the 2 instruments (rc = 0.969 and rc = 0.999, respectively). The mean number of hyperTDs was 4.3 ± 3.1 for SD-OCTA scans and 4.5 ± 3.3 for SS-OCTA scans (P = 0.06). The mean sqrt total area measurements were 1.16 ± 0.64 mm for the SD-OCTA scans and 1.17 ± 0.65 mm for the SS-OCTA scans (P < 0.001). Because of the small standard error of the differences, the mean difference between the scans was statistically significant but not clinically significant.
Spectral-domain OCTA scans provide similar results to SS-OCTA scans when used to obtain the number and area measurements of persistent hyperTDs through a semiautomated algorithm previously developed for SS-OCTA. This facilitates the detection of atrophy with a more widely available scan pattern and the longitudinal study of early to late-stage AMD.
Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.