Benton Chuter, Justin Huynh, Evan Walker, Shahin Hallaj, Jalil Jalili, Jeffrey Liebmann, Massimo A Fazio, Christopher A Girkin, Robert N Weinreb, Mark Christopher, Linda M Zangwill
{"title":"Accuracy of a New Foundation Model in Glaucoma Detection using Ocular Coherence Tomography Images","authors":"Benton Chuter, Justin Huynh, Evan Walker, Shahin Hallaj, Jalil Jalili, Jeffrey Liebmann, Massimo A Fazio, Christopher A Girkin, Robert N Weinreb, Mark Christopher, Linda M Zangwill","doi":"10.1101/2024.08.04.24311475","DOIUrl":null,"url":null,"abstract":"Purpose: To fine tune and evaluate the performance of the retinal foundation model (RETFound) on a diverse longitudinal clinical research dataset in glaucoma detection from optical coherence tomography (OCT) RNFL scans. Subanalyses of the model performance were evaluated across different subgroups, various dataset sample sizes and training cycles (epochs). Design: Evaluation of a diagnostic technology Subjects, Participants, and Controls: 15,216 Spectralis OCT RNFL circle scans of 747 individuals of diverse race (56.9% White, 37.8% Black / African American, and 5.3% Other / Not reported (5.3%), glaucoma severity (30.8% mild, 18.4% moderate-to-severe, and 50.9% no glaucoma), and age (44.8% <60 years, 55.2% >60 years) from the Diagnostic Innovations in Glaucoma Study (DIGS) and the African Descent and Glaucoma Evaluation Study (ADAGES). All OCT b scans were labeled as \"Non-glaucomatous\" or \"Glaucomatous.\" Methods: RETFound was employed to perform binary glaucoma classification. The diagnostic accuracy of RETFound was iteratively tested across different combinations of dataset sample sizes (50 to 2000 OCT RNFL circle scans), epochs (5 to 50), and study subpopulations stratified by severity of glaucoma, age, and race). Main Outcome Measures: Area under receiver operating characteristic curve (AUC) for classifying RNFL scans as \"Non-glaucomatous\" or \"Glaucomatous.\" Results: Performance metrics improved with larger training datasets and more training cycles, rising from an AUC of 0.61 (50 training images and 5 epochs) to AUC 0.91 (2,000 training images and 50 epochs). Gains in performance were marginal as training size increased beyond 500 scans. Performance was similar across race for all training size and cycle number combinations: African American (AUC=0.90) vs other (AUC=0.93). RNFL scans from older patients (>60 years) led to worse performance (AUC=0.85) compared to younger patients (<60 years, AUC=0.95), Performance was significantly higher for RNFL scans from patients with moderate-to-severe glaucoma vs mild glaucoma (AUC=0.99 vs 0.88, respectively). Conclusions: Good RETFound performance was observed with a relatively small sample size of images used for fine tuning and across differences in race and age. The ability of RETFound to adapt across a range of OCT training conditions and populations suggests it is a promising tool to automate glaucoma detection in a variety of use cases.","PeriodicalId":501390,"journal":{"name":"medRxiv - Ophthalmology","volume":"6 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"medRxiv - Ophthalmology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.08.04.24311475","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Purpose: To fine tune and evaluate the performance of the retinal foundation model (RETFound) on a diverse longitudinal clinical research dataset in glaucoma detection from optical coherence tomography (OCT) RNFL scans. Subanalyses of the model performance were evaluated across different subgroups, various dataset sample sizes and training cycles (epochs). Design: Evaluation of a diagnostic technology Subjects, Participants, and Controls: 15,216 Spectralis OCT RNFL circle scans of 747 individuals of diverse race (56.9% White, 37.8% Black / African American, and 5.3% Other / Not reported (5.3%), glaucoma severity (30.8% mild, 18.4% moderate-to-severe, and 50.9% no glaucoma), and age (44.8% <60 years, 55.2% >60 years) from the Diagnostic Innovations in Glaucoma Study (DIGS) and the African Descent and Glaucoma Evaluation Study (ADAGES). All OCT b scans were labeled as "Non-glaucomatous" or "Glaucomatous." Methods: RETFound was employed to perform binary glaucoma classification. The diagnostic accuracy of RETFound was iteratively tested across different combinations of dataset sample sizes (50 to 2000 OCT RNFL circle scans), epochs (5 to 50), and study subpopulations stratified by severity of glaucoma, age, and race). Main Outcome Measures: Area under receiver operating characteristic curve (AUC) for classifying RNFL scans as "Non-glaucomatous" or "Glaucomatous." Results: Performance metrics improved with larger training datasets and more training cycles, rising from an AUC of 0.61 (50 training images and 5 epochs) to AUC 0.91 (2,000 training images and 50 epochs). Gains in performance were marginal as training size increased beyond 500 scans. Performance was similar across race for all training size and cycle number combinations: African American (AUC=0.90) vs other (AUC=0.93). RNFL scans from older patients (>60 years) led to worse performance (AUC=0.85) compared to younger patients (<60 years, AUC=0.95), Performance was significantly higher for RNFL scans from patients with moderate-to-severe glaucoma vs mild glaucoma (AUC=0.99 vs 0.88, respectively). Conclusions: Good RETFound performance was observed with a relatively small sample size of images used for fine tuning and across differences in race and age. The ability of RETFound to adapt across a range of OCT training conditions and populations suggests it is a promising tool to automate glaucoma detection in a variety of use cases.