Pub Date : 2026-02-01Epub Date: 2025-12-06DOI: 10.1016/j.xops.2025.101034
Fares Antaki MDCM , David Mikhail MSc , Daniel Milad MD , Danny A. Mammo MD , Sumit Sharma MD , Sunil K. Srivastava MD , Bing Yu Chen MDCM , Samir Touma MDCM , Mertcan Sevgi MD , Jonathan El-Khoury MD , Pearse A. Keane MD , Qingyu Chen PhD , Yih Chung Tham PhD , Renaud Duval MD
<div><h3>Purpose</h3><div>Novel large language models (LLMs) such as Generative Pretrained Transformer-5 (GPT-5) integrate advanced reasoning capabilities that may enhance performance on complex medical question-answering tasks. For this latest generation of reasoning models, the configurations that maximize both accuracy and cost-efficiency have yet to be established. Our objective was to evaluate the performance and cost-accuracy trade-offs of OpenAI’s GPT-5 compared with previous generation LLMs on ophthalmic question answering.</div></div><div><h3>Design</h3><div>Evaluation of diagnostic test or technology.</div></div><div><h3>Participants</h3><div>Generative Pretrained Transformer-5 is a publicly available LLM.</div></div><div><h3>Methods</h3><div>In August 2025, 12 configurations of OpenAI’s GPT-5 series (3 model tiers across 4 reasoning effort settings) were evaluated alongside o1-high, o3-high, and GPT-4o, using 260 closed-access multiple-choice questions from the American Academy of Ophthalmology Basic Clinical Science Course data set. The study did not include human participants.</div></div><div><h3>Main Outcome Measures</h3><div>The primary outcome was accuracy on the 260-item ophthalmology multiple-choice question set for each model configuration. The secondary outcomes included head-to-head ranking of configurations using a Bradley–Terry model applied to paired win/loss comparisons of answer accuracy, and evaluation of generated natural language rationales using a reference-anchored, pairwise LLM-as-a-judge framework. Additional analyses assessed the accuracy-cost trade-off by calculating mean per-question cost from token usage and identifying Pareto-efficient configurations.</div></div><div><h3>Results</h3><div>The configuration GPT-5-high achieved the highest accuracy (0.965; 95% confidence interval [CI], 0.942–0.985), significantly outperforming all GPT-5-nano variants (<em>P</em> < 0.001), o1-high (<em>P</em> = 0.04), and GPT-4o (<em>P</em> < 0.001), but not o3-high (0.958; 95% CI, 0.931–0.981). The configuration GPT-5-high ranked first in accuracy (1.66x stronger than o3-high) and rationale quality (1.11x stronger than o3-high), as judged by a reference-anchored LLM-as-a-judge autograder. Cost-accuracy analysis identified multiple GPT-5 configurations on the Pareto frontier, with GPT-5-mini-low providing the most optimal low-cost, high-performance configuration.</div></div><div><h3>Conclusions</h3><div>This study benchmarks the GPT-5 series on a high-quality ophthalmology question-answering data set, demonstrating that GPT-5 with high reasoning effort achieved near-perfect accuracy and outperformed prior reasoning LLMs. This study also introduces an autograder framework for scalable, automated evaluation of LLM-generated answers against reference standards in ophthalmology.</div></div><div><h3>Financial Disclosure(s)</h3><div>Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of t
{"title":"Performance of GPT-5 Frontier Models in Ophthalmology Question Answering","authors":"Fares Antaki MDCM , David Mikhail MSc , Daniel Milad MD , Danny A. Mammo MD , Sumit Sharma MD , Sunil K. Srivastava MD , Bing Yu Chen MDCM , Samir Touma MDCM , Mertcan Sevgi MD , Jonathan El-Khoury MD , Pearse A. Keane MD , Qingyu Chen PhD , Yih Chung Tham PhD , Renaud Duval MD","doi":"10.1016/j.xops.2025.101034","DOIUrl":"10.1016/j.xops.2025.101034","url":null,"abstract":"<div><h3>Purpose</h3><div>Novel large language models (LLMs) such as Generative Pretrained Transformer-5 (GPT-5) integrate advanced reasoning capabilities that may enhance performance on complex medical question-answering tasks. For this latest generation of reasoning models, the configurations that maximize both accuracy and cost-efficiency have yet to be established. Our objective was to evaluate the performance and cost-accuracy trade-offs of OpenAI’s GPT-5 compared with previous generation LLMs on ophthalmic question answering.</div></div><div><h3>Design</h3><div>Evaluation of diagnostic test or technology.</div></div><div><h3>Participants</h3><div>Generative Pretrained Transformer-5 is a publicly available LLM.</div></div><div><h3>Methods</h3><div>In August 2025, 12 configurations of OpenAI’s GPT-5 series (3 model tiers across 4 reasoning effort settings) were evaluated alongside o1-high, o3-high, and GPT-4o, using 260 closed-access multiple-choice questions from the American Academy of Ophthalmology Basic Clinical Science Course data set. The study did not include human participants.</div></div><div><h3>Main Outcome Measures</h3><div>The primary outcome was accuracy on the 260-item ophthalmology multiple-choice question set for each model configuration. The secondary outcomes included head-to-head ranking of configurations using a Bradley–Terry model applied to paired win/loss comparisons of answer accuracy, and evaluation of generated natural language rationales using a reference-anchored, pairwise LLM-as-a-judge framework. Additional analyses assessed the accuracy-cost trade-off by calculating mean per-question cost from token usage and identifying Pareto-efficient configurations.</div></div><div><h3>Results</h3><div>The configuration GPT-5-high achieved the highest accuracy (0.965; 95% confidence interval [CI], 0.942–0.985), significantly outperforming all GPT-5-nano variants (<em>P</em> < 0.001), o1-high (<em>P</em> = 0.04), and GPT-4o (<em>P</em> < 0.001), but not o3-high (0.958; 95% CI, 0.931–0.981). The configuration GPT-5-high ranked first in accuracy (1.66x stronger than o3-high) and rationale quality (1.11x stronger than o3-high), as judged by a reference-anchored LLM-as-a-judge autograder. Cost-accuracy analysis identified multiple GPT-5 configurations on the Pareto frontier, with GPT-5-mini-low providing the most optimal low-cost, high-performance configuration.</div></div><div><h3>Conclusions</h3><div>This study benchmarks the GPT-5 series on a high-quality ophthalmology question-answering data set, demonstrating that GPT-5 with high reasoning effort achieved near-perfect accuracy and outperformed prior reasoning LLMs. This study also introduces an autograder framework for scalable, automated evaluation of LLM-generated answers against reference standards in ophthalmology.</div></div><div><h3>Financial Disclosure(s)</h3><div>Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of t","PeriodicalId":74363,"journal":{"name":"Ophthalmology science","volume":"6 2","pages":"Article 101034"},"PeriodicalIF":4.6,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145925826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01Epub Date: 2025-11-26DOI: 10.1016/j.xops.2025.101021
Jeroen A.A.H. Pas MD , Patty P.A. Dhooge MD, PhD , Catherina H.Z. Li MD , Rob W.J. Collin PhD , Carel B. Hoyng MD, PhD , Joanna IntHout PhD
Objective
Designing a clinical trial for rare diseases such as Stargardt disease type 1 is challenging due to the limited patient population. In traditional clinical trial designs for inherited retinal diseases, often only 1 eye of each patient is used as the treated eye or the sham, disregarding half of the available eyes.
This study explores a trial design in which both eyes are included, with the fellow eye serving as the control, maximizing the use of available data and enhancing statistical power.
Design
Retrospective analysis of natural history data to conduct sample size calculations.
Participants
Patients with genetically solved Stargardt disease type 1 who had at least 2 fundus autofluorescence measurements obtained within 5 years of each other. Retrospective data of 164 patients were included for analysis.
Methods
The required sample sizes for 1-eye and paired-eye study designs were calculated using retrospective natural history data on the progression of definitely decreased autofluorescence quantified from fundus autofluorescence imaging.
Main Outcome Measures
Required sample size for a clinical trial.
Results
Sample size calculations showed that 170 patients are needed for a 2-year clinical trial with a 1-eye design, decreasing to 99 patients for a 5-year trial. When using a paired-eye design, 64 patients are needed in a 2-year trial, decreasing to 28 patients in a 5-year trial. When using a paired-eye design and requiring definitely decreased autofluorescence atrophy in both eyes at inclusion, 37 patients were needed in a 2-year trial, decreasing to 16 patients in a 5-year trial.
Conclusions
Using a paired-eye design for a clinical trial in Stargardt disease type 1, with definitely decreased autofluorescence atrophy growth rate as the primary end point, is more efficient than a 1-eye design. Implementing additional inclusion criteria, such as requiring definitely decreased autofluorescence atrophy in both eyes at baseline, further reduces the number of patients needed to achieve sufficient statistical power. This approach enhances the feasibility for trials in Stargardt disease type 1 where patient availability is limited.
Financial Disclosure(s)
Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.
{"title":"A Comparison of Randomizing Either One Eye or Both Eyes in Clinical Trials for Stargardt Disease Type 1","authors":"Jeroen A.A.H. Pas MD , Patty P.A. Dhooge MD, PhD , Catherina H.Z. Li MD , Rob W.J. Collin PhD , Carel B. Hoyng MD, PhD , Joanna IntHout PhD","doi":"10.1016/j.xops.2025.101021","DOIUrl":"10.1016/j.xops.2025.101021","url":null,"abstract":"<div><h3>Objective</h3><div>Designing a clinical trial for rare diseases such as Stargardt disease type 1 is challenging due to the limited patient population. In traditional clinical trial designs for inherited retinal diseases, often only 1 eye of each patient is used as the treated eye or the sham, disregarding half of the available eyes.</div><div>This study explores a trial design in which both eyes are included, with the fellow eye serving as the control, maximizing the use of available data and enhancing statistical power.</div></div><div><h3>Design</h3><div>Retrospective analysis of natural history data to conduct sample size calculations.</div></div><div><h3>Participants</h3><div>Patients with genetically solved Stargardt disease type 1 who had at least 2 fundus autofluorescence measurements obtained within 5 years of each other. Retrospective data of 164 patients were included for analysis.</div></div><div><h3>Methods</h3><div>The required sample sizes for 1-eye and paired-eye study designs were calculated using retrospective natural history data on the progression of definitely decreased autofluorescence quantified from fundus autofluorescence imaging.</div></div><div><h3>Main Outcome Measures</h3><div>Required sample size for a clinical trial.</div></div><div><h3>Results</h3><div>Sample size calculations showed that 170 patients are needed for a 2-year clinical trial with a 1-eye design, decreasing to 99 patients for a 5-year trial. When using a paired-eye design, 64 patients are needed in a 2-year trial, decreasing to 28 patients in a 5-year trial. When using a paired-eye design and requiring definitely decreased autofluorescence atrophy in both eyes at inclusion, 37 patients were needed in a 2-year trial, decreasing to 16 patients in a 5-year trial.</div></div><div><h3>Conclusions</h3><div>Using a paired-eye design for a clinical trial in Stargardt disease type 1, with definitely decreased autofluorescence atrophy growth rate as the primary end point, is more efficient than a 1-eye design. Implementing additional inclusion criteria, such as requiring definitely decreased autofluorescence atrophy in both eyes at baseline, further reduces the number of patients needed to achieve sufficient statistical power. This approach enhances the feasibility for trials in Stargardt disease type 1 where patient availability is limited.</div></div><div><h3>Financial Disclosure(s)</h3><div>Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.</div></div>","PeriodicalId":74363,"journal":{"name":"Ophthalmology science","volume":"6 2","pages":"Article 101021"},"PeriodicalIF":4.6,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145925895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01Epub Date: 2025-12-08DOI: 10.1016/j.xops.2025.101031
Chang Liu MD , Hong Chang Tan MBBS, PhD , Mingyi Yu MD , Isabelle Xin Yu Lee MSc , Ching-Yu Cheng MD, PhD , Yu-Chi Liu MD, PhD
Purpose
To investigate the association among corneal nerves, ocular surface, and renal function in diabetes, and to compare these variables in patients with and without chronic diabetic kidney disease (DKD).
Design
Cross-sectional study.
Participants
This study included 538 patients with type 2 diabetes.
Methods
All subjects received renal function tests, in vivo confocal microscopy examinations for corneal nerves, epithelial and immune cells, as well as ocular surface subjective and objective assessments. Univariable and multivariable regression analyses were used to determine the relationship between corneal nerve variables and renal function parameters. Multivariable logistic regression was performed to examine factors that were associated with DKD.
Main Outcome Measures
The association between corneal nerve metrics and renal function parameters.
Results
After adjusting for potential confounders, lower corneal nerve fiber density (CNFD) was significantly associated with higher urine albumin (P = 0.019), and higher corneal nerve fiber width was significantly associated with higher urine albumin and higher urine albumin-creatinine ratio (P < 0.001 and P = 0.001, respectively). Corneal nerve fiber length and width were significantly associated with DKD (P = 0.028 and P = 0.025, respectively). Compared with the non-DKD group, patients with DKD had significantly lower CNFD, length, area, and fractal dimension, as well as increased width, decreased epithelial cell density and count, and larger epithelial cell size (all P < 0.05). Patients with DKD presented with significantly lower Schirmer value and tear break-up time, and increased corneal staining and Ocular Surface Disease Index score than non-DKD patients (all P < 0.05).
Conclusions
In diabetes, the impairment of corneal nerves is associated with the deterioration of renal function. Patients who have poor corneal nerve status are at risk of DKD, and patients who have DKD should be examined for corneal neuropathy.
Financial Disclosure(s)
The author has no/the authors have no proprietary or commercial interest in any materials discussed in this article.
{"title":"The Association among Corneal Nerve Metrics, Ocular Surface Integrity, and Renal Function in Type 2 Diabetes","authors":"Chang Liu MD , Hong Chang Tan MBBS, PhD , Mingyi Yu MD , Isabelle Xin Yu Lee MSc , Ching-Yu Cheng MD, PhD , Yu-Chi Liu MD, PhD","doi":"10.1016/j.xops.2025.101031","DOIUrl":"10.1016/j.xops.2025.101031","url":null,"abstract":"<div><h3>Purpose</h3><div>To investigate the association among corneal nerves, ocular surface, and renal function in diabetes, and to compare these variables in patients with and without chronic diabetic kidney disease (DKD).</div></div><div><h3>Design</h3><div>Cross-sectional study.</div></div><div><h3>Participants</h3><div>This study included 538 patients with type 2 diabetes.</div></div><div><h3>Methods</h3><div>All subjects received renal function tests, in vivo confocal microscopy examinations for corneal nerves, epithelial and immune cells, as well as ocular surface subjective and objective assessments. Univariable and multivariable regression analyses were used to determine the relationship between corneal nerve variables and renal function parameters. Multivariable logistic regression was performed to examine factors that were associated with DKD.</div></div><div><h3>Main Outcome Measures</h3><div>The association between corneal nerve metrics and renal function parameters.</div></div><div><h3>Results</h3><div>After adjusting for potential confounders, lower corneal nerve fiber density (CNFD) was significantly associated with higher urine albumin (<em>P</em> = 0.019), and higher corneal nerve fiber width was significantly associated with higher urine albumin and higher urine albumin-creatinine ratio (<em>P</em> < 0.001 and <em>P</em> = 0.001, respectively). Corneal nerve fiber length and width were significantly associated with DKD (<em>P</em> = 0.028 and <em>P</em> = 0.025, respectively). Compared with the non-DKD group, patients with DKD had significantly lower CNFD, length, area, and fractal dimension, as well as increased width, decreased epithelial cell density and count, and larger epithelial cell size (all <em>P</em> < 0.05). Patients with DKD presented with significantly lower Schirmer value and tear break-up time, and increased corneal staining and Ocular Surface Disease Index score than non-DKD patients (all <em>P</em> < 0.05).</div></div><div><h3>Conclusions</h3><div>In diabetes, the impairment of corneal nerves is associated with the deterioration of renal function. Patients who have poor corneal nerve status are at risk of DKD, and patients who have DKD should be examined for corneal neuropathy.</div></div><div><h3>Financial Disclosure(s)</h3><div>The author has no/the authors have no proprietary or commercial interest in any materials discussed in this article.</div></div>","PeriodicalId":74363,"journal":{"name":"Ophthalmology science","volume":"6 2","pages":"Article 101031"},"PeriodicalIF":4.6,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145976577","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01Epub Date: 2025-10-27DOI: 10.1016/j.xops.2025.100985
Albert K. Dadzie OD , Sabrina P. Iddir MD , Mansour Abtahi PhD , Behrouz Ebrahimi MSc , Mojtaba Rahimi MSc , Sanjay Ganesh BS , Taeyoon Son PhD , Michael J. Heiferman MD , Xincheng Yao PhD
Purpose
To develop and evaluate a deep learning model that integrates ultra-widefield fundus photography and B-scan ultrasonography for automated classification of uveal melanoma (UM) and choroidal nevi.
Design
A retrospective cross-sectional study.
Subjects
This study included 174 patients (93 with UM and 81 with choroidal nevi) diagnosed at a tertiary eye center. For each patient, ultra-widefield fundus photographs and B-scan ultrasound images in both transverse and longitudinal orientations were acquired.
Methods
A deep learning model was trained using ultra-widefield fundus photography, ultrasound images, and combinations of both. Fivefold cross-validation was used to evaluate model performance.
Main Outcome Measures
The deep learning models were evaluated using accuracy, F1 score, and area under the receiver operating characteristic curve (AUC).
Results
Uveal melanomas had a mean thickness of 6.0 mm and a basal diameter of 12.6 mm, whereas nevi measured 1.8 mm and 6.5 mm, respectively. Among single-modality models, the model trained on transverse ultrasound images achieved the highest performance (accuracy: 92%; F1 score: 0.9227; AUC: 0.9538). Averaging predictions from the single-modality models provided only modest gains because their outputs sometimes conflicted. In contrast, the model that combined fundus photographs and ultrasound images using an attention mechanism achieved the highest overall performance (accuracy: 94%; F1 score: 0.9445; AUC: 0.9606), outperforming all other configurations by effectively integrating complementary information from both modalities.
Conclusions
Multimodal deep learning that combines fundus photography and ultrasound imaging improves the classification of UM and choroidal nevi. This approach demonstrates feasibility for leveraging the strengths of each modality for automated classification of UM and choroidal nevi.
Financial Disclosures
Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.
{"title":"Attention-Based Multimodal Deep Learning for Uveal Melanoma Classification Using Ultra-Widefield Fundus Images and Ocular Ultrasound","authors":"Albert K. Dadzie OD , Sabrina P. Iddir MD , Mansour Abtahi PhD , Behrouz Ebrahimi MSc , Mojtaba Rahimi MSc , Sanjay Ganesh BS , Taeyoon Son PhD , Michael J. Heiferman MD , Xincheng Yao PhD","doi":"10.1016/j.xops.2025.100985","DOIUrl":"10.1016/j.xops.2025.100985","url":null,"abstract":"<div><h3>Purpose</h3><div>To develop and evaluate a deep learning model that integrates ultra-widefield fundus photography and B-scan ultrasonography for automated classification of uveal melanoma (UM) and choroidal nevi.</div></div><div><h3>Design</h3><div>A retrospective cross-sectional study.</div></div><div><h3>Subjects</h3><div>This study included 174 patients (93 with UM and 81 with choroidal nevi) diagnosed at a tertiary eye center. For each patient, ultra-widefield fundus photographs and B-scan ultrasound images in both transverse and longitudinal orientations were acquired.</div></div><div><h3>Methods</h3><div>A deep learning model was trained using ultra-widefield fundus photography, ultrasound images, and combinations of both. Fivefold cross-validation was used to evaluate model performance.</div></div><div><h3>Main Outcome Measures</h3><div>The deep learning models were evaluated using accuracy, F1 score, and area under the receiver operating characteristic curve (AUC).</div></div><div><h3>Results</h3><div>Uveal melanomas had a mean thickness of 6.0 mm and a basal diameter of 12.6 mm, whereas nevi measured 1.8 mm and 6.5 mm, respectively. Among single-modality models, the model trained on transverse ultrasound images achieved the highest performance (accuracy: 92%; F1 score: 0.9227; AUC: 0.9538). Averaging predictions from the single-modality models provided only modest gains because their outputs sometimes conflicted. In contrast, the model that combined fundus photographs and ultrasound images using an attention mechanism achieved the highest overall performance (accuracy: 94%; F1 score: 0.9445; AUC: 0.9606), outperforming all other configurations by effectively integrating complementary information from both modalities.</div></div><div><h3>Conclusions</h3><div>Multimodal deep learning that combines fundus photography and ultrasound imaging improves the classification of UM and choroidal nevi. This approach demonstrates feasibility for leveraging the strengths of each modality for automated classification of UM and choroidal nevi.</div></div><div><h3>Financial Disclosures</h3><div>Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.</div></div>","PeriodicalId":74363,"journal":{"name":"Ophthalmology science","volume":"6 2","pages":"Article 100985"},"PeriodicalIF":4.6,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145645983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01Epub Date: 2025-11-12DOI: 10.1016/j.xops.2025.101008
Kyle Bolo MD , Tran Huy Nguyen MS , Sreenidhi Iyengar MS , Zhiwei Li MS , Van Nguyen MD , Brandon J. Wong MD , Jiun L. Do MD, PhD , Jose-Luis Ambite PhD , Carl Kesselman PhD , Lauren P. Daskivich MD , Benjamin Y. Xu MD, PhD
Purpose
To compare the performance of a vision transformer-based foundation model (RETFound) and a supervised convolutional neural network (VGG-19) for detecting referable glaucoma from fundus photographs.
Design
An evaluation of diagnostic technology.
Participants
Six thousand one hundred sixteen participants from the Los Angeles County Department of Health Services Teleretinal Screening Program.
Methods
Fundus photographs were labeled for referable glaucoma (cup-to-disc ratio ≥0.6) by certified optometrists. Four deep learning models were trained on cropped and uncropped images (training N = 8996; validation N = 3002) using 2 architectures: RETFound, a vision transformer with self-supervised pretraining on fundus photographs, and VGG-19. Models were evaluated on a held-out test set (N = 1000) labeled by glaucoma specialists and an external test set (N = 300) from University of Southern California clinics. Performance was assessed while varying training set size and stratifying by demographic factors. xRAI was used for saliency mapping.
Main Outcome Measures
Area under the receiver operating characteristic curve (AUC–ROC) and threshold-specific metrics.
Results
The cropped image VGG-19 model achieved the highest AUC–ROC (0.924 [0.907–0.940]), which was comparable (P = 0.07) to the cropped image RETFound model (0.911 [0.892–0.930]), which achieved the highest Youden-optimal performance (sensitivity 82.6% and specificity 88.2%) and F1 score (0.801). Cropped image models outperformed their uncropped counterparts (RETFound 0.889 [0.868–0.909], VGG-19 0.898 [0.879–0.917]) within each architecture (P < 0.001 for AUC–ROC comparisons). The uncropped image RETFound model performed best on external data (0.886 [0.849–0.924] vs. the next-highest 0.797 [0.746–0.848], P < 0.001 for AUC–ROC comparisons). RETFound models had a performance advantage when trained on smaller datasets (N < 2000 images), and the cropped image RETFound model performed consistently across ethnic groups (P = 0.20), whereas the others did not (P < 0.04). Performance did not vary by age or gender. Saliency maps for both architectures consistently included the optic nerve.
Conclusions
Although both RETFound and VGG-19 models performed well for classification of referable glaucoma, foundation models may be preferable when training data are limited and when domain shift is expected. Training models using images cropped to the region of the optic nerve improves performance regardless of architecture but may reduce model generalizability.
Financial Disclosure(s)
Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.
{"title":"Comparison of RETFound and a Supervised Convolutional Neural Network for Detection of Referable Glaucoma from Fundus Photographs","authors":"Kyle Bolo MD , Tran Huy Nguyen MS , Sreenidhi Iyengar MS , Zhiwei Li MS , Van Nguyen MD , Brandon J. Wong MD , Jiun L. Do MD, PhD , Jose-Luis Ambite PhD , Carl Kesselman PhD , Lauren P. Daskivich MD , Benjamin Y. Xu MD, PhD","doi":"10.1016/j.xops.2025.101008","DOIUrl":"10.1016/j.xops.2025.101008","url":null,"abstract":"<div><h3>Purpose</h3><div>To compare the performance of a vision transformer-based foundation model (RETFound) and a supervised convolutional neural network (VGG-19) for detecting referable glaucoma from fundus photographs.</div></div><div><h3>Design</h3><div>An evaluation of diagnostic technology.</div></div><div><h3>Participants</h3><div>Six thousand one hundred sixteen participants from the Los Angeles County Department of Health Services Teleretinal Screening Program.</div></div><div><h3>Methods</h3><div>Fundus photographs were labeled for referable glaucoma (cup-to-disc ratio ≥0.6) by certified optometrists. Four deep learning models were trained on cropped and uncropped images (training N = 8996; validation N = 3002) using 2 architectures: RETFound, a vision transformer with self-supervised pretraining on fundus photographs, and VGG-19. Models were evaluated on a held-out test set (N = 1000) labeled by glaucoma specialists and an external test set (N = 300) from University of Southern California clinics. Performance was assessed while varying training set size and stratifying by demographic factors. xRAI was used for saliency mapping.</div></div><div><h3>Main Outcome Measures</h3><div>Area under the receiver operating characteristic curve (AUC–ROC) and threshold-specific metrics.</div></div><div><h3>Results</h3><div>The cropped image VGG-19 model achieved the highest AUC–ROC (0.924 [0.907–0.940]), which was comparable (<em>P</em> = 0.07) to the cropped image RETFound model (0.911 [0.892–0.930]), which achieved the highest Youden-optimal performance (sensitivity 82.6% and specificity 88.2%) and F1 score (0.801). Cropped image models outperformed their uncropped counterparts (RETFound 0.889 [0.868–0.909], VGG-19 0.898 [0.879–0.917]) within each architecture (<em>P</em> < 0.001 for AUC–ROC comparisons). The uncropped image RETFound model performed best on external data (0.886 [0.849–0.924] vs. the next-highest 0.797 [0.746–0.848], <em>P</em> < 0.001 for AUC–ROC comparisons). RETFound models had a performance advantage when trained on smaller datasets (N < 2000 images), and the cropped image RETFound model performed consistently across ethnic groups (<em>P</em> = 0.20), whereas the others did not (<em>P</em> < 0.04). Performance did not vary by age or gender. Saliency maps for both architectures consistently included the optic nerve.</div></div><div><h3>Conclusions</h3><div>Although both RETFound and VGG-19 models performed well for classification of referable glaucoma, foundation models may be preferable when training data are limited and when domain shift is expected. Training models using images cropped to the region of the optic nerve improves performance regardless of architecture but may reduce model generalizability.</div></div><div><h3>Financial Disclosure(s)</h3><div>Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.</div></div>","PeriodicalId":74363,"journal":{"name":"Ophthalmology science","volume":"6 2","pages":"Article 101008"},"PeriodicalIF":4.6,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145884504","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01Epub Date: 2025-11-04DOI: 10.1016/j.xops.2025.100994
Samantha Rees MPH , Jing Nie PhD , Yihua Yue PhD, MPH , Jean Wactawski-Wende PhD , Sangita Patel MD, PhD , Chris A. Andrews PhD , Robert B. Wallace MD , Emily W. Gower PhD , Amy E. Millen PhD
Objective
We prospectively examined the association between tobacco exposure (personal/secondhand smoking [SHS]) and Fuchs endothelial corneal dystrophy (FECD) in older women, the group most impacted by FECD.
Design
We conducted a secondary data analysis utilizing the Women’s Health Initiative’s (WHI’s) Observational Study and Clinical Trials data.
Participants
Postmenopausal women aged >65 who participated in WHI, had available Medicare claims data, and did not have FECD within 1 year after WHI enrollment were included (N = 37 824).
Methods
Smoking status, pack-years, and average cigarettes per day were assessed at baseline (1993-1998). Secondhand smoking was assessed by location (childhood or adulthood at home and work). Participant characteristics were compared by personal smoking history and SHS status. Crude and adjusted Cox proportional hazards models were used to estimate hazard ratios (HRs) and 95% confidence intervals (CIs) for the risk of FECD by personal smoking exposure and by SHS status.
Main Outcome Measures
Incident FECD cases were identified objectively via Medicare claims data through 2019.
Results
Current smokers compared with never smokers were more likely to be younger, have lower body mass indices, and were less likely to be White, married, and users of hormone replacement therapy. Current smokers had an increased risk of FECD compared with never smokers (HR = 1.12, CIs: 0.90–1.38) and former smokers had a slight decreased risk of FECD compared with never smokers (HR = 0.92, CIs: 0.84–1.01). Current smokers who smoked ≥15 cigarettes/day had a 26.0% (HR = 1.26, CIs: 0.94–1.68) greater risk of developing FECD compared with never smokers. However, former smokers who smoked ≥15 cigarettes/day had a 14.0% (HR = 0.86, CIs: 0.76–0.97) reduced risk of developing FECD compared with never smokers. Most women (93.6%) were exposed to SHS sometime in their life. Never smokers exposed to SHS at home during childhood or adulthood had approximately a 22%-25% nonsignificant increased risk of developing FECD compared with never smokers without SHS exposure.
Conclusions
In this sample of postmenopausal women, personal smoking and SHS were not significantly associated with increased risk of FECD, but suggestions of an increased risk were observed in current smokers. Our findings may have been impacted by lack of variation in exposures, survival, and sick-quitter biases.
Financial Disclosure(s)
Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.
{"title":"Tobacco Exposure and Risk of Developing Fuchs Endothelial Corneal Dystrophy in the Women’s Health Initiative Studies","authors":"Samantha Rees MPH , Jing Nie PhD , Yihua Yue PhD, MPH , Jean Wactawski-Wende PhD , Sangita Patel MD, PhD , Chris A. Andrews PhD , Robert B. Wallace MD , Emily W. Gower PhD , Amy E. Millen PhD","doi":"10.1016/j.xops.2025.100994","DOIUrl":"10.1016/j.xops.2025.100994","url":null,"abstract":"<div><h3>Objective</h3><div>We prospectively examined the association between tobacco exposure (personal/secondhand smoking [SHS]) and Fuchs endothelial corneal dystrophy (FECD) in older women, the group most impacted by FECD.</div></div><div><h3>Design</h3><div>We conducted a secondary data analysis utilizing the Women’s Health Initiative’s (WHI’s) Observational Study and Clinical Trials data.</div></div><div><h3>Participants</h3><div>Postmenopausal women aged >65 who participated in WHI, had available Medicare claims data, and did not have FECD within 1 year after WHI enrollment were included (N = 37 824).</div></div><div><h3>Methods</h3><div>Smoking status, pack-years, and average cigarettes per day were assessed at baseline (1993-1998). Secondhand smoking was assessed by location (childhood or adulthood at home and work). Participant characteristics were compared by personal smoking history and SHS status. Crude and adjusted Cox proportional hazards models were used to estimate hazard ratios (HRs) and 95% confidence intervals (CIs) for the risk of FECD by personal smoking exposure and by SHS status.</div></div><div><h3>Main Outcome Measures</h3><div>Incident FECD cases were identified objectively via Medicare claims data through 2019.</div></div><div><h3>Results</h3><div>Current smokers compared with never smokers were more likely to be younger, have lower body mass indices, and were less likely to be White, married, and users of hormone replacement therapy. Current smokers had an increased risk of FECD compared with never smokers (HR = 1.12, CIs: 0.90–1.38) and former smokers had a slight decreased risk of FECD compared with never smokers (HR = 0.92, CIs: 0.84–1.01). Current smokers who smoked ≥15 cigarettes/day had a 26.0% (HR = 1.26, CIs: 0.94–1.68) greater risk of developing FECD compared with never smokers. However, former smokers who smoked ≥15 cigarettes/day had a 14.0% (HR = 0.86, CIs: 0.76–0.97) reduced risk of developing FECD compared with never smokers. Most women (93.6%) were exposed to SHS sometime in their life. Never smokers exposed to SHS at home during childhood or adulthood had approximately a 22%-25% nonsignificant increased risk of developing FECD compared with never smokers without SHS exposure.</div></div><div><h3>Conclusions</h3><div>In this sample of postmenopausal women, personal smoking and SHS were not significantly associated with increased risk of FECD, but suggestions of an increased risk were observed in current smokers. Our findings may have been impacted by lack of variation in exposures, survival, and sick-quitter biases.</div></div><div><h3>Financial Disclosure(s)</h3><div>Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.</div></div>","PeriodicalId":74363,"journal":{"name":"Ophthalmology science","volume":"6 2","pages":"Article 100994"},"PeriodicalIF":4.6,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145749030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01Epub Date: 2025-11-21DOI: 10.1016/j.xops.2025.101019
{"title":"Corrigendum to “Randomized Study of Intravitreal Autologous CD34+ Stem Cells in Central Retinal Vein Occlusion (Treatment of Retinal vein occlusion Using STem cells [TRUST] Report 1): Safety and Feasibility. Ophthalmol Sci. 2026;6:100905”","authors":"","doi":"10.1016/j.xops.2025.101019","DOIUrl":"10.1016/j.xops.2025.101019","url":null,"abstract":"","PeriodicalId":74363,"journal":{"name":"Ophthalmology science","volume":"6 2","pages":"Article 101019"},"PeriodicalIF":4.6,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145976578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01Epub Date: 2025-12-03DOI: 10.1016/j.xops.2025.101027
Usha Chakravarthy MD, PhD , Lajos Csincsik , Kelvin Y.C. Teo MD, PhD , Marion R. Munk MD , Dilraj S. Grewal MD , Robyn H. Guymer MD , Glenn J. Jaffe MD , Tunde Peto MD , SriniVas R. Sadda MD , Giovanni Staurenghi MD , Chui M.G. Cheung MD
Purpose
To evaluate conventional imaging modalities for detecting fibrosis in neovascular age-related macular degeneration (nAMD) and to develop a standardized diagnostic workflow.
Design
Systematic discussion and grading exercise assessing multiple imaging modalities.
Participants
Retina specialists from the International Fibrosis Consensus workgroup and members of the International Retinal Imaging Society.
Methods
An international panel assessed the advantages and limitations of 5 imaging modalities—color fundus photography (CFP), fluorescein angiography (FA), spectral domain OCT (SD-OCT), near-infrared reflectance, and fundus autofluorescence—for detecting fibrosis in nAMD. A structured debate was followed by 2 online, masked image grading surveys. Sensitivity, specificity, and predictive accuracy of each modality, alone and in combination, were determined. Intergrader agreement was calculated. Imaging features were also correlated with histology in a nonhuman primate laser model. Based on consensus discussions at 2 in-person meetings and survey results, a 2-step diagnostic approach using SD-OCT as the primary modality was proposed.
Main Outcome Measures
Recommendation for a standardized approach for diagnosing fibrosis in eyes with nAMD.
Results
Among the 5 modalities, SD-OCT was considered essential by all workgroup members. Hyperreflective material on OCT was unanimously identified as a key indicator of fibrosis. However, its limited specificity was acknowledged. In 2 masked grading exercises, SD-OCT showed the highest sensitivity (0.88 and 0.84) but only moderate specificity (0.56 and 0.57). The area under the curve (AUC) for SD-OCT was 0.72 and 0.70. A 2-step strategy combining SD-OCT with CFP or FA improved diagnostic accuracy. Hyperreflective material was defined as material with reflectivity equal to or greater than normal retinal pigment epithelium (RPE), well-defined margins, RPE disruption, and a laminated appearance. Corresponding CFP findings included well-defined yellow/white/gray subretinal lesions, and FA findings included early blocked fluorescence and late staining. This 2-step approach increased AUC to 0.85, with sensitivity of 0.83 and specificity of 0.87.
Conclusions
The study establishes a 2-step approach using OCT as the primary modality in clinical studies for the detection of fibrosis.
Financial Disclosure(s)
Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.
{"title":"Standardization of Imaging Criteria for Detecting Macular Fibrosis in Neovascular Age-Related Macular Degeneration","authors":"Usha Chakravarthy MD, PhD , Lajos Csincsik , Kelvin Y.C. Teo MD, PhD , Marion R. Munk MD , Dilraj S. Grewal MD , Robyn H. Guymer MD , Glenn J. Jaffe MD , Tunde Peto MD , SriniVas R. Sadda MD , Giovanni Staurenghi MD , Chui M.G. Cheung MD","doi":"10.1016/j.xops.2025.101027","DOIUrl":"10.1016/j.xops.2025.101027","url":null,"abstract":"<div><h3>Purpose</h3><div>To evaluate conventional imaging modalities for detecting fibrosis in neovascular age-related macular degeneration (nAMD) and to develop a standardized diagnostic workflow.</div></div><div><h3>Design</h3><div>Systematic discussion and grading exercise assessing multiple imaging modalities.</div></div><div><h3>Participants</h3><div>Retina specialists from the International Fibrosis Consensus workgroup and members of the International Retinal Imaging Society.</div></div><div><h3>Methods</h3><div>An international panel assessed the advantages and limitations of 5 imaging modalities—color fundus photography (CFP), fluorescein angiography (FA), spectral domain OCT (SD-OCT), near-infrared reflectance, and fundus autofluorescence—for detecting fibrosis in nAMD. A structured debate was followed by 2 online, masked image grading surveys. Sensitivity, specificity, and predictive accuracy of each modality, alone and in combination, were determined. Intergrader agreement was calculated. Imaging features were also correlated with histology in a nonhuman primate laser model. Based on consensus discussions at 2 in-person meetings and survey results, a 2-step diagnostic approach using SD-OCT as the primary modality was proposed.</div></div><div><h3>Main Outcome Measures</h3><div>Recommendation for a standardized approach for diagnosing fibrosis in eyes with nAMD.</div></div><div><h3>Results</h3><div>Among the 5 modalities, SD-OCT was considered essential by all workgroup members. Hyperreflective material on OCT was unanimously identified as a key indicator of fibrosis. However, its limited specificity was acknowledged. In 2 masked grading exercises, SD-OCT showed the highest sensitivity (0.88 and 0.84) but only moderate specificity (0.56 and 0.57). The area under the curve (AUC) for SD-OCT was 0.72 and 0.70. A 2-step strategy combining SD-OCT with CFP or FA improved diagnostic accuracy. Hyperreflective material was defined as material with reflectivity equal to or greater than normal retinal pigment epithelium (RPE), well-defined margins, RPE disruption, and a laminated appearance. Corresponding CFP findings included well-defined yellow/white/gray subretinal lesions, and FA findings included early blocked fluorescence and late staining. This 2-step approach increased AUC to 0.85, with sensitivity of 0.83 and specificity of 0.87.</div></div><div><h3>Conclusions</h3><div>The study establishes a 2-step approach using OCT as the primary modality in clinical studies for the detection of fibrosis.</div></div><div><h3>Financial Disclosure(s)</h3><div>Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.</div></div>","PeriodicalId":74363,"journal":{"name":"Ophthalmology science","volume":"6 2","pages":"Article 101027"},"PeriodicalIF":4.6,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145976573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01Epub Date: 2025-11-05DOI: 10.1016/j.xops.2025.100995
Alessio Antropoli MD , Francesco Vacirca MD , Ugo Introini MD , Francesco Bandello MD , Maurizio Battaglia Parodi MD , Maria Vittoria Cicinelli MD
Purpose
To investigate cross-sectional characteristics and longitudinal changes in perilesional fundus autofluorescence (FAF) patterns in geographic atrophy (GA).
Study Design
Retrospective cohort study.
Participants
One hundred forty-three eyes from 99 patients (70 females) with foveal-sparing GA at baseline, of which 106 eyes from 76 patients were eligible for longitudinal analyses.
Methods
Best-corrected visual acuity, FAF, and OCT findings were collected at all visits. Baseline FAF patterns were determined using a 5-item classification, with tracking of longitudinal changes. Changes in GA growth rate following pattern transitions were investigated through linear mixed models.
Main Outcome Measures
Frequency and timing of perilesional FAF pattern transitions, and their association with GA growth rate.
Results
Of the 106 eyes with follow-up, 23 (22%) showed a change in perilesional FAF pattern after a median of 3 years (interquartile range: 1.74–4.10). Square root GA growth rate was 0.40 mm/year (95% confidence interval [CI]: 0.34–0.46; P < 0.001), with modestly faster rate in “diffuse nontrickling” compared with “none” eyes (+0.06 mm/year; 95% CI: 0.004–0.12; P = 0.036) and slower rate in eyes showing FAF pattern transitions (–0.12 mm/year; 95% CI: –0.19 to –0.05; P < 0.001). Baseline lesion size and other FAF patterns were not significantly associated with progression (P > 0.05).
Conclusions
Perilesional FAF pattern transitions occur in a subset of GA eyes and are marked by slower progression, underscoring their potential relevance for disease monitoring and clinical trial design.
Financial Disclosure(s)
Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.
{"title":"Perilesional Fundus Autofluorescence Patterns Are Not Static: Longitudinal Transitions in Geographic Atrophy and Association with Disease Progression","authors":"Alessio Antropoli MD , Francesco Vacirca MD , Ugo Introini MD , Francesco Bandello MD , Maurizio Battaglia Parodi MD , Maria Vittoria Cicinelli MD","doi":"10.1016/j.xops.2025.100995","DOIUrl":"10.1016/j.xops.2025.100995","url":null,"abstract":"<div><h3>Purpose</h3><div>To investigate cross-sectional characteristics and longitudinal changes in perilesional fundus autofluorescence (FAF) patterns in geographic atrophy (GA).</div></div><div><h3>Study Design</h3><div>Retrospective cohort study.</div></div><div><h3>Participants</h3><div>One hundred forty-three eyes from 99 patients (70 females) with foveal-sparing GA at baseline, of which 106 eyes from 76 patients were eligible for longitudinal analyses.</div></div><div><h3>Methods</h3><div>Best-corrected visual acuity, FAF, and OCT findings were collected at all visits. Baseline FAF patterns were determined using a 5-item classification, with tracking of longitudinal changes. Changes in GA growth rate following pattern transitions were investigated through linear mixed models.</div></div><div><h3>Main Outcome Measures</h3><div>Frequency and timing of perilesional FAF pattern transitions, and their association with GA growth rate.</div></div><div><h3>Results</h3><div>Of the 106 eyes with follow-up, 23 (22%) showed a change in perilesional FAF pattern after a median of 3 years (interquartile range: 1.74–4.10). Square root GA growth rate was 0.40 mm/year (95% confidence interval [CI]: 0.34–0.46; <em>P</em> < 0.001), with modestly faster rate in “diffuse nontrickling” compared with “none” eyes (+0.06 mm/year; 95% CI: 0.004–0.12; <em>P</em> = 0.036) and slower rate in eyes showing FAF pattern transitions (–0.12 mm/year; 95% CI: –0.19 to –0.05; <em>P</em> < 0.001). Baseline lesion size and other FAF patterns were not significantly associated with progression (<em>P</em> > 0.05).</div></div><div><h3>Conclusions</h3><div>Perilesional FAF pattern transitions occur in a subset of GA eyes and are marked by slower progression, underscoring their potential relevance for disease monitoring and clinical trial design.</div></div><div><h3>Financial Disclosure(s)</h3><div>Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.</div></div>","PeriodicalId":74363,"journal":{"name":"Ophthalmology science","volume":"6 2","pages":"Article 100995"},"PeriodicalIF":4.6,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145645993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01Epub Date: 2025-11-26DOI: 10.1016/j.xops.2025.101014
David Mikhail MD(C), MSc , Shuting Xie MSc , Michael Balas MD , Jason M. Kwok MD , Ana Miguel MD, PhD , Amrit Rai MD , Amandeep Rai MD , Peter J. Kertes MD , Iqbal Ike K. Ahmed MD , Matthew B. Schlenker MD, MSc
Purpose
To objectively quantify the motion paths of surgical instruments during cataract surgery across a resident’s training, identifying patterns of skill acquisition and proficiency development.
Design
An n = 1 panel study.
Subjects
One ophthalmology resident performing cataract surgery.
Methods
One hundred cataract surgery videos performed by a single resident from their sixth to 760th case were collected. Advanced motion tracking software (Computer Vision Annotation Tool) was utilized to annotate and track the trajectories of 11 surgical instruments on a frame-by-frame basis. Monotonic trends were assessed using the Mann–Kendall test and Theil–Sen slope estimation, with Spearman correlation measuring the association between case number and performance metric values. Pettitt change-point analysis identified significant transitions in the resident’s skill progression.
Main Outcome Measures
Six key motion parameters, including total path length, average velocity, average acceleration, root mean square jerk, average angular change, and workspace coverage, were extracted for each instrument in each video.
Results
All 11 instruments demonstrated statistically significant reductions in ≥1 motion parameter. Path length consistently decreased across training, with the largest reductions seen in the cannula (–11.8%; 95% confidence interval [CI], –17.4% to –6.8%; P < 0.001), phacoemulsification handpiece (–11.5%; 95% CI, –14.1% to –8.7%; P < 0.001), and cystotome (–8.9%; 95% CI, –11.8% to –5.9%; P < 0.001). The intraocular lens inserter showed the greatest reduction in average angular change of 3.0% (–1.70°) (95% CI, –3.9% to –2.0%; P < 0.001). Pettitt analysis demonstrated significant shifts in surgical efficiency at around case 300 for most instruments, although improvements in certain advanced tasks (e.g., lens implantation) emerged later.
Conclusions
This large-scale, frame-by-frame motion tracking study revealed distinct instrument- and task-specific learning curves in cataract surgery, highlighting progressive changes in motion metrics over time. A significant shift at approximately case 300 marked a milestone in the resident’s instrument use patterns. These findings underscore the potential of objective, video-based motion tracking analytics to provide data-driven resident feedback, guiding targeted instruction and standardizing cataract surgery training.
Financial Disclosure(s)
Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.
{"title":"Quantitative Analysis of Instrument Motion Paths in Cataract Surgery across a Resident’s Training","authors":"David Mikhail MD(C), MSc , Shuting Xie MSc , Michael Balas MD , Jason M. Kwok MD , Ana Miguel MD, PhD , Amrit Rai MD , Amandeep Rai MD , Peter J. Kertes MD , Iqbal Ike K. Ahmed MD , Matthew B. Schlenker MD, MSc","doi":"10.1016/j.xops.2025.101014","DOIUrl":"10.1016/j.xops.2025.101014","url":null,"abstract":"<div><h3>Purpose</h3><div>To objectively quantify the motion paths of surgical instruments during cataract surgery across a resident’s training, identifying patterns of skill acquisition and proficiency development.</div></div><div><h3>Design</h3><div>An <em>n</em> = 1 panel study.</div></div><div><h3>Subjects</h3><div>One ophthalmology resident performing cataract surgery.</div></div><div><h3>Methods</h3><div>One hundred cataract surgery videos performed by a single resident from their sixth to 760th case were collected. Advanced motion tracking software (Computer Vision Annotation Tool) was utilized to annotate and track the trajectories of 11 surgical instruments on a frame-by-frame basis. Monotonic trends were assessed using the Mann–Kendall test and Theil–Sen slope estimation, with Spearman correlation measuring the association between case number and performance metric values. Pettitt change-point analysis identified significant transitions in the resident’s skill progression.</div></div><div><h3>Main Outcome Measures</h3><div>Six key motion parameters, including total path length, average velocity, average acceleration, root mean square jerk, average angular change, and workspace coverage, were extracted for each instrument in each video.</div></div><div><h3>Results</h3><div>All 11 instruments demonstrated statistically significant reductions in ≥1 motion parameter. Path length consistently decreased across training, with the largest reductions seen in the cannula (–11.8%; 95% confidence interval [CI], –17.4% to –6.8%; <em>P</em> < 0.001), phacoemulsification handpiece (–11.5%; 95% CI, –14.1% to –8.7%; <em>P</em> < 0.001), and cystotome (–8.9%; 95% CI, –11.8% to –5.9%; <em>P</em> < 0.001). The intraocular lens inserter showed the greatest reduction in average angular change of 3.0% (–1.70°) (95% CI, –3.9% to –2.0%; <em>P</em> < 0.001). Pettitt analysis demonstrated significant shifts in surgical efficiency at around case 300 for most instruments, although improvements in certain advanced tasks (e.g., lens implantation) emerged later.</div></div><div><h3>Conclusions</h3><div>This large-scale, frame-by-frame motion tracking study revealed distinct instrument- and task-specific learning curves in cataract surgery, highlighting progressive changes in motion metrics over time. A significant shift at approximately case 300 marked a milestone in the resident’s instrument use patterns. These findings underscore the potential of objective, video-based motion tracking analytics to provide data-driven resident feedback, guiding targeted instruction and standardizing cataract surgery training.</div></div><div><h3>Financial Disclosure(s)</h3><div>Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.</div></div>","PeriodicalId":74363,"journal":{"name":"Ophthalmology science","volume":"6 2","pages":"Article 101014"},"PeriodicalIF":4.6,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145884503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}