Statistical Evaluation of Smartphone-Based Automated Grading System for Ocular Redness Associated with Dry Eye Disease and Implications for Clinical Trials.
John D Rodriguez, Adam Hamm, Ethan Bensinger, Samanatha J Kerti, Paul J Gomes, George W Ousler Iii, Palak Gupta, Carlos Gustavo De Moraes, Mark B Abelson
{"title":"Statistical Evaluation of Smartphone-Based Automated Grading System for Ocular Redness Associated with Dry Eye Disease and Implications for Clinical Trials.","authors":"John D Rodriguez, Adam Hamm, Ethan Bensinger, Samanatha J Kerti, Paul J Gomes, George W Ousler Iii, Palak Gupta, Carlos Gustavo De Moraes, Mark B Abelson","doi":"10.2147/OPTH.S506519","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>This study introduces a fully automated approach using deep learning-based segmentation to select the conjunctiva as the region of interest (ROI) for large-scale, multi-site clinical trials. By integrating a precise, objective grading system, we aim to minimize inter- and intra-grader variability due to perceptual biases. We evaluate the impact of adding a \"horizontality\" parameter to the grading system and assess this method's potential to enhance grading precision, reduce sample size, and improve clinical trial efficiency.</p><p><strong>Methods: </strong>We analyzed 29,640 images from 450 subjects in a multi-visit, multi-site clinical trial to assess the performance of an automated grading model compared to expert graders. Images were graded on a 0-4 scale, in 0.5 increments. The model utilizes the DeepLabV3 architecture for image segmentation, extracting two key features-horizontality and redness. The algorithm then uses these features to predict eye redness, validated by comparison with expert grader scores.</p><p><strong>Results: </strong>The bivariate model using both redness and horizontality performed best, with a Mean Absolute Error (MAE) of 0.450 points (SD=0.334) on the redness scale relative to expert scores. Expert graded scores were within one unit of the mean grade in over 85% cases, ensuring consistency and optimal training set for the predictive model. Models incorporating both features outperformed those using only redness, reducing MAE by 5-6%. The optimal generalized model improved predictive accuracy with horizontality such that 93.0% of images were predicted with an absolute error less than one unit difference in grading.</p><p><strong>Conclusion: </strong>This study demonstrates that fully automating image analysis allows thousands of images to be graded efficiently. The addition of the horizontality parameter enhances model performance, reduces error, and supports its relevance to specific Dry Eye manifestations. This automated method provides a continuous scale and greater sensitivity to treatment effects than standard clinical scales.</p>","PeriodicalId":93945,"journal":{"name":"Clinical ophthalmology (Auckland, N.Z.)","volume":"19 ","pages":"907-914"},"PeriodicalIF":0.0000,"publicationDate":"2025-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11912931/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Clinical ophthalmology (Auckland, N.Z.)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2147/OPTH.S506519","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Purpose: This study introduces a fully automated approach using deep learning-based segmentation to select the conjunctiva as the region of interest (ROI) for large-scale, multi-site clinical trials. By integrating a precise, objective grading system, we aim to minimize inter- and intra-grader variability due to perceptual biases. We evaluate the impact of adding a "horizontality" parameter to the grading system and assess this method's potential to enhance grading precision, reduce sample size, and improve clinical trial efficiency.
Methods: We analyzed 29,640 images from 450 subjects in a multi-visit, multi-site clinical trial to assess the performance of an automated grading model compared to expert graders. Images were graded on a 0-4 scale, in 0.5 increments. The model utilizes the DeepLabV3 architecture for image segmentation, extracting two key features-horizontality and redness. The algorithm then uses these features to predict eye redness, validated by comparison with expert grader scores.
Results: The bivariate model using both redness and horizontality performed best, with a Mean Absolute Error (MAE) of 0.450 points (SD=0.334) on the redness scale relative to expert scores. Expert graded scores were within one unit of the mean grade in over 85% cases, ensuring consistency and optimal training set for the predictive model. Models incorporating both features outperformed those using only redness, reducing MAE by 5-6%. The optimal generalized model improved predictive accuracy with horizontality such that 93.0% of images were predicted with an absolute error less than one unit difference in grading.
Conclusion: This study demonstrates that fully automating image analysis allows thousands of images to be graded efficiently. The addition of the horizontality parameter enhances model performance, reduces error, and supports its relevance to specific Dry Eye manifestations. This automated method provides a continuous scale and greater sensitivity to treatment effects than standard clinical scales.