Evaluating the Impact of Retinal Vessel Segmentation Metrics on Retest Reliability in a Clinical Setting: A Comparative Analysis Using AutoMorph.

IF 4.7 2区医学 Q1 OPHTHALMOLOGY Investigative ophthalmology & visual science Pub Date : 2024-11-04 DOI:10.1167/iovs.65.13.24

Samuel D Giesser, Ferhat Turgut, Amr Saad, Jay R Zoellin, Chiara Sommer, Yukun Zhou, Siegfried K Wagner, Pearse A Keane, Matthias Becker, Delia Cabrera DeBuc, Gábor Márk Somfai

{"title":"Evaluating the Impact of Retinal Vessel Segmentation Metrics on Retest Reliability in a Clinical Setting: A Comparative Analysis Using AutoMorph.","authors":"Samuel D Giesser, Ferhat Turgut, Amr Saad, Jay R Zoellin, Chiara Sommer, Yukun Zhou, Siegfried K Wagner, Pearse A Keane, Matthias Becker, Delia Cabrera DeBuc, Gábor Márk Somfai","doi":"10.1167/iovs.65.13.24","DOIUrl":null,"url":null,"abstract":"Purpose: Current research on artificial intelligence-based fundus photography biomarkers has demonstrated inconsistent results. Consequently, we aimed to evaluate and predict the test-retest reliability of retinal parameters extracted from fundus photography.Methods: Two groups of patients were recruited for the study: an intervisit group (n = 28) to assess retest reliability over a period of 1 to 14 days and an intravisit group (n = 44) to evaluate retest reliability within a single session. Using AutoMorph, we generated test and retest vessel segmentation maps; measured segmentation map agreement via accuracy, sensitivity, F1 score and Jaccard index; and calculated 76 metrics from each fundus image. The retest reliability of each metric was analyzed in terms of the Spearman correlation coefficient, intraclass correlation coefficient (ICC), and relative percentage change. A linear model with the input variables contrast-to-noise-ratio and fractal dimension, chosen by a P-value-based backward selection process, was developed to predict the median percentage difference on retest per image based on image-quality metrics. This model was trained on the intravisit dataset and validated using the intervisit dataset.Results: In the intervisit group, retest reliability varied between Spearman correlation coefficients of 0.34 and 0.99, ICC values of 0.31 to 0.99, and mean absolute percentage differences of 0.96% to 223.67%. Similarly, in the intravisit group, the retest reliability ranged from Spearman correlation coefficients of 0.55 and 0.96, ICC values of 0.40 to 0.97, and mean percentage differences of 0.49% to 371.23%. Segmentation map accuracy between test and retest never dropped below 97%; the mean F1 scores were 0.85 for the intravisit dataset and 0.82 for the intervisit dataset. The best retest was achieved with disc-width regarding the Spearman correlation coefficient in both datasets. In terms of the Spearman correlation coefficient, the worst retests of the intervisit and intravisit groups were tortuosity density and artery tortuosity density, respectively. The intravisit group exhibited better retest reliability than the intervisit group (P < 0.001). Our linear model, with the two independent variables contrast-to-noise ratio and fractal dimension predicted the median retest reliability per image on its validation dataset, the intervisit group, with an R2 of 0.53 (P < 0.001).Conclusions: Our findings highlight a considerable volatility in the reliability of some retinal biomarkers. Improving retest could allow disease progression modeling in smaller datasets or an individualized treatment approach. Image quality is moderately predictive of retest reliability, and further work is warranted to understand the reasons behind our observations better and thus ensure consistent retest results.","PeriodicalId":14620,"journal":{"name":"Investigative ophthalmology & visual science","volume":"65 13","pages":"24"},"PeriodicalIF":4.7000,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11572755/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Investigative ophthalmology & visual science","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1167/iovs.65.13.24","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"OPHTHALMOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Purpose: Current research on artificial intelligence-based fundus photography biomarkers has demonstrated inconsistent results. Consequently, we aimed to evaluate and predict the test-retest reliability of retinal parameters extracted from fundus photography.

Methods: Two groups of patients were recruited for the study: an intervisit group (n = 28) to assess retest reliability over a period of 1 to 14 days and an intravisit group (n = 44) to evaluate retest reliability within a single session. Using AutoMorph, we generated test and retest vessel segmentation maps; measured segmentation map agreement via accuracy, sensitivity, F1 score and Jaccard index; and calculated 76 metrics from each fundus image. The retest reliability of each metric was analyzed in terms of the Spearman correlation coefficient, intraclass correlation coefficient (ICC), and relative percentage change. A linear model with the input variables contrast-to-noise-ratio and fractal dimension, chosen by a P-value-based backward selection process, was developed to predict the median percentage difference on retest per image based on image-quality metrics. This model was trained on the intravisit dataset and validated using the intervisit dataset.

Results: In the intervisit group, retest reliability varied between Spearman correlation coefficients of 0.34 and 0.99, ICC values of 0.31 to 0.99, and mean absolute percentage differences of 0.96% to 223.67%. Similarly, in the intravisit group, the retest reliability ranged from Spearman correlation coefficients of 0.55 and 0.96, ICC values of 0.40 to 0.97, and mean percentage differences of 0.49% to 371.23%. Segmentation map accuracy between test and retest never dropped below 97%; the mean F1 scores were 0.85 for the intravisit dataset and 0.82 for the intervisit dataset. The best retest was achieved with disc-width regarding the Spearman correlation coefficient in both datasets. In terms of the Spearman correlation coefficient, the worst retests of the intervisit and intravisit groups were tortuosity density and artery tortuosity density, respectively. The intravisit group exhibited better retest reliability than the intervisit group (P < 0.001). Our linear model, with the two independent variables contrast-to-noise ratio and fractal dimension predicted the median retest reliability per image on its validation dataset, the intervisit group, with an R2 of 0.53 (P < 0.001).

Conclusions: Our findings highlight a considerable volatility in the reliability of some retinal biomarkers. Improving retest could allow disease progression modeling in smaller datasets or an individualized treatment approach. Image quality is moderately predictive of retest reliability, and further work is warranted to understand the reasons behind our observations better and thus ensure consistent retest results.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

在临床环境中评估视网膜血管分割指标对重测可靠性的影响：使用 AutoMorph 进行比较分析。

目的：目前对基于人工智能的眼底摄影生物标志物的研究结果并不一致。因此，我们旨在评估和预测从眼底摄影中提取的视网膜参数的重测可靠性：我们招募了两组患者进行研究：一组是诊间组（28 人），用于评估 1 到 14 天内的重测可靠性；另一组是诊中组（44 人），用于评估单次诊疗中的重测可靠性。我们使用 AutoMorph 生成了测试和重测血管分割图；通过准确度、灵敏度、F1 分数和 Jaccard 指数测量了分割图的一致性；并计算了每张眼底图像的 76 项指标。通过斯皮尔曼相关系数（Spearman correlation coefficient）、类内相关系数（ICC）和相对百分比变化分析了每个指标的重测可靠性。通过基于 P 值的反向选择过程，建立了一个输入变量为对比度-噪声比和分形维度的线性模型，用于预测基于图像质量指标的每张图像重测百分比差异中值。该模型在访问内数据集上进行了训练，并使用访问间数据集进行了验证：在互访组中，重测可靠性的斯皮尔曼相关系数介于 0.34 和 0.99 之间，ICC 值介于 0.31 和 0.99 之间，平均绝对百分比差异介于 0.96% 和 223.67% 之间。同样，在内部观察组中，重测可靠性的斯皮尔曼相关系数为 0.55 和 0.96，ICC 值为 0.40 至 0.97，平均百分比差异为 0.49% 至 371.23%。测试和复测之间的分割图准确率从未低于 97%；访问内数据集的平均 F1 分数为 0.85，访问间数据集的平均 F1 分数为 0.82。就 Spearman 相关系数而言，在两个数据集中，圆盘宽度的复测结果最好。就斯皮尔曼相关系数而言，视像间组和视像内组重测结果最差的分别是迂曲密度和动脉迂曲密度。观察内组的重测可靠性优于观察间组（P < 0.001）。我们的线性模型包含对比度-噪声比和分形维度两个自变量，可预测验证数据集（即造影间组）每幅图像的重测可靠性中位数，R2 为 0.53（P < 0.001）：我们的研究结果表明，一些视网膜生物标志物的可靠性存在相当大的波动性。改进重测可在较小的数据集中建立疾病进展模型，或采用个体化治疗方法。图像质量对重测可靠性有一定的预测作用，我们有必要开展进一步的工作，以更好地理解我们的观察结果背后的原因，从而确保重测结果的一致性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Investigative ophthalmology & visual science 医学-眼科学

CiteScore

6.90

自引率

4.50%

发文量

339

审稿时长

1 months

期刊介绍： Investigative Ophthalmology & Visual Science (IOVS), published as ready online, is a peer-reviewed academic journal of the Association for Research in Vision and Ophthalmology (ARVO). IOVS features original research, mostly pertaining to clinical and laboratory ophthalmology and vision research in general.