Artificial intelligence for diagnosing exudative age-related macular degeneration.

IF 8.8 2区医学 Q1 MEDICINE, GENERAL & INTERNAL Cochrane Database of Systematic Reviews Pub Date : 2024-10-17 DOI:10.1002/14651858.CD015522.pub2

Chaerim Kang, Jui-En Lo, Helen Zhang, Sueko M Ng, John C Lin, Ingrid U Scott, Jayashree Kalpathy-Cramer, Su-Hsun Alison Liu, Paul B Greenberg

{"title":"Artificial intelligence for diagnosing exudative age-related macular degeneration.","authors":"Chaerim Kang, Jui-En Lo, Helen Zhang, Sueko M Ng, John C Lin, Ingrid U Scott, Jayashree Kalpathy-Cramer, Su-Hsun Alison Liu, Paul B Greenberg","doi":"10.1002/14651858.CD015522.pub2","DOIUrl":null,"url":null,"abstract":"Background: Age-related macular degeneration (AMD) is a retinal disorder characterized by central retinal (macular) damage. Approximately 10% to 20% of non-exudative AMD cases progress to the exudative form, which may result in rapid deterioration of central vision. Individuals with exudative AMD (eAMD) need prompt consultation with retinal specialists to minimize the risk and extent of vision loss. Traditional methods of diagnosing ophthalmic disease rely on clinical evaluation and multiple imaging techniques, which can be resource-consuming. Tests leveraging artificial intelligence (AI) hold the promise of automatically identifying and categorizing pathological features, enabling the timely diagnosis and treatment of eAMD.Objectives: To determine the diagnostic accuracy of artificial intelligence (AI) as a triaging tool for exudative age-related macular degeneration (eAMD).Search methods: We searched CENTRAL, MEDLINE, Embase, three clinical trials registries, and Data Archiving and Networked Services (DANS) for gray literature. We did not restrict searches by language or publication date. The date of the last search was April 2024.Selection criteria: Included studies compared the test performance of algorithms with that of human readers to detect eAMD on retinal images collected from people with AMD who were evaluated at eye clinics in community or academic medical centers, and who were not receiving treatment for eAMD when the images were taken. We included algorithms that were either internally or externally validated or both.Data collection and analysis: Pairs of review authors independently extracted data and assessed study quality using the Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) tool with revised signaling questions. For studies that reported more than one set of performance results, we extracted only one set of diagnostic accuracy data per study based on the last development stage or the optimal algorithm as indicated by the study authors. For two-class algorithms, we collected data from the 2x2 table whenever feasible. For multi-class algorithms, we first consolidated data from all classes other than eAMD before constructing the corresponding 2x2 tables. Assuming a common positivity threshold applied by the included studies, we chose random-effects, bivariate logistic models to estimate summary sensitivity and specificity as the primary performance metrics.Main results: We identified 36 eligible studies that reported 40 sets of algorithm performance data, encompassing over 16,000 participants and 62,000 images. We included 28 studies (78%) that reported 31 algorithms with performance data in the meta-analysis. The remaining nine studies (25%) reported eight algorithms that lacked usable performance data; we reported them in the qualitative synthesis. Study characteristics and risk of bias Most studies were conducted in Asia, followed by Europe, the USA, and collaborative efforts spanning multiple countries. Most studies identified study participants from the hospital setting, while others used retinal images from public repositories; a few studies did not specify image sources. Based on four of the 36 studies reporting demographic information, the age of the study participants ranged from 62 to 82 years. The included algorithms used various retinal image types as model input, such as optical coherence tomography (OCT) images (N = 15), fundus images (N = 6), and multi-modal imaging (N = 7). The predominant core method used was deep neural networks. All studies that reported externally validated algorithms were at high risk of bias mainly due to potential selection bias from either a two-gate design or the inappropriate exclusion of potentially eligible retinal images (or participants). Findings Only three of the 40 included algorithms were externally validated (7.5%, 3/40). The summary sensitivity and specificity were 0.94 (95% confidence interval (CI) 0.90 to 0.97) and 0.99 (95% CI 0.76 to 1.00), respectively, when compared to human graders (3 studies; 27,872 images; low-certainty evidence). The prevalence of images with eAMD ranged from 0.3% to 49%. Twenty-eight algorithms were reportedly either internally validated (20%, 8/40) or tested on a development set (50%, 20/40); the pooled sensitivity and specificity were 0.93 (95% CI 0.89 to 0.96) and 0.96 (95% CI 0.94 to 0.98), respectively, when compared to human graders (28 studies; 33,409 images; low-certainty evidence). We did not identify significant sources of heterogeneity among these 28 algorithms. Although algorithms using OCT images appeared more homogeneous and had the highest summary specificity (0.97, 95% CI 0.93 to 0.98), they were not superior to algorithms using fundus images alone (0.94, 95% CI 0.89 to 0.97) or multimodal imaging (0.96, 95% CI 0.88 to 0.99; P for meta-regression = 0.239). The median prevalence of images with eAMD was 30% (interquartile range [IQR] 22% to 39%). We did not include eight studies that described nine algorithms (one study reported two sets of algorithm results) to distinguish eAMD from normal images, images of other AMD, or other non-AMD retinal lesions in the meta-analysis. Five of these algorithms were generally based on smaller datasets (range 21 to 218 participants per study) yet with a higher prevalence of eAMD images (range 33% to 66%). Relative to human graders, the reported sensitivity in these studies ranged from 0.95 and 0.97, while the specificity ranged from 0.94 to 0.99. Similarly, using small datasets (range 46 to 106), an additional four algorithms for detecting eAMD from other retinal lesions showed high sensitivity (range 0.96 to 1.00) and specificity (range 0.77 to 1.00).Authors' conclusions: Low- to very low-certainty evidence suggests that an algorithm-based test may correctly identify most individuals with eAMD without increasing unnecessary referrals (false positives) in either the primary or the specialty care settings. There were significant concerns for applying the review findings due to variations in the eAMD prevalence in the included studies. In addition, among the included algorithm-based tests, diagnostic accuracy estimates were at risk of bias due to study participants not reflecting real-world characteristics, inadequate model validation, and the likelihood of selective results reporting. Limited quality and quantity of externally validated algorithms highlighted the need for high-certainty evidence. This evidence will require a standardized definition for eAMD on different imaging modalities and external validation of the algorithm to assess generalizability.","PeriodicalId":10473,"journal":{"name":"Cochrane Database of Systematic Reviews","volume":"10 ","pages":"CD015522"},"PeriodicalIF":8.8000,"publicationDate":"2024-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11483348/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cochrane Database of Systematic Reviews","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1002/14651858.CD015522.pub2","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MEDICINE, GENERAL & INTERNAL","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Age-related macular degeneration (AMD) is a retinal disorder characterized by central retinal (macular) damage. Approximately 10% to 20% of non-exudative AMD cases progress to the exudative form, which may result in rapid deterioration of central vision. Individuals with exudative AMD (eAMD) need prompt consultation with retinal specialists to minimize the risk and extent of vision loss. Traditional methods of diagnosing ophthalmic disease rely on clinical evaluation and multiple imaging techniques, which can be resource-consuming. Tests leveraging artificial intelligence (AI) hold the promise of automatically identifying and categorizing pathological features, enabling the timely diagnosis and treatment of eAMD.

Objectives: To determine the diagnostic accuracy of artificial intelligence (AI) as a triaging tool for exudative age-related macular degeneration (eAMD).

Search methods: We searched CENTRAL, MEDLINE, Embase, three clinical trials registries, and Data Archiving and Networked Services (DANS) for gray literature. We did not restrict searches by language or publication date. The date of the last search was April 2024.

Selection criteria: Included studies compared the test performance of algorithms with that of human readers to detect eAMD on retinal images collected from people with AMD who were evaluated at eye clinics in community or academic medical centers, and who were not receiving treatment for eAMD when the images were taken. We included algorithms that were either internally or externally validated or both.

Data collection and analysis: Pairs of review authors independently extracted data and assessed study quality using the Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) tool with revised signaling questions. For studies that reported more than one set of performance results, we extracted only one set of diagnostic accuracy data per study based on the last development stage or the optimal algorithm as indicated by the study authors. For two-class algorithms, we collected data from the 2x2 table whenever feasible. For multi-class algorithms, we first consolidated data from all classes other than eAMD before constructing the corresponding 2x2 tables. Assuming a common positivity threshold applied by the included studies, we chose random-effects, bivariate logistic models to estimate summary sensitivity and specificity as the primary performance metrics.

Main results: We identified 36 eligible studies that reported 40 sets of algorithm performance data, encompassing over 16,000 participants and 62,000 images. We included 28 studies (78%) that reported 31 algorithms with performance data in the meta-analysis. The remaining nine studies (25%) reported eight algorithms that lacked usable performance data; we reported them in the qualitative synthesis. Study characteristics and risk of bias Most studies were conducted in Asia, followed by Europe, the USA, and collaborative efforts spanning multiple countries. Most studies identified study participants from the hospital setting, while others used retinal images from public repositories; a few studies did not specify image sources. Based on four of the 36 studies reporting demographic information, the age of the study participants ranged from 62 to 82 years. The included algorithms used various retinal image types as model input, such as optical coherence tomography (OCT) images (N = 15), fundus images (N = 6), and multi-modal imaging (N = 7). The predominant core method used was deep neural networks. All studies that reported externally validated algorithms were at high risk of bias mainly due to potential selection bias from either a two-gate design or the inappropriate exclusion of potentially eligible retinal images (or participants). Findings Only three of the 40 included algorithms were externally validated (7.5%, 3/40). The summary sensitivity and specificity were 0.94 (95% confidence interval (CI) 0.90 to 0.97) and 0.99 (95% CI 0.76 to 1.00), respectively, when compared to human graders (3 studies; 27,872 images; low-certainty evidence). The prevalence of images with eAMD ranged from 0.3% to 49%. Twenty-eight algorithms were reportedly either internally validated (20%, 8/40) or tested on a development set (50%, 20/40); the pooled sensitivity and specificity were 0.93 (95% CI 0.89 to 0.96) and 0.96 (95% CI 0.94 to 0.98), respectively, when compared to human graders (28 studies; 33,409 images; low-certainty evidence). We did not identify significant sources of heterogeneity among these 28 algorithms. Although algorithms using OCT images appeared more homogeneous and had the highest summary specificity (0.97, 95% CI 0.93 to 0.98), they were not superior to algorithms using fundus images alone (0.94, 95% CI 0.89 to 0.97) or multimodal imaging (0.96, 95% CI 0.88 to 0.99; P for meta-regression = 0.239). The median prevalence of images with eAMD was 30% (interquartile range [IQR] 22% to 39%). We did not include eight studies that described nine algorithms (one study reported two sets of algorithm results) to distinguish eAMD from normal images, images of other AMD, or other non-AMD retinal lesions in the meta-analysis. Five of these algorithms were generally based on smaller datasets (range 21 to 218 participants per study) yet with a higher prevalence of eAMD images (range 33% to 66%). Relative to human graders, the reported sensitivity in these studies ranged from 0.95 and 0.97, while the specificity ranged from 0.94 to 0.99. Similarly, using small datasets (range 46 to 106), an additional four algorithms for detecting eAMD from other retinal lesions showed high sensitivity (range 0.96 to 1.00) and specificity (range 0.77 to 1.00).

Authors' conclusions: Low- to very low-certainty evidence suggests that an algorithm-based test may correctly identify most individuals with eAMD without increasing unnecessary referrals (false positives) in either the primary or the specialty care settings. There were significant concerns for applying the review findings due to variations in the eAMD prevalence in the included studies. In addition, among the included algorithm-based tests, diagnostic accuracy estimates were at risk of bias due to study participants not reflecting real-world characteristics, inadequate model validation, and the likelihood of selective results reporting. Limited quality and quantity of externally validated algorithms highlighted the need for high-certainty evidence. This evidence will require a standardized definition for eAMD on different imaging modalities and external validation of the algorithm to assess generalizability.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

人工智能诊断渗出性老年性黄斑变性。

背景：老年性黄斑变性（AMD）是一种视网膜疾病，以视网膜（黄斑）中心受损为特征。大约 10%-20%的非渗出性黄斑变性病例会发展为渗出性黄斑变性，这可能会导致中心视力迅速恶化。患有渗出性黄斑病变（eAMD）的患者需要及时向视网膜专家咨询，以尽量减少视力丧失的风险和程度。诊断眼科疾病的传统方法依赖于临床评估和多种成像技术，这可能会消耗大量资源。利用人工智能（AI）进行的测试有望自动识别病理特征并对其进行分类，从而及时诊断和治疗 eAMD：目的：确定人工智能（AI）作为渗出性老年性黄斑变性（eAMD）分诊工具的诊断准确性：我们检索了 CENTRAL、MEDLINE、Embase、三个临床试验登记处以及数据归档和联网服务 (DANS) 中的灰色文献。我们没有限制检索的语言或出版日期。最后一次检索日期为 2024 年 4 月：纳入的研究比较了算法与人类阅读器在视网膜图像上检测 eAMD 的测试性能，这些视网膜图像是从社区或学术医疗中心的眼科诊所收集的 AMD 患者的视网膜图像上检测的，这些患者在拍摄图像时并未接受 eAMD 治疗。数据收集与分析：一对综述作者独立提取数据，并使用带有修订信号问题的诊断准确性研究质量评估-2（QUADAS-2）工具评估研究质量。对于报告了多组性能结果的研究，我们根据最后开发阶段或研究作者指出的最佳算法，每项研究只提取一组诊断准确性数据。对于两类算法，我们尽可能从 2x2 表中收集数据。对于多类算法，我们首先合并了除 eAMD 以外的所有类别的数据，然后再构建相应的 2x2 表。假设纳入的研究采用了共同的阳性阈值，我们选择随机效应双变量逻辑模型来估算灵敏度和特异度，作为主要的性能指标：我们确定了 36 项符合条件的研究，这些研究报告了 40 组算法性能数据，涉及 16,000 多名参与者和 62,000 多张图像。我们在荟萃分析中纳入了 28 项研究（78%），这些研究报告了 31 种算法的性能数据。其余 9 项研究（25%）报告了 8 种缺乏可用性能数据的算法；我们在定性综合中报告了这些算法。研究特点和偏倚风险大多数研究在亚洲进行，其次是欧洲、美国和多个国家的合作研究。大多数研究确定了医院环境中的研究参与者，其他研究则使用了公共存储库中的视网膜图像；少数研究没有说明图像来源。36 项研究中有 4 项报告了人口统计学信息，根据这些信息，研究参与者的年龄从 62 岁到 82 岁不等。纳入的算法使用各种视网膜图像类型作为模型输入，如光学相干断层扫描（OCT）图像（15 例）、眼底图像（6 例）和多模态成像（7 例）。使用的主要核心方法是深度神经网络。所有报告外部验证算法的研究都存在较高的偏倚风险，这主要是由于双门设计或不适当地排除了可能符合条件的视网膜图像（或参与者）造成的潜在选择偏倚。研究结果在纳入的 40 项算法中，只有 3 项经过外部验证（7.5%，3/40）。与人类分级人员相比，灵敏度和特异度分别为 0.94（95% 置信区间 (CI) 0.90 至 0.97）和 0.99（95% CI 0.76 至 1.00）（3 项研究；27,872 张图像；低确定性证据）。具有 eAMD 的图像的发生率从 0.3% 到 49% 不等。据报道，有 28 种算法经过了内部验证（20%，8/40）或在开发集上进行了测试（50%，20/40）；与人类分级人员相比，汇总灵敏度和特异度分别为 0.93（95% CI 0.89 至 0.96）和 0.96（95% CI 0.94 至 0.98）（28 项研究；33,409 张图像；低确定性证据）。在这 28 种算法中，我们没有发现明显的异质性。虽然使用 OCT 图像的算法似乎更均匀，且具有最高的汇总特异性（0.97，95% CI 0.93 至 0.98），但它们并不优于仅使用眼底图像（0.94，95% CI 0.89 至 0.97）或多模态成像（0.96，95% CI 0.88 至 0.99；元回归 P = 0.239）的算法。eAMD影像的中位患病率为30%（四分位距[IQR]为22%至39%）。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Cochrane Database of Systematic Reviews 医学-医学：内科

CiteScore

10.60

自引率

2.40%

发文量

173

审稿时长

1-2 weeks

期刊介绍： The Cochrane Database of Systematic Reviews (CDSR) stands as the premier database for systematic reviews in healthcare. It comprises Cochrane Reviews, along with protocols for these reviews, editorials, and supplements. Owned and operated by Cochrane, a worldwide independent network of healthcare stakeholders, the CDSR (ISSN 1469-493X) encompasses a broad spectrum of health-related topics, including health services.