Test accuracy of artificial intelligence-based grading of fundus images in diabetic retinopathy screening: A systematic review.

IF 2.6 4区 医学 Q2 PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH Journal of Medical Screening Pub Date : 2023-09-01 DOI:10.1177/09691413221144382
Zhivko Zhelev, Jaime Peters, Morwenna Rogers, Michael Allen, Goda Kijauskaite, Farah Seedat, Elizabeth Wilkinson, Christopher Hyde
{"title":"Test accuracy of artificial intelligence-based grading of fundus images in diabetic retinopathy screening: A systematic review.","authors":"Zhivko Zhelev,&nbsp;Jaime Peters,&nbsp;Morwenna Rogers,&nbsp;Michael Allen,&nbsp;Goda Kijauskaite,&nbsp;Farah Seedat,&nbsp;Elizabeth Wilkinson,&nbsp;Christopher Hyde","doi":"10.1177/09691413221144382","DOIUrl":null,"url":null,"abstract":"<p><strong>Objectives: </strong>To systematically review the accuracy of artificial intelligence (AI)-based systems for grading of fundus images in diabetic retinopathy (DR) screening.</p><p><strong>Methods: </strong>We searched MEDLINE, EMBASE, the Cochrane Library and the ClinicalTrials.gov from 1st January 2000 to 27th August 2021. Accuracy studies published in English were included if they met the pre-specified inclusion criteria. Selection of studies for inclusion, data extraction and quality assessment were conducted by one author with a second reviewer independently screening and checking 20% of titles. Results were analysed narratively.</p><p><strong>Results: </strong>Forty-three studies evaluating 15 deep learning (DL) and 4 machine learning (ML) systems were included. Nine systems were evaluated in a single study each. Most studies were judged to be at high or unclear risk of bias in at least one QUADAS-2 domain. Sensitivity for referable DR and higher grades was ≥85% while specificity varied and was <80% for all ML systems and in 6/31 studies evaluating DL systems. Studies reported high accuracy for detection of ungradable images, but the latter were analysed and reported inconsistently. Seven studies reported that AI was more sensitive but less specific than human graders.</p><p><strong>Conclusions: </strong>AI-based systems are more sensitive than human graders and could be safe to use in clinical practice but have variable specificity. However, for many systems evidence is limited, at high risk of bias and may not generalise across settings. Therefore, pre-implementation assessment in the target clinical pathway is essential to obtain reliable and applicable accuracy estimates.</p>","PeriodicalId":51089,"journal":{"name":"Journal of Medical Screening","volume":"30 3","pages":"97-112"},"PeriodicalIF":2.6000,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10399100/pdf/","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Medical Screening","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1177/09691413221144382","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH","Score":null,"Total":0}
引用次数: 1

Abstract

Objectives: To systematically review the accuracy of artificial intelligence (AI)-based systems for grading of fundus images in diabetic retinopathy (DR) screening.

Methods: We searched MEDLINE, EMBASE, the Cochrane Library and the ClinicalTrials.gov from 1st January 2000 to 27th August 2021. Accuracy studies published in English were included if they met the pre-specified inclusion criteria. Selection of studies for inclusion, data extraction and quality assessment were conducted by one author with a second reviewer independently screening and checking 20% of titles. Results were analysed narratively.

Results: Forty-three studies evaluating 15 deep learning (DL) and 4 machine learning (ML) systems were included. Nine systems were evaluated in a single study each. Most studies were judged to be at high or unclear risk of bias in at least one QUADAS-2 domain. Sensitivity for referable DR and higher grades was ≥85% while specificity varied and was <80% for all ML systems and in 6/31 studies evaluating DL systems. Studies reported high accuracy for detection of ungradable images, but the latter were analysed and reported inconsistently. Seven studies reported that AI was more sensitive but less specific than human graders.

Conclusions: AI-based systems are more sensitive than human graders and could be safe to use in clinical practice but have variable specificity. However, for many systems evidence is limited, at high risk of bias and may not generalise across settings. Therefore, pre-implementation assessment in the target clinical pathway is essential to obtain reliable and applicable accuracy estimates.

Abstract Image

Abstract Image

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于人工智能的眼底图像分级在糖尿病视网膜病变筛查中的测试准确性:系统综述。
目的:系统评价基于人工智能(AI)的眼底图像分级系统在糖尿病视网膜病变(DR)筛查中的准确性。方法:我们从2000年1月1日至2021年8月27日检索MEDLINE、EMBASE、Cochrane图书馆和ClinicalTrials.gov。如果以英文发表的准确性研究符合预先指定的纳入标准,则纳入研究。纳入研究的选择、数据提取和质量评估由一名作者进行,第二名审稿人独立筛选和检查20%的标题。对结果进行叙述分析。结果:纳入了43项研究,评估了15个深度学习(DL)和4个机器学习(ML)系统。在单个研究中分别评估了9个系统。大多数研究在至少一个QUADAS-2领域被判定为高偏倚风险或不明确。结论:基于人工智能的分级系统比人类分级系统更敏感,在临床实践中可以安全使用,但特异性可变。然而,对于许多系统来说,证据是有限的,具有很高的偏倚风险,并且可能无法在所有设置中推广。因此,在目标临床路径中进行实施前评估对于获得可靠和适用的准确性评估至关重要。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of Medical Screening
Journal of Medical Screening 医学-公共卫生、环境卫生与职业卫生
CiteScore
4.90
自引率
3.40%
发文量
40
审稿时长
>12 weeks
期刊介绍: Journal of Medical Screening, a fully peer reviewed journal, is concerned with all aspects of medical screening, particularly the publication of research that advances screening theory and practice. The journal aims to increase awareness of the principles of screening (quantitative and statistical aspects), screening techniques and procedures and methodologies from all specialties. An essential subscription for physicians, clinicians and academics with an interest in screening, epidemiology and public health.
期刊最新文献
Age-specific differences in tumour characteristics between screen-detected and non-screen-detected breast cancers in women aged 40-74 at diagnosis in Sweden from 2008 to 2017. Association between time to colonoscopy after positive fecal testing and colorectal cancer outcomes in Alberta, Canada. Cancer screening programs in Japan: Progress and challenges. Strong association between reduction of late-stage cancers and reduction of cancer-specific mortality in meta-regression of randomized screening trials across multiple cancer types. Factors associated with private or public breast cancer screening attendance in Queensland, Australia: A retrospective cross-sectional study.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1