Zichen Ye , Daqian Zhang , Yuankai Zhao , Mingyang Chen , Huike Wang , Samuel Seery , Yimin Qu , Peng Xue , Yu Jiang
{"title":"Deep learning algorithms for melanoma detection using dermoscopic images: A systematic review and meta-analysis","authors":"Zichen Ye , Daqian Zhang , Yuankai Zhao , Mingyang Chen , Huike Wang , Samuel Seery , Yimin Qu , Peng Xue , Yu Jiang","doi":"10.1016/j.artmed.2024.102934","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><p>Melanoma is a serious risk to human health and early identification is vital for treatment success. Deep learning (DL) has the potential to detect cancer using imaging technologies and many studies provide evidence that DL algorithms can achieve high accuracy in melanoma diagnostics.</p></div><div><h3>Objectives</h3><p>To critically assess different DL performances in diagnosing melanoma using dermatoscopic images and discuss the relationship between dermatologists and DL.</p></div><div><h3>Methods</h3><p>Ovid-Medline, Embase, IEEE Xplore, and the Cochrane Library were systematically searched from inception until 7th December 2021. Studies that reported diagnostic DL model performances in detecting melanoma using dermatoscopic images were included if they had specific outcomes and histopathologic confirmation. Binary diagnostic accuracy data and contingency tables were extracted to analyze outcomes of interest, which included sensitivity (SEN), specificity (SPE), and area under the curve (AUC). Subgroup analyses were performed according to human-machine comparison and cooperation. The study was registered in PROSPERO, CRD42022367824.</p></div><div><h3>Results</h3><p>2309 records were initially retrieved, of which 37 studies met our inclusion criteria, and 27 provided sufficient data for meta-analytical synthesis. The pooled SEN was 82 % (range 77–86), SPE was 87 % (range 84–90), with an AUC of 0.92 (range 0.89–0.94). Human-machine comparison had pooled AUCs of 0.87 (0.84–0.90) and 0.83 (0.79–0.86) for DL and dermatologists, respectively. Pooled AUCs were 0.90 (0.87–0.93), 0.80 (0.76–0.83), and 0.88 (0.85–0.91) for DL, and junior and senior dermatologists, respectively. Analyses of human-machine cooperation were 0.88 (0.85–0.91) for DL, 0.76 (0.72–0.79) for unassisted, and 0.87 (0.84–0.90) for DL-assisted dermatologists.</p></div><div><h3>Conclusions</h3><p>Evidence suggests that DL algorithms are as accurate as senior dermatologists in melanoma diagnostics. Therefore, DL could be used to support dermatologists in diagnostic decision-making. Although, further high-quality, large-scale multicenter studies are required to address the specific challenges associated with medical AI-based diagnostics.</p></div>","PeriodicalId":55458,"journal":{"name":"Artificial Intelligence in Medicine","volume":"155 ","pages":"Article 102934"},"PeriodicalIF":6.1000,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence in Medicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0933365724001763","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Background
Melanoma is a serious risk to human health and early identification is vital for treatment success. Deep learning (DL) has the potential to detect cancer using imaging technologies and many studies provide evidence that DL algorithms can achieve high accuracy in melanoma diagnostics.
Objectives
To critically assess different DL performances in diagnosing melanoma using dermatoscopic images and discuss the relationship between dermatologists and DL.
Methods
Ovid-Medline, Embase, IEEE Xplore, and the Cochrane Library were systematically searched from inception until 7th December 2021. Studies that reported diagnostic DL model performances in detecting melanoma using dermatoscopic images were included if they had specific outcomes and histopathologic confirmation. Binary diagnostic accuracy data and contingency tables were extracted to analyze outcomes of interest, which included sensitivity (SEN), specificity (SPE), and area under the curve (AUC). Subgroup analyses were performed according to human-machine comparison and cooperation. The study was registered in PROSPERO, CRD42022367824.
Results
2309 records were initially retrieved, of which 37 studies met our inclusion criteria, and 27 provided sufficient data for meta-analytical synthesis. The pooled SEN was 82 % (range 77–86), SPE was 87 % (range 84–90), with an AUC of 0.92 (range 0.89–0.94). Human-machine comparison had pooled AUCs of 0.87 (0.84–0.90) and 0.83 (0.79–0.86) for DL and dermatologists, respectively. Pooled AUCs were 0.90 (0.87–0.93), 0.80 (0.76–0.83), and 0.88 (0.85–0.91) for DL, and junior and senior dermatologists, respectively. Analyses of human-machine cooperation were 0.88 (0.85–0.91) for DL, 0.76 (0.72–0.79) for unassisted, and 0.87 (0.84–0.90) for DL-assisted dermatologists.
Conclusions
Evidence suggests that DL algorithms are as accurate as senior dermatologists in melanoma diagnostics. Therefore, DL could be used to support dermatologists in diagnostic decision-making. Although, further high-quality, large-scale multicenter studies are required to address the specific challenges associated with medical AI-based diagnostics.
背景黑色素瘤严重危害人类健康,早期识别对治疗成功至关重要。深度学习(DL)具有利用成像技术检测癌症的潜力,许多研究提供证据表明,DL 算法可以在黑色素瘤诊断中达到很高的准确性.Objectives To critically assess different DL performances in diagnosing melanoma using dermatoscopic images and discuss the relationship between dermatologists and DL.MethodsOvid-Medline, Embase, IEEE Xplore, and the Cochrane Library were systematically searched from inception until 7th December 2021.方法系统地检索了从开始到 2021 年 12 月 7 日的研究。如果研究具有特定结果和组织病理学证实,则纳入报告了使用皮肤镜图像检测黑色素瘤的诊断 DL 模型性能的研究。提取二元诊断准确性数据和或然率表来分析相关结果,包括灵敏度(SEN)、特异性(SPE)和曲线下面积(AUC)。根据人机比较和合作情况进行了分组分析。该研究已在 PROSPERO 注册,CRD42022367824.Results最初检索到 2309 条记录,其中 37 项研究符合我们的纳入标准,27 项提供了足够的数据用于荟萃分析综合。汇总的 SEN 为 82%(范围为 77-86),SPE 为 87%(范围为 84-90),AUC 为 0.92(范围为 0.89-0.94)。人机比较中,DL 和皮肤科医生的集合 AUC 分别为 0.87(0.84-0.90)和 0.83(0.79-0.86)。DL以及初级和高级皮肤科医生的集合AUC分别为0.90(0.87-0.93)、0.80(0.76-0.83)和0.88(0.85-0.91)。结论有证据表明,DL 算法在黑色素瘤诊断方面与资深皮肤科医生一样准确。因此,DL 可用于辅助皮肤科医生做出诊断决策。不过,还需要进一步开展高质量、大规模的多中心研究,以应对与基于医疗人工智能的诊断相关的具体挑战。
期刊介绍:
Artificial Intelligence in Medicine publishes original articles from a wide variety of interdisciplinary perspectives concerning the theory and practice of artificial intelligence (AI) in medicine, medically-oriented human biology, and health care.
Artificial intelligence in medicine may be characterized as the scientific discipline pertaining to research studies, projects, and applications that aim at supporting decision-based medical tasks through knowledge- and/or data-intensive computer-based solutions that ultimately support and improve the performance of a human care provider.