Lukas Heinlein, Roman C. Maron, Achim Hekler, Sarah Haggenmüller, Christoph Wies, Jochen S. Utikal, Friedegund Meier, Sarah Hobelsberger, Frank F. Gellrich, Mildred Sergon, Axel Hauschild, Lars E. French, Lucie Heinzerling, Justin G. Schlager, Kamran Ghoreschi, Max Schlaak, Franz J. Hilke, Gabriela Poch, Sören Korsing, Carola Berking, Markus V. Heppt, Michael Erdmann, Sebastian Haferkamp, Konstantin Drexler, Dirk Schadendorf, Wiebke Sondermann, Matthias Goebeler, Bastian Schilling, Eva Krieghoff-Henning, Titus J. Brinker
{"title":"Prospective multicenter study using artificial intelligence to improve dermoscopic melanoma diagnosis in patient care","authors":"Lukas Heinlein, Roman C. Maron, Achim Hekler, Sarah Haggenmüller, Christoph Wies, Jochen S. Utikal, Friedegund Meier, Sarah Hobelsberger, Frank F. Gellrich, Mildred Sergon, Axel Hauschild, Lars E. French, Lucie Heinzerling, Justin G. Schlager, Kamran Ghoreschi, Max Schlaak, Franz J. Hilke, Gabriela Poch, Sören Korsing, Carola Berking, Markus V. Heppt, Michael Erdmann, Sebastian Haferkamp, Konstantin Drexler, Dirk Schadendorf, Wiebke Sondermann, Matthias Goebeler, Bastian Schilling, Eva Krieghoff-Henning, Titus J. Brinker","doi":"10.1038/s43856-024-00598-5","DOIUrl":null,"url":null,"abstract":"Early detection of melanoma, a potentially lethal type of skin cancer with high prevalence worldwide, improves patient prognosis. In retrospective studies, artificial intelligence (AI) has proven to be helpful for enhancing melanoma detection. However, there are few prospective studies confirming these promising results. Existing studies are limited by low sample sizes, too homogenous datasets, or lack of inclusion of rare melanoma subtypes, preventing a fair and thorough evaluation of AI and its generalizability, a crucial aspect for its application in the clinical setting. Therefore, we assessed “All Data are Ext” (ADAE), an established open-source ensemble algorithm for detecting melanomas, by comparing its diagnostic accuracy to that of dermatologists on a prospectively collected, external, heterogeneous test set comprising eight distinct hospitals, four different camera setups, rare melanoma subtypes, and special anatomical sites. We advanced the algorithm with real test-time augmentation (R-TTA, i.e., providing real photographs of lesions taken from multiple angles and averaging the predictions), and evaluated its generalization capabilities. Overall, the AI shows higher balanced accuracy than dermatologists (0.798, 95% confidence interval (CI) 0.779–0.814 vs. 0.781, 95% CI 0.760–0.802; p = 4.0e−145), obtaining a higher sensitivity (0.921, 95% CI 0.900–0.942 vs. 0.734, 95% CI 0.701–0.770; p = 3.3e−165) at the cost of a lower specificity (0.673, 95% CI 0.641–0.702 vs. 0.828, 95% CI 0.804–0.852; p = 3.3e−165). As the algorithm exhibits a significant performance advantage on our heterogeneous dataset exclusively comprising melanoma-suspicious lesions, AI may offer the potential to support dermatologists, particularly in diagnosing challenging cases. Melanoma is a type of skin cancer that can spread to other parts of the body, often resulting in death. Early detection improves survival rates. Computational tools that use artificial intelligence (AI) can be used to detect melanoma. However, few studies have checked how well the AI works on real-world data obtained from patients. We tested a previously developed AI tool on data obtained from eight different hospitals that used different types of cameras, which also included images taken of rare melanoma types and from a range of different parts of the body. The AI tool was more likely to correctly identify melanoma than dermatologists. This AI tool could be used to help dermatologists diagnose melanoma, particularly those that are difficult for dermatologists to diagnose. Heinlein, Maron, Hekler et al. evaluate an AI algorithm for detecting melanoma and compare its performance to that of dermatologist on a prospectively collected, external, heterogeneous dataset. The AI exhibits a significant performance advantage, especially in diagnosing challenging cases.","PeriodicalId":72646,"journal":{"name":"Communications medicine","volume":null,"pages":null},"PeriodicalIF":5.4000,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s43856-024-00598-5.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Communications medicine","FirstCategoryId":"1085","ListUrlMain":"https://www.nature.com/articles/s43856-024-00598-5","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MEDICINE, RESEARCH & EXPERIMENTAL","Score":null,"Total":0}
引用次数: 0
Abstract
Early detection of melanoma, a potentially lethal type of skin cancer with high prevalence worldwide, improves patient prognosis. In retrospective studies, artificial intelligence (AI) has proven to be helpful for enhancing melanoma detection. However, there are few prospective studies confirming these promising results. Existing studies are limited by low sample sizes, too homogenous datasets, or lack of inclusion of rare melanoma subtypes, preventing a fair and thorough evaluation of AI and its generalizability, a crucial aspect for its application in the clinical setting. Therefore, we assessed “All Data are Ext” (ADAE), an established open-source ensemble algorithm for detecting melanomas, by comparing its diagnostic accuracy to that of dermatologists on a prospectively collected, external, heterogeneous test set comprising eight distinct hospitals, four different camera setups, rare melanoma subtypes, and special anatomical sites. We advanced the algorithm with real test-time augmentation (R-TTA, i.e., providing real photographs of lesions taken from multiple angles and averaging the predictions), and evaluated its generalization capabilities. Overall, the AI shows higher balanced accuracy than dermatologists (0.798, 95% confidence interval (CI) 0.779–0.814 vs. 0.781, 95% CI 0.760–0.802; p = 4.0e−145), obtaining a higher sensitivity (0.921, 95% CI 0.900–0.942 vs. 0.734, 95% CI 0.701–0.770; p = 3.3e−165) at the cost of a lower specificity (0.673, 95% CI 0.641–0.702 vs. 0.828, 95% CI 0.804–0.852; p = 3.3e−165). As the algorithm exhibits a significant performance advantage on our heterogeneous dataset exclusively comprising melanoma-suspicious lesions, AI may offer the potential to support dermatologists, particularly in diagnosing challenging cases. Melanoma is a type of skin cancer that can spread to other parts of the body, often resulting in death. Early detection improves survival rates. Computational tools that use artificial intelligence (AI) can be used to detect melanoma. However, few studies have checked how well the AI works on real-world data obtained from patients. We tested a previously developed AI tool on data obtained from eight different hospitals that used different types of cameras, which also included images taken of rare melanoma types and from a range of different parts of the body. The AI tool was more likely to correctly identify melanoma than dermatologists. This AI tool could be used to help dermatologists diagnose melanoma, particularly those that are difficult for dermatologists to diagnose. Heinlein, Maron, Hekler et al. evaluate an AI algorithm for detecting melanoma and compare its performance to that of dermatologist on a prospectively collected, external, heterogeneous dataset. The AI exhibits a significant performance advantage, especially in diagnosing challenging cases.
黑色素瘤是一种可能致命的皮肤癌,在全球发病率很高,早期发现黑色素瘤可改善患者的预后。在回顾性研究中,人工智能(AI)已被证明有助于提高黑色素瘤的检测率。然而,很少有前瞻性研究能证实这些令人鼓舞的结果。现有研究受到样本量少、数据集过于单一或未纳入罕见黑色素瘤亚型等因素的限制,无法对人工智能及其普适性进行公平、全面的评估,而这正是人工智能应用于临床的关键所在。因此,我们评估了 "All Data are Ext"(ADAE)--一种用于检测黑色素瘤的成熟开源集合算法,将其诊断准确性与皮肤科医生在前瞻性收集的外部异构测试集上的诊断准确性进行了比较,该测试集包括八家不同的医院、四种不同的相机设置、罕见黑色素瘤亚型和特殊解剖部位。我们利用真实测试时间增强(R-TTA,即提供从多个角度拍摄的病变真实照片并对预测结果求平均值)推进了该算法,并对其泛化能力进行了评估。总体而言,人工智能显示出比皮肤科医生更高的平衡准确度(0.798,95% 置信区间 (CI) 0.779-0.814 vs. 0.781,95% CI 0.760-0.802; p = 4.0e-145),获得更高的灵敏度(0.921,95% CI 0.900-0.942 vs. 0.734,95% CI 0.701-0.770; p = 3.3e-165),但特异性较低(0.673,95% CI 0.641-0.702 vs. 0.828,95% CI 0.804-0.852; p = 3.3e-165)。由于该算法在由黑色素瘤可疑病变组成的异构数据集上表现出显著的性能优势,因此人工智能有可能为皮肤科医生提供支持,尤其是在诊断具有挑战性的病例时。黑色素瘤是一种皮肤癌,可扩散到身体其他部位,往往导致死亡。早期发现可提高存活率。使用人工智能(AI)的计算工具可用于检测黑色素瘤。然而,很少有研究检验过人工智能在从患者那里获得的真实世界数据上的工作效果。我们对以前开发的人工智能工具进行了测试,测试的数据来自八家使用不同类型相机的医院,其中还包括罕见黑色素瘤类型和身体不同部位的图像。与皮肤科医生相比,人工智能工具更有可能正确识别黑色素瘤。这种人工智能工具可用于帮助皮肤科医生诊断黑色素瘤,尤其是那些皮肤科医生难以诊断的黑色素瘤。Heinlein、Maron、Hekler 等人评估了一种用于检测黑色素瘤的人工智能算法,并将其与皮肤科医生在前瞻性收集的外部异构数据集上的表现进行了比较。人工智能在性能上表现出明显的优势,尤其是在诊断具有挑战性的病例时。