Dyllan Edson Similié, Jakob K H Andersen, Sebastian Dinesen, Thiusius R Savarimuthu, Jakob Grauslund
{"title":"使用预分割深度学习分类模型对糖尿病视网膜病变进行分级:验证自动算法。","authors":"Dyllan Edson Similié, Jakob K H Andersen, Sebastian Dinesen, Thiusius R Savarimuthu, Jakob Grauslund","doi":"10.1111/aos.16781","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>To validate the performance of autonomous diabetic retinopathy (DR) grading by comparing a human grader and a self-developed deep-learning (DL) algorithm with gold-standard evaluation.</p><p><strong>Methods: </strong>We included 500, 6-field retinal images graded by an expert ophthalmologist (gold standard) according to the International Clinical Diabetic Retinopathy Disease Severity Scale as represented with DR levels 0-4 (97, 100, 100, 103, 100, respectively). Weighted kappa was calculated to measure the DR classification agreement for (1) a certified human grader without, and (2) with assistance from a DL algorithm and (3) the DL operating autonomously. Using any DR (level 0 vs. 1-4) as a cutoff, we calculated sensitivity, specificity, as well as positive and negative predictive values (PPV and NPV). Finally, we assessed lesion discrepancies between Model 3 and the gold standard.</p><p><strong>Results: </strong>As compared to the gold standard, weighted kappa for Models 1-3 was 0.88, 0.89 and 0.72, sensitivities were 95%, 94% and 78% and specificities were 82%, 84% and 81%. Extrapolating to a real-world DR prevalence of 23.8%, the PPV were 63%, 64% and 57% and the NPV were 98%, 98% and 92%. Discrepancies between the gold standard and Model 3 were mainly incorrect detection of artefacts (n = 49), missed microaneurysms (n = 26) and inconsistencies between the segmentation and classification (n = 51).</p><p><strong>Conclusion: </strong>While the autonomous DL algorithm for DR classification only performed on par with a human grader for some measures in a high-risk population, extrapolations to a real-world population demonstrated an excellent 92% NPV, which could make it clinically feasible to use autonomously to identify non-DR patients.</p>","PeriodicalId":6915,"journal":{"name":"Acta Ophthalmologica","volume":" ","pages":""},"PeriodicalIF":3.0000,"publicationDate":"2024-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Grading of diabetic retinopathy using a pre-segmenting deep learning classification model: Validation of an automated algorithm.\",\"authors\":\"Dyllan Edson Similié, Jakob K H Andersen, Sebastian Dinesen, Thiusius R Savarimuthu, Jakob Grauslund\",\"doi\":\"10.1111/aos.16781\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Purpose: </strong>To validate the performance of autonomous diabetic retinopathy (DR) grading by comparing a human grader and a self-developed deep-learning (DL) algorithm with gold-standard evaluation.</p><p><strong>Methods: </strong>We included 500, 6-field retinal images graded by an expert ophthalmologist (gold standard) according to the International Clinical Diabetic Retinopathy Disease Severity Scale as represented with DR levels 0-4 (97, 100, 100, 103, 100, respectively). Weighted kappa was calculated to measure the DR classification agreement for (1) a certified human grader without, and (2) with assistance from a DL algorithm and (3) the DL operating autonomously. Using any DR (level 0 vs. 1-4) as a cutoff, we calculated sensitivity, specificity, as well as positive and negative predictive values (PPV and NPV). Finally, we assessed lesion discrepancies between Model 3 and the gold standard.</p><p><strong>Results: </strong>As compared to the gold standard, weighted kappa for Models 1-3 was 0.88, 0.89 and 0.72, sensitivities were 95%, 94% and 78% and specificities were 82%, 84% and 81%. Extrapolating to a real-world DR prevalence of 23.8%, the PPV were 63%, 64% and 57% and the NPV were 98%, 98% and 92%. Discrepancies between the gold standard and Model 3 were mainly incorrect detection of artefacts (n = 49), missed microaneurysms (n = 26) and inconsistencies between the segmentation and classification (n = 51).</p><p><strong>Conclusion: </strong>While the autonomous DL algorithm for DR classification only performed on par with a human grader for some measures in a high-risk population, extrapolations to a real-world population demonstrated an excellent 92% NPV, which could make it clinically feasible to use autonomously to identify non-DR patients.</p>\",\"PeriodicalId\":6915,\"journal\":{\"name\":\"Acta Ophthalmologica\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":3.0000,\"publicationDate\":\"2024-10-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Acta Ophthalmologica\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1111/aos.16781\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"OPHTHALMOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Acta Ophthalmologica","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1111/aos.16781","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"OPHTHALMOLOGY","Score":null,"Total":0}
Grading of diabetic retinopathy using a pre-segmenting deep learning classification model: Validation of an automated algorithm.
Purpose: To validate the performance of autonomous diabetic retinopathy (DR) grading by comparing a human grader and a self-developed deep-learning (DL) algorithm with gold-standard evaluation.
Methods: We included 500, 6-field retinal images graded by an expert ophthalmologist (gold standard) according to the International Clinical Diabetic Retinopathy Disease Severity Scale as represented with DR levels 0-4 (97, 100, 100, 103, 100, respectively). Weighted kappa was calculated to measure the DR classification agreement for (1) a certified human grader without, and (2) with assistance from a DL algorithm and (3) the DL operating autonomously. Using any DR (level 0 vs. 1-4) as a cutoff, we calculated sensitivity, specificity, as well as positive and negative predictive values (PPV and NPV). Finally, we assessed lesion discrepancies between Model 3 and the gold standard.
Results: As compared to the gold standard, weighted kappa for Models 1-3 was 0.88, 0.89 and 0.72, sensitivities were 95%, 94% and 78% and specificities were 82%, 84% and 81%. Extrapolating to a real-world DR prevalence of 23.8%, the PPV were 63%, 64% and 57% and the NPV were 98%, 98% and 92%. Discrepancies between the gold standard and Model 3 were mainly incorrect detection of artefacts (n = 49), missed microaneurysms (n = 26) and inconsistencies between the segmentation and classification (n = 51).
Conclusion: While the autonomous DL algorithm for DR classification only performed on par with a human grader for some measures in a high-risk population, extrapolations to a real-world population demonstrated an excellent 92% NPV, which could make it clinically feasible to use autonomously to identify non-DR patients.
期刊介绍:
Acta Ophthalmologica is published on behalf of the Acta Ophthalmologica Scandinavica Foundation and is the official scientific publication of the following societies: The Danish Ophthalmological Society, The Finnish Ophthalmological Society, The Icelandic Ophthalmological Society, The Norwegian Ophthalmological Society and The Swedish Ophthalmological Society, and also the European Association for Vision and Eye Research (EVER).
Acta Ophthalmologica publishes clinical and experimental original articles, reviews, editorials, educational photo essays (Diagnosis and Therapy in Ophthalmology), case reports and case series, letters to the editor and doctoral theses.