Kerol R. Djoumessi Donteu, Ziwei Huang, Laura Kuehlewein, Annekatrin Rickmann, Natalia Simon, Lisa M. Koch, Philipp Berens
{"title":"An Inherently Interpretable AI model improves Screening Speed and Accuracy for Early Diabetic Retinopathy","authors":"Kerol R. Djoumessi Donteu, Ziwei Huang, Laura Kuehlewein, Annekatrin Rickmann, Natalia Simon, Lisa M. Koch, Philipp Berens","doi":"10.1101/2024.06.27.24309574","DOIUrl":null,"url":null,"abstract":"Background: Diabetic retinopathy (DR) is a frequent concomitant disease of diabetes, affecting millions worldwide. Screening for this disease based on fundus images has been one of the first successful use cases for modern artificial intelligence in medicine. Current state-of-the-art systems typically use black-box models to make referral decisions, requiring post-hoc methods for AI-human interaction.\nMethods: In this retrospective reader study, we evaluated an inherently interpretable deep learning model, which explicitly models the local evidence of DR as part of its network architecture, for early DR screening. We trained the network on 34,350 high-quality fundus images from a publicly available dataset and validated its state-of-the-art performance on a large range of ten external datasets. We obtained detailed lesion annotations from ophthalmologists on 65 images to study if the class evidence maps highlight clinically relevant information. Finally, we tested the clinical usefulness of our model in a reader study, where we compared screening for DR without AI support to screening with AI support with and without AI explanations.\nResults: The inherently interpretable deep learning model obtained an accuracy of .906 [.900-.913] (95%-confidence interval) and an AUC of .904 [.894-.913] on the internal test set and similar performance on external datasets. High evidence regions directly extracted from the model contained clinically relevant lesions such as microaneurysms or haemorrhages with a high precision of .960 [.941 - .976]. Decision support by the model highlighting high-evidence regions in the image improved screening accuracy for difficult decisions and improved screening speed.\nInterpretation: Inherently interpretable deep learning models can reach state-of-the-art performance and support screening for early DR by improving human-AI collaboration.","PeriodicalId":501390,"journal":{"name":"medRxiv - Ophthalmology","volume":"36 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"medRxiv - Ophthalmology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.06.27.24309574","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Diabetic retinopathy (DR) is a frequent concomitant disease of diabetes, affecting millions worldwide. Screening for this disease based on fundus images has been one of the first successful use cases for modern artificial intelligence in medicine. Current state-of-the-art systems typically use black-box models to make referral decisions, requiring post-hoc methods for AI-human interaction.
Methods: In this retrospective reader study, we evaluated an inherently interpretable deep learning model, which explicitly models the local evidence of DR as part of its network architecture, for early DR screening. We trained the network on 34,350 high-quality fundus images from a publicly available dataset and validated its state-of-the-art performance on a large range of ten external datasets. We obtained detailed lesion annotations from ophthalmologists on 65 images to study if the class evidence maps highlight clinically relevant information. Finally, we tested the clinical usefulness of our model in a reader study, where we compared screening for DR without AI support to screening with AI support with and without AI explanations.
Results: The inherently interpretable deep learning model obtained an accuracy of .906 [.900-.913] (95%-confidence interval) and an AUC of .904 [.894-.913] on the internal test set and similar performance on external datasets. High evidence regions directly extracted from the model contained clinically relevant lesions such as microaneurysms or haemorrhages with a high precision of .960 [.941 - .976]. Decision support by the model highlighting high-evidence regions in the image improved screening accuracy for difficult decisions and improved screening speed.
Interpretation: Inherently interpretable deep learning models can reach state-of-the-art performance and support screening for early DR by improving human-AI collaboration.