Reproducibility and Explainability of Deep Learning in Mammography: A Systematic Review of Literature

IF 0.9 Q4 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING Indian Journal of Radiology and Imaging Pub Date : 2023-10-10 DOI:10.1055/s-0043-1775737
Deeksha Bhalla, Krithika Rangarajan, Tany Chandra, Subhashis Banerjee, Chetan Arora
{"title":"Reproducibility and Explainability of Deep Learning in Mammography: A Systematic Review of Literature","authors":"Deeksha Bhalla, Krithika Rangarajan, Tany Chandra, Subhashis Banerjee, Chetan Arora","doi":"10.1055/s-0043-1775737","DOIUrl":null,"url":null,"abstract":"Abstract Background Although abundant literature is currently available on the use of deep learning for breast cancer detection in mammography, the quality of such literature is widely variable. Purpose To evaluate published literature on breast cancer detection in mammography for reproducibility and to ascertain best practices for model design. Methods The PubMed and Scopus databases were searched to identify records that described the use of deep learning to detect lesions or classify images into cancer or noncancer. A modification of Quality Assessment of Diagnostic Accuracy Studies (mQUADAS-2) tool was developed for this review and was applied to the included studies. Results of reported studies (area under curve [AUC] of receiver operator curve [ROC] curve, sensitivity, specificity) were recorded. Results A total of 12,123 records were screened, of which 107 fit the inclusion criteria. Training and test datasets, key idea behind model architecture, and results were recorded for these studies. Based on mQUADAS-2 assessment, 103 studies had high risk of bias due to nonrepresentative patient selection. Four studies were of adequate quality, of which three trained their own model, and one used a commercial network. Ensemble models were used in two of these. Common strategies used for model training included patch classifiers, image classification networks (ResNet in 67%), and object detection networks (RetinaNet in 67%). The highest reported AUC was 0.927 ± 0.008 on a screening dataset, while it reached 0.945 (0.919–0.968) on an enriched subset. Higher values of AUC (0.955) and specificity (98.5%) were reached when combined radiologist and Artificial Intelligence readings were used than either of them alone. None of the studies provided explainability beyond localization accuracy. None of the studies have studied interaction between AI and radiologist in a real world setting. Conclusion While deep learning holds much promise in mammography interpretation, evaluation in a reproducible clinical setting and explainable networks are the need of the hour.","PeriodicalId":51597,"journal":{"name":"Indian Journal of Radiology and Imaging","volume":null,"pages":null},"PeriodicalIF":0.9000,"publicationDate":"2023-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Indian Journal of Radiology and Imaging","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1055/s-0043-1775737","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0

Abstract

Abstract Background Although abundant literature is currently available on the use of deep learning for breast cancer detection in mammography, the quality of such literature is widely variable. Purpose To evaluate published literature on breast cancer detection in mammography for reproducibility and to ascertain best practices for model design. Methods The PubMed and Scopus databases were searched to identify records that described the use of deep learning to detect lesions or classify images into cancer or noncancer. A modification of Quality Assessment of Diagnostic Accuracy Studies (mQUADAS-2) tool was developed for this review and was applied to the included studies. Results of reported studies (area under curve [AUC] of receiver operator curve [ROC] curve, sensitivity, specificity) were recorded. Results A total of 12,123 records were screened, of which 107 fit the inclusion criteria. Training and test datasets, key idea behind model architecture, and results were recorded for these studies. Based on mQUADAS-2 assessment, 103 studies had high risk of bias due to nonrepresentative patient selection. Four studies were of adequate quality, of which three trained their own model, and one used a commercial network. Ensemble models were used in two of these. Common strategies used for model training included patch classifiers, image classification networks (ResNet in 67%), and object detection networks (RetinaNet in 67%). The highest reported AUC was 0.927 ± 0.008 on a screening dataset, while it reached 0.945 (0.919–0.968) on an enriched subset. Higher values of AUC (0.955) and specificity (98.5%) were reached when combined radiologist and Artificial Intelligence readings were used than either of them alone. None of the studies provided explainability beyond localization accuracy. None of the studies have studied interaction between AI and radiologist in a real world setting. Conclusion While deep learning holds much promise in mammography interpretation, evaluation in a reproducible clinical setting and explainable networks are the need of the hour.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
乳房x线照相术中深度学习的可重复性和可解释性:文献系统综述
虽然目前有大量关于在乳房x光检查中使用深度学习进行乳腺癌检测的文献,但这些文献的质量参差不齐。目的评价已发表的关于乳房x线摄影中乳腺癌检测的文献的可重复性,并确定模型设计的最佳实践。方法检索PubMed和Scopus数据库,找出描述使用深度学习检测病变或将图像分类为癌症或非癌症的记录。本综述开发了诊断准确性研究质量评估(mQUADAS-2)工具的修改版,并应用于纳入的研究。记录已报道的研究结果(受试者操作曲线曲线下面积(AUC)、敏感性、特异性)。结果共筛选12123份记录,其中107份符合纳入标准。这些研究记录了训练和测试数据集、模型架构背后的关键思想和结果。基于mQUADAS-2评估,103项研究由于患者选择不具有代表性而存在高偏倚风险。四项研究具有足够的质量,其中三项研究训练了自己的模型,一项研究使用了商业网络。在其中的两个项目中使用了集成模型。用于模型训练的常用策略包括补丁分类器、图像分类网络(67%的ResNet)和目标检测网络(67%的RetinaNet)。在筛选数据集上报道的最高AUC为0.927±0.008,而在富集子集上报道的最高AUC为0.945(0.919-0.968)。联合使用放射科医生和人工智能读数时,AUC(0.955)和特异性(98.5%)高于单独使用任何一种读数。除了定位准确性之外,没有一项研究提供可解释性。这些研究都没有研究人工智能和放射科医生在现实世界中的互动。结论:虽然深度学习在乳房x线摄影解释中具有很大的前景,但在可重复的临床环境和可解释的网络中进行评估是当前的需要。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Indian Journal of Radiology and Imaging
Indian Journal of Radiology and Imaging RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING-
CiteScore
1.20
自引率
0.00%
发文量
115
审稿时长
45 weeks
期刊介绍: Information not localized
期刊最新文献
A Reflection of Our Role as Radiologists in India's Current Landscape of Infectious Diseases. Multimodality Imaging in the Diagnosis and Staging of Gestational Choriocarcinoma Rare Presentations of Takayasu Arteritis: A Case Series Sonographic Assessment of Isthmocele and Its Obstetric Complications in Subsequent Pregnancies: A Pictorial Review Exploring Radiology Postgraduate Students' Engagement with Large Language Models for Educational Purposes: A Study of Knowledge, Attitudes, and Practices
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1