Performance of an Artificial Intelligence System for Breast Cancer Detection on Screening Mammograms from BreastScreen Norway.
Marthe Larsen, Camilla F Olstad, Christoph I Lee, Tone Hovda, Solveig R Hoff, Marit A Martiniussen, Karl Øyvind Mikalsen, Håkon Lund-Hanssen, Helene S Solli, Marko Silberhorn, Åse Ø Sulheim, Steinar Auensen, Jan F Nygård, Solveig Hofvind
下载PDF
{"title":"Performance of an Artificial Intelligence System for Breast Cancer Detection on Screening Mammograms from BreastScreen Norway.","authors":"Marthe Larsen, Camilla F Olstad, Christoph I Lee, Tone Hovda, Solveig R Hoff, Marit A Martiniussen, Karl Øyvind Mikalsen, Håkon Lund-Hanssen, Helene S Solli, Marko Silberhorn, Åse Ø Sulheim, Steinar Auensen, Jan F Nygård, Solveig Hofvind","doi":"10.1148/ryai.230375","DOIUrl":null,"url":null,"abstract":"<p><p>Purpose To explore the stand-alone breast cancer detection performance, at different risk score thresholds, of a commercially available artificial intelligence (AI) system. Materials and Methods This retrospective study included information from 661 695 digital mammographic examinations performed among 242 629 female individuals screened as a part of BreastScreen Norway, 2004-2018. The study sample included 3807 screen-detected cancers and 1110 interval breast cancers. A continuous examination-level risk score by the AI system was used to measure performance as the area under the receiver operating characteristic curve (AUC) with 95% CIs and cancer detection at different AI risk score thresholds. Results The AUC of the AI system was 0.93 (95% CI: 0.92, 0.93) for screen-detected cancers and interval breast cancers combined and 0.97 (95% CI: 0.97, 0.97) for screen-detected cancers. In a setting where 10% of the examinations with the highest AI risk scores were defined as positive and 90% with the lowest scores as negative, 92.0% (3502 of 3807) of the screen-detected cancers and 44.6% (495 of 1110) of the interval breast cancers were identified with AI. In this scenario, 68.5% (10 987 of 16 040) of false-positive screening results (negative recall assessment) were considered negative by AI. When 50% was used as the cutoff, 99.3% (3781 of 3807) of the screen-detected cancers and 85.2% (946 of 1110) of the interval breast cancers were identified as positive by AI, whereas 17.0% (2725 of 16 040) of the false-positive results were considered negative. Conclusion The AI system showed high performance in detecting breast cancers within 2 years of screening mammography and a potential for use to triage low-risk mammograms to reduce radiologist workload. <b>Keywords:</b> Mammography, Breast, Screening, Convolutional Neural Network (CNN), Deep Learning Algorithms <i>Supplemental material is available for this article</i>. © RSNA, 2024 See also commentary by Bahl and Do in this issue.</p>","PeriodicalId":29787,"journal":{"name":"Radiology-Artificial Intelligence","volume":" ","pages":"e230375"},"PeriodicalIF":8.1000,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11140504/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Radiology-Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1148/ryai.230375","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
引用
批量引用
Abstract
Purpose To explore the stand-alone breast cancer detection performance, at different risk score thresholds, of a commercially available artificial intelligence (AI) system. Materials and Methods This retrospective study included information from 661 695 digital mammographic examinations performed among 242 629 female individuals screened as a part of BreastScreen Norway, 2004-2018. The study sample included 3807 screen-detected cancers and 1110 interval breast cancers. A continuous examination-level risk score by the AI system was used to measure performance as the area under the receiver operating characteristic curve (AUC) with 95% CIs and cancer detection at different AI risk score thresholds. Results The AUC of the AI system was 0.93 (95% CI: 0.92, 0.93) for screen-detected cancers and interval breast cancers combined and 0.97 (95% CI: 0.97, 0.97) for screen-detected cancers. In a setting where 10% of the examinations with the highest AI risk scores were defined as positive and 90% with the lowest scores as negative, 92.0% (3502 of 3807) of the screen-detected cancers and 44.6% (495 of 1110) of the interval breast cancers were identified with AI. In this scenario, 68.5% (10 987 of 16 040) of false-positive screening results (negative recall assessment) were considered negative by AI. When 50% was used as the cutoff, 99.3% (3781 of 3807) of the screen-detected cancers and 85.2% (946 of 1110) of the interval breast cancers were identified as positive by AI, whereas 17.0% (2725 of 16 040) of the false-positive results were considered negative. Conclusion The AI system showed high performance in detecting breast cancers within 2 years of screening mammography and a potential for use to triage low-risk mammograms to reduce radiologist workload. Keywords: Mammography, Breast, Screening, Convolutional Neural Network (CNN), Deep Learning Algorithms Supplemental material is available for this article . © RSNA, 2024 See also commentary by Bahl and Do in this issue.
挪威 BreastScreen 乳腺癌筛查乳房 X 线照片的人工智能乳腺癌检测系统性能。
"刚刚接受 "的论文经过同行评审,已被接受在《放射学》上发表:人工智能》上发表。这篇文章在以最终版本发表之前,还将经过校对、排版和校对审核。请注意,在制作最终校对稿的过程中,可能会发现一些错误,从而影响文章内容。目的 探讨市售人工智能(AI)系统在不同风险评分阈值下的独立乳腺癌检测性能。材料与方法 这项回顾性研究纳入了 2004-2018 年作为 x 的一部分进行筛查的 242629 名女性中进行的 661695 次数字乳腺 X 光检查的信息。研究样本包括 3807 例筛查出的癌症(SDC)和 1110 例间期乳腺癌(IC)。采用人工智能系统的连续检查水平风险评分来衡量不同人工智能风险评分阈值下的接收者操作特征曲线下面积(AUC)及 95% CIs 和癌症检出率的性能。结果 AI 系统对 SDC 和 IC 的 AUC 值分别为 0.93(95% CI:0.92-0.93)和 0.97(95% CI:0.97-0.97)。在 AI 风险评分最高的检查中有 10% 被定义为阳性,评分最低的检查中有 90% 被定义为阴性的情况下,92.0%(3502/3807)的 SDC 和 44.6%(495/1100)的 IC 是通过 AI 识别的。在这种情况下,68.5%(10 987/16 029)的假阳性筛查结果(阴性回忆评估)被人工智能视为阴性。当以 50%为临界值时,人工智能识别出 99.3%(3781/3807)的 SDC 和 85.2%(946/1100)的 IC 为阳性,而 17.0%(2725/16 029)的假阳性结果被视为阴性。结论 人工智能系统在乳腺放射摄影筛查后两年内检测出乳腺癌方面表现出很高的性能,并有可能对低风险乳腺放射摄影进行分流,以减少放射医师的工作量。©RSNA,2024。
本文章由计算机程序翻译,如有差异,请以英文原文为准。