Performance of Two Deep Learning-based AI Models for Breast Cancer Detection and Localization on Screening Mammograms from BreastScreen Norway.
Marit A Martiniussen, Marthe Larsen, Tone Hovda, Merete U Kristiansen, Fredrik A Dahl, Line Eikvil, Olav Brautaset, Atle Bjørnerud, Vessela Kristensen, Marie B Bergan, Solveig Hofvind
求助PDF
{"title":"Performance of Two Deep Learning-based AI Models for Breast Cancer Detection and Localization on Screening Mammograms from BreastScreen Norway.","authors":"Marit A Martiniussen, Marthe Larsen, Tone Hovda, Merete U Kristiansen, Fredrik A Dahl, Line Eikvil, Olav Brautaset, Atle Bjørnerud, Vessela Kristensen, Marie B Bergan, Solveig Hofvind","doi":"10.1148/ryai.240039","DOIUrl":null,"url":null,"abstract":"<p><p><i>\"Just Accepted\" papers have undergone full peer review and have been accepted for publication in <i>Radiology: Artificial Intelligence</i>. This article will undergo copyediting, layout, and proof review before it is published in its final version. Please note that during production of the final copyedited article, errors may be discovered which could affect the content.</i> Purpose To evaluate cancer detection and marker placement accuracy of two artificial intelligence (AI) models developed for interpretation of screening mammograms. Materials and Methods This retrospective study included data from 129 434 screening examinations (all female, mean age 59.2, SD = 5.8) performed between January 2008 and December 2018 in BreastScreen Norway. Model A was commercially available and B was an in-house model. Area under the receiver operating characteristic curve (AUC) with 95% confidence interval (CIs) were calculated. The study defined 3.2% and 11.1% of the examinations with the highest AI scores as positive, threshold 1 and 2, respectively. A radiologic review assessed location of AI markings and classified interval cancers as true or false negative. Results The AUC was 0.93 (95% CI: 0.92-0.94) for model A and B when including screen-detected and interval cancers. Model A identified 82.5% (611/741) of the screen-detected cancers at threshold 1 and 92.4% (685/741) at threshold 2. For model B, the numbers were 81.8% (606/741) and 93.7% (694/741), respectively. The AI markings were correctly localized for all screen-detected cancers identified by both models and 82% (56/68) of the interval cancers for model A and 79% (54/68) for B. At the review, 21.6% (45/208) of the interval cancers were identified at the preceding screening by either or both models, correctly localized and classified as false negative (<i>n</i> = 17) or with minimal signs of malignancy (<i>n</i> = 28). Conclusion Both AI models showed promising performance for cancer detection on screening mammograms. The AI markings corresponded well to the true cancer locations. ©RSNA, 2025.</p>","PeriodicalId":29787,"journal":{"name":"Radiology-Artificial Intelligence","volume":" ","pages":"e240039"},"PeriodicalIF":8.1000,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Radiology-Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1148/ryai.240039","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
引用
批量引用
Abstract
"Just Accepted" papers have undergone full peer review and have been accepted for publication in Radiology: Artificial Intelligence . This article will undergo copyediting, layout, and proof review before it is published in its final version. Please note that during production of the final copyedited article, errors may be discovered which could affect the content. Purpose To evaluate cancer detection and marker placement accuracy of two artificial intelligence (AI) models developed for interpretation of screening mammograms. Materials and Methods This retrospective study included data from 129 434 screening examinations (all female, mean age 59.2, SD = 5.8) performed between January 2008 and December 2018 in BreastScreen Norway. Model A was commercially available and B was an in-house model. Area under the receiver operating characteristic curve (AUC) with 95% confidence interval (CIs) were calculated. The study defined 3.2% and 11.1% of the examinations with the highest AI scores as positive, threshold 1 and 2, respectively. A radiologic review assessed location of AI markings and classified interval cancers as true or false negative. Results The AUC was 0.93 (95% CI: 0.92-0.94) for model A and B when including screen-detected and interval cancers. Model A identified 82.5% (611/741) of the screen-detected cancers at threshold 1 and 92.4% (685/741) at threshold 2. For model B, the numbers were 81.8% (606/741) and 93.7% (694/741), respectively. The AI markings were correctly localized for all screen-detected cancers identified by both models and 82% (56/68) of the interval cancers for model A and 79% (54/68) for B. At the review, 21.6% (45/208) of the interval cancers were identified at the preceding screening by either or both models, correctly localized and classified as false negative (n = 17) or with minimal signs of malignancy (n = 28). Conclusion Both AI models showed promising performance for cancer detection on screening mammograms. The AI markings corresponded well to the true cancer locations. ©RSNA, 2025.