Alyssa T Watanabe, Hoanh Vu, Chi Y Chim, Andrew W Litt, Tara Retson, Ray C Mayo
{"title":"Potential Impact of an Artificial Intelligence-based Mammography Triage Algorithm on Performance and Workload in a Population-based Screening Sample.","authors":"Alyssa T Watanabe, Hoanh Vu, Chi Y Chim, Andrew W Litt, Tara Retson, Ray C Mayo","doi":"10.1093/jbi/wbae056","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>To evaluate potential screening mammography performance and workload impact using a commercial artificial intelligence (AI)-based triage device in a population-based screening sample.</p><p><strong>Methods: </strong>In this retrospective study, a sample of 2129 women who underwent screening mammograms were evaluated. The performance of a commercial AI-based triage device was compared with radiologists' reports, actual outcomes, and national benchmarks using commonly used mammography metrics. Up to 5 years of follow-up examination results were evaluated in cases to establish benignity. The algorithm sorted cases into groups of \"suspicious\" and \"low suspicion.\" A theoretical workload reduction was calculated by subtracting cases triaged as \"low suspicion\" from the sample.</p><p><strong>Results: </strong>At the default 93% sensitivity setting, there was significant improvement (P <.05) in the following triage simulation mean performance measures compared with actual outcome: 45.5% improvement in recall rate (13.4% to 7.3%; 95% CI, 6.2-8.3), 119% improvement in positive predictive value (PPV) 1 (5.3% to 11.6%; 95% CI, 9.96-13.4), 28.5% improvement in PPV2 (24.6% to 31.6%; 95% CI, 24.8-39.1), 20% improvement in sensitivity (83.3% to 100%; 95% CI, 100-100), and 7.2% improvement in specificity (87.2% to 93.5%; 95% CI, 92.4-94.5). A theoretical 62.5% workload reduction was possible. At the ultrahigh 99% sensitivity setting, a theoretical 27% workload reduction was possible. No cancers were missed by the algorithm at either sensitivity.</p><p><strong>Conclusion: </strong>Artificial intelligence-based triage in this simulation demonstrated potential for significant improvement in mammography performance and predicted substantial theoretical workload reduction without any missed cancers.</p>","PeriodicalId":43134,"journal":{"name":"Journal of Breast Imaging","volume":null,"pages":null},"PeriodicalIF":2.0000,"publicationDate":"2024-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Breast Imaging","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/jbi/wbae056","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ONCOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Objective: To evaluate potential screening mammography performance and workload impact using a commercial artificial intelligence (AI)-based triage device in a population-based screening sample.
Methods: In this retrospective study, a sample of 2129 women who underwent screening mammograms were evaluated. The performance of a commercial AI-based triage device was compared with radiologists' reports, actual outcomes, and national benchmarks using commonly used mammography metrics. Up to 5 years of follow-up examination results were evaluated in cases to establish benignity. The algorithm sorted cases into groups of "suspicious" and "low suspicion." A theoretical workload reduction was calculated by subtracting cases triaged as "low suspicion" from the sample.
Results: At the default 93% sensitivity setting, there was significant improvement (P <.05) in the following triage simulation mean performance measures compared with actual outcome: 45.5% improvement in recall rate (13.4% to 7.3%; 95% CI, 6.2-8.3), 119% improvement in positive predictive value (PPV) 1 (5.3% to 11.6%; 95% CI, 9.96-13.4), 28.5% improvement in PPV2 (24.6% to 31.6%; 95% CI, 24.8-39.1), 20% improvement in sensitivity (83.3% to 100%; 95% CI, 100-100), and 7.2% improvement in specificity (87.2% to 93.5%; 95% CI, 92.4-94.5). A theoretical 62.5% workload reduction was possible. At the ultrahigh 99% sensitivity setting, a theoretical 27% workload reduction was possible. No cancers were missed by the algorithm at either sensitivity.
Conclusion: Artificial intelligence-based triage in this simulation demonstrated potential for significant improvement in mammography performance and predicted substantial theoretical workload reduction without any missed cancers.