求助PDF
{"title":"Human-AI Interaction in the ScreenTrustCAD Trial: Recall Proportion and Positive Predictive Value Related to Screening Mammograms Flagged by AI CAD versus a Human Reader.","authors":"Karin E Dembrower, Alessio Crippa, Martin Eklund, Fredrik Strand","doi":"10.1148/radiol.242566","DOIUrl":null,"url":null,"abstract":"<p><p>Background The ScreenTrustCAD trial was a prospective study that evaluated the cancer detection rates for combinations of artificial intelligence (AI) computer-aided detection (CAD) and two radiologists. The results raised concerns about the tendency of radiologists to agree with AI CAD too much (when AI CAD made an erroneous flagging) or too little (when AI CAD made a correct flagging). Purpose To evaluate differences in recall proportion and positive predictive value (PPV) related to which reader flagged the mammogram for consensus discussion: AI CAD and/or radiologists. Materials and Methods Participants were enrolled from April 2021 to June 2022, and each examination was interpreted by three independent readers: two radiologists and AI CAD, after which positive findings were forwarded to the consensus discussion. For each combination of readers flagging an examination, the proportion recalled and the PPV were calculated by dividing the number of pathologic evaluation-verified cancers by the number of positive examinations. Results The study included 54 991 women (median age, 55 years [IQR, 46-65 years]), among whom 5489 were flagged for consensus discussion and 1348 were recalled. For examinations flagged by one reader, the proportion recalled after flagging by one radiologist was larger (14.2% [263 of 1858]) compared with flagging by AI CAD (4.6% [86 of 1886]) (<i>P</i> < .001), whereas the PPV of breast cancer was lower (3.4% [nine of 263] vs 22% [19 of 86]) (<i>P</i> < .001). For examinations flagged by two readers, the proportion recalled after flagging by two radiologists was larger (57.2% [360 of 629]) compared with flagging by AI CAD and one radiologist (38.6% [244 of 632]) (<i>P</i> < .001), whereas the PPV was lower (2.5% [nine of 360] vs 25.0% [61 of 244]) (<i>P</i> < .001). For examinations flagged by all three readers, the proportion recalled was 82.6% (400 of 484) and the PPV was 34.2 (137 of 400). Conclusion A larger proportion of participants were recalled after initial flagging by radiologists compared with those flagged by AI CAD, with a lower proportion of cancer. ClinicalTrials.gov Identifier: NCT04778670 © RSNA, 2025 See also the editorial by Grimm in this issue.</p>","PeriodicalId":20896,"journal":{"name":"Radiology","volume":"314 3","pages":"e242566"},"PeriodicalIF":15.2000,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Radiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1148/radiol.242566","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0
引用
批量引用
Abstract
Background The ScreenTrustCAD trial was a prospective study that evaluated the cancer detection rates for combinations of artificial intelligence (AI) computer-aided detection (CAD) and two radiologists. The results raised concerns about the tendency of radiologists to agree with AI CAD too much (when AI CAD made an erroneous flagging) or too little (when AI CAD made a correct flagging). Purpose To evaluate differences in recall proportion and positive predictive value (PPV) related to which reader flagged the mammogram for consensus discussion: AI CAD and/or radiologists. Materials and Methods Participants were enrolled from April 2021 to June 2022, and each examination was interpreted by three independent readers: two radiologists and AI CAD, after which positive findings were forwarded to the consensus discussion. For each combination of readers flagging an examination, the proportion recalled and the PPV were calculated by dividing the number of pathologic evaluation-verified cancers by the number of positive examinations. Results The study included 54 991 women (median age, 55 years [IQR, 46-65 years]), among whom 5489 were flagged for consensus discussion and 1348 were recalled. For examinations flagged by one reader, the proportion recalled after flagging by one radiologist was larger (14.2% [263 of 1858]) compared with flagging by AI CAD (4.6% [86 of 1886]) (P < .001), whereas the PPV of breast cancer was lower (3.4% [nine of 263] vs 22% [19 of 86]) (P < .001). For examinations flagged by two readers, the proportion recalled after flagging by two radiologists was larger (57.2% [360 of 629]) compared with flagging by AI CAD and one radiologist (38.6% [244 of 632]) (P < .001), whereas the PPV was lower (2.5% [nine of 360] vs 25.0% [61 of 244]) (P < .001). For examinations flagged by all three readers, the proportion recalled was 82.6% (400 of 484) and the PPV was 34.2 (137 of 400). Conclusion A larger proportion of participants were recalled after initial flagging by radiologists compared with those flagged by AI CAD, with a lower proportion of cancer. ClinicalTrials.gov Identifier: NCT04778670 © RSNA, 2025 See also the editorial by Grimm in this issue.
ScreenTrustCAD试验中的人机交互:人工智能CAD与人类阅读器标记的乳房x光检查相关的召回率和阳性预测值。
ScreenTrustCAD试验是一项前瞻性研究,评估了人工智能(AI)、计算机辅助检测(CAD)和两名放射科医生联合使用的癌症检出率。结果引起了人们的担忧,即放射科医生倾向于过于认同人工智能CAD(当人工智能CAD做出错误的标记时)或过于认同人工智能CAD(当人工智能CAD做出正确的标记时)。目的评估与读者标记乳房x线照片进行共识讨论相关的召回率和阳性预测值(PPV)的差异:AI CAD和/或放射科医生。材料和方法参与者于2021年4月至2022年6月入组,每次检查由三名独立读者(两名放射科医生和AI CAD)进行解释,之后将阳性结果转发给共识讨论。对于每个标记检查的读者组合,通过将病理评估证实的癌症数量除以阳性检查数量来计算召回比例和PPV。结果本研究纳入54 991名女性(中位年龄55岁[IQR, 46-65岁]),其中5489名被标记为共识讨论,1348名被召回。对于由一名阅读者标记的检查,由一名放射科医生标记后回忆的比例(14.2%[1858年的263人])比由AI CAD标记的比例(4.6%[1886年的86人])更大(P < 0.001),而乳腺癌的PPV较低(3.4%[263年的9人]对22%[86年的19人])(P < 0.001)。对于由两名阅读者标记的检查,与由AI CAD和一名放射科医生标记的检查(38.6%[632名中的244名])相比,由两名放射科医生标记后的召回比例(57.2%[629名中的360名])较大(P < 0.001),而PPV较低(2.5%[360名中的9名]对25.0%[244名中的61名])(P < 0.001)。对于所有三位读者标记的考试,召回率为82.6%(484分之400),PPV为34.2(400分之137)。结论:与人工智能CAD标记的参与者相比,由放射科医生标记的参与者被召回的比例更大,癌症的比例更低。ClinicalTrials.gov标识符:NCT04778670©RSNA, 2025另见格林在本期的社论。
本文章由计算机程序翻译,如有差异,请以英文原文为准。