Annie Trang, Kristin Putman, Dharmam Savani, Devina Chatterjee, Jerry Zhao, Peter Kamel, Jean J Jeudy, Vishwa S Parekh, Paul H Yi
{"title":"Sociodemographic biases in a commercial AI model for intracranial hemorrhage detection.","authors":"Annie Trang, Kristin Putman, Dharmam Savani, Devina Chatterjee, Jerry Zhao, Peter Kamel, Jean J Jeudy, Vishwa S Parekh, Paul H Yi","doi":"10.1007/s10140-024-02270-w","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>To evaluate whether a commercial AI tool for intracranial hemorrhage (ICH) detection on head CT exhibited sociodemographic biases.</p><p><strong>Methods: </strong>Our retrospective study reviewed 9736 consecutive, adult non-contrast head CT scans performed between November 2021 and February 2022 in a single healthcare system. Each CT scan was evaluated by a commercial ICH AI tool and a board-certified neuroradiologist; ground truth was defined as final radiologist determination of ICH presence/absence. After evaluating the AI tool's aggregate diagnostic performance, sub-analyses based on sociodemographic groups (age, sex, race, ethnicity, insurance status, and Area of Deprivation Index [ADI] scores) assessed for biases. χ<sup>2</sup> test or Fisher's exact tests evaluated for statistical significance with p ≤ 0.05.</p><p><strong>Results: </strong>Our patient population was 50% female (mean age 60 ± 19 years). The AI tool had an aggregate accuracy of 93% [9060/9736], sensitivity of 85% [1140/1338], specificity of 94% [7920/ 8398], positive predictive value (PPV) of 71% [1140/1618] and negative predictive value (NPV) of 98% [7920/8118]. Sociodemographic biases were identified, including lower PPV for patients who were females (67.3% [62,441/656] vs. 72.7% [699/962], p = 0.02), Black (66.7% [454/681] vs. 73.2% [686/937], p = 0.005), non-Hispanic/non-Latino (69.7% [1038/1490] vs. 95.4% [417/437]), p = 0.009), and who had Medicaid/Medicare (69.9% [754/1078]) or Private (66.5% [228/343]) primary insurance (p = 0.003). Lower sensitivity was seen for patients in the third quartile of national (78.8% [241/306], p = 0.001) and state ADI scores (79.0% [22/287], p = 0.001).</p><p><strong>Conclusions: </strong>In our healthcare system, a commercial AI tool had lower performance for ICH detection than previously reported and demonstrated several sociodemographic biases.</p>","PeriodicalId":11623,"journal":{"name":"Emergency Radiology","volume":" ","pages":"713-723"},"PeriodicalIF":1.7000,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Emergency Radiology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s10140-024-02270-w","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/7/22 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0
Abstract
Purpose: To evaluate whether a commercial AI tool for intracranial hemorrhage (ICH) detection on head CT exhibited sociodemographic biases.
Methods: Our retrospective study reviewed 9736 consecutive, adult non-contrast head CT scans performed between November 2021 and February 2022 in a single healthcare system. Each CT scan was evaluated by a commercial ICH AI tool and a board-certified neuroradiologist; ground truth was defined as final radiologist determination of ICH presence/absence. After evaluating the AI tool's aggregate diagnostic performance, sub-analyses based on sociodemographic groups (age, sex, race, ethnicity, insurance status, and Area of Deprivation Index [ADI] scores) assessed for biases. χ2 test or Fisher's exact tests evaluated for statistical significance with p ≤ 0.05.
Results: Our patient population was 50% female (mean age 60 ± 19 years). The AI tool had an aggregate accuracy of 93% [9060/9736], sensitivity of 85% [1140/1338], specificity of 94% [7920/ 8398], positive predictive value (PPV) of 71% [1140/1618] and negative predictive value (NPV) of 98% [7920/8118]. Sociodemographic biases were identified, including lower PPV for patients who were females (67.3% [62,441/656] vs. 72.7% [699/962], p = 0.02), Black (66.7% [454/681] vs. 73.2% [686/937], p = 0.005), non-Hispanic/non-Latino (69.7% [1038/1490] vs. 95.4% [417/437]), p = 0.009), and who had Medicaid/Medicare (69.9% [754/1078]) or Private (66.5% [228/343]) primary insurance (p = 0.003). Lower sensitivity was seen for patients in the third quartile of national (78.8% [241/306], p = 0.001) and state ADI scores (79.0% [22/287], p = 0.001).
Conclusions: In our healthcare system, a commercial AI tool had lower performance for ICH detection than previously reported and demonstrated several sociodemographic biases.
期刊介绍:
To advance and improve the radiologic aspects of emergency careTo establish Emergency Radiology as an area of special interest in the field of diagnostic imagingTo improve methods of education in Emergency RadiologyTo provide, through formal meetings, a mechanism for presentation of scientific papers on various aspects of Emergency Radiology and continuing educationTo promote research in Emergency Radiology by clinical and basic science investigators, including residents and other traineesTo act as the resource body on Emergency Radiology for those interested in emergency patient care Members of the American Society of Emergency Radiology (ASER) receive the Emergency Radiology journal as a benefit of membership!