Rikke Bachmann, Gozde Gunes, Stine Hangaard, Andreas Nexmann, P. Lisouski, Mikael Boesen, Michael Lundemann, Scott G Baginski
{"title":"Improving Traumatic Fracture Detection on Radiographs with Artificial Intelligence Support: A Multi-Reader Study","authors":"Rikke Bachmann, Gozde Gunes, Stine Hangaard, Andreas Nexmann, P. Lisouski, Mikael Boesen, Michael Lundemann, Scott G Baginski","doi":"10.1093/bjro/tzae011","DOIUrl":null,"url":null,"abstract":"\n \n \n The aim of this study was to evaluate the diagnostic performance of non-specialist readers with and without the use of an AI support tool to detect traumatic fractures on radiographs of the appendicular skeleton.\n \n \n \n The design was a retrospective, fully-crossed multi-reader, multi-case study on a balanced dataset of patients (≥2 years of age) with an AI tool as a diagnostic intervention. Fifteen readers assessed 340 radiographic exams, with and without the AI tool in two different sessions and the time spent was automatically recorded. Reference standard was established by three consultant radiologists. Sensitivity, specificity, and false positives per patient were calculated.\n \n \n \n Patient-wise sensitivity increased from 72% to 80% (p < 0.05) and patient-wise specificity increased from 81% to 85% (p < 0.05) in exams aided by the AI tool compared to the unaided exams. The increase in sensitivity resulted in a relative reduction of missed fractures of 29%. The average rate of false positives per patient decreased from 0.16 to 0.14, corresponding to a relative reduction of 21%. There was no significant difference in average reading time spent per exam. The largest gain in fracture detection performance, with AI support, across all readers, was on non-obvious fractures with a significant increase in sensitivity of 11 percentage points (60% to 71%).\n \n \n \n The diagnostic performance for detection of traumatic fractures on radiographs of the appendicular skeleton improved among non-specialist readers tested AI fracture detection support tool showed an overall reader improvement in sensitivity and specificity when supported by an AI tool. Improvement was seen in both sensitivity and specificity and without negatively affecting the interpretation time.\n \n \n \n The division and analysis of obvious and non-obvious fractures are novel in AI reader comparison studies like this.\n","PeriodicalId":516126,"journal":{"name":"BJR|Open","volume":"5 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BJR|Open","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bjro/tzae011","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The aim of this study was to evaluate the diagnostic performance of non-specialist readers with and without the use of an AI support tool to detect traumatic fractures on radiographs of the appendicular skeleton.
The design was a retrospective, fully-crossed multi-reader, multi-case study on a balanced dataset of patients (≥2 years of age) with an AI tool as a diagnostic intervention. Fifteen readers assessed 340 radiographic exams, with and without the AI tool in two different sessions and the time spent was automatically recorded. Reference standard was established by three consultant radiologists. Sensitivity, specificity, and false positives per patient were calculated.
Patient-wise sensitivity increased from 72% to 80% (p < 0.05) and patient-wise specificity increased from 81% to 85% (p < 0.05) in exams aided by the AI tool compared to the unaided exams. The increase in sensitivity resulted in a relative reduction of missed fractures of 29%. The average rate of false positives per patient decreased from 0.16 to 0.14, corresponding to a relative reduction of 21%. There was no significant difference in average reading time spent per exam. The largest gain in fracture detection performance, with AI support, across all readers, was on non-obvious fractures with a significant increase in sensitivity of 11 percentage points (60% to 71%).
The diagnostic performance for detection of traumatic fractures on radiographs of the appendicular skeleton improved among non-specialist readers tested AI fracture detection support tool showed an overall reader improvement in sensitivity and specificity when supported by an AI tool. Improvement was seen in both sensitivity and specificity and without negatively affecting the interpretation time.
The division and analysis of obvious and non-obvious fractures are novel in AI reader comparison studies like this.