Assessing the Performance of Models from the 2022 RSNA Cervical Spine Fracture Detection Competition at a Level I Trauma Center.
Zixuan Hu, Markand Patel, Robyn L Ball, Hui Ming Lin, Luciano M Prevedello, Mitra Naseri, Shobhit Mathur, Robert Moreland, Jefferson Wilson, Christopher Witiw, Kristen W Yeom, Qishen Ha, Darragh Hanley, Selim Seferbekov, Hao Chen, Philipp Singer, Christof Henkel, Pascal Pfeiffer, Ian Pan, Harshit Sheoran, Wuqi Li, Adam E Flanders, Felipe C Kitamura, Tyler Richards, Jason Talbott, Ervin Sejdić, Errol Colak
下载PDF
{"title":"Assessing the Performance of Models from the 2022 RSNA Cervical Spine Fracture Detection Competition at a Level I Trauma Center.","authors":"Zixuan Hu, Markand Patel, Robyn L Ball, Hui Ming Lin, Luciano M Prevedello, Mitra Naseri, Shobhit Mathur, Robert Moreland, Jefferson Wilson, Christopher Witiw, Kristen W Yeom, Qishen Ha, Darragh Hanley, Selim Seferbekov, Hao Chen, Philipp Singer, Christof Henkel, Pascal Pfeiffer, Ian Pan, Harshit Sheoran, Wuqi Li, Adam E Flanders, Felipe C Kitamura, Tyler Richards, Jason Talbott, Ervin Sejdić, Errol Colak","doi":"10.1148/ryai.230550","DOIUrl":null,"url":null,"abstract":"<p><p>Purpose To evaluate the performance of the top models from the RSNA 2022 Cervical Spine Fracture Detection challenge on a clinical test dataset of both noncontrast and contrast-enhanced CT scans acquired at a level I trauma center. Materials and Methods Seven top-performing models in the RSNA 2022 Cervical Spine Fracture Detection challenge were retrospectively evaluated on a clinical test set of 1828 CT scans (from 1829 series: 130 positive for fracture, 1699 negative for fracture; 1308 noncontrast, 521 contrast enhanced) from 1779 patients (mean age, 55.8 years ± 22.1 [SD]; 1154 [64.9%] male patients). Scans were acquired without exclusion criteria over 1 year (January-December 2022) from the emergency department of a neurosurgical and level I trauma center. Model performance was assessed using area under the receiver operating characteristic curve (AUC), sensitivity, and specificity. False-positive and false-negative cases were further analyzed by a neuroradiologist. Results Although all seven models showed decreased performance on the clinical test set compared with the challenge dataset, the models maintained high performances. On noncontrast CT scans, the models achieved a mean AUC of 0.89 (range: 0.79-0.92), sensitivity of 67.0% (range: 30.9%-80.0%), and specificity of 92.9% (range: 82.1%-99.0%). On contrast-enhanced CT scans, the models had a mean AUC of 0.88 (range: 0.76-0.94), sensitivity of 81.9% (range: 42.7%-100.0%), and specificity of 72.1% (range: 16.4%-92.8%). The models identified 10 fractures missed by radiologists. False-positive cases were more common in contrast-enhanced scans and observed in patients with degenerative changes on noncontrast scans, while false-negative cases were often associated with degenerative changes and osteopenia. Conclusion The winning models from the 2022 RSNA AI Challenge demonstrated a high performance for cervical spine fracture detection on a clinical test dataset, warranting further evaluation for their use as clinical support tools. <b>Keywords:</b> Feature Detection, Supervised Learning, Convolutional Neural Network (CNN), Genetic Algorithms, CT, Spine, Technology Assessment, Head/Neck <i>Supplemental material is available for this article.</i> © RSNA, 2024 See also commentary by Levi and Politi in this issue.</p>","PeriodicalId":29787,"journal":{"name":"Radiology-Artificial Intelligence","volume":" ","pages":"e230550"},"PeriodicalIF":8.1000,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11605142/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Radiology-Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1148/ryai.230550","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
引用
批量引用
Abstract
Purpose To evaluate the performance of the top models from the RSNA 2022 Cervical Spine Fracture Detection challenge on a clinical test dataset of both noncontrast and contrast-enhanced CT scans acquired at a level I trauma center. Materials and Methods Seven top-performing models in the RSNA 2022 Cervical Spine Fracture Detection challenge were retrospectively evaluated on a clinical test set of 1828 CT scans (from 1829 series: 130 positive for fracture, 1699 negative for fracture; 1308 noncontrast, 521 contrast enhanced) from 1779 patients (mean age, 55.8 years ± 22.1 [SD]; 1154 [64.9%] male patients). Scans were acquired without exclusion criteria over 1 year (January-December 2022) from the emergency department of a neurosurgical and level I trauma center. Model performance was assessed using area under the receiver operating characteristic curve (AUC), sensitivity, and specificity. False-positive and false-negative cases were further analyzed by a neuroradiologist. Results Although all seven models showed decreased performance on the clinical test set compared with the challenge dataset, the models maintained high performances. On noncontrast CT scans, the models achieved a mean AUC of 0.89 (range: 0.79-0.92), sensitivity of 67.0% (range: 30.9%-80.0%), and specificity of 92.9% (range: 82.1%-99.0%). On contrast-enhanced CT scans, the models had a mean AUC of 0.88 (range: 0.76-0.94), sensitivity of 81.9% (range: 42.7%-100.0%), and specificity of 72.1% (range: 16.4%-92.8%). The models identified 10 fractures missed by radiologists. False-positive cases were more common in contrast-enhanced scans and observed in patients with degenerative changes on noncontrast scans, while false-negative cases were often associated with degenerative changes and osteopenia. Conclusion The winning models from the 2022 RSNA AI Challenge demonstrated a high performance for cervical spine fracture detection on a clinical test dataset, warranting further evaluation for their use as clinical support tools. Keywords: Feature Detection, Supervised Learning, Convolutional Neural Network (CNN), Genetic Algorithms, CT, Spine, Technology Assessment, Head/Neck Supplemental material is available for this article. © RSNA, 2024 See also commentary by Levi and Politi in this issue.
评估 2022 年 RSNA 颈椎骨折检测竞赛模型在一级创伤中心的性能。
"刚刚接受 "的论文经过同行评审,已被接受在《放射学》上发表:人工智能》上发表。这篇文章在以最终版本发表之前,还将经过校对、排版和校对审核。请注意,在制作最终校对稿的过程中,可能会发现影响内容的错误。目的 评估 RSNA 2022 颈椎骨折检测挑战赛中的顶级模型在临床测试数据集上的表现,这些数据集包括在一级创伤中心获得的非对比度和对比度增强 CT 扫描。材料与方法 对 RSNA 2022 颈椎骨折检测挑战赛中表现最出色的七个模型进行了回顾性评估,临床测试集包括 1,828 份 CT 扫描(1,829 个系列:1,829 个系列:130 个骨折阳性,1,699 个骨折阴性;1,308 个非对比,521 个对比增强)进行了回顾性评估,这些扫描来自 1,779 名患者(平均年龄 55.8 ± 22.1 岁;1,154 名男性)。扫描数据是在一年内(2022 年 1 月至 12 月)从神经外科和一级创伤中心的急诊科获得的,无排除标准。使用接收者操作特征曲线下面积(AUC)、灵敏度和特异性评估模型性能。假阳性和假阴性病例由神经放射科医生进一步分析。结果 虽然与挑战数据集相比,所有 7 个模型在临床测试集上的性能都有所下降,但这些模型仍然保持了较高的性能。在非对比 CT 扫描中,模型的平均 AUC 为 0.89(范围:0.81-0.92),灵敏度为 67.0%(范围:30.9%-80.0%),特异性为 92.9%(范围:82.1%-99.0%)。在对比增强 CT 扫描中,模型的平均 AUC 为 0.88(范围:0.76-0.94),灵敏度为 81.9%(范围:42.7%-100.0%),特异性为 72.1%(范围:16.4%-92.8%)。这些模型发现了放射科医生漏诊的 10 处骨折。假阳性在对比度增强扫描中更为常见,在非对比度扫描中有退行性病变的患者中也可观察到,而假阴性通常与退行性病变和骨质疏松有关。结论 在 2022 年 RSNA 人工智能挑战赛中获胜的模型在临床测试数据集上表现出了很高的颈椎骨折检测性能,值得进一步评估其作为临床支持工具的用途。©RSNA,2024。
本文章由计算机程序翻译,如有差异,请以英文原文为准。