RSNA 2022颈椎骨折检测挑战赛获奖ai算法的外部验证

James P Harper, Ghee R Lee, Ian Pan, Xuan V Nguyen, Nathan Quails, Luciano M Prevedello
{"title":"RSNA 2022颈椎骨折检测挑战赛获奖ai算法的外部验证","authors":"James P Harper, Ghee R Lee, Ian Pan, Xuan V Nguyen, Nathan Quails, Luciano M Prevedello","doi":"10.3174/ajnr.A8715","DOIUrl":null,"url":null,"abstract":"<p><strong>Background and purpose: </strong>The Radiological Society of North America has actively promoted artificial intelligence (AI) challenges since 2017. Algorithms emerging from the recent RSNA 2022 Cervical Spine Fracture Detection Challenge demonstrated state-of-the-art performance in the competition's data set, surpassing results from prior publications. However, their performance in real-world clinical practice is not known. As an initial step toward the goal of assessing feasibility of these models in clinical practice, we conducted a generalizability test by using one of the leading algorithms of the competition.</p><p><strong>Materials and methods: </strong>The deep learning algorithm was selected due to its performance, portability, and ease of use, and installed locally. One hundred examinations (50 consecutive cervical spine CT scans with at least 1 fracture present and 50 consecutive negative CT scans) from a level 1 trauma center not represented in the competition data set were processed at 6.4 seconds per examination. Ground truth was established based on the radiology report with retrospective confirmation of positive fracture cases. Sensitivity, specificity, F1 score, and area under the curve were calculated.</p><p><strong>Results: </strong>The external validation data set comprised older patients in comparison to the competition set (53.5 ± 21.8 years versus 58 ± 22.0, respectively; <i>P</i> < .05). Sensitivity and specificity were 86% and 70% in the external validation group and 85% and 94% in the competition group, respectively. Fractures misclassified by the convolutional neural networks frequently had features of advanced degenerative disease, subtle nondisplaced fractures not easily identified on the axial plane, and malalignment.</p><p><strong>Conclusions: </strong>The model performed with a similar sensitivity on the test and external data set, suggesting that such a tool could be potentially generalizable as a triage tool in the emergency setting. Discordant factors such as age-associated comorbidities may affect accuracy and specificity of AI models when used in certain populations. Further research should be encouraged to help elucidate the potential contributions and pitfalls of these algorithms in supporting clinical care.</p>","PeriodicalId":93863,"journal":{"name":"AJNR. American journal of neuroradiology","volume":" ","pages":"1852-1858"},"PeriodicalIF":0.0000,"publicationDate":"2025-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12453477/pdf/","citationCount":"0","resultStr":"{\"title\":\"External Validation of a Winning Artificial Intelligence Algorithm from the RSNA 2022 Cervical Spine Fracture Detection Challenge.\",\"authors\":\"James P Harper, Ghee R Lee, Ian Pan, Xuan V Nguyen, Nathan Quails, Luciano M Prevedello\",\"doi\":\"10.3174/ajnr.A8715\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background and purpose: </strong>The Radiological Society of North America has actively promoted artificial intelligence (AI) challenges since 2017. Algorithms emerging from the recent RSNA 2022 Cervical Spine Fracture Detection Challenge demonstrated state-of-the-art performance in the competition's data set, surpassing results from prior publications. However, their performance in real-world clinical practice is not known. As an initial step toward the goal of assessing feasibility of these models in clinical practice, we conducted a generalizability test by using one of the leading algorithms of the competition.</p><p><strong>Materials and methods: </strong>The deep learning algorithm was selected due to its performance, portability, and ease of use, and installed locally. One hundred examinations (50 consecutive cervical spine CT scans with at least 1 fracture present and 50 consecutive negative CT scans) from a level 1 trauma center not represented in the competition data set were processed at 6.4 seconds per examination. Ground truth was established based on the radiology report with retrospective confirmation of positive fracture cases. Sensitivity, specificity, F1 score, and area under the curve were calculated.</p><p><strong>Results: </strong>The external validation data set comprised older patients in comparison to the competition set (53.5 ± 21.8 years versus 58 ± 22.0, respectively; <i>P</i> < .05). Sensitivity and specificity were 86% and 70% in the external validation group and 85% and 94% in the competition group, respectively. Fractures misclassified by the convolutional neural networks frequently had features of advanced degenerative disease, subtle nondisplaced fractures not easily identified on the axial plane, and malalignment.</p><p><strong>Conclusions: </strong>The model performed with a similar sensitivity on the test and external data set, suggesting that such a tool could be potentially generalizable as a triage tool in the emergency setting. Discordant factors such as age-associated comorbidities may affect accuracy and specificity of AI models when used in certain populations. Further research should be encouraged to help elucidate the potential contributions and pitfalls of these algorithms in supporting clinical care.</p>\",\"PeriodicalId\":93863,\"journal\":{\"name\":\"AJNR. American journal of neuroradiology\",\"volume\":\" \",\"pages\":\"1852-1858\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-09-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12453477/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"AJNR. American journal of neuroradiology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3174/ajnr.A8715\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"AJNR. American journal of neuroradiology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3174/ajnr.A8715","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

背景与目的:自2017年以来,北美放射学会积极推动人工智能(AI)挑战。最近的RSNA 2022颈椎骨折检测挑战赛中出现的算法在比赛数据集中展示了最先进的性能,超过了先前出版物的结果。然而,它们在现实世界的临床实践中的表现尚不清楚。作为评估这些模型在临床实践中的可行性目标的第一步,我们使用竞争中的一种领先算法进行了通用性测试。材料和方法:选择深度学习算法,因为它的性能,可移植性和易用性,并在本地安装。来自一级创伤中心的100个检查(50个连续的颈椎CT扫描,至少有一个骨折存在,50个连续的阴性CT扫描)未在竞争数据集中被处理,每次检查6.4秒。基本事实是建立在回顾性确认阳性骨折病例的放射学报告的基础上。计算敏感性、特异性、F1评分和AUC。结果:与竞争组相比,外部验证数据集包括年龄较大的患者(分别为53.5±21.8岁和58±22.0岁);P < 0.05)。外部验证组的敏感性为86%,特异性为70%,竞争组的敏感性为85%,特异性为94%。被CNN错误分类的骨折通常具有晚期退行性疾病、不易在轴向面识别的细微非移位骨折和不对准的特征。结论:该模型在测试数据集和外部数据集上具有相似的灵敏度,这表明该工具可以潜在地推广为紧急情况下的分类工具。在某些人群中使用人工智能模型时,年龄相关合并症等不一致因素可能会影响模型的准确性和特异性。应该鼓励进一步的研究,以帮助阐明这些算法在支持临床护理方面的潜在贡献和缺陷。缩写:AI=人工智能;CNN =卷积神经网络;北美放射学会。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
External Validation of a Winning Artificial Intelligence Algorithm from the RSNA 2022 Cervical Spine Fracture Detection Challenge.

Background and purpose: The Radiological Society of North America has actively promoted artificial intelligence (AI) challenges since 2017. Algorithms emerging from the recent RSNA 2022 Cervical Spine Fracture Detection Challenge demonstrated state-of-the-art performance in the competition's data set, surpassing results from prior publications. However, their performance in real-world clinical practice is not known. As an initial step toward the goal of assessing feasibility of these models in clinical practice, we conducted a generalizability test by using one of the leading algorithms of the competition.

Materials and methods: The deep learning algorithm was selected due to its performance, portability, and ease of use, and installed locally. One hundred examinations (50 consecutive cervical spine CT scans with at least 1 fracture present and 50 consecutive negative CT scans) from a level 1 trauma center not represented in the competition data set were processed at 6.4 seconds per examination. Ground truth was established based on the radiology report with retrospective confirmation of positive fracture cases. Sensitivity, specificity, F1 score, and area under the curve were calculated.

Results: The external validation data set comprised older patients in comparison to the competition set (53.5 ± 21.8 years versus 58 ± 22.0, respectively; P < .05). Sensitivity and specificity were 86% and 70% in the external validation group and 85% and 94% in the competition group, respectively. Fractures misclassified by the convolutional neural networks frequently had features of advanced degenerative disease, subtle nondisplaced fractures not easily identified on the axial plane, and malalignment.

Conclusions: The model performed with a similar sensitivity on the test and external data set, suggesting that such a tool could be potentially generalizable as a triage tool in the emergency setting. Discordant factors such as age-associated comorbidities may affect accuracy and specificity of AI models when used in certain populations. Further research should be encouraged to help elucidate the potential contributions and pitfalls of these algorithms in supporting clinical care.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Automated Quantitative Assessment of Recanalization in Endovascular Thrombectomy. Tenecteplase Versus Alteplase in Basilar Artery Occlusion Treated with Endovascular Thrombectomy: A Propensity Score-Matched Analysis of 90- and 180-Day Outcomes. Tumor Growth Rate (TGR) Analysis Predicts Survival in a Multicenter Phase II Study of the PARP Inhibitor Pamiparib (BGB-290) with Temozolomide in Recurrent IDH-Mutant Gliomas (ABTC-1801). Aneurysm Wall Enhancement in Patients With Unruptured Intracranial Aneurysm Using Aspirin or Statins: A Cross-Sectional Analysis of Three Prospective Cohorts. Direct Injection of Meningeal Diverticula for Localization of CSF-Venous Fistulas: Technique and Clinical Applications.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1