James P Harper, Ghee R Lee, Ian Pan, Xuan V Nguyen, Nathan Quails, Luciano M Prevedello
{"title":"RSNA 2022颈椎骨折检测挑战赛获奖ai算法的外部验证","authors":"James P Harper, Ghee R Lee, Ian Pan, Xuan V Nguyen, Nathan Quails, Luciano M Prevedello","doi":"10.3174/ajnr.A8715","DOIUrl":null,"url":null,"abstract":"<p><strong>Background and purpose: </strong>The Radiological Society of North America has actively promoted artificial intelligence (AI) challenges since 2017. Algorithms emerging from the recent RSNA 2022 Cervical Spine Fracture Detection Challenge demonstrated state-of-the-art performance in the competition's data set, surpassing results from prior publications. However, their performance in real-world clinical practice is not known. As an initial step toward the goal of assessing feasibility of these models in clinical practice, we conducted a generalizability test by using one of the leading algorithms of the competition.</p><p><strong>Materials and methods: </strong>The deep learning algorithm was selected due to its performance, portability, and ease of use, and installed locally. One hundred examinations (50 consecutive cervical spine CT scans with at least 1 fracture present and 50 consecutive negative CT scans) from a level 1 trauma center not represented in the competition data set were processed at 6.4 seconds per examination. Ground truth was established based on the radiology report with retrospective confirmation of positive fracture cases. Sensitivity, specificity, F1 score, and area under the curve were calculated.</p><p><strong>Results: </strong>The external validation data set comprised older patients in comparison to the competition set (53.5 ± 21.8 years versus 58 ± 22.0, respectively; <i>P</i> < .05). Sensitivity and specificity were 86% and 70% in the external validation group and 85% and 94% in the competition group, respectively. Fractures misclassified by the convolutional neural networks frequently had features of advanced degenerative disease, subtle nondisplaced fractures not easily identified on the axial plane, and malalignment.</p><p><strong>Conclusions: </strong>The model performed with a similar sensitivity on the test and external data set, suggesting that such a tool could be potentially generalizable as a triage tool in the emergency setting. Discordant factors such as age-associated comorbidities may affect accuracy and specificity of AI models when used in certain populations. Further research should be encouraged to help elucidate the potential contributions and pitfalls of these algorithms in supporting clinical care.</p>","PeriodicalId":93863,"journal":{"name":"AJNR. American journal of neuroradiology","volume":" ","pages":"1852-1858"},"PeriodicalIF":0.0000,"publicationDate":"2025-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12453477/pdf/","citationCount":"0","resultStr":"{\"title\":\"External Validation of a Winning Artificial Intelligence Algorithm from the RSNA 2022 Cervical Spine Fracture Detection Challenge.\",\"authors\":\"James P Harper, Ghee R Lee, Ian Pan, Xuan V Nguyen, Nathan Quails, Luciano M Prevedello\",\"doi\":\"10.3174/ajnr.A8715\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background and purpose: </strong>The Radiological Society of North America has actively promoted artificial intelligence (AI) challenges since 2017. Algorithms emerging from the recent RSNA 2022 Cervical Spine Fracture Detection Challenge demonstrated state-of-the-art performance in the competition's data set, surpassing results from prior publications. However, their performance in real-world clinical practice is not known. As an initial step toward the goal of assessing feasibility of these models in clinical practice, we conducted a generalizability test by using one of the leading algorithms of the competition.</p><p><strong>Materials and methods: </strong>The deep learning algorithm was selected due to its performance, portability, and ease of use, and installed locally. One hundred examinations (50 consecutive cervical spine CT scans with at least 1 fracture present and 50 consecutive negative CT scans) from a level 1 trauma center not represented in the competition data set were processed at 6.4 seconds per examination. Ground truth was established based on the radiology report with retrospective confirmation of positive fracture cases. Sensitivity, specificity, F1 score, and area under the curve were calculated.</p><p><strong>Results: </strong>The external validation data set comprised older patients in comparison to the competition set (53.5 ± 21.8 years versus 58 ± 22.0, respectively; <i>P</i> < .05). Sensitivity and specificity were 86% and 70% in the external validation group and 85% and 94% in the competition group, respectively. Fractures misclassified by the convolutional neural networks frequently had features of advanced degenerative disease, subtle nondisplaced fractures not easily identified on the axial plane, and malalignment.</p><p><strong>Conclusions: </strong>The model performed with a similar sensitivity on the test and external data set, suggesting that such a tool could be potentially generalizable as a triage tool in the emergency setting. Discordant factors such as age-associated comorbidities may affect accuracy and specificity of AI models when used in certain populations. Further research should be encouraged to help elucidate the potential contributions and pitfalls of these algorithms in supporting clinical care.</p>\",\"PeriodicalId\":93863,\"journal\":{\"name\":\"AJNR. American journal of neuroradiology\",\"volume\":\" \",\"pages\":\"1852-1858\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-09-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12453477/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"AJNR. American journal of neuroradiology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3174/ajnr.A8715\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"AJNR. American journal of neuroradiology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3174/ajnr.A8715","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
External Validation of a Winning Artificial Intelligence Algorithm from the RSNA 2022 Cervical Spine Fracture Detection Challenge.
Background and purpose: The Radiological Society of North America has actively promoted artificial intelligence (AI) challenges since 2017. Algorithms emerging from the recent RSNA 2022 Cervical Spine Fracture Detection Challenge demonstrated state-of-the-art performance in the competition's data set, surpassing results from prior publications. However, their performance in real-world clinical practice is not known. As an initial step toward the goal of assessing feasibility of these models in clinical practice, we conducted a generalizability test by using one of the leading algorithms of the competition.
Materials and methods: The deep learning algorithm was selected due to its performance, portability, and ease of use, and installed locally. One hundred examinations (50 consecutive cervical spine CT scans with at least 1 fracture present and 50 consecutive negative CT scans) from a level 1 trauma center not represented in the competition data set were processed at 6.4 seconds per examination. Ground truth was established based on the radiology report with retrospective confirmation of positive fracture cases. Sensitivity, specificity, F1 score, and area under the curve were calculated.
Results: The external validation data set comprised older patients in comparison to the competition set (53.5 ± 21.8 years versus 58 ± 22.0, respectively; P < .05). Sensitivity and specificity were 86% and 70% in the external validation group and 85% and 94% in the competition group, respectively. Fractures misclassified by the convolutional neural networks frequently had features of advanced degenerative disease, subtle nondisplaced fractures not easily identified on the axial plane, and malalignment.
Conclusions: The model performed with a similar sensitivity on the test and external data set, suggesting that such a tool could be potentially generalizable as a triage tool in the emergency setting. Discordant factors such as age-associated comorbidities may affect accuracy and specificity of AI models when used in certain populations. Further research should be encouraged to help elucidate the potential contributions and pitfalls of these algorithms in supporting clinical care.