Interobserver Agreement and Performance of Concurrent AI Assistance for Radiographic Evaluation of Knee Osteoarthritis.

IF 12.1 1区医学 Q1 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING Radiology Pub Date : 2024-07-01 DOI:10.1148/radiol.233341

Mathias W Brejnebøl, Anders Lenskjold, Katharina Ziegeler, Huib Ruitenbeek, Felix C Müller, Janus U Nybing, Jacob J Visser, Loes M Schiphouwer, Jorrit Jasper, Behschad Bashian, Haoyin Cao, Maximilian Muellner, Sebastian A Dahlmann, Dimitar I Radev, Ann Ganestam, Camilla T Nielsen, Carsten U Stroemmen, Edwin H G Oei, Kay-Geert A Hermann, Mikael Boesen

{"title":"Interobserver Agreement and Performance of Concurrent AI Assistance for Radiographic Evaluation of Knee Osteoarthritis.","authors":"Mathias W Brejnebøl, Anders Lenskjold, Katharina Ziegeler, Huib Ruitenbeek, Felix C Müller, Janus U Nybing, Jacob J Visser, Loes M Schiphouwer, Jorrit Jasper, Behschad Bashian, Haoyin Cao, Maximilian Muellner, Sebastian A Dahlmann, Dimitar I Radev, Ann Ganestam, Camilla T Nielsen, Carsten U Stroemmen, Edwin H G Oei, Kay-Geert A Hermann, Mikael Boesen","doi":"10.1148/radiol.233341","DOIUrl":null,"url":null,"abstract":"Background Due to conflicting findings in the literature, there are concerns about a lack of objectivity in grading knee osteoarthritis (KOA) on radiographs. Purpose To examine how artificial intelligence (AI) assistance affects the performance and interobserver agreement of radiologists and orthopedists of various experience levels when evaluating KOA on radiographs according to the established Kellgren-Lawrence (KL) grading system. Materials and Methods In this retrospective observer performance study, consecutive standing knee radiographs from patients with suspected KOA were collected from three participating European centers between April 2019 and May 2022. Each center recruited four readers across radiology and orthopedic surgery at in-training and board-certified experience levels. KL grading (KL-0 = no KOA, KL-4 = severe KOA) on the frontal view was assessed by readers with and without assistance from a commercial AI tool. The majority vote of three musculoskeletal radiology consultants established the reference standard. The ordinal receiver operating characteristic method was used to estimate grading performance. Light kappa was used to estimate interrater agreement, and bootstrapped t statistics were used to compare groups. Results Seventy-five studies were included from each center, totaling 225 studies (mean patient age, 55 years ± 15 [SD]; 113 female patients). The KL grades were KL-0, 24.0% (n = 54); KL-1, 28.0% (n = 63); KL-2, 21.8% (n = 49); KL-3, 18.7% (n = 42); and KL-4, 7.6% (n = 17). Eleven readers completed their readings. Three of the six junior readers showed higher KL grading performance with versus without AI assistance (area under the receiver operating characteristic curve, 0.81 ± 0.017 [SEM] vs 0.88 ± 0.011 [P < .001]; 0.76 ± 0.018 vs 0.86 ± 0.013 [P < .001]; and 0.89 ± 0.011 vs 0.91 ± 0.009 [P = .008]). Interobserver agreement for KL grading among all readers was higher with versus without AI assistance (κ = 0.77 ± 0.018 [SEM] vs 0.85 ± 0.013; P < .001). Board-certified radiologists achieved almost perfect agreement for KL grading when assisted by AI (κ = 0.90 ± 0.01), which was higher than that achieved by the reference readers independently (κ = 0.84 ± 0.017; P = .01). Conclusion AI assistance increased junior readers' radiographic KOA grading performance and increased interobserver agreement for osteoarthritis grading across all readers and experience levels. Published under a CC BY 4.0 license. Supplemental material is available for this article.","PeriodicalId":20896,"journal":{"name":"Radiology","volume":"312 1","pages":"e233341"},"PeriodicalIF":12.1000,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Radiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1148/radiol.233341","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}

引用次数: 0

Abstract

Background Due to conflicting findings in the literature, there are concerns about a lack of objectivity in grading knee osteoarthritis (KOA) on radiographs. Purpose To examine how artificial intelligence (AI) assistance affects the performance and interobserver agreement of radiologists and orthopedists of various experience levels when evaluating KOA on radiographs according to the established Kellgren-Lawrence (KL) grading system. Materials and Methods In this retrospective observer performance study, consecutive standing knee radiographs from patients with suspected KOA were collected from three participating European centers between April 2019 and May 2022. Each center recruited four readers across radiology and orthopedic surgery at in-training and board-certified experience levels. KL grading (KL-0 = no KOA, KL-4 = severe KOA) on the frontal view was assessed by readers with and without assistance from a commercial AI tool. The majority vote of three musculoskeletal radiology consultants established the reference standard. The ordinal receiver operating characteristic method was used to estimate grading performance. Light kappa was used to estimate interrater agreement, and bootstrapped t statistics were used to compare groups. Results Seventy-five studies were included from each center, totaling 225 studies (mean patient age, 55 years ± 15 [SD]; 113 female patients). The KL grades were KL-0, 24.0% (n = 54); KL-1, 28.0% (n = 63); KL-2, 21.8% (n = 49); KL-3, 18.7% (n = 42); and KL-4, 7.6% (n = 17). Eleven readers completed their readings. Three of the six junior readers showed higher KL grading performance with versus without AI assistance (area under the receiver operating characteristic curve, 0.81 ± 0.017 [SEM] vs 0.88 ± 0.011 [P < .001]; 0.76 ± 0.018 vs 0.86 ± 0.013 [P < .001]; and 0.89 ± 0.011 vs 0.91 ± 0.009 [P = .008]). Interobserver agreement for KL grading among all readers was higher with versus without AI assistance (κ = 0.77 ± 0.018 [SEM] vs 0.85 ± 0.013; P < .001). Board-certified radiologists achieved almost perfect agreement for KL grading when assisted by AI (κ = 0.90 ± 0.01), which was higher than that achieved by the reference readers independently (κ = 0.84 ± 0.017; P = .01). Conclusion AI assistance increased junior readers' radiographic KOA grading performance and increased interobserver agreement for osteoarthritis grading across all readers and experience levels. Published under a CC BY 4.0 license. Supplemental material is available for this article.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

膝关节骨性关节炎放射学评估的观察者间一致性和同期人工智能辅助性能

背景由于文献中的研究结果相互矛盾，人们担心在对X光片上的膝关节骨性关节炎（KOA）进行分级时缺乏客观性。目的研究人工智能（AI）辅助如何影响具有不同经验水平的放射科医生和骨科医生根据既定的 Kellgren-Lawrence （KL）分级系统对 X 光片上的 KOA 进行评估时的表现和观察者之间的一致性。材料与方法在这项回顾性观察者表现研究中，2019 年 4 月至 2022 年 5 月期间，从三个参与研究的欧洲中心收集了疑似 KOA 患者的连续站立膝关节 X 光片。每个中心都招募了四名放射科和骨科手术科的读片员，他们分别具有在训和委员会认证的经验水平。正面视图上的 KL 分级（KL-0 = 无 KOA，KL-4 = 严重 KOA）由读者在商业人工智能工具的协助下和不协助下进行评估。三位肌肉骨骼放射学顾问以多数票确定了参考标准。采用序数接收器操作特征法估算分级结果。Light kappa 用于估算检查者之间的一致性，自引导 t 统计用于比较组别。结果每个中心共纳入 75 项研究，共计 225 项研究（患者平均年龄为 55 岁 ± 15 [SD]；113 名女性患者）。KL 分级为：KL-0，24.0%（n = 54）；KL-1，28.0%（n = 63）；KL-2，21.8%（n = 49）；KL-3，18.7%（n = 42）；KL-4，7.6%（n = 17）。11 名读者完成了阅读。六名初级读者中有三人在有人工智能辅助的情况下，KL 分级成绩高于无人工智能辅助的情况（接收者操作特征曲线下面积，0.81 ± 0.017 [SEM] vs 0.88 ± 0.011 [P < .001]；0.76 ± 0.018 vs 0.86 ± 0.013 [P < .001]；0.89 ± 0.011 vs 0.91 ± 0.009 [P=0.008]）。在有人工智能辅助的情况下（κ = 0.77 ± 0.018 [SEM] vs 0.85 ± 0.013; P < .001），所有读片者的 KL 分级的观察者间一致性更高。经认证的放射科医师在人工智能协助下对 KL 进行分级时几乎完全一致（κ = 0.90 ± 0.01），高于参考读者独立分级的结果（κ = 0.84 ± 0.017; P = .01）。结论人工智能辅助提高了初级读者的放射学KOA分级能力，并增加了所有读者和经验水平的骨关节炎分级的观察者间一致性。以 CC BY 4.0 许可发布。本文有补充材料。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Radiology 医学-核医学

CiteScore

35.20

自引率

3.00%

发文量

596

审稿时长

3.6 months

期刊介绍： Published regularly since 1923 by the Radiological Society of North America (RSNA), Radiology has long been recognized as the authoritative reference for the most current, clinically relevant and highest quality research in the field of radiology. Each month the journal publishes approximately 240 pages of peer-reviewed original research, authoritative reviews, well-balanced commentary on significant articles, and expert opinion on new techniques and technologies. Radiology publishes cutting edge and impactful imaging research articles in radiology and medical imaging in order to help improve human health.