Evaluating CNN Architectures for the Automated Detection and Grading of Modic Changes in MRI: A Comparative Study.

IF 2.1 2区医学 Q2 ORTHOPEDICS Orthopaedic Surgery Pub Date : 2025-01-01 Epub Date: 2024-12-05 DOI:10.1111/os.14280

Li-Peng Xing, Gang Liu, Hao-Chen Zhang, Lei Wang, Shan Zhu, Man Du La Hua Bao, Yan-Ni Wang, Chao Chen, Zhi Wang, Xin-Yu Liu, Shuai Zhang, Qiang Yang

{"title":"Evaluating CNN Architectures for the Automated Detection and Grading of Modic Changes in MRI: A Comparative Study.","authors":"Li-Peng Xing, Gang Liu, Hao-Chen Zhang, Lei Wang, Shan Zhu, Man Du La Hua Bao, Yan-Ni Wang, Chao Chen, Zhi Wang, Xin-Yu Liu, Shuai Zhang, Qiang Yang","doi":"10.1111/os.14280","DOIUrl":null,"url":null,"abstract":"Objective: Modic changes (MCs) classification system is the most widely used method in magnetic resonance imaging (MRI) for characterizing subchondral vertebral marrow changes. However, it shows a high degree of sensitivity to variations in MRI because of its semiquantitative nature. In 2021, the authors of this classification system further proposed a quantitative and reliable MC grading method. However, automated tools to grade MCs are lacking. This study developed and investigated the performance of convolutional neural network (CNN) in detecting and grading MCs based on their maximum vertical extent. In order to verify performance, we tested CNNs' generalization performance, the performance of CNN with that of junior doctors, and the consistency of junior doctors after AI assistance.Methods: A retrospective analysis of 139 patients' MRIs with MCs was conducted and annotated by a spine surgeon. Of the 139 patients, MRIs from 109 patients were acquired using Philips scanners from June 2020 to June 2021, constituting Dataset 1. The remaining 30 patients had MRIs obtained from both Philips and United Imaging scanners from June 2022 to March 2023, forming Dataset 2. YOLOv8 and YOLOv5 were developed in PyCharm using the Python language and based on the PyTorch deep learning framework, data enhancement and transfer learning were applied to enhance model generalization. The model's performance was compared with precision, recall, F1 score, and mAP50. It also tested generalizability and compared it with the junior doctor's performance on the second data set (Dataset 2). Post hoc, the junior doctor graded Dataset 2 with CNN assistance. In addition, the region of interest was displayed using the class activation mapping heat map.Results: On the unseen test set, the YOLOv8 and YOLOv5 models achieved precision of 81.60% and 61.59%, recall of 80.90% and 67.16%, mAP50 of 84.40% and 68.88%, and F1 of 0.81 and 0.60 respectively. On Dataset 2, YOLOv8 and junior doctor achieved precision of 95.1% and 72.5%, recall of 68.3% and 60.6%. In the AI-assisted experiment, agreement between the junior doctor and the senior spine surgeon significantly improved from Cohen's kappa of 0.368-0.681.Conclusions: YOLOv8 in detecting and grading MCs was significantly superior to that of YOLOv5. The performance of YOLOv8 is superior to that of junior doctors, and it can enhance the capabilities of junior doctors and improve the reliability of diagnoses.","PeriodicalId":19566,"journal":{"name":"Orthopaedic Surgery","volume":" ","pages":"233-243"},"PeriodicalIF":2.1000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11735353/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Orthopaedic Surgery","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1111/os.14280","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/12/5 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"ORTHOPEDICS","Score":null,"Total":0}

引用次数: 0

Abstract

Objective: Modic changes (MCs) classification system is the most widely used method in magnetic resonance imaging (MRI) for characterizing subchondral vertebral marrow changes. However, it shows a high degree of sensitivity to variations in MRI because of its semiquantitative nature. In 2021, the authors of this classification system further proposed a quantitative and reliable MC grading method. However, automated tools to grade MCs are lacking. This study developed and investigated the performance of convolutional neural network (CNN) in detecting and grading MCs based on their maximum vertical extent. In order to verify performance, we tested CNNs' generalization performance, the performance of CNN with that of junior doctors, and the consistency of junior doctors after AI assistance.

Methods: A retrospective analysis of 139 patients' MRIs with MCs was conducted and annotated by a spine surgeon. Of the 139 patients, MRIs from 109 patients were acquired using Philips scanners from June 2020 to June 2021, constituting Dataset 1. The remaining 30 patients had MRIs obtained from both Philips and United Imaging scanners from June 2022 to March 2023, forming Dataset 2. YOLOv8 and YOLOv5 were developed in PyCharm using the Python language and based on the PyTorch deep learning framework, data enhancement and transfer learning were applied to enhance model generalization. The model's performance was compared with precision, recall, F1 score, and mAP50. It also tested generalizability and compared it with the junior doctor's performance on the second data set (Dataset 2). Post hoc, the junior doctor graded Dataset 2 with CNN assistance. In addition, the region of interest was displayed using the class activation mapping heat map.

Results: On the unseen test set, the YOLOv8 and YOLOv5 models achieved precision of 81.60% and 61.59%, recall of 80.90% and 67.16%, mAP50 of 84.40% and 68.88%, and F1 of 0.81 and 0.60 respectively. On Dataset 2, YOLOv8 and junior doctor achieved precision of 95.1% and 72.5%, recall of 68.3% and 60.6%. In the AI-assisted experiment, agreement between the junior doctor and the senior spine surgeon significantly improved from Cohen's kappa of 0.368-0.681.

Conclusions: YOLOv8 in detecting and grading MCs was significantly superior to that of YOLOv5. The performance of YOLOv8 is superior to that of junior doctors, and it can enhance the capabilities of junior doctors and improve the reliability of diagnoses.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

评估CNN架构对MRI动态变化的自动检测和分级：比较研究。

目的：Modic changes （MCs）分类系统是磁共振成像（MRI）中应用最广泛的表征软骨下椎体骨髓改变的方法。然而，由于其半定量的性质，它对MRI的变化表现出高度的敏感性。2021年，该分类体系的作者进一步提出了一种定量可靠的MC分级方法。然而，目前还缺乏对mc进行分级的自动化工具。本研究开发并研究了卷积神经网络（CNN）在基于最大垂直范围的mc检测和分级中的性能。为了验证性能，我们测试了CNN的泛化性能，CNN与初级医生的性能，以及AI辅助后初级医生的一致性。方法：回顾性分析139例MCs患者的mri，并由脊柱外科医生注释。在139名患者中，109名患者的mri是在2020年6月至2021年6月期间使用飞利浦扫描仪获得的，构成数据集1。其余30例患者在2022年6月至2023年3月期间通过Philips和United Imaging扫描仪获得mri，形成数据集2。YOLOv8和YOLOv5是在PyCharm中使用Python语言开发的，基于PyTorch深度学习框架，应用数据增强和迁移学习来增强模型泛化。将模型的性能与准确率、召回率、F1分数和mAP50进行比较。它还测试了泛化性，并将其与初级医生在第二个数据集（数据集2）上的表现进行了比较。事后，初级医生在CNN的帮助下对数据集2进行了评分。此外，使用类激活映射热图显示感兴趣的区域。结果：在未见测试集上，YOLOv8和YOLOv5模型的准确率分别为81.60%和61.59%，召回率分别为80.90%和67.16%，mAP50分别为84.40%和68.88%，F1分别为0.81和0.60。在数据集2上，YOLOv8和junior doctor的准确率分别为95.1%和72.5%，召回率分别为68.3%和60.6%。在人工智能辅助实验中，初级医生和高级脊柱外科医生之间的一致性较Cohen的0.368-0.681有明显提高。结论：YOLOv8对MCs的检测和分级明显优于YOLOv5。YOLOv8的性能优于初级医生，可以增强初级医生的能力，提高诊断的可靠性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Orthopaedic Surgery ORTHOPEDICS-

CiteScore

3.40

自引率

14.30%

发文量

374

审稿时长

20 weeks

期刊介绍： Orthopaedic Surgery (OS) is the official journal of the Chinese Orthopaedic Association, focusing on all aspects of orthopaedic technique and surgery. The journal publishes peer-reviewed articles in the following categories: Original Articles, Clinical Articles, Review Articles, Guidelines, Editorials, Commentaries, Surgical Techniques, Case Reports and Meeting Reports.