Deep learning-based detection and classification of lumbar disc herniation on magnetic resonance images

IF 3.4 3区医学 Q1 ORTHOPEDICS JOR Spine Pub Date : 2023-08-14 DOI:10.1002/jsp2.1276

Weicong Zhang, Ziyang Chen, Zhihai Su, Zhengyan Wang, Jinjin Hai, Chengjie Huang, Yuhan Wang, Bin Yan, Hai Lu

{"title":"Deep learning-based detection and classification of lumbar disc herniation on magnetic resonance images","authors":"Weicong Zhang, Ziyang Chen, Zhihai Su, Zhengyan Wang, Jinjin Hai, Chengjie Huang, Yuhan Wang, Bin Yan, Hai Lu","doi":"10.1002/jsp2.1276","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Background</h3>\n \n <p>The severity assessment of lumbar disc herniation (LDH) on MR images is crucial for selecting suitable surgical candidates. However, the interpretation of MR images is time-consuming and requires repetitive work. This study aims to develop and evaluate a deep learning-based diagnostic model for automated LDH detection and classification on lumbar axial T2-weighted MR images.</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>A total of 1115 patients were analyzed in this retrospective study; both a development dataset (1015 patients, 15 249 images) and an external test dataset (100 patients, 1273 images) were utilized. According to the Michigan State University (MSU) classification criterion, experts labeled all images with consensus, and the final labeled results were regarded as the reference standard. The automated diagnostic model comprised Faster R-CNN and ResNeXt101 as the detection and classification network, respectively. The deep learning-based diagnostic performance was evaluated by calculating mean intersection over union (IoU), accuracy, precision, sensitivity, specificity, F1 score, the area under the receiver operating characteristics curve (AUC), and intraclass correlation coefficient (ICC) with 95% confidence intervals (CIs).</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>High detection consistency was obtained in the internal test dataset (mean IoU = 0.82, precision = 98.4%, sensitivity = 99.4%) and external test dataset (mean IoU = 0.70, precision = 96.3%, sensitivity = 97.8%). Overall accuracy for LDH classification was 87.70% (95% CI: 86.59%–88.86%) and 74.23% (95% CI: 71.83%–76.75%) in the internal and external test datasets, respectively. For internal testing, the proposed model achieved a high agreement in classification (ICC = 0.87, 95% CI: 0.86–0.88, <i>P</i> < 0.001), which was higher than that of external testing (ICC = 0.79, 95% CI: 0.76–0.81, <i>P</i> < 0.001). The AUC for model classification was 0.965 (95% CI: 0.962–0.968) and 0.916 (95% CI: 0.908–0.925) in the internal and external test datasets, respectively.</p>\n </section>\n \n <section>\n \n <h3> Conclusions</h3>\n \n <p>The automated diagnostic model achieved high performance in detecting and classifying LDH and exhibited considerable consistency with experts' classification.</p>\n </section>\n </div>","PeriodicalId":14876,"journal":{"name":"JOR Spine","volume":null,"pages":null},"PeriodicalIF":3.4000,"publicationDate":"2023-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/e6/2d/JSP2-6-e1276.PMC10540823.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JOR Spine","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/jsp2.1276","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ORTHOPEDICS","Score":null,"Total":0}

引用次数: 0

Abstract

Background

The severity assessment of lumbar disc herniation (LDH) on MR images is crucial for selecting suitable surgical candidates. However, the interpretation of MR images is time-consuming and requires repetitive work. This study aims to develop and evaluate a deep learning-based diagnostic model for automated LDH detection and classification on lumbar axial T2-weighted MR images.

Methods

A total of 1115 patients were analyzed in this retrospective study; both a development dataset (1015 patients, 15 249 images) and an external test dataset (100 patients, 1273 images) were utilized. According to the Michigan State University (MSU) classification criterion, experts labeled all images with consensus, and the final labeled results were regarded as the reference standard. The automated diagnostic model comprised Faster R-CNN and ResNeXt101 as the detection and classification network, respectively. The deep learning-based diagnostic performance was evaluated by calculating mean intersection over union (IoU), accuracy, precision, sensitivity, specificity, F1 score, the area under the receiver operating characteristics curve (AUC), and intraclass correlation coefficient (ICC) with 95% confidence intervals (CIs).

Results

High detection consistency was obtained in the internal test dataset (mean IoU = 0.82, precision = 98.4%, sensitivity = 99.4%) and external test dataset (mean IoU = 0.70, precision = 96.3%, sensitivity = 97.8%). Overall accuracy for LDH classification was 87.70% (95% CI: 86.59%–88.86%) and 74.23% (95% CI: 71.83%–76.75%) in the internal and external test datasets, respectively. For internal testing, the proposed model achieved a high agreement in classification (ICC = 0.87, 95% CI: 0.86–0.88, P < 0.001), which was higher than that of external testing (ICC = 0.79, 95% CI: 0.76–0.81, P < 0.001). The AUC for model classification was 0.965 (95% CI: 0.962–0.968) and 0.916 (95% CI: 0.908–0.925) in the internal and external test datasets, respectively.

Conclusions

The automated diagnostic model achieved high performance in detecting and classifying LDH and exhibited considerable consistency with experts' classification.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于深度学习的磁共振图像腰椎间盘突出症的检测和分类。

背景：在MR图像上评估腰椎间盘突出症（LDH）的严重程度对于选择合适的手术候选者至关重要。然而，MR图像的解释是耗时的并且需要重复的工作。本研究旨在开发和评估一种基于深度学习的诊断模型，用于腰椎轴T2加权MR图像的LDH自动检测和分类。方法：对1115例患者进行回顾性分析；两个开发数据集（1015名患者，15名 249张图像）和外部测试数据集（100名患者、1273张图像）。根据密歇根州立大学（MSU）的分类标准，专家对所有图像进行了一致标记，最终标记结果作为参考标准。自动诊断模型包括分别作为检测和分类网络的Faster R-CNN和ResNeXt101。基于深度学习的诊断性能通过计算联合平均交叉点（IoU）、准确性、精密度、敏感性、特异性、F1评分、受试者工作特征曲线下面积（AUC）、，组内相关系数（ICC）和95%置信区间（CI）。结果：内部测试数据集中获得了较高的检测一致性（平均IoU = 0.82，精度 = 98.4%，灵敏度 = 99.4%）和外部测试数据集（平均IoU = 0.70，精度 = 96.3%，灵敏度 = 在内部和外部测试数据集中，LDH分类的总体准确率分别为87.70%（95%CI:86.59%-88.86%）和74.23%（95%CI:71.83%-76.75%）。对于内部测试，所提出的模型在分类上达成了高度一致（ICC = 0.87,95%置信区间：0.86-0.88，P P 结论：该自动诊断模型在LDH的检测和分类方面取得了较高的性能，并与专家的分类表现出相当的一致性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊