{"title":"Leveraging Pretrained Transformers for Efficient Segmentation and Lesion Detection in Cone-Beam Computed Tomography Scans","authors":"","doi":"10.1016/j.joen.2024.07.012","DOIUrl":null,"url":null,"abstract":"<div><h3>Introduction</h3><div>Cone-beam computed tomography (CBCT) is widely used to detect jaw lesions, although CBCT interpretation is time-consuming and challenging. Artificial intelligence for CBCT segmentation may improve lesion detection accuracy. However, consistent automated lesion detection remains difficult, especially with limited training data. This study aimed to assess the applicability of pretrained transformer-based architectures for semantic segmentation of CBCT volumes when applied to periapical lesion detection.</div></div><div><h3>Methods</h3><div>CBCT volumes (<em>n</em> = 138) were collected and annotated by expert clinicians using 5 labels – \"lesion,\" \"restorative material,\" \"bone,\" \"tooth structure,\" and \"background.\" U-Net (convolutional neural network-based) and Swin-UNETR (transformer-based) models, pretrained (Swin-UNETR-PRETRAIN), and from scratch (Swin-UNETR-SCRATCH), were trained with subsets of the annotated CBCTs. These models were then evaluated for semantic segmentation performance using the Sørensen–Dice coefficient (DICE), lesion detection performance using sensitivity and specificity, and training sample size requirements by comparing models trained with 20, 40, 60, or 103 samples.</div></div><div><h3>Results</h3><div>Trained with 103 samples, Swin-UNETR-PRETRAIN achieved a DICE of 0.8512 for \"lesion,\" 0.8282 for \"restorative materials,\" 0.9178 for \"bone,\" 0.9029 for \"tooth structure,\" and 0.9901 for \"background.\" “Lesion” DICE was statistically similar between Swin-UNETR-PRETRAIN trained with 103 and 60 images (<em>P</em> > .05), with the latter achieving 1.00 sensitivity and 0.94 specificity in lesion detection. With small training sets, Swin-UNETR-PRETRAIN outperformed Swin-UNETR-SCRATCH in DICE over all labels (<em>P</em> < .001 [<em>n</em> = 20], <em>P</em> < .001 [<em>n</em> = 40]), and U-Net in lesion detection specificity (<em>P</em> = .006 [<em>n</em> = 20], <em>P</em> = .031 [<em>n</em> = 40]).</div></div><div><h3>Conclusions</h3><div>Transformer-based Swin-UNETR architectures allowed for excellent semantic segmentation and periapical lesion detection. Pretrained, it may provide an alternative with smaller training datasets compared to classic U-Net architectures.</div></div>","PeriodicalId":15703,"journal":{"name":"Journal of endodontics","volume":null,"pages":null},"PeriodicalIF":3.5000,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of endodontics","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0099239924004084","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"DENTISTRY, ORAL SURGERY & MEDICINE","Score":null,"Total":0}
引用次数: 0
Abstract
Introduction
Cone-beam computed tomography (CBCT) is widely used to detect jaw lesions, although CBCT interpretation is time-consuming and challenging. Artificial intelligence for CBCT segmentation may improve lesion detection accuracy. However, consistent automated lesion detection remains difficult, especially with limited training data. This study aimed to assess the applicability of pretrained transformer-based architectures for semantic segmentation of CBCT volumes when applied to periapical lesion detection.
Methods
CBCT volumes (n = 138) were collected and annotated by expert clinicians using 5 labels – "lesion," "restorative material," "bone," "tooth structure," and "background." U-Net (convolutional neural network-based) and Swin-UNETR (transformer-based) models, pretrained (Swin-UNETR-PRETRAIN), and from scratch (Swin-UNETR-SCRATCH), were trained with subsets of the annotated CBCTs. These models were then evaluated for semantic segmentation performance using the Sørensen–Dice coefficient (DICE), lesion detection performance using sensitivity and specificity, and training sample size requirements by comparing models trained with 20, 40, 60, or 103 samples.
Results
Trained with 103 samples, Swin-UNETR-PRETRAIN achieved a DICE of 0.8512 for "lesion," 0.8282 for "restorative materials," 0.9178 for "bone," 0.9029 for "tooth structure," and 0.9901 for "background." “Lesion” DICE was statistically similar between Swin-UNETR-PRETRAIN trained with 103 and 60 images (P > .05), with the latter achieving 1.00 sensitivity and 0.94 specificity in lesion detection. With small training sets, Swin-UNETR-PRETRAIN outperformed Swin-UNETR-SCRATCH in DICE over all labels (P < .001 [n = 20], P < .001 [n = 40]), and U-Net in lesion detection specificity (P = .006 [n = 20], P = .031 [n = 40]).
Conclusions
Transformer-based Swin-UNETR architectures allowed for excellent semantic segmentation and periapical lesion detection. Pretrained, it may provide an alternative with smaller training datasets compared to classic U-Net architectures.
期刊介绍:
The Journal of Endodontics, the official journal of the American Association of Endodontists, publishes scientific articles, case reports and comparison studies evaluating materials and methods of pulp conservation and endodontic treatment. Endodontists and general dentists can learn about new concepts in root canal treatment and the latest advances in techniques and instrumentation in the one journal that helps them keep pace with rapid changes in this field.