Fara A Fernandes, Mouzhi Ge, Georgi Chaltikyan, Martin W Gerdes, Christian W Omlin
{"title":"Preparing for downstream tasks in AI for dental radiology: a baseline performance comparison of deep learning models.","authors":"Fara A Fernandes, Mouzhi Ge, Georgi Chaltikyan, Martin W Gerdes, Christian W Omlin","doi":"10.1093/dmfr/twae056","DOIUrl":null,"url":null,"abstract":"<p><strong>Objectives: </strong>To compare the performance of the convolutional neural network (CNN) with the vision transformer (ViT) and the gated multilayer perceptron (gMLP) in the classification of radiographic images of dental structures.</p><p><strong>Methods: </strong>Retrospectively collected 2-dimensional images derived from cone beam computed tomographic volumes were used to train CNN, ViT and gMLP architectures as classifiers for 4 different cases. Cases selected for training the architectures were the classification of the radiographic appearance of maxillary sinuses, maxillary and mandibular incisors, presence or absence of the mental foramen and the positional relationship of the mandibular third molar to the inferior alveolar nerve canal. The performance metrics (sensitivity, specificity, precision, accuracy and f1-score) and area under curve (AUC) - receiver operating characteristic and precision-recall curves were calculated.</p><p><strong>Results: </strong>The ViT with an accuracy of 0.74-0.98, performed on par with the CNN model (accuracy 0.71-0.99) in all tasks. The gMLP displayed marginally lower performance (accuracy 0.65-0.98) as compared to the CNN and ViT. For certain tasks, the ViT outperformed the CNN. The AUCs ranged from 0.77-1.00 (CNN), 0.80-1.00 (ViT) and 0.73-1.00 (gMLP) for all of the 4 cases.</p><p><strong>Conclusions: </strong>The difference in performance of the ViT, gMLP and the CNN (the current state-of-the-art) was significant in certain tasks. This difference in model performance for various tasks proves that capabilities of different architectures may be leveraged.</p><p><strong>Advances in knowledge: </strong>The vision transformer, followed by the gated multilayer perceptron are deep learning models that exhibit comparable performance with the convolutional neural network in the classification of dental radiographic images.</p>","PeriodicalId":11261,"journal":{"name":"Dento maxillo facial radiology","volume":" ","pages":""},"PeriodicalIF":2.9000,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Dento maxillo facial radiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1093/dmfr/twae056","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"DENTISTRY, ORAL SURGERY & MEDICINE","Score":null,"Total":0}
引用次数: 0
Abstract
Objectives: To compare the performance of the convolutional neural network (CNN) with the vision transformer (ViT) and the gated multilayer perceptron (gMLP) in the classification of radiographic images of dental structures.
Methods: Retrospectively collected 2-dimensional images derived from cone beam computed tomographic volumes were used to train CNN, ViT and gMLP architectures as classifiers for 4 different cases. Cases selected for training the architectures were the classification of the radiographic appearance of maxillary sinuses, maxillary and mandibular incisors, presence or absence of the mental foramen and the positional relationship of the mandibular third molar to the inferior alveolar nerve canal. The performance metrics (sensitivity, specificity, precision, accuracy and f1-score) and area under curve (AUC) - receiver operating characteristic and precision-recall curves were calculated.
Results: The ViT with an accuracy of 0.74-0.98, performed on par with the CNN model (accuracy 0.71-0.99) in all tasks. The gMLP displayed marginally lower performance (accuracy 0.65-0.98) as compared to the CNN and ViT. For certain tasks, the ViT outperformed the CNN. The AUCs ranged from 0.77-1.00 (CNN), 0.80-1.00 (ViT) and 0.73-1.00 (gMLP) for all of the 4 cases.
Conclusions: The difference in performance of the ViT, gMLP and the CNN (the current state-of-the-art) was significant in certain tasks. This difference in model performance for various tasks proves that capabilities of different architectures may be leveraged.
Advances in knowledge: The vision transformer, followed by the gated multilayer perceptron are deep learning models that exhibit comparable performance with the convolutional neural network in the classification of dental radiographic images.
期刊介绍:
Dentomaxillofacial Radiology (DMFR) is the journal of the International Association of Dentomaxillofacial Radiology (IADMFR) and covers the closely related fields of oral radiology and head and neck imaging.
Established in 1972, DMFR is a key resource keeping dentists, radiologists and clinicians and scientists with an interest in Head and Neck imaging abreast of important research and developments in oral and maxillofacial radiology.
The DMFR editorial board features a panel of international experts including Editor-in-Chief Professor Ralf Schulze. Our editorial board provide their expertise and guidance in shaping the content and direction of the journal.
Quick Facts:
- 2015 Impact Factor - 1.919
- Receipt to first decision - average of 3 weeks
- Acceptance to online publication - average of 3 weeks
- Open access option
- ISSN: 0250-832X
- eISSN: 1476-542X