{"title":"带有 ECAP 的 CrossViT:用于颌骨病变分类的增强型深度学习。","authors":"Wannakamon Panyarak , Wattanapong Suttapak , Phattaranant Mahasantipiya , Arnon Charuakkra , Nattanit Boonsong , Kittichai Wantanajittikul , Anak Iamaroon","doi":"10.1016/j.ijmedinf.2024.105666","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>Radiolucent jaw lesions like ameloblastoma (AM), dentigerous cyst (DC), odontogenic keratocyst (OKC), and radicular cyst (RC) often share similar characteristics, making diagnosis challenging. In 2021, CrossViT, a novel deep learning approach using multi-scale vision transformers (ViT) with cross-attention, emerged for accurate image classification. Additionally, we introduced Extended Cropping and Padding (ECAP), a method to expand training data by iteratively cropping smaller images while preserving context. However, its application in dental radiographic classification remains unexplored. This study investigates the effectiveness of CrossViTs and ECAP against ResNets for classifying common radiolucent jaw lesions.</div></div><div><h3>Methods</h3><div>We conducted a retrospective study involving 208 prevalent radiolucent jaw lesions (49 AMs, 59 DCs, 48 OKCs, and 54 RCs) observed in panoramic radiographs or orthopantomograms (OPGs) with confirmed histological diagnoses. Three experienced oral radiologists provided annotations with consensus. We implemented horizontal flip and ECAP technique with CrossViT-15, −18, ResNet-50, −101, and −152. A four-fold cross-validation approach was employed. The models’ performance assessed through accuracy, specificity, precision, recall (sensitivity), F1-score, and area under the receiver operating characteristics (AUCs) metrics.</div></div><div><h3>Results</h3><div>Models using the ECAP technique generally achieved better results, with ResNet-152 showing a statistically significant increase in F1-score. CrossViT models consistently achieved higher accuracy, precision, recall, and F1-score compared to ResNet models, regardless of ECAP usage. CrossViT-18 achieved the best overall performance. While all models showed positive ability to differentiate lesions, DC had the highest AUCs (0.89–0.90) and OKC the lowest (0.72–0.81). Only CrossViT-15 achieved AUCs above 0.80 for all four lesion types.</div></div><div><h3>Conclusion</h3><div>ECAP, a targeted padding data technique, improves deep learning model performance for radiolucent jaw lesion classification. This context-preserving approach is beneficial for tasks requiring an understanding of the lesion’s surroundings. Combined with CrossViT models, ECAP shows promise for accurate classification, particularly for rare lesions with limited data.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"193 ","pages":"Article 105666"},"PeriodicalIF":3.7000,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"CrossViT with ECAP: Enhanced deep learning for jaw lesion classification\",\"authors\":\"Wannakamon Panyarak , Wattanapong Suttapak , Phattaranant Mahasantipiya , Arnon Charuakkra , Nattanit Boonsong , Kittichai Wantanajittikul , Anak Iamaroon\",\"doi\":\"10.1016/j.ijmedinf.2024.105666\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Background</h3><div>Radiolucent jaw lesions like ameloblastoma (AM), dentigerous cyst (DC), odontogenic keratocyst (OKC), and radicular cyst (RC) often share similar characteristics, making diagnosis challenging. In 2021, CrossViT, a novel deep learning approach using multi-scale vision transformers (ViT) with cross-attention, emerged for accurate image classification. Additionally, we introduced Extended Cropping and Padding (ECAP), a method to expand training data by iteratively cropping smaller images while preserving context. However, its application in dental radiographic classification remains unexplored. This study investigates the effectiveness of CrossViTs and ECAP against ResNets for classifying common radiolucent jaw lesions.</div></div><div><h3>Methods</h3><div>We conducted a retrospective study involving 208 prevalent radiolucent jaw lesions (49 AMs, 59 DCs, 48 OKCs, and 54 RCs) observed in panoramic radiographs or orthopantomograms (OPGs) with confirmed histological diagnoses. Three experienced oral radiologists provided annotations with consensus. We implemented horizontal flip and ECAP technique with CrossViT-15, −18, ResNet-50, −101, and −152. A four-fold cross-validation approach was employed. The models’ performance assessed through accuracy, specificity, precision, recall (sensitivity), F1-score, and area under the receiver operating characteristics (AUCs) metrics.</div></div><div><h3>Results</h3><div>Models using the ECAP technique generally achieved better results, with ResNet-152 showing a statistically significant increase in F1-score. CrossViT models consistently achieved higher accuracy, precision, recall, and F1-score compared to ResNet models, regardless of ECAP usage. CrossViT-18 achieved the best overall performance. While all models showed positive ability to differentiate lesions, DC had the highest AUCs (0.89–0.90) and OKC the lowest (0.72–0.81). Only CrossViT-15 achieved AUCs above 0.80 for all four lesion types.</div></div><div><h3>Conclusion</h3><div>ECAP, a targeted padding data technique, improves deep learning model performance for radiolucent jaw lesion classification. This context-preserving approach is beneficial for tasks requiring an understanding of the lesion’s surroundings. Combined with CrossViT models, ECAP shows promise for accurate classification, particularly for rare lesions with limited data.</div></div>\",\"PeriodicalId\":54950,\"journal\":{\"name\":\"International Journal of Medical Informatics\",\"volume\":\"193 \",\"pages\":\"Article 105666\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2024-10-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Medical Informatics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1386505624003290\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Medical Informatics","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1386505624003290","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
CrossViT with ECAP: Enhanced deep learning for jaw lesion classification
Background
Radiolucent jaw lesions like ameloblastoma (AM), dentigerous cyst (DC), odontogenic keratocyst (OKC), and radicular cyst (RC) often share similar characteristics, making diagnosis challenging. In 2021, CrossViT, a novel deep learning approach using multi-scale vision transformers (ViT) with cross-attention, emerged for accurate image classification. Additionally, we introduced Extended Cropping and Padding (ECAP), a method to expand training data by iteratively cropping smaller images while preserving context. However, its application in dental radiographic classification remains unexplored. This study investigates the effectiveness of CrossViTs and ECAP against ResNets for classifying common radiolucent jaw lesions.
Methods
We conducted a retrospective study involving 208 prevalent radiolucent jaw lesions (49 AMs, 59 DCs, 48 OKCs, and 54 RCs) observed in panoramic radiographs or orthopantomograms (OPGs) with confirmed histological diagnoses. Three experienced oral radiologists provided annotations with consensus. We implemented horizontal flip and ECAP technique with CrossViT-15, −18, ResNet-50, −101, and −152. A four-fold cross-validation approach was employed. The models’ performance assessed through accuracy, specificity, precision, recall (sensitivity), F1-score, and area under the receiver operating characteristics (AUCs) metrics.
Results
Models using the ECAP technique generally achieved better results, with ResNet-152 showing a statistically significant increase in F1-score. CrossViT models consistently achieved higher accuracy, precision, recall, and F1-score compared to ResNet models, regardless of ECAP usage. CrossViT-18 achieved the best overall performance. While all models showed positive ability to differentiate lesions, DC had the highest AUCs (0.89–0.90) and OKC the lowest (0.72–0.81). Only CrossViT-15 achieved AUCs above 0.80 for all four lesion types.
Conclusion
ECAP, a targeted padding data technique, improves deep learning model performance for radiolucent jaw lesion classification. This context-preserving approach is beneficial for tasks requiring an understanding of the lesion’s surroundings. Combined with CrossViT models, ECAP shows promise for accurate classification, particularly for rare lesions with limited data.
期刊介绍:
International Journal of Medical Informatics provides an international medium for dissemination of original results and interpretative reviews concerning the field of medical informatics. The Journal emphasizes the evaluation of systems in healthcare settings.
The scope of journal covers:
Information systems, including national or international registration systems, hospital information systems, departmental and/or physician''s office systems, document handling systems, electronic medical record systems, standardization, systems integration etc.;
Computer-aided medical decision support systems using heuristic, algorithmic and/or statistical methods as exemplified in decision theory, protocol development, artificial intelligence, etc.
Educational computer based programs pertaining to medical informatics or medicine in general;
Organizational, economic, social, clinical impact, ethical and cost-benefit aspects of IT applications in health care.