带有 ECAP 的 CrossViT：用于颌骨病变分类的增强型深度学习。

IF 3.7 2区医学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS International Journal of Medical Informatics Pub Date : 2024-10-28 DOI:10.1016/j.ijmedinf.2024.105666

Wannakamon Panyarak , Wattanapong Suttapak , Phattaranant Mahasantipiya , Arnon Charuakkra , Nattanit Boonsong , Kittichai Wantanajittikul , Anak Iamaroon

{"title":"带有 ECAP 的 CrossViT：用于颌骨病变分类的增强型深度学习。","authors":"Wannakamon Panyarak , Wattanapong Suttapak , Phattaranant Mahasantipiya , Arnon Charuakkra , Nattanit Boonsong , Kittichai Wantanajittikul , Anak Iamaroon","doi":"10.1016/j.ijmedinf.2024.105666","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>Radiolucent jaw lesions like ameloblastoma (AM), dentigerous cyst (DC), odontogenic keratocyst (OKC), and radicular cyst (RC) often share similar characteristics, making diagnosis challenging. In 2021, CrossViT, a novel deep learning approach using multi-scale vision transformers (ViT) with cross-attention, emerged for accurate image classification. Additionally, we introduced Extended Cropping and Padding (ECAP), a method to expand training data by iteratively cropping smaller images while preserving context. However, its application in dental radiographic classification remains unexplored. This study investigates the effectiveness of CrossViTs and ECAP against ResNets for classifying common radiolucent jaw lesions.</div></div><div><h3>Methods</h3><div>We conducted a retrospective study involving 208 prevalent radiolucent jaw lesions (49 AMs, 59 DCs, 48 OKCs, and 54 RCs) observed in panoramic radiographs or orthopantomograms (OPGs) with confirmed histological diagnoses. Three experienced oral radiologists provided annotations with consensus. We implemented horizontal flip and ECAP technique with CrossViT-15, −18, ResNet-50, −101, and −152. A four-fold cross-validation approach was employed. The models’ performance assessed through accuracy, specificity, precision, recall (sensitivity), F1-score, and area under the receiver operating characteristics (AUCs) metrics.</div></div><div><h3>Results</h3><div>Models using the ECAP technique generally achieved better results, with ResNet-152 showing a statistically significant increase in F1-score. CrossViT models consistently achieved higher accuracy, precision, recall, and F1-score compared to ResNet models, regardless of ECAP usage. CrossViT-18 achieved the best overall performance. While all models showed positive ability to differentiate lesions, DC had the highest AUCs (0.89–0.90) and OKC the lowest (0.72–0.81). Only CrossViT-15 achieved AUCs above 0.80 for all four lesion types.</div></div><div><h3>Conclusion</h3><div>ECAP, a targeted padding data technique, improves deep learning model performance for radiolucent jaw lesion classification. This context-preserving approach is beneficial for tasks requiring an understanding of the lesion’s surroundings. Combined with CrossViT models, ECAP shows promise for accurate classification, particularly for rare lesions with limited data.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"193 ","pages":"Article 105666"},"PeriodicalIF":3.7000,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"CrossViT with ECAP: Enhanced deep learning for jaw lesion classification\",\"authors\":\"Wannakamon Panyarak , Wattanapong Suttapak , Phattaranant Mahasantipiya , Arnon Charuakkra , Nattanit Boonsong , Kittichai Wantanajittikul , Anak Iamaroon\",\"doi\":\"10.1016/j.ijmedinf.2024.105666\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Background</h3><div>Radiolucent jaw lesions like ameloblastoma (AM), dentigerous cyst (DC), odontogenic keratocyst (OKC), and radicular cyst (RC) often share similar characteristics, making diagnosis challenging. In 2021, CrossViT, a novel deep learning approach using multi-scale vision transformers (ViT) with cross-attention, emerged for accurate image classification. Additionally, we introduced Extended Cropping and Padding (ECAP), a method to expand training data by iteratively cropping smaller images while preserving context. However, its application in dental radiographic classification remains unexplored. This study investigates the effectiveness of CrossViTs and ECAP against ResNets for classifying common radiolucent jaw lesions.</div></div><div><h3>Methods</h3><div>We conducted a retrospective study involving 208 prevalent radiolucent jaw lesions (49 AMs, 59 DCs, 48 OKCs, and 54 RCs) observed in panoramic radiographs or orthopantomograms (OPGs) with confirmed histological diagnoses. Three experienced oral radiologists provided annotations with consensus. We implemented horizontal flip and ECAP technique with CrossViT-15, −18, ResNet-50, −101, and −152. A four-fold cross-validation approach was employed. The models’ performance assessed through accuracy, specificity, precision, recall (sensitivity), F1-score, and area under the receiver operating characteristics (AUCs) metrics.</div></div><div><h3>Results</h3><div>Models using the ECAP technique generally achieved better results, with ResNet-152 showing a statistically significant increase in F1-score. CrossViT models consistently achieved higher accuracy, precision, recall, and F1-score compared to ResNet models, regardless of ECAP usage. CrossViT-18 achieved the best overall performance. While all models showed positive ability to differentiate lesions, DC had the highest AUCs (0.89–0.90) and OKC the lowest (0.72–0.81). Only CrossViT-15 achieved AUCs above 0.80 for all four lesion types.</div></div><div><h3>Conclusion</h3><div>ECAP, a targeted padding data technique, improves deep learning model performance for radiolucent jaw lesion classification. This context-preserving approach is beneficial for tasks requiring an understanding of the lesion’s surroundings. Combined with CrossViT models, ECAP shows promise for accurate classification, particularly for rare lesions with limited data.</div></div>\",\"PeriodicalId\":54950,\"journal\":{\"name\":\"International Journal of Medical Informatics\",\"volume\":\"193 \",\"pages\":\"Article 105666\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2024-10-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Medical Informatics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1386505624003290\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Medical Informatics","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1386505624003290","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

背景：颌骨放射性病变，如釉母细胞瘤（AM）、齿状囊肿（DC）、牙源性角化囊肿（OKC）和根状囊肿（RC），往往具有相似的特征，这给诊断带来了挑战。2021 年，CrossViT--一种使用多尺度视觉变换器（ViT）和交叉注意的新型深度学习方法--应运而生，用于准确的图像分类。此外，我们还引入了扩展裁剪和填充（ECAP），这是一种通过迭代裁剪较小图像来扩展训练数据，同时保留上下文的方法。然而，这种方法在牙科放射成像分类中的应用仍有待探索。本研究调查了 CrossViTs 和 ECAP 与 ResNets 相比在颌骨常见放射病变分类中的有效性：我们进行了一项回顾性研究，涉及在全景X光片或正侧位X光片（OPG）中观察到的 208 个经组织学确诊的颌骨放射性病变（49 个 AM、59 个 DC、48 个 OKC 和 54 个 RC）。三位经验丰富的口腔放射科医生提供了具有共识的注释。我们使用 CrossViT-15、-18、ResNet-50、-101 和 -152 实现了水平翻转和 ECAP 技术。我们采用了四倍交叉验证方法。通过准确度、特异性、精确度、召回率（灵敏度）、F1 分数和接收器工作特征下面积（AUCs）指标评估了模型的性能：结果：使用 ECAP 技术的模型普遍取得了更好的结果，ResNet-152 的 F1 分数在统计上有显著提高。与 ResNet 模型相比，无论使用 ECAP 技术与否，CrossViT 模型的准确度、精确度、召回率和 F1 分数都更高。CrossViT-18 的整体性能最佳。虽然所有模型都显示出了区分病变的积极能力，但 DC 的 AUC 最高（0.89-0.90），OKC 最低（0.72-0.81）。只有 CrossViT-15 对所有四种病变类型的 AUC 都超过了 0.80：ECAP是一种有针对性的填充数据技术，可提高深度学习模型在颌骨放射性病变分类中的性能。这种保留上下文的方法有利于需要了解病变周围环境的任务。结合 CrossViT 模型，ECAP 有望实现准确分类，尤其是针对数据有限的罕见病变。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

CrossViT with ECAP: Enhanced deep learning for jaw lesion classification

Background

Radiolucent jaw lesions like ameloblastoma (AM), dentigerous cyst (DC), odontogenic keratocyst (OKC), and radicular cyst (RC) often share similar characteristics, making diagnosis challenging. In 2021, CrossViT, a novel deep learning approach using multi-scale vision transformers (ViT) with cross-attention, emerged for accurate image classification. Additionally, we introduced Extended Cropping and Padding (ECAP), a method to expand training data by iteratively cropping smaller images while preserving context. However, its application in dental radiographic classification remains unexplored. This study investigates the effectiveness of CrossViTs and ECAP against ResNets for classifying common radiolucent jaw lesions.

Methods

We conducted a retrospective study involving 208 prevalent radiolucent jaw lesions (49 AMs, 59 DCs, 48 OKCs, and 54 RCs) observed in panoramic radiographs or orthopantomograms (OPGs) with confirmed histological diagnoses. Three experienced oral radiologists provided annotations with consensus. We implemented horizontal flip and ECAP technique with CrossViT-15, −18, ResNet-50, −101, and −152. A four-fold cross-validation approach was employed. The models’ performance assessed through accuracy, specificity, precision, recall (sensitivity), F1-score, and area under the receiver operating characteristics (AUCs) metrics.

Results

Models using the ECAP technique generally achieved better results, with ResNet-152 showing a statistically significant increase in F1-score. CrossViT models consistently achieved higher accuracy, precision, recall, and F1-score compared to ResNet models, regardless of ECAP usage. CrossViT-18 achieved the best overall performance. While all models showed positive ability to differentiate lesions, DC had the highest AUCs (0.89–0.90) and OKC the lowest (0.72–0.81). Only CrossViT-15 achieved AUCs above 0.80 for all four lesion types.

Conclusion

ECAP, a targeted padding data technique, improves deep learning model performance for radiolucent jaw lesion classification. This context-preserving approach is beneficial for tasks requiring an understanding of the lesion’s surroundings. Combined with CrossViT models, ECAP shows promise for accurate classification, particularly for rare lesions with limited data.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Journal of Medical Informatics 医学-计算机：信息系统

CiteScore

8.90

自引率

4.10%

发文量

217

审稿时长

42 days

期刊介绍： International Journal of Medical Informatics provides an international medium for dissemination of original results and interpretative reviews concerning the field of medical informatics. The Journal emphasizes the evaluation of systems in healthcare settings. The scope of journal covers: Information systems, including national or international registration systems, hospital information systems, departmental and/or physician''s office systems, document handling systems, electronic medical record systems, standardization, systems integration etc.; Computer-aided medical decision support systems using heuristic, algorithmic and/or statistical methods as exemplified in decision theory, protocol development, artificial intelligence, etc. Educational computer based programs pertaining to medical informatics or medicine in general; Organizational, economic, social, clinical impact, ethical and cost-benefit aspects of IT applications in health care.