Exploring vision transformers for classifying early Barrett's dysplasia in endoscopic images: A pilot study on white-light and narrow-band imaging

IF 1.7 Q3 GASTROENTEROLOGY & HEPATOLOGY JGH Open Pub Date : 2024-09-25 DOI:10.1002/jgh3.70030

Jin L Tan, Dileepa Pitawela, Mohamed A Chinnaratha, Andrawus Beany, Enrik J Aguila, Hsiang-Ting Chen, Gustavo Carneiro, Rajvinder Singh

{"title":"Exploring vision transformers for classifying early Barrett's dysplasia in endoscopic images: A pilot study on white-light and narrow-band imaging","authors":"Jin L Tan, Dileepa Pitawela, Mohamed A Chinnaratha, Andrawus Beany, Enrik J Aguila, Hsiang-Ting Chen, Gustavo Carneiro, Rajvinder Singh","doi":"10.1002/jgh3.70030","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Background and Aim</h3>\n \n <p>Various deep learning models, based on convolutional neural network (CNN), have been shown to improve the detection of early esophageal neoplasia in Barrett's esophagus. Vision transformer (ViT), derived from natural language processing, has emerged as the new state-of-the-art for image recognition, outperforming predecessors such as CNN. This pilot study explores the use of ViT to classify the presence or absence of early esophageal neoplasia in endoscopic images of Barrett's esophagus.</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>A BO dataset of 1918 images of Barrett's esophagus from 267 unique patients was used. The images were classified as dysplastic (D-BO) or non-dysplastic (ND-BO). A pretrained vision transformer model, ViTBase16, was used to develop our classifier models. Three ViT models were developed for comparison based on imaging modality: white-light imaging (WLI), narrow-band imaging (NBI), and combined modalities. Performance of each model was evaluated based on accuracy, sensitivity, specificity, confusion matrices, and receiver operating characteristic curves.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>The ViT models demonstrated the following performance: WLI-ViT (Accuracy: 92%, Sensitivity: 82%, Specificity: 95%), NBI-ViT (Accuracy: 99%, Sensitivity: 97%, Specificity: 99%), and combined modalities-ViT (Accuracy: 93%, Sensitivity: 87%, Specificity: 95%). Combined modalities-ViT showed greater accuracy (94% <i>vs</i> 90%) and sensitivity (80% <i>vs</i> 70%) compared with WLI-ViT when classifying WLI images on a subgroup testing set.</p>\n </section>\n \n <section>\n \n <h3> Conclusion</h3>\n \n <p>ViT exhibited high accuracy in classifying the presence or absence of EON in endoscopic images of Barrett's esophagus. ViT has the potential to be widely applicable to other endoscopic diagnoses of gastrointestinal diseases.</p>\n </section>\n </div>","PeriodicalId":45861,"journal":{"name":"JGH Open","volume":"8 9","pages":""},"PeriodicalIF":1.7000,"publicationDate":"2024-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/jgh3.70030","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JGH Open","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/jgh3.70030","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"GASTROENTEROLOGY & HEPATOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Background and Aim

Various deep learning models, based on convolutional neural network (CNN), have been shown to improve the detection of early esophageal neoplasia in Barrett's esophagus. Vision transformer (ViT), derived from natural language processing, has emerged as the new state-of-the-art for image recognition, outperforming predecessors such as CNN. This pilot study explores the use of ViT to classify the presence or absence of early esophageal neoplasia in endoscopic images of Barrett's esophagus.

Methods

A BO dataset of 1918 images of Barrett's esophagus from 267 unique patients was used. The images were classified as dysplastic (D-BO) or non-dysplastic (ND-BO). A pretrained vision transformer model, ViTBase16, was used to develop our classifier models. Three ViT models were developed for comparison based on imaging modality: white-light imaging (WLI), narrow-band imaging (NBI), and combined modalities. Performance of each model was evaluated based on accuracy, sensitivity, specificity, confusion matrices, and receiver operating characteristic curves.

Results

The ViT models demonstrated the following performance: WLI-ViT (Accuracy: 92%, Sensitivity: 82%, Specificity: 95%), NBI-ViT (Accuracy: 99%, Sensitivity: 97%, Specificity: 99%), and combined modalities-ViT (Accuracy: 93%, Sensitivity: 87%, Specificity: 95%). Combined modalities-ViT showed greater accuracy (94% vs 90%) and sensitivity (80% vs 70%) compared with WLI-ViT when classifying WLI images on a subgroup testing set.

Conclusion

ViT exhibited high accuracy in classifying the presence or absence of EON in endoscopic images of Barrett's esophagus. ViT has the potential to be widely applicable to other endoscopic diagnoses of gastrointestinal diseases.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

探索用于对内窥镜图像中的早期巴雷特发育不良进行分类的视觉转换器：白光和窄带成像试验研究

背景与目的基于卷积神经网络（CNN）的各种深度学习模型已被证明可改善巴雷特食管早期食管肿瘤的检测。源于自然语言处理的视觉转换器（ViT）已成为图像识别领域的最新技术，其性能优于 CNN 等前辈。本试验研究探讨了如何使用 ViT 对巴雷特食管内窥镜图像中是否存在早期食管肿瘤进行分类。方法研究使用了来自 267 名患者的 1918 张巴雷特食管图像的 BO 数据集。这些图像被分为增生不良（D-BO）和非增生不良（ND-BO）两类。我们使用预先训练好的视觉转换器模型 ViTBase16 来开发分类器模型。为了进行比较，我们根据成像模式开发了三种 ViT 模型：白光成像（WLI）、窄带成像（NBI）和组合模式。根据准确性、灵敏度、特异性、混淆矩阵和接收者工作特征曲线对每个模型的性能进行了评估。结果 ViT 模型的性能如下：WLI-ViT（准确率：92%，灵敏度：82%，特异性：95%）、NBI-ViT（准确率：99%，灵敏度：97%，特异性：99%）和组合模式-ViT（准确率：93%，灵敏度：87%，特异性：95%）。与 WLI-ViT 相比，在分组测试集上对 WLI 图像进行分类时，组合模式-ViT 显示出更高的准确性（94% 对 90%）和灵敏度（80% 对 70%）。结论 ViT 在对 Barrett 食管内窥镜图像中是否存在 EON 进行分类时表现出很高的准确性。ViT 有潜力广泛应用于其他消化道疾病的内窥镜诊断。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊