On the differences between CNNs and vision transformers for COVID-19 diagnosis using CT and chest x-ray mono- and multimodality

IF 1.7 4区 计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Data Technologies and Applications Pub Date : 2024-01-10 DOI:10.1108/dta-01-2023-0005
Sara El-Ateif, Ali Idri, José Luis Fernández-Alemán
{"title":"On the differences between CNNs and vision transformers for COVID-19 diagnosis using CT and chest x-ray mono- and multimodality","authors":"Sara El-Ateif, Ali Idri, José Luis Fernández-Alemán","doi":"10.1108/dta-01-2023-0005","DOIUrl":null,"url":null,"abstract":"<h3>Purpose</h3>\n<p>COVID-19 continues to spread, and cause increasing deaths. Physicians diagnose COVID-19 using not only real-time polymerase chain reaction but also the computed tomography (CT) and chest x-ray (CXR) modalities, depending on the stage of infection. However, with so many patients and so few doctors, it has become difficult to keep abreast of the disease. Deep learning models have been developed in order to assist in this respect, and vision transformers are currently state-of-the-art methods, but most techniques currently focus only on one modality (CXR).</p><!--/ Abstract__block -->\n<h3>Design/methodology/approach</h3>\n<p>This work aims to leverage the benefits of both CT and CXR to improve COVID-19 diagnosis. This paper studies the differences between using convolutional MobileNetV2, ViT DeiT and Swin Transformer models when training from scratch and pretraining on the MedNIST medical dataset rather than the ImageNet dataset of natural images. The comparison is made by reporting six performance metrics, the Scott–Knott Effect Size Difference, Wilcoxon statistical test and the Borda Count method. We also use the Grad-CAM algorithm to study the model's interpretability. Finally, the model's robustness is tested by evaluating it on Gaussian noised images.</p><!--/ Abstract__block -->\n<h3>Findings</h3>\n<p>Although pretrained MobileNetV2 was the best model in terms of performance, the best model in terms of performance, interpretability, and robustness to noise is the trained from scratch Swin Transformer using the CXR (accuracy = 93.21 per cent) and CT (accuracy = 94.14 per cent) modalities.</p><!--/ Abstract__block -->\n<h3>Originality/value</h3>\n<p>Models compared are pretrained on MedNIST and leverage both the CT and CXR modalities.</p><!--/ Abstract__block -->","PeriodicalId":56156,"journal":{"name":"Data Technologies and Applications","volume":null,"pages":null},"PeriodicalIF":1.7000,"publicationDate":"2024-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data Technologies and Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1108/dta-01-2023-0005","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Purpose

COVID-19 continues to spread, and cause increasing deaths. Physicians diagnose COVID-19 using not only real-time polymerase chain reaction but also the computed tomography (CT) and chest x-ray (CXR) modalities, depending on the stage of infection. However, with so many patients and so few doctors, it has become difficult to keep abreast of the disease. Deep learning models have been developed in order to assist in this respect, and vision transformers are currently state-of-the-art methods, but most techniques currently focus only on one modality (CXR).

Design/methodology/approach

This work aims to leverage the benefits of both CT and CXR to improve COVID-19 diagnosis. This paper studies the differences between using convolutional MobileNetV2, ViT DeiT and Swin Transformer models when training from scratch and pretraining on the MedNIST medical dataset rather than the ImageNet dataset of natural images. The comparison is made by reporting six performance metrics, the Scott–Knott Effect Size Difference, Wilcoxon statistical test and the Borda Count method. We also use the Grad-CAM algorithm to study the model's interpretability. Finally, the model's robustness is tested by evaluating it on Gaussian noised images.

Findings

Although pretrained MobileNetV2 was the best model in terms of performance, the best model in terms of performance, interpretability, and robustness to noise is the trained from scratch Swin Transformer using the CXR (accuracy = 93.21 per cent) and CT (accuracy = 94.14 per cent) modalities.

Originality/value

Models compared are pretrained on MedNIST and leverage both the CT and CXR modalities.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用 CT 和胸部 X 射线单模态和多模态诊断 COVID-19 的 CNN 与视觉变换器之间的差异
目的 COVID-19 仍在继续传播,并导致越来越多的人死亡。医生在诊断 COVID-19 时,不仅要使用实时聚合酶链反应,还要根据感染阶段使用计算机断层扫描(CT)和胸部 X 光检查(CXR)。然而,由于患者多、医生少,要及时了解病情变得十分困难。为了在这方面提供帮助,人们开发了深度学习模型,视觉转换器也是目前最先进的方法,但大多数技术目前只关注一种模式(CXR)。本文研究了使用卷积 MobileNetV2、ViT DeiT 和 Swin Transformer 模型从头开始训练与在 MedNIST 医学数据集而非自然图像的 ImageNet 数据集上进行预训练之间的差异。通过报告六项性能指标、斯科特-克诺特效应大小差、Wilcoxon 统计检验和 Borda 计数法进行比较。我们还使用 Grad-CAM 算法来研究模型的可解释性。研究结果虽然经过预训练的 MobileNetV2 是性能最佳的模型,但在性能、可解释性和对噪声的鲁棒性方面,使用 CXR(准确率 = 93.21%)和 CT(准确率 = 94.14%)模式从头开始训练的 Swin Transformer 才是最佳模型。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Data Technologies and Applications
Data Technologies and Applications Social Sciences-Library and Information Sciences
CiteScore
3.80
自引率
6.20%
发文量
29
期刊介绍: Previously published as: Program Online from: 2018 Subject Area: Information & Knowledge Management, Library Studies
期刊最新文献
Understanding customer behavior by mapping complaints to personality based on social media textual data A systematic review of the use of FHIR to support clinical research, public health and medical education Novel framework for learning performance prediction using pattern identification and deep learning A comparative analysis of job satisfaction prediction models using machine learning: a mixed-method approach Assessing the alignment of corporate ESG disclosures with the UN sustainable development goals: a BERT-based text analysis
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1