Enhanced tuberculosis detection using Vision Transformers and explainable AI with a Grad-CAM approach on chest X-rays.

IF 3.2 3区医学 Q2 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING BMC Medical Imaging Pub Date : 2025-03-24 DOI:10.1186/s12880-025-01630-3

K Vanitha, T R Mahesh, V Vinoth Kumar, Suresh Guluwadi

{"title":"Enhanced tuberculosis detection using Vision Transformers and explainable AI with a Grad-CAM approach on chest X-rays.","authors":"K Vanitha, T R Mahesh, V Vinoth Kumar, Suresh Guluwadi","doi":"10.1186/s12880-025-01630-3","DOIUrl":null,"url":null,"abstract":"<p><p>Tuberculosis (TB), caused by Mycobacterium tuberculosis, remains a leading global health challenge, especially in low-resource settings. Accurate diagnosis from chest X-rays is critical yet challenging due to subtle manifestations of TB, particularly in its early stages. Traditional computational methods, primarily using basic convolutional neural networks (CNNs), often require extensive pre-processing and struggle with generalizability across diverse clinical environments. This study introduces a novel Vision Transformer (ViT) model augmented with Gradient-weighted Class Activation Mapping (Grad-CAM) to enhance both diagnostic accuracy and interpretability. The ViT model utilizes self-attention mechanisms to extract long-range dependencies and complex patterns directly from the raw pixel information, whereas Grad-CAM offers visual explanations of model decisions about highlighting significant regions in the X-rays. The model contains a Conv2D stem for initial feature extraction, followed by many transformer encoder blocks, thereby significantly boosting its ability to learn discriminative features without any pre-processing. Performance testing on a validation set had an accuracy of 0.97, recall of 0.99, and F1-score of 0.98 for TB patients. On the test set, the model has accuracy of 0.98, recall of 0.97, and F1-score of 0.98, which is better than existing methods. The addition of Grad-CAM visuals not only improves the transparency of the model but also assists radiologists in assessing and verifying AI-driven diagnoses. These results demonstrate the model's higher diagnostic precision and potential for clinical application in real-world settings, providing a massive improvement in the automated detection of TB.</p>","PeriodicalId":9020,"journal":{"name":"BMC Medical Imaging","volume":"25 1","pages":"96"},"PeriodicalIF":3.2000,"publicationDate":"2025-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11934573/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medical Imaging","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12880-025-01630-3","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}

引用次数: 0

Abstract

Tuberculosis (TB), caused by Mycobacterium tuberculosis, remains a leading global health challenge, especially in low-resource settings. Accurate diagnosis from chest X-rays is critical yet challenging due to subtle manifestations of TB, particularly in its early stages. Traditional computational methods, primarily using basic convolutional neural networks (CNNs), often require extensive pre-processing and struggle with generalizability across diverse clinical environments. This study introduces a novel Vision Transformer (ViT) model augmented with Gradient-weighted Class Activation Mapping (Grad-CAM) to enhance both diagnostic accuracy and interpretability. The ViT model utilizes self-attention mechanisms to extract long-range dependencies and complex patterns directly from the raw pixel information, whereas Grad-CAM offers visual explanations of model decisions about highlighting significant regions in the X-rays. The model contains a Conv2D stem for initial feature extraction, followed by many transformer encoder blocks, thereby significantly boosting its ability to learn discriminative features without any pre-processing. Performance testing on a validation set had an accuracy of 0.97, recall of 0.99, and F1-score of 0.98 for TB patients. On the test set, the model has accuracy of 0.98, recall of 0.97, and F1-score of 0.98, which is better than existing methods. The addition of Grad-CAM visuals not only improves the transparency of the model but also assists radiologists in assessing and verifying AI-driven diagnoses. These results demonstrate the model's higher diagnostic precision and potential for clinical application in real-world settings, providing a massive improvement in the automated detection of TB.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

使用视觉变压器和可解释的AI在胸部x光片上使用Grad-CAM方法增强结核病检测。

由结核分枝杆菌引起的结核病仍然是一个主要的全球卫生挑战，特别是在资源匮乏的环境中。胸部x光片的准确诊断至关重要，但由于结核病的细微表现，特别是在其早期阶段，因此具有挑战性。传统的计算方法，主要使用基本卷积神经网络（cnn），通常需要大量的预处理，并且难以在不同的临床环境中泛化。本研究引入了一种新的视觉变压器（ViT）模型，增强了梯度加权类激活映射（Grad-CAM），以提高诊断的准确性和可解释性。ViT模型利用自注意机制直接从原始像素信息中提取远程依赖关系和复杂模式，而Grad-CAM则提供了关于突出x射线中重要区域的模型决策的可视化解释。该模型包含一个用于初始特征提取的Conv2D stem，然后是许多变压器编码器块，从而大大提高了其无需任何预处理即可学习判别特征的能力。在验证集上进行性能测试，结核病患者的准确率为0.97，召回率为0.99，f1评分为0.98。在测试集上，该模型的准确率为0.98，召回率为0.97，f1得分为0.98，优于现有的方法。添加Grad-CAM视觉效果不仅提高了模型的透明度，而且还有助于放射科医生评估和验证人工智能驱动的诊断。这些结果表明该模型具有更高的诊断精度和在现实环境中临床应用的潜力，为结核病的自动检测提供了巨大的改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

BMC Medical Imaging RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING-

CiteScore

4.60

自引率

3.70%

发文量

198

审稿时长

27 weeks

期刊介绍： BMC Medical Imaging is an open access journal publishing original peer-reviewed research articles in the development, evaluation, and use of imaging techniques and image processing tools to diagnose and manage disease.