Enhanced tuberculosis detection using Vision Transformers and explainable AI with a Grad-CAM approach on chest X-rays.

IF 3.2 3区 医学 Q2 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING BMC Medical Imaging Pub Date : 2025-03-24 DOI:10.1186/s12880-025-01630-3
K Vanitha, T R Mahesh, V Vinoth Kumar, Suresh Guluwadi
{"title":"Enhanced tuberculosis detection using Vision Transformers and explainable AI with a Grad-CAM approach on chest X-rays.","authors":"K Vanitha, T R Mahesh, V Vinoth Kumar, Suresh Guluwadi","doi":"10.1186/s12880-025-01630-3","DOIUrl":null,"url":null,"abstract":"<p><p>Tuberculosis (TB), caused by Mycobacterium tuberculosis, remains a leading global health challenge, especially in low-resource settings. Accurate diagnosis from chest X-rays is critical yet challenging due to subtle manifestations of TB, particularly in its early stages. Traditional computational methods, primarily using basic convolutional neural networks (CNNs), often require extensive pre-processing and struggle with generalizability across diverse clinical environments. This study introduces a novel Vision Transformer (ViT) model augmented with Gradient-weighted Class Activation Mapping (Grad-CAM) to enhance both diagnostic accuracy and interpretability. The ViT model utilizes self-attention mechanisms to extract long-range dependencies and complex patterns directly from the raw pixel information, whereas Grad-CAM offers visual explanations of model decisions about highlighting significant regions in the X-rays. The model contains a Conv2D stem for initial feature extraction, followed by many transformer encoder blocks, thereby significantly boosting its ability to learn discriminative features without any pre-processing. Performance testing on a validation set had an accuracy of 0.97, recall of 0.99, and F1-score of 0.98 for TB patients. On the test set, the model has accuracy of 0.98, recall of 0.97, and F1-score of 0.98, which is better than existing methods. The addition of Grad-CAM visuals not only improves the transparency of the model but also assists radiologists in assessing and verifying AI-driven diagnoses. These results demonstrate the model's higher diagnostic precision and potential for clinical application in real-world settings, providing a massive improvement in the automated detection of TB.</p>","PeriodicalId":9020,"journal":{"name":"BMC Medical Imaging","volume":"25 1","pages":"96"},"PeriodicalIF":3.2000,"publicationDate":"2025-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11934573/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medical Imaging","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12880-025-01630-3","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0

Abstract

Tuberculosis (TB), caused by Mycobacterium tuberculosis, remains a leading global health challenge, especially in low-resource settings. Accurate diagnosis from chest X-rays is critical yet challenging due to subtle manifestations of TB, particularly in its early stages. Traditional computational methods, primarily using basic convolutional neural networks (CNNs), often require extensive pre-processing and struggle with generalizability across diverse clinical environments. This study introduces a novel Vision Transformer (ViT) model augmented with Gradient-weighted Class Activation Mapping (Grad-CAM) to enhance both diagnostic accuracy and interpretability. The ViT model utilizes self-attention mechanisms to extract long-range dependencies and complex patterns directly from the raw pixel information, whereas Grad-CAM offers visual explanations of model decisions about highlighting significant regions in the X-rays. The model contains a Conv2D stem for initial feature extraction, followed by many transformer encoder blocks, thereby significantly boosting its ability to learn discriminative features without any pre-processing. Performance testing on a validation set had an accuracy of 0.97, recall of 0.99, and F1-score of 0.98 for TB patients. On the test set, the model has accuracy of 0.98, recall of 0.97, and F1-score of 0.98, which is better than existing methods. The addition of Grad-CAM visuals not only improves the transparency of the model but also assists radiologists in assessing and verifying AI-driven diagnoses. These results demonstrate the model's higher diagnostic precision and potential for clinical application in real-world settings, providing a massive improvement in the automated detection of TB.

Abstract Image

Abstract Image

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
使用视觉变压器和可解释的AI在胸部x光片上使用Grad-CAM方法增强结核病检测。
由结核分枝杆菌引起的结核病仍然是一个主要的全球卫生挑战,特别是在资源匮乏的环境中。胸部x光片的准确诊断至关重要,但由于结核病的细微表现,特别是在其早期阶段,因此具有挑战性。传统的计算方法,主要使用基本卷积神经网络(cnn),通常需要大量的预处理,并且难以在不同的临床环境中泛化。本研究引入了一种新的视觉变压器(ViT)模型,增强了梯度加权类激活映射(Grad-CAM),以提高诊断的准确性和可解释性。ViT模型利用自注意机制直接从原始像素信息中提取远程依赖关系和复杂模式,而Grad-CAM则提供了关于突出x射线中重要区域的模型决策的可视化解释。该模型包含一个用于初始特征提取的Conv2D stem,然后是许多变压器编码器块,从而大大提高了其无需任何预处理即可学习判别特征的能力。在验证集上进行性能测试,结核病患者的准确率为0.97,召回率为0.99,f1评分为0.98。在测试集上,该模型的准确率为0.98,召回率为0.97,f1得分为0.98,优于现有的方法。添加Grad-CAM视觉效果不仅提高了模型的透明度,而且还有助于放射科医生评估和验证人工智能驱动的诊断。这些结果表明该模型具有更高的诊断精度和在现实环境中临床应用的潜力,为结核病的自动检测提供了巨大的改进。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
BMC Medical Imaging
BMC Medical Imaging RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING-
CiteScore
4.60
自引率
3.70%
发文量
198
审稿时长
27 weeks
期刊介绍: BMC Medical Imaging is an open access journal publishing original peer-reviewed research articles in the development, evaluation, and use of imaging techniques and image processing tools to diagnose and manage disease.
期刊最新文献
In vivo study of intravoxel incoherent motion imaging in assessing obesity-related kidney injury. Interpretable deep learning radiomics from 18F-FDG PET/CT for differentiating diffuse large B-cell lymphoma and follicular lymphoma. An interpretable radiomics model based on contrast‑enhanced pancreatic computed tomography for predicting the prognosis of post-acute pancreatitis diabetes mellitus. 18F-FDG PET/CT-based radiomics for differentiating low-grade and grade 3A of follicular lymphoma. Are 2D MRI slices equally important in microvascular invasion prediction: a study based on multiple instance learning with attention.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1