{"title":"基于视觉变换器(ViT)和深度卷积神经网络(D-CNN)的模型,在可解释人工智能(XAI)的支持下用于核磁共振成像脑原发性肿瘤图像的多重分类","authors":"Hiba Mzoughi, Ines Njeh, Mohamed BenSlima, Nouha Farhat, Chokri Mhiri","doi":"10.1007/s00371-024-03524-x","DOIUrl":null,"url":null,"abstract":"<p>The manual classification of primary brain tumors through Magnetic Resonance Imaging (MRI) is considered as a critical task during the clinical routines that requires highly qualified neuroradiologists. Deep Learning (DL)-based computer-aided diagnosis tools are established to support the neurosurgeons’ opinion during the diagnosis. However, the black-box nature and the lack of transparency and interpretability of such DL-based models make their implementation, especially in critical and sensitive medical applications, very difficult. The explainable artificial intelligence techniques help to gain clinicians’ confidence and to provide explanations about the models' predictions. Typical and existing Convolutional Neural Network (CNN)-based architectures could not capture long-range global information and feature from pathology MRI scans. Recently, Vision Transformer (ViT) networks have been introduced to solve the issue of long-range dependency in CNN-based architecture by introducing a self-attention mechanism to analyze images, allowing the network to capture deep long-range reliance between pixels. The purpose of the proposed study is to provide efficient CAD tool for MRI brain tumor classification. At the same, we aim to enhance the neuroradiologists' confidence when using DL in clinical and medical standards. In this paper, we investigated a deep ViT architecture trained from scratch for the multi-classification task of common primary tumors (gliomas, meningiomas, and pituitary brain tumors), using T1-weighted contrast-enhanced MRI sequences. Several XAI techniques have been adopted: Gradient-weighted Class Activation Mapping (Grad-CAM), Local Interpretable Model-agnostic Explanations (LIME), and SHapley Additive exPlanations (SHAP), to visualize the most significant and distinguishing features related to the model prediction results. A publicly available benchmark dataset has been used for the evaluation task. The comparative study confirms the efficiency of ViT architecture compared to the CNN model using the testing dataset. The test accuracy of 83.37% for the Convolutional Neural Network (CNN) and 91.61% for the Vision Transformer (ViT) indicates that the ViT model outperformed the CNN model in the classification task. Based on the experimental results, we could confirm that the proposed ViT model presents a competitive performance outperforming the multi-classification state-of-the-art models using MRI sequences. Further, the proposed models present an exact and correct interpretation. Thus, we could confirm that the proposed CAD could be established during the clinical diagnosis routines.</p>","PeriodicalId":501186,"journal":{"name":"The Visual Computer","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Vision transformers (ViT) and deep convolutional neural network (D-CNN)-based models for MRI brain primary tumors images multi-classification supported by explainable artificial intelligence (XAI)\",\"authors\":\"Hiba Mzoughi, Ines Njeh, Mohamed BenSlima, Nouha Farhat, Chokri Mhiri\",\"doi\":\"10.1007/s00371-024-03524-x\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>The manual classification of primary brain tumors through Magnetic Resonance Imaging (MRI) is considered as a critical task during the clinical routines that requires highly qualified neuroradiologists. Deep Learning (DL)-based computer-aided diagnosis tools are established to support the neurosurgeons’ opinion during the diagnosis. However, the black-box nature and the lack of transparency and interpretability of such DL-based models make their implementation, especially in critical and sensitive medical applications, very difficult. The explainable artificial intelligence techniques help to gain clinicians’ confidence and to provide explanations about the models' predictions. Typical and existing Convolutional Neural Network (CNN)-based architectures could not capture long-range global information and feature from pathology MRI scans. Recently, Vision Transformer (ViT) networks have been introduced to solve the issue of long-range dependency in CNN-based architecture by introducing a self-attention mechanism to analyze images, allowing the network to capture deep long-range reliance between pixels. The purpose of the proposed study is to provide efficient CAD tool for MRI brain tumor classification. At the same, we aim to enhance the neuroradiologists' confidence when using DL in clinical and medical standards. In this paper, we investigated a deep ViT architecture trained from scratch for the multi-classification task of common primary tumors (gliomas, meningiomas, and pituitary brain tumors), using T1-weighted contrast-enhanced MRI sequences. Several XAI techniques have been adopted: Gradient-weighted Class Activation Mapping (Grad-CAM), Local Interpretable Model-agnostic Explanations (LIME), and SHapley Additive exPlanations (SHAP), to visualize the most significant and distinguishing features related to the model prediction results. A publicly available benchmark dataset has been used for the evaluation task. The comparative study confirms the efficiency of ViT architecture compared to the CNN model using the testing dataset. The test accuracy of 83.37% for the Convolutional Neural Network (CNN) and 91.61% for the Vision Transformer (ViT) indicates that the ViT model outperformed the CNN model in the classification task. Based on the experimental results, we could confirm that the proposed ViT model presents a competitive performance outperforming the multi-classification state-of-the-art models using MRI sequences. Further, the proposed models present an exact and correct interpretation. Thus, we could confirm that the proposed CAD could be established during the clinical diagnosis routines.</p>\",\"PeriodicalId\":501186,\"journal\":{\"name\":\"The Visual Computer\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-06-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The Visual Computer\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1007/s00371-024-03524-x\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Visual Computer","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s00371-024-03524-x","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
通过磁共振成像(MRI)对原发性脑肿瘤进行人工分类被认为是临床常规工作中的一项关键任务,需要高素质的神经放射科医生。基于深度学习(DL)的计算机辅助诊断工具的建立是为了在诊断过程中支持神经外科医生的意见。然而,这种基于深度学习的模型具有黑箱性质,缺乏透明度和可解释性,因此很难实施,尤其是在关键和敏感的医疗应用中。可解释的人工智能技术有助于获得临床医生的信任,并为模型的预测提供解释。典型的和现有的基于卷积神经网络(CNN)的架构无法从病理核磁共振扫描中捕捉到远距离全局信息和特征。最近,Vision Transformer(ViT)网络的出现解决了基于 CNN 架构的长程依赖性问题,它引入了一种自我注意机制来分析图像,使网络能够捕捉像素之间的深度长程依赖性。本研究旨在为核磁共振成像脑肿瘤分类提供高效的 CAD 工具。同时,我们还希望增强神经放射科医生在临床和医疗标准中使用 DL 时的信心。在本文中,我们利用 T1 加权对比增强 MRI 序列,针对常见原发性肿瘤(胶质瘤、脑膜瘤和垂体脑瘤)的多重分类任务,研究了一种从头开始训练的深度 ViT 架构。目前已采用了几种 XAI 技术:梯度加权类激活图谱(Grad-CAM)、局部可解释模型-诊断性解释(LIME)和 SHapley Additive exPlanations(SHAP),以可视化与模型预测结果相关的最重要和最显著的特征。评估任务使用了一个公开的基准数据集。对比研究证实了 ViT 架构与使用测试数据集的 CNN 模型相比的效率。卷积神经网络(CNN)的测试准确率为 83.37%,而视觉转换器(ViT)的测试准确率为 91.61%,这表明 ViT 模型在分类任务中的表现优于 CNN 模型。根据实验结果,我们可以确认所提出的 ViT 模型在使用核磁共振成像序列进行多重分类方面的性能优于最先进的模型。此外,所提出的模型还提供了准确和正确的解释。因此,我们可以确认所提出的计算机辅助诊断系统可以在临床诊断过程中使用。
Vision transformers (ViT) and deep convolutional neural network (D-CNN)-based models for MRI brain primary tumors images multi-classification supported by explainable artificial intelligence (XAI)
The manual classification of primary brain tumors through Magnetic Resonance Imaging (MRI) is considered as a critical task during the clinical routines that requires highly qualified neuroradiologists. Deep Learning (DL)-based computer-aided diagnosis tools are established to support the neurosurgeons’ opinion during the diagnosis. However, the black-box nature and the lack of transparency and interpretability of such DL-based models make their implementation, especially in critical and sensitive medical applications, very difficult. The explainable artificial intelligence techniques help to gain clinicians’ confidence and to provide explanations about the models' predictions. Typical and existing Convolutional Neural Network (CNN)-based architectures could not capture long-range global information and feature from pathology MRI scans. Recently, Vision Transformer (ViT) networks have been introduced to solve the issue of long-range dependency in CNN-based architecture by introducing a self-attention mechanism to analyze images, allowing the network to capture deep long-range reliance between pixels. The purpose of the proposed study is to provide efficient CAD tool for MRI brain tumor classification. At the same, we aim to enhance the neuroradiologists' confidence when using DL in clinical and medical standards. In this paper, we investigated a deep ViT architecture trained from scratch for the multi-classification task of common primary tumors (gliomas, meningiomas, and pituitary brain tumors), using T1-weighted contrast-enhanced MRI sequences. Several XAI techniques have been adopted: Gradient-weighted Class Activation Mapping (Grad-CAM), Local Interpretable Model-agnostic Explanations (LIME), and SHapley Additive exPlanations (SHAP), to visualize the most significant and distinguishing features related to the model prediction results. A publicly available benchmark dataset has been used for the evaluation task. The comparative study confirms the efficiency of ViT architecture compared to the CNN model using the testing dataset. The test accuracy of 83.37% for the Convolutional Neural Network (CNN) and 91.61% for the Vision Transformer (ViT) indicates that the ViT model outperformed the CNN model in the classification task. Based on the experimental results, we could confirm that the proposed ViT model presents a competitive performance outperforming the multi-classification state-of-the-art models using MRI sequences. Further, the proposed models present an exact and correct interpretation. Thus, we could confirm that the proposed CAD could be established during the clinical diagnosis routines.