A multimodal vision transformer for interpretable fusion of functional and structural neuroimaging data

IF 3.5 2区医学 Q1 NEUROIMAGING Human Brain Mapping Pub Date : 2024-11-26 DOI:10.1002/hbm.26783

Yuda Bi, Anees Abrol, Zening Fu, Vince D. Calhoun

{"title":"A multimodal vision transformer for interpretable fusion of functional and structural neuroimaging data","authors":"Yuda Bi, Anees Abrol, Zening Fu, Vince D. Calhoun","doi":"10.1002/hbm.26783","DOIUrl":null,"url":null,"abstract":"<p>Multimodal neuroimaging is an emerging field that leverages multiple sources of information to diagnose specific brain disorders, especially when deep learning-based AI algorithms are applied. The successful combination of different brain imaging modalities using deep learning remains a challenging yet crucial research topic. The integration of structural and functional modalities is particularly important for the diagnosis of various brain disorders, where structural information plays a crucial role in diseases such as Alzheimer's, while functional imaging is more critical for disorders such as schizophrenia. However, the combination of functional and structural imaging modalities can provide a more comprehensive diagnosis. In this work, we present MultiViT, a novel diagnostic deep learning model that utilizes vision transformers and cross-attention mechanisms to effectively fuse information from 3D gray matter maps derived from structural MRI with functional network connectivity matrices obtained from functional MRI using the ICA algorithm. MultiViT achieves an AUC of 0.833, outperforming both our unimodal and multimodal baselines, enabling more accurate classification and diagnosis of schizophrenia. In addition, using vision transformer's unique attentional maps in combination with cross-attentional mechanisms and brain function information, we identify critical brain regions in 3D gray matter space associated with the characteristics of schizophrenia. Our research not only significantly improves the accuracy of AI-based automated imaging diagnostics for schizophrenia, but also pioneers a rational and advanced data fusion approach by replacing complex, high-dimensional fMRI information with functional network connectivity, integrating it with representative structural data from 3D gray matter images, and further providing interpretative biomarker localization in a 3D structural space.</p>","PeriodicalId":13019,"journal":{"name":"Human Brain Mapping","volume":"45 17","pages":""},"PeriodicalIF":3.5000,"publicationDate":"2024-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/hbm.26783","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Human Brain Mapping","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/hbm.26783","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"NEUROIMAGING","Score":null,"Total":0}

引用次数: 0

Abstract

Multimodal neuroimaging is an emerging field that leverages multiple sources of information to diagnose specific brain disorders, especially when deep learning-based AI algorithms are applied. The successful combination of different brain imaging modalities using deep learning remains a challenging yet crucial research topic. The integration of structural and functional modalities is particularly important for the diagnosis of various brain disorders, where structural information plays a crucial role in diseases such as Alzheimer's, while functional imaging is more critical for disorders such as schizophrenia. However, the combination of functional and structural imaging modalities can provide a more comprehensive diagnosis. In this work, we present MultiViT, a novel diagnostic deep learning model that utilizes vision transformers and cross-attention mechanisms to effectively fuse information from 3D gray matter maps derived from structural MRI with functional network connectivity matrices obtained from functional MRI using the ICA algorithm. MultiViT achieves an AUC of 0.833, outperforming both our unimodal and multimodal baselines, enabling more accurate classification and diagnosis of schizophrenia. In addition, using vision transformer's unique attentional maps in combination with cross-attentional mechanisms and brain function information, we identify critical brain regions in 3D gray matter space associated with the characteristics of schizophrenia. Our research not only significantly improves the accuracy of AI-based automated imaging diagnostics for schizophrenia, but also pioneers a rational and advanced data fusion approach by replacing complex, high-dimensional fMRI information with functional network connectivity, integrating it with representative structural data from 3D gray matter images, and further providing interpretative biomarker localization in a 3D structural space.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

用于融合功能和结构神经成像数据的多模态视觉转换器

多模态神经成像是一个新兴领域，它利用多种信息源诊断特定的脑部疾病，尤其是在应用基于深度学习的人工智能算法时。利用深度学习将不同的脑成像模式成功结合起来，仍然是一个具有挑战性但又至关重要的研究课题。结构和功能模式的整合对于诊断各种脑部疾病尤为重要，其中结构信息在阿尔茨海默氏症等疾病中发挥着关键作用，而功能成像对于精神分裂症等疾病则更为关键。然而，功能成像和结构成像模式的结合可以提供更全面的诊断。在这项工作中，我们提出了一种新型诊断深度学习模型 MultiViT，它利用视觉转换器和交叉注意机制，通过 ICA 算法有效融合了结构性核磁共振成像获得的三维灰质图和功能性核磁共振成像获得的功能网络连接矩阵信息。MultiViT 的 AUC 达到了 0.833，优于我们的单模态和多模态基线，使精神分裂症的分类和诊断更加准确。此外，利用视觉转换器独特的注意图谱，结合交叉注意机制和脑功能信息，我们在三维灰质空间中识别出了与精神分裂症特征相关的关键脑区。我们的研究不仅大大提高了基于人工智能的精神分裂症自动成像诊断的准确性，还开创了一种合理而先进的数据融合方法，即用功能网络连通性取代复杂的高维fMRI信息，将其与三维灰质图像中的代表性结构数据进行整合，并进一步在三维结构空间中提供可解释的生物标记定位。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Human Brain Mapping 医学-核医学

CiteScore

8.30

自引率

6.20%

发文量

401

审稿时长

3-6 weeks

期刊介绍： Human Brain Mapping publishes peer-reviewed basic, clinical, technical, and theoretical research in the interdisciplinary and rapidly expanding field of human brain mapping. The journal features research derived from non-invasive brain imaging modalities used to explore the spatial and temporal organization of the neural systems supporting human behavior. Imaging modalities of interest include positron emission tomography, event-related potentials, electro-and magnetoencephalography, magnetic resonance imaging, and single-photon emission tomography. Brain mapping research in both normal and clinical populations is encouraged. Article formats include Research Articles, Review Articles, Clinical Case Studies, and Technique, as well as Technological Developments, Theoretical Articles, and Synthetic Reviews. Technical advances, such as novel brain imaging methods, analyses for detecting or localizing neural activity, synergistic uses of multiple imaging modalities, and strategies for the design of behavioral paradigms and neural-systems modeling are of particular interest. The journal endorses the propagation of methodological standards and encourages database development in the field of human brain mapping.