Romeo Lanzino, Federico Fontana, Luigi Cinque, Francesco Scarcello, Atsuto Maki
{"title":"NT-ViT: Neural Transcoding Vision Transformers for EEG-to-fMRI Synthesis","authors":"Romeo Lanzino, Federico Fontana, Luigi Cinque, Francesco Scarcello, Atsuto Maki","doi":"arxiv-2409.11836","DOIUrl":null,"url":null,"abstract":"This paper introduces the Neural Transcoding Vision Transformer (\\modelname),\na generative model designed to estimate high-resolution functional Magnetic\nResonance Imaging (fMRI) samples from simultaneous Electroencephalography (EEG)\ndata. A key feature of \\modelname is its Domain Matching (DM) sub-module which\neffectively aligns the latent EEG representations with those of fMRI volumes,\nenhancing the model's accuracy and reliability. Unlike previous methods that\ntend to struggle with fidelity and reproducibility of images, \\modelname\naddresses these challenges by ensuring methodological integrity and\nhigher-quality reconstructions which we showcase through extensive evaluation\non two benchmark datasets; \\modelname outperforms the current state-of-the-art\nby a significant margin in both cases, e.g. achieving a $10\\times$ reduction in\nRMSE and a $3.14\\times$ increase in SSIM on the Oddball dataset. An ablation\nstudy also provides insights into the contribution of each component to the\nmodel's overall effectiveness. This development is critical in offering a new\napproach to lessen the time and financial constraints typically linked with\nhigh-resolution brain imaging, thereby aiding in the swift and precise\ndiagnosis of neurological disorders. Although it is not a replacement for\nactual fMRI but rather a step towards making such imaging more accessible, we\nbelieve that it represents a pivotal advancement in clinical practice and\nneuroscience research. Code is available at\n\\url{https://github.com/rom42pla/ntvit}.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":"190 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - EE - Image and Video Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11836","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
This paper introduces the Neural Transcoding Vision Transformer (\modelname),
a generative model designed to estimate high-resolution functional Magnetic
Resonance Imaging (fMRI) samples from simultaneous Electroencephalography (EEG)
data. A key feature of \modelname is its Domain Matching (DM) sub-module which
effectively aligns the latent EEG representations with those of fMRI volumes,
enhancing the model's accuracy and reliability. Unlike previous methods that
tend to struggle with fidelity and reproducibility of images, \modelname
addresses these challenges by ensuring methodological integrity and
higher-quality reconstructions which we showcase through extensive evaluation
on two benchmark datasets; \modelname outperforms the current state-of-the-art
by a significant margin in both cases, e.g. achieving a $10\times$ reduction in
RMSE and a $3.14\times$ increase in SSIM on the Oddball dataset. An ablation
study also provides insights into the contribution of each component to the
model's overall effectiveness. This development is critical in offering a new
approach to lessen the time and financial constraints typically linked with
high-resolution brain imaging, thereby aiding in the swift and precise
diagnosis of neurological disorders. Although it is not a replacement for
actual fMRI but rather a step towards making such imaging more accessible, we
believe that it represents a pivotal advancement in clinical practice and
neuroscience research. Code is available at
\url{https://github.com/rom42pla/ntvit}.