fMRI-3D: A Comprehensive Dataset for Enhancing fMRI-based 3D Reconstruction

arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2024-09-17 DOI:arxiv-2409.11315

Jianxiong Gao, Yuqian Fu, Yun Wang, Xuelin Qian, Jianfeng Feng, Yanwei Fu

{"title":"fMRI-3D: A Comprehensive Dataset for Enhancing fMRI-based 3D Reconstruction","authors":"Jianxiong Gao, Yuqian Fu, Yun Wang, Xuelin Qian, Jianfeng Feng, Yanwei Fu","doi":"arxiv-2409.11315","DOIUrl":null,"url":null,"abstract":"Reconstructing 3D visuals from functional Magnetic Resonance Imaging (fMRI)\ndata, introduced as Recon3DMind in our conference work, is of significant\ninterest to both cognitive neuroscience and computer vision. To advance this\ntask, we present the fMRI-3D dataset, which includes data from 15 participants\nand showcases a total of 4768 3D objects. The dataset comprises two components:\nfMRI-Shape, previously introduced and accessible at\nhttps://huggingface.co/datasets/Fudan-fMRI/fMRI-Shape, and fMRI-Objaverse,\nproposed in this paper and available at\nhttps://huggingface.co/datasets/Fudan-fMRI/fMRI-Objaverse. fMRI-Objaverse\nincludes data from 5 subjects, 4 of whom are also part of the Core set in\nfMRI-Shape, with each subject viewing 3142 3D objects across 117 categories,\nall accompanied by text captions. This significantly enhances the diversity and\npotential applications of the dataset. Additionally, we propose MinD-3D, a\nnovel framework designed to decode 3D visual information from fMRI signals. The\nframework first extracts and aggregates features from fMRI data using a\nneuro-fusion encoder, then employs a feature-bridge diffusion model to generate\nvisual features, and finally reconstructs the 3D object using a generative\ntransformer decoder. We establish new benchmarks by designing metrics at both\nsemantic and structural levels to evaluate model performance. Furthermore, we\nassess our model's effectiveness in an Out-of-Distribution setting and analyze\nthe attribution of the extracted features and the visual ROIs in fMRI signals.\nOur experiments demonstrate that MinD-3D not only reconstructs 3D objects with\nhigh semantic and spatial accuracy but also deepens our understanding of how\nhuman brain processes 3D visual information. Project page at:\nhttps://jianxgao.github.io/MinD-3D.","PeriodicalId":501130,"journal":{"name":"arXiv - CS - Computer Vision and Pattern Recognition","volume":"188 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computer Vision and Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11315","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Reconstructing 3D visuals from functional Magnetic Resonance Imaging (fMRI) data, introduced as Recon3DMind in our conference work, is of significant interest to both cognitive neuroscience and computer vision. To advance this task, we present the fMRI-3D dataset, which includes data from 15 participants and showcases a total of 4768 3D objects. The dataset comprises two components: fMRI-Shape, previously introduced and accessible at https://huggingface.co/datasets/Fudan-fMRI/fMRI-Shape, and fMRI-Objaverse, proposed in this paper and available at https://huggingface.co/datasets/Fudan-fMRI/fMRI-Objaverse. fMRI-Objaverse includes data from 5 subjects, 4 of whom are also part of the Core set in fMRI-Shape, with each subject viewing 3142 3D objects across 117 categories, all accompanied by text captions. This significantly enhances the diversity and potential applications of the dataset. Additionally, we propose MinD-3D, a novel framework designed to decode 3D visual information from fMRI signals. The framework first extracts and aggregates features from fMRI data using a neuro-fusion encoder, then employs a feature-bridge diffusion model to generate visual features, and finally reconstructs the 3D object using a generative transformer decoder. We establish new benchmarks by designing metrics at both semantic and structural levels to evaluate model performance. Furthermore, we assess our model's effectiveness in an Out-of-Distribution setting and analyze the attribution of the extracted features and the visual ROIs in fMRI signals. Our experiments demonstrate that MinD-3D not only reconstructs 3D objects with high semantic and spatial accuracy but also deepens our understanding of how human brain processes 3D visual information. Project page at: https://jianxgao.github.io/MinD-3D.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

fMRI-3D：用于增强基于 fMRI 的三维重建的综合数据集

从功能性磁共振成像（fMRI）数据中重建三维视觉效果，在我们的会议工作中被称为Recon3DMind，对认知神经科学和计算机视觉都具有重要意义。为了推进这项任务，我们推出了 fMRI-3D 数据集，其中包括 15 名参与者的数据，并展示了总共 4768 个三维对象。该数据集由两部分组成：fMRI-Shape（之前已介绍过，可访问https://huggingface.co/datasets/Fudan-fMRI/fMRI-Shape）和fMRI-Objaverse（本文提出，可访问https://huggingface.co/datasets/Fudan-fMRI/fMRI-Objaverse）。fMRI-Objaverse包括来自5位受试者的数据，其中4位也是核心集fMRI-Shape的一部分，每位受试者观看了117个类别的3142个三维物体，所有物体都配有文字说明。这大大增强了数据集的多样性和潜在应用。此外，我们还提出了 MinD-3D，一个旨在从 fMRI 信号中解码 3D 视觉信息的高级框架。该框架首先使用神经融合编码器从 fMRI 数据中提取和聚合特征，然后使用特征桥扩散模型生成视觉特征，最后使用生成式变换器解码器重建三维物体。我们设计了语义和结构两个层面的指标来评估模型性能，从而建立了新的基准。我们的实验证明，MinD-3D 不仅能以较高的语义和空间准确性重建 3D 物体，还能加深我们对人脑如何处理 3D 视觉信息的理解。项目页面：https://jianxgao.github.io/MinD-3D。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

arXiv - CS - Computer Vision and Pattern Recognition

自引率

0.00%

发文量

期刊最新文献

Massively Multi-Person 3D Human Motion Forecasting with Scene Context Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution Precise Forecasting of Sky Images Using Spatial Warping JEAN: Joint Expression and Audio-guided NeRF-based Talking Face Generation Applications of Knowledge Distillation in Remote Sensing: A Survey