Multi-modal MRIs are widely used in neuroimaging applications since different MR sequences provide complementary information about brain structures. Recent works have suggested that multi-modal deep learning analysis can benefit from explicitly disentangling anatomical (shape) and modality (appearance) information into separate image presentations. In this work, we challenge mainstream strategies by showing that they do not naturally lead to representation disentanglement both in theory and in practice. To address this issue, we propose a margin loss that regularizes the similarity in relationships of the representations across subjects and modalities. To enable robust training, we further use a conditional convolution to design a single model for encoding images of all modalities. Lastly, we propose a fusion function to combine the disentangled anatomical representations as a set of modality-invariant features for downstream tasks. We evaluate the proposed method on three multi-modal neuroimaging datasets. Experiments show that our proposed method can achieve superior disentangled representations compared to existing disentanglement strategies. Results also indicate that the fused anatomical representation has potential in the downstream task of zero-dose PET reconstruction and brain tumor segmentation.
Interpretability is a critical factor in applying complex deep learning models to advance the understanding of brain disorders in neuroimaging studies. To interpret the decision process of a trained classifier, existing techniques typically rely on saliency maps to quantify the voxel-wise or feature-level importance for classification through partial derivatives. Despite providing some level of localization, these maps are not human-understandable from the neuroscience perspective as they often do not inform the specific type of morphological changes linked to the brain disorder. Inspired by the image-to-image translation scheme, we propose to train simulator networks to inject (or remove) patterns of the disease into a given MRI based on a warping operation, such that the classifier increases (or decreases) its confidence in labeling the simulated MRI as diseased. To increase the robustness of training, we propose to couple the two simulators into a unified model based on conditional convolution. We applied our approach to interpreting classifiers trained on a synthetic dataset and two neuroimaging datasets to visualize the effect of Alzheimer's disease and alcohol dependence. Compared to the saliency maps generated by baseline approaches, our simulations and visualizations based on the Jacobian determinants of the warping field reveal meaningful and understandable patterns related to the diseases.
We present a rotation-equivariant self-supervised learning framework for the sparse deconvolution of non-negative scalar fields on the unit sphere. Spherical signals with multiple peaks naturally arise in Diffusion MRI (dMRI), where each voxel consists of one or more signal sources corresponding to anisotropic tissue structure such as white matter. Due to spatial and spectral partial voluming, clinically-feasible dMRI struggles to resolve crossing-fiber white matter configurations, leading to extensive development in spherical deconvolution methodology to recover underlying fiber directions. However, these methods are typically linear and struggle with small crossing-angles and partial volume fraction estimation. In this work, we improve on current methodologies by nonlinearly estimating fiber structures via self-supervised spherical convolutional networks with guaranteed equivariance to spherical rotation. We perform validation via extensive single and multi-shell synthetic benchmarks demonstrating competitive performance against common base-lines. We further show improved downstream performance on fiber tractography measures on the Tractometer benchmark dataset. Finally, we show downstream improvements in terms of tractography and partial volume estimation on a multi-shell dataset of human subjects.