Many neuropsychiatric disorders are considered to be associated with abnormalities in the functional connectivity networks of the brain. The research on the classification of functional connectivity can therefore provide new perspectives for understanding the pathology of disorders and contribute to early diagnosis and treatment. Functional connectivity exhibits a nature of dynamically changing over time, however, the majority of existing methods are unable to collectively reveal the spatial topology and time-varying characteristics. Furthermore, despite the efforts of limited spatial-temporal studies to capture rich information across different spatial scales, they have not delved into the temporal characteristics among different scales. To address above issues, we propose a novel Multi-Scale Spatial-Temporal Attention Networks (MSSTAN) to exploit the multi-scale spatial-temporal information provided by functional connectome for classification. To fully extract spatial features of brain regions, we propose a Topology Enhanced Graph Transformer module to guide the attention calculations in the learning of spatial features by incorporating topology priors. A Multi-Scale Pooling Strategy is introduced to obtain representations of brain connectome at various scales. Considering the temporal dynamic characteristics between dynamic functional connectome, we employ Locality Sensitive Hashing attention to further capture long-term dependencies in time dynamics across multiple scales and reduce the computational complexity of the original attention mechanism. Experiments on three brain fMRI datasets of MDD and ASD demonstrate the superiority of our proposed approach. In addition, benefiting from the attention mechanism in Transformer, our results are interpretable, which can contribute to the discovery of biomarkers. The code is available at https://github.com/LIST-KONG/MSSTAN.
{"title":"Multi-Scale Spatial-Temporal Attention Networks for Functional Connectome Classification.","authors":"Youyong Kong, Xiaotong Zhang, Wenhan Wang, Yue Zhou, Yueying Li, Yonggui Yuan","doi":"10.1109/TMI.2024.3448214","DOIUrl":"https://doi.org/10.1109/TMI.2024.3448214","url":null,"abstract":"<p><p>Many neuropsychiatric disorders are considered to be associated with abnormalities in the functional connectivity networks of the brain. The research on the classification of functional connectivity can therefore provide new perspectives for understanding the pathology of disorders and contribute to early diagnosis and treatment. Functional connectivity exhibits a nature of dynamically changing over time, however, the majority of existing methods are unable to collectively reveal the spatial topology and time-varying characteristics. Furthermore, despite the efforts of limited spatial-temporal studies to capture rich information across different spatial scales, they have not delved into the temporal characteristics among different scales. To address above issues, we propose a novel Multi-Scale Spatial-Temporal Attention Networks (MSSTAN) to exploit the multi-scale spatial-temporal information provided by functional connectome for classification. To fully extract spatial features of brain regions, we propose a Topology Enhanced Graph Transformer module to guide the attention calculations in the learning of spatial features by incorporating topology priors. A Multi-Scale Pooling Strategy is introduced to obtain representations of brain connectome at various scales. Considering the temporal dynamic characteristics between dynamic functional connectome, we employ Locality Sensitive Hashing attention to further capture long-term dependencies in time dynamics across multiple scales and reduce the computational complexity of the original attention mechanism. Experiments on three brain fMRI datasets of MDD and ASD demonstrate the superiority of our proposed approach. In addition, benefiting from the attention mechanism in Transformer, our results are interpretable, which can contribute to the discovery of biomarkers. The code is available at https://github.com/LIST-KONG/MSSTAN.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142037961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
CT and MR are currently the most common imaging techniques for pancreatic cancer diagnosis. Accurate segmentation of the pancreas in CT and MR images can provide significant help in the diagnosis and treatment of pancreatic cancer. Traditional supervised segmentation methods require a large number of labeled CT and MR training data, which is usually time-consuming and laborious. Meanwhile, due to domain shift, traditional segmentation networks are difficult to be deployed on different imaging modality datasets. Cross-domain segmentation can utilize labeled source domain data to assist unlabeled target domains in solving the above problems. In this paper, a cross-domain pancreas segmentation algorithm is proposed based on Moment-Consistent Contrastive Cycle Generative Adversarial Networks (MC-CCycleGAN). MC-CCycleGAN is a style transfer network, in which the encoder of its generator is used to extract features from real images and style transfer images, constrain feature extraction through a contrastive loss, and fully extract structural features of input images during style transfer while eliminate redundant style features. The multi-order central moments of the pancreas are proposed to describe its anatomy in high dimensions and a contrastive loss is also proposed to constrain the moment consistency, so as to maintain consistency of the pancreatic structure and shape before and after style transfer. Multi-teacher knowledge distillation framework is proposed to transfer the knowledge from multiple teachers to a single student, so as to improve the robustness and performance of the student network. The experimental results have demonstrated the superiority of our framework over state-of-the-art domain adaptation methods.
{"title":"Moment-Consistent Contrastive CycleGAN for Cross-Domain Pancreatic Image Segmentation.","authors":"Zhongyu Chen, Yun Bian, Erwei Shen, Ligang Fan, Weifang Zhu, Fei Shi, Chengwei Shao, Xinjian Chen, Dehui Xiang","doi":"10.1109/TMI.2024.3447071","DOIUrl":"https://doi.org/10.1109/TMI.2024.3447071","url":null,"abstract":"<p><p>CT and MR are currently the most common imaging techniques for pancreatic cancer diagnosis. Accurate segmentation of the pancreas in CT and MR images can provide significant help in the diagnosis and treatment of pancreatic cancer. Traditional supervised segmentation methods require a large number of labeled CT and MR training data, which is usually time-consuming and laborious. Meanwhile, due to domain shift, traditional segmentation networks are difficult to be deployed on different imaging modality datasets. Cross-domain segmentation can utilize labeled source domain data to assist unlabeled target domains in solving the above problems. In this paper, a cross-domain pancreas segmentation algorithm is proposed based on Moment-Consistent Contrastive Cycle Generative Adversarial Networks (MC-CCycleGAN). MC-CCycleGAN is a style transfer network, in which the encoder of its generator is used to extract features from real images and style transfer images, constrain feature extraction through a contrastive loss, and fully extract structural features of input images during style transfer while eliminate redundant style features. The multi-order central moments of the pancreas are proposed to describe its anatomy in high dimensions and a contrastive loss is also proposed to constrain the moment consistency, so as to maintain consistency of the pancreatic structure and shape before and after style transfer. Multi-teacher knowledge distillation framework is proposed to transfer the knowledge from multiple teachers to a single student, so as to improve the robustness and performance of the student network. The experimental results have demonstrated the superiority of our framework over state-of-the-art domain adaptation methods.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142019954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-21DOI: 10.1109/TMI.2024.3447214
Xingyue Wei, Lin Ge, Lijie Huang, Jianwen Luo, Yan Xu
Histological image registration is a fundamental task in histological image analysis. It is challenging because of substantial appearance differences due to multiple staining. Keypoint correspondences, i.e., matched keypoint pairs, have been introduced to guide unsupervised deep learning (DL) based registration methods to handle such a registration task. This paper proposes an iterative keypoint correspondence-guided (IKCG) unsupervised network for non-rigid histological image registration. Fixed deep features and learnable deep features are introduced as keypoint descriptors to automatically establish keypoint correspondences, the distance between which is used as a loss function to train the registration network. Fixed deep features extracted from DL networks that are pre-trained on natural image datasets are more discriminative than handcrafted ones, benefiting from the deep and hierarchical nature of DL networks. The intermediate layer outputs of the registration networks trained on histological image datasets are extracted as learnable deep features, which reveal unique information for histological images. An iterative training strategy is adopted to train the registration network and optimize learnable deep features jointly. Benefiting from the excellent matching ability of learnable deep features optimized with the iterative training strategy, the proposed method can solve the local non-rigid large displacement problem, an inevitable problem usually caused by misoperation, such as tears in producing tissue slices. The proposed method is evaluated on the Automatic Non-rigid Histology Image Registration (ANHIR) website and AutomatiC Registration Of Breast cAncer Tissue (ACROBAT) website. It ranked 1st on both websites as of August 6th, 2024.
{"title":"Unsupervised Non-rigid Histological Image Registration Guided by Keypoint Correspondences Based on Learnable Deep Features with Iterative Training.","authors":"Xingyue Wei, Lin Ge, Lijie Huang, Jianwen Luo, Yan Xu","doi":"10.1109/TMI.2024.3447214","DOIUrl":"https://doi.org/10.1109/TMI.2024.3447214","url":null,"abstract":"<p><p>Histological image registration is a fundamental task in histological image analysis. It is challenging because of substantial appearance differences due to multiple staining. Keypoint correspondences, i.e., matched keypoint pairs, have been introduced to guide unsupervised deep learning (DL) based registration methods to handle such a registration task. This paper proposes an iterative keypoint correspondence-guided (IKCG) unsupervised network for non-rigid histological image registration. Fixed deep features and learnable deep features are introduced as keypoint descriptors to automatically establish keypoint correspondences, the distance between which is used as a loss function to train the registration network. Fixed deep features extracted from DL networks that are pre-trained on natural image datasets are more discriminative than handcrafted ones, benefiting from the deep and hierarchical nature of DL networks. The intermediate layer outputs of the registration networks trained on histological image datasets are extracted as learnable deep features, which reveal unique information for histological images. An iterative training strategy is adopted to train the registration network and optimize learnable deep features jointly. Benefiting from the excellent matching ability of learnable deep features optimized with the iterative training strategy, the proposed method can solve the local non-rigid large displacement problem, an inevitable problem usually caused by misoperation, such as tears in producing tissue slices. The proposed method is evaluated on the Automatic Non-rigid Histology Image Registration (ANHIR) website and AutomatiC Registration Of Breast cAncer Tissue (ACROBAT) website. It ranked 1st on both websites as of August 6th, 2024.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142019956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-21DOI: 10.1109/TMI.2024.3447125
Qiang Liu, Weian Chao, Ruyi Wen, Yubin Gong, Lei Xi
Microwave-induced thermoacoustic imaging (M-TAI) allows the visualization of macroscopic and microscopic structures of bio-tissues. However, it suffers from severe inherent artifacts that might misguide the subsequent diagnostics and treatments of diseases. To overcome this limitation, we propose an optimized excitation strategy. In detail, the strategy integrates dynamically compound specific absorption rate (SAR) and co-planar configuration of polarization state, incident wave vector and imaging plane. Starting from the theoretical analysis, we interpret the underlying mechanism supporting the superiority of the optimized excitation strategy to achieve an effect equivalent to homogenizing the deposited electromagnetic energy in bio-tissues. The following numerical simulations demonstrate that the strategy enables better preservation of the conductivity weighting of samples while increasing Pearson correlation coefficient. Furthermore, the in vitro and in vivo M-TAI experiments validate the effectiveness and robustness of this optimized excitation strategy in artifact suppression, allowing the simultaneous identification of both boundary and inside fine structures within bio-tissues. All the results suggest that the optimized excitation strategy can be expanded to diverse scenarios, inspiring more suitable strategies that remarkably suppress the inherent artifacts in M-TAI.
{"title":"Optimized Excitation in Microwave-induced Thermoacoustic Imaging for Artifact Suppression.","authors":"Qiang Liu, Weian Chao, Ruyi Wen, Yubin Gong, Lei Xi","doi":"10.1109/TMI.2024.3447125","DOIUrl":"https://doi.org/10.1109/TMI.2024.3447125","url":null,"abstract":"<p><p>Microwave-induced thermoacoustic imaging (M-TAI) allows the visualization of macroscopic and microscopic structures of bio-tissues. However, it suffers from severe inherent artifacts that might misguide the subsequent diagnostics and treatments of diseases. To overcome this limitation, we propose an optimized excitation strategy. In detail, the strategy integrates dynamically compound specific absorption rate (SAR) and co-planar configuration of polarization state, incident wave vector and imaging plane. Starting from the theoretical analysis, we interpret the underlying mechanism supporting the superiority of the optimized excitation strategy to achieve an effect equivalent to homogenizing the deposited electromagnetic energy in bio-tissues. The following numerical simulations demonstrate that the strategy enables better preservation of the conductivity weighting of samples while increasing Pearson correlation coefficient. Furthermore, the in vitro and in vivo M-TAI experiments validate the effectiveness and robustness of this optimized excitation strategy in artifact suppression, allowing the simultaneous identification of both boundary and inside fine structures within bio-tissues. All the results suggest that the optimized excitation strategy can be expanded to diverse scenarios, inspiring more suitable strategies that remarkably suppress the inherent artifacts in M-TAI.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142019955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-20DOI: 10.1109/TMI.2024.3446716
Philip Chikontwe, Meejeong Kim, Jaehoon Jeong, Hyun Jung Sung, Heounjeong Go, Soo Jeong Nam, Sang Hyun Park
In digital pathology, whole slide images (WSI) are crucial for cancer prognostication and treatment planning. WSI classification is generally addressed using multiple instance learning (MIL), alleviating the challenge of processing billions of pixels and curating rich annotations. Though recent MIL approaches leverage variants of the attention mechanism to learn better representations, they scarcely study the properties of the data distribution itself i.e., different staining and acquisition protocols resulting in intra-patch and inter-slide variations. In this work, we first introduce a distribution re-calibration strategy to shift the feature distribution of a WSI bag (instances) using the statistics of the max-instance (critical) feature. Second, we enforce class (bag) separation via a metric loss assuming that positive bags exhibit larger magnitudes than negatives. We also introduce a generative process leveraging Vector Quantization (VQ) for improved instance discrimination i.e., VQ helps model bag latent factors for improved classification. To model spatial and context information, a position encoding module (PEM) is employed with transformer-based pooling by multi-head self-attention (PMSA). Evaluation of popular WSI benchmark datasets reveals our approach improves over state-of-the-art MIL methods. Further, we validate the general applicability of our method on classic MIL benchmark tasks and for point cloud classification with limited points https://github.com/PhilipChicco/FRMIL.
在数字病理学中,整张切片图像(WSI)对于癌症预后和治疗规划至关重要。WSI 分类通常采用多实例学习(MIL)方法,以减轻处理数十亿像素和整理丰富注释所带来的挑战。虽然最近的 MIL 方法利用注意力机制的变体来学习更好的表征,但它们几乎没有研究数据分布本身的属性,即不同染色和采集方案导致的斑块内和切片间的差异。在这项工作中,我们首先引入了一种分布重新校准策略,利用最大实例(临界)特征的统计数据来改变 WSI 包(实例)的特征分布。其次,我们通过度量损失来执行类(袋)分离,假设正向袋比负向袋表现出更大的量级。此外,我们还引入了一种利用矢量量化(VQ)的生成过程,以提高实例分辨能力,即 VQ 可帮助对袋的潜在因素进行建模,从而提高分类能力。为了对空间和上下文信息进行建模,我们采用了位置编码模块(PEM),并通过多头自注意(PMSA)进行基于变压器的汇集。对流行的 WSI 基准数据集进行评估后发现,我们的方法比最先进的 MIL 方法更胜一筹。此外,我们还验证了我们的方法在经典 MIL 基准任务和有限点 https://github.com/PhilipChicco/FRMIL 的点云分类中的普遍适用性。
{"title":"FR-MIL: Distribution Re-calibration based Multiple Instance Learning with Transformer for Whole Slide Image Classification.","authors":"Philip Chikontwe, Meejeong Kim, Jaehoon Jeong, Hyun Jung Sung, Heounjeong Go, Soo Jeong Nam, Sang Hyun Park","doi":"10.1109/TMI.2024.3446716","DOIUrl":"https://doi.org/10.1109/TMI.2024.3446716","url":null,"abstract":"<p><p>In digital pathology, whole slide images (WSI) are crucial for cancer prognostication and treatment planning. WSI classification is generally addressed using multiple instance learning (MIL), alleviating the challenge of processing billions of pixels and curating rich annotations. Though recent MIL approaches leverage variants of the attention mechanism to learn better representations, they scarcely study the properties of the data distribution itself i.e., different staining and acquisition protocols resulting in intra-patch and inter-slide variations. In this work, we first introduce a distribution re-calibration strategy to shift the feature distribution of a WSI bag (instances) using the statistics of the max-instance (critical) feature. Second, we enforce class (bag) separation via a metric loss assuming that positive bags exhibit larger magnitudes than negatives. We also introduce a generative process leveraging Vector Quantization (VQ) for improved instance discrimination i.e., VQ helps model bag latent factors for improved classification. To model spatial and context information, a position encoding module (PEM) is employed with transformer-based pooling by multi-head self-attention (PMSA). Evaluation of popular WSI benchmark datasets reveals our approach improves over state-of-the-art MIL methods. Further, we validate the general applicability of our method on classic MIL benchmark tasks and for point cloud classification with limited points https://github.com/PhilipChicco/FRMIL.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142010126","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-19DOI: 10.1109/TMI.2024.3445969
Yidan Feng, Sen Deng, Jun Lyu, Jing Cai, Mingqiang Wei, Jing Qin
In multi-modal magnetic resonance imaging (MRI), the tasks of imputing or reconstructing the target modality share a common obstacle: the accurate modeling of fine-grained inter-modal differences, which has been sparingly addressed in current literature. These differences stem from two sources: 1) spatial misalignment remaining after coarse registration and 2) structural distinction arising from modality-specific signal manifestations. This paper integrates the previously separate research trajectories of cross-modality synthesis (CMS) and multi-contrast super-resolution (MCSR) to address this pervasive challenge within a unified framework. Connected through generalized down-sampling ratios, this unification not only emphasizes their common goal in reducing structural differences, but also identifies the key task distinguishing MCSR from CMS: modeling the structural distinctions using the limited information from the misaligned target input. Specifically, we propose a composite network architecture with several key components: a label correction module to align the coordinates of multi-modal training pairs, a CMS module serving as the base model, an SR branch to handle target inputs, and a difference projection discriminator for structural distinction-centered adversarial training. When training the SR branch as the generator, the adversarial learning is enhanced with distinction-aware incremental modulation to ensure better-controlled generation. Moreover, the SR branch integrates deformable convolutions to address cross-modal spatial misalignment at the feature level. Experiments conducted on three public datasets demonstrate that our approach effectively balances structural accuracy and realism, exhibiting overall superiority in comprehensive evaluations for both tasks over current state-of-the-art approaches. The code is available at https://github.com/papshare/FGDL.
在多模态磁共振成像(MRI)中,归因或重建目标模态的任务有一个共同的障碍:对细粒度模态间差异进行精确建模,而目前的文献很少涉及这一问题。这些差异有两个来源:1) 粗配准后残留的空间错位;2) 由特定模态信号表现产生的结构差异。本文整合了跨模态合成(CMS)和多对比度超分辨率(MCSR)这两个以往独立的研究方向,在一个统一的框架内解决了这一普遍存在的难题。通过广义下采样率的连接,这种统一不仅强调了它们在减少结构差异方面的共同目标,还确定了 MCSR 区别于 CMS 的关键任务:利用来自错位目标输入的有限信息对结构差异进行建模。具体来说,我们提出了一种包含几个关键组件的复合网络架构:用于对齐多模态训练对坐标的标签校正模块、作为基础模型的 CMS 模块、处理目标输入的 SR 分支,以及用于以结构差异为中心的对抗训练的差异投影判别器。在将 SR 分支作为生成器进行训练时,对抗学习会通过区分感知增量调制得到增强,以确保生成器得到更好的控制。此外,SR 分支还整合了可变形卷积,以解决特征层面的跨模态空间错位问题。在三个公共数据集上进行的实验表明,我们的方法有效地平衡了结构准确性和真实性,在这两项任务的综合评估中,我们的方法总体上优于目前最先进的方法。代码见 https://github.com/papshare/FGDL。
{"title":"Bridging MRI Cross-Modality Synthesis and Multi-Contrast Super-Resolution by Fine-Grained Difference Learning.","authors":"Yidan Feng, Sen Deng, Jun Lyu, Jing Cai, Mingqiang Wei, Jing Qin","doi":"10.1109/TMI.2024.3445969","DOIUrl":"https://doi.org/10.1109/TMI.2024.3445969","url":null,"abstract":"<p><p>In multi-modal magnetic resonance imaging (MRI), the tasks of imputing or reconstructing the target modality share a common obstacle: the accurate modeling of fine-grained inter-modal differences, which has been sparingly addressed in current literature. These differences stem from two sources: 1) spatial misalignment remaining after coarse registration and 2) structural distinction arising from modality-specific signal manifestations. This paper integrates the previously separate research trajectories of cross-modality synthesis (CMS) and multi-contrast super-resolution (MCSR) to address this pervasive challenge within a unified framework. Connected through generalized down-sampling ratios, this unification not only emphasizes their common goal in reducing structural differences, but also identifies the key task distinguishing MCSR from CMS: modeling the structural distinctions using the limited information from the misaligned target input. Specifically, we propose a composite network architecture with several key components: a label correction module to align the coordinates of multi-modal training pairs, a CMS module serving as the base model, an SR branch to handle target inputs, and a difference projection discriminator for structural distinction-centered adversarial training. When training the SR branch as the generator, the adversarial learning is enhanced with distinction-aware incremental modulation to ensure better-controlled generation. Moreover, the SR branch integrates deformable convolutions to address cross-modal spatial misalignment at the feature level. Experiments conducted on three public datasets demonstrate that our approach effectively balances structural accuracy and realism, exhibiting overall superiority in comprehensive evaluations for both tasks over current state-of-the-art approaches. The code is available at https://github.com/papshare/FGDL.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142006159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-19DOI: 10.1109/TMI.2024.3446450
Oscar Dabrowski, Jean-Luc Falcone, Antoine Klauser, Julien Songeon, Michel Kocher, Bastien Chopard, Francois Lazeyras, Sebastien Courvoisier
MRI, a widespread non-invasive medical imaging modality, is highly sensitive to patient motion. Despite many attempts over the years, motion correction remains a difficult problem and there is no general method applicable to all situations. We propose a retrospective method for motion estimation and correction to tackle the problem of in-plane rigid-body motion, apt for classical 2D Spin-Echo scans of the brain, which are regularly used in clinical practice. Due to the sequential acquisition of k-space, motion artifacts are well localized. The method leverages the power of deep neural networks to estimate motion parameters in k-space and uses a model-based approach to restore degraded images to avoid "hallucinations". Notable advantages are its ability to estimate motion occurring in high spatial frequencies without the need of a motion-free reference. The proposed method operates on the whole k-space dynamic range and is moderately affected by the lower SNR of higher harmonics. As a proof of concept, we provide models trained using supervised learning on 600k motion simulations based on motion-free scans of 43 different subjects. Generalization performance was tested with simulations as well as in-vivo. Qualitative and quantitative evaluations are presented for motion parameter estimations and image reconstruction. Experimental results show that our approach is able to obtain good generalization performance on simulated data and in-vivo acquisitions. We provide a Python implementation at https://gitlab.unige.ch/Oscar.Dabrowski/sismik_mri/.
核磁共振成像是一种广泛应用的无创医学成像模式,对患者的运动非常敏感。尽管多年来进行了许多尝试,但运动校正仍是一个难题,没有适用于所有情况的通用方法。我们提出了一种运动估计和校正的回顾性方法,以解决平面内刚体运动的问题,适用于临床上经常使用的经典二维脑部自旋回波扫描。由于 k 空间的顺序采集,运动伪影被很好地定位。该方法利用深度神经网络的强大功能来估计 k 空间中的运动参数,并使用基于模型的方法来恢复退化图像,以避免出现 "幻觉"。该方法的显著优点是能够估算高空间频率下的运动,而无需无运动参照物。该方法适用于整个 k 空间动态范围,受高次谐波较低信噪比的影响较小。作为概念验证,我们提供了基于 43 个不同受试者的无运动扫描的 600k 运动模拟的监督学习训练模型。通过模拟和活体测试了泛化性能。对运动参数估计和图像重建进行了定性和定量评估。实验结果表明,我们的方法能够在模拟数据和体内采集中获得良好的泛化性能。我们在 https://gitlab.unige.ch/Oscar.Dabrowski/sismik_mri/ 上提供了 Python 实现。
{"title":"SISMIK for brain MRI: Deep-learning-based motion estimation and model-based motion correction in k-space.","authors":"Oscar Dabrowski, Jean-Luc Falcone, Antoine Klauser, Julien Songeon, Michel Kocher, Bastien Chopard, Francois Lazeyras, Sebastien Courvoisier","doi":"10.1109/TMI.2024.3446450","DOIUrl":"https://doi.org/10.1109/TMI.2024.3446450","url":null,"abstract":"<p><p>MRI, a widespread non-invasive medical imaging modality, is highly sensitive to patient motion. Despite many attempts over the years, motion correction remains a difficult problem and there is no general method applicable to all situations. We propose a retrospective method for motion estimation and correction to tackle the problem of in-plane rigid-body motion, apt for classical 2D Spin-Echo scans of the brain, which are regularly used in clinical practice. Due to the sequential acquisition of k-space, motion artifacts are well localized. The method leverages the power of deep neural networks to estimate motion parameters in k-space and uses a model-based approach to restore degraded images to avoid \"hallucinations\". Notable advantages are its ability to estimate motion occurring in high spatial frequencies without the need of a motion-free reference. The proposed method operates on the whole k-space dynamic range and is moderately affected by the lower SNR of higher harmonics. As a proof of concept, we provide models trained using supervised learning on 600k motion simulations based on motion-free scans of 43 different subjects. Generalization performance was tested with simulations as well as in-vivo. Qualitative and quantitative evaluations are presented for motion parameter estimations and image reconstruction. Experimental results show that our approach is able to obtain good generalization performance on simulated data and in-vivo acquisitions. We provide a Python implementation at https://gitlab.unige.ch/Oscar.Dabrowski/sismik_mri/.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142006161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-19DOI: 10.1109/TMI.2024.3445999
M M Amaan Valiuddin, Christiaan G A Viviers, Ruud J G Van Sloun, Peter H N De With, Fons van der Sommen
Data uncertainties, such as sensor noise, occlusions or limitations in the acquisition method can introduce irreducible ambiguities in images, which result in varying, yet plausible, semantic hypotheses. In Machine Learning, this ambiguity is commonly referred to as aleatoric uncertainty. In image segmentation, latent density models can be utilized to address this problem. The most popular approach is the Probabilistic U-Net (PU-Net), which uses latent Normal densities to optimize the conditional data log-likelihood Evidence Lower Bound. In this work, we demonstrate that the PU-Net latent space is severely sparse and heavily under-utilized. To address this, we introduce mutual information maximization and entropy-regularized Sinkhorn Divergence in the latent space to promote homogeneity across all latent dimensions, effectively improving gradient-descent updates and latent space informativeness. Our results show that by applying this on public datasets of various clinical segmentation problems, our proposed methodology receives up to 11% performance gains compared against preceding latent variable models for probabilistic segmentation on the Hungarian-Matched Intersection over Union. The results indicate that encouraging a homogeneous latent space significantly improves latent density modeling for medical image segmentation.
{"title":"Investigating and Improving Latent Density Segmentation Models for Aleatoric Uncertainty Quantification in Medical Imaging.","authors":"M M Amaan Valiuddin, Christiaan G A Viviers, Ruud J G Van Sloun, Peter H N De With, Fons van der Sommen","doi":"10.1109/TMI.2024.3445999","DOIUrl":"https://doi.org/10.1109/TMI.2024.3445999","url":null,"abstract":"<p><p>Data uncertainties, such as sensor noise, occlusions or limitations in the acquisition method can introduce irreducible ambiguities in images, which result in varying, yet plausible, semantic hypotheses. In Machine Learning, this ambiguity is commonly referred to as aleatoric uncertainty. In image segmentation, latent density models can be utilized to address this problem. The most popular approach is the Probabilistic U-Net (PU-Net), which uses latent Normal densities to optimize the conditional data log-likelihood Evidence Lower Bound. In this work, we demonstrate that the PU-Net latent space is severely sparse and heavily under-utilized. To address this, we introduce mutual information maximization and entropy-regularized Sinkhorn Divergence in the latent space to promote homogeneity across all latent dimensions, effectively improving gradient-descent updates and latent space informativeness. Our results show that by applying this on public datasets of various clinical segmentation problems, our proposed methodology receives up to 11% performance gains compared against preceding latent variable models for probabilistic segmentation on the Hungarian-Matched Intersection over Union. The results indicate that encouraging a homogeneous latent space significantly improves latent density modeling for medical image segmentation.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142006160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Scene graph generation (SGG) of surgical procedures is crucial in enhancing holistically cognitive intelligence in the operating room (OR). However, previous works have primarily relied on multi-stage learning, where the generated semantic scene graphs depend on intermediate processes with pose estimation and object detection. This pipeline may potentially compromise the flexibility of learning multimodal representations, consequently constraining the overall effectiveness. In this study, we introduce a novel single-stage bi-modal transformer framework for SGG in the OR, termed, S2Former-OR, aimed to complementally leverage multi-view 2D scenes and 3D point clouds for SGG in an end-to-end manner. Concretely, our model embraces a View-Sync Transfusion scheme to encourage multi-view visual information interaction. Concurrently, a Geometry-Visual Cohesion operation is designed to integrate the synergic 2D semantic features into 3D point cloud features. Moreover, based on the augmented feature, we propose a novel relation-sensitive transformer decoder that embeds dynamic entity-pair queries and relational trait priors, which enables the direct prediction of entity-pair relations for graph generation without intermediate steps. Extensive experiments have validated the superior SGG performance and lower computational cost of S2Former-OR on 4D-OR benchmark, compared with current OR-SGG methods, e.g., 3 percentage points Precision increase and 24.2M reduction in model parameters. We further compared our method with generic single-stage SGG methods with broader metrics for a comprehensive evaluation, with consistently better performance achieved. Our source code can be made available at: https://github.com/PJLallen/S2Former-OR.
{"title":"S<sup>2</sup>Former-OR: Single-Stage Bi-Modal Transformer for Scene Graph Generation in OR.","authors":"Jialun Pei, Diandian Guo, Jingyang Zhang, Manxi Lin, Yueming Jin, Pheng-Ann Heng","doi":"10.1109/TMI.2024.3444279","DOIUrl":"https://doi.org/10.1109/TMI.2024.3444279","url":null,"abstract":"<p><p>Scene graph generation (SGG) of surgical procedures is crucial in enhancing holistically cognitive intelligence in the operating room (OR). However, previous works have primarily relied on multi-stage learning, where the generated semantic scene graphs depend on intermediate processes with pose estimation and object detection. This pipeline may potentially compromise the flexibility of learning multimodal representations, consequently constraining the overall effectiveness. In this study, we introduce a novel single-stage bi-modal transformer framework for SGG in the OR, termed, S<sup>2</sup>Former-OR, aimed to complementally leverage multi-view 2D scenes and 3D point clouds for SGG in an end-to-end manner. Concretely, our model embraces a View-Sync Transfusion scheme to encourage multi-view visual information interaction. Concurrently, a Geometry-Visual Cohesion operation is designed to integrate the synergic 2D semantic features into 3D point cloud features. Moreover, based on the augmented feature, we propose a novel relation-sensitive transformer decoder that embeds dynamic entity-pair queries and relational trait priors, which enables the direct prediction of entity-pair relations for graph generation without intermediate steps. Extensive experiments have validated the superior SGG performance and lower computational cost of S<sup>2</sup>Former-OR on 4D-OR benchmark, compared with current OR-SGG methods, e.g., 3 percentage points Precision increase and 24.2M reduction in model parameters. We further compared our method with generic single-stage SGG methods with broader metrics for a comprehensive evaluation, with consistently better performance achieved. Our source code can be made available at: https://github.com/PJLallen/S2Former-OR.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141989773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-15DOI: 10.1109/TMI.2024.3443292
Cagan Alkan, Morteza Mardani, Congyu Liao, Zhitao Li, Shreyas S Vasanawala, John M Pauly
Accelerated MRI protocols routinely involve a predefined sampling pattern that undersamples the k-space. Finding an optimal pattern can enhance the reconstruction quality, however this optimization is a challenging task. To address this challenge, we introduce a novel deep learning framework, AutoSamp, based on variational information maximization that enables joint optimization of sampling pattern and reconstruction of MRI scans. We represent the encoder as a non-uniform Fast Fourier Transform that allows continuous optimization of k-space sample locations on a non-Cartesian plane, and the decoder as a deep reconstruction network. Experiments on public 3D acquired MRI datasets show improved reconstruction quality of the proposed AutoSamp method over the prevailing variable density and variable density Poisson disc sampling for both compressed sensing and deep learning reconstructions. We demonstrate that our data-driven sampling optimization method achieves 4.4dB, 2.0dB, 0.75dB, 0.7dB PSNR improvements over reconstruction with Poisson Disc masks for acceleration factors of R = 5, 10, 15, 25, respectively. Prospectively accelerated acquisitions with 3D FSE sequences using our optimized sampling patterns exhibit improved image quality and sharpness. Furthermore, we analyze the characteristics of the learned sampling patterns with respect to changes in acceleration factor, measurement noise, underlying anatomy, and coil sensitivities. We show that all these factors contribute to the optimization result by affecting the sampling density, k-space coverage and point spread functions of the learned sampling patterns.
加速核磁共振成像方案通常会采用预先确定的采样模式,对 k 空间进行低采样。寻找最佳模式可以提高重建质量,但这种优化是一项具有挑战性的任务。为了应对这一挑战,我们引入了一种基于变异信息最大化的新型深度学习框架 AutoSamp,该框架能对磁共振成像扫描的采样模式和重建进行联合优化。我们将编码器表示为非均匀快速傅立叶变换,允许在非笛卡尔平面上连续优化 k 空间采样位置,将解码器表示为深度重建网络。在公开的三维核磁共振成像数据集上进行的实验表明,在压缩传感和深度学习重建方面,所提出的 AutoSamp 方法比现有的变密度和变密度泊松圆盘采样法提高了重建质量。我们证明,在加速因子为 R = 5、10、15、25 时,我们的数据驱动采样优化方法比使用泊松圆盘掩模重建的 PSNR 分别提高了 4.4dB、2.0dB、0.75dB、0.7dB。使用我们优化的采样模式的三维 FSE 序列的前瞻性加速采集显示出更高的图像质量和清晰度。此外,我们还分析了学习到的采样模式在加速因子、测量噪声、基础解剖和线圈灵敏度变化方面的特点。我们发现,所有这些因素都会影响学习到的采样模式的采样密度、k 空间覆盖率和点扩散函数,从而对优化结果产生影响。
{"title":"AutoSamp: Autoencoding k-space Sampling via Variational Information Maximization for 3D MRI.","authors":"Cagan Alkan, Morteza Mardani, Congyu Liao, Zhitao Li, Shreyas S Vasanawala, John M Pauly","doi":"10.1109/TMI.2024.3443292","DOIUrl":"10.1109/TMI.2024.3443292","url":null,"abstract":"<p><p>Accelerated MRI protocols routinely involve a predefined sampling pattern that undersamples the k-space. Finding an optimal pattern can enhance the reconstruction quality, however this optimization is a challenging task. To address this challenge, we introduce a novel deep learning framework, AutoSamp, based on variational information maximization that enables joint optimization of sampling pattern and reconstruction of MRI scans. We represent the encoder as a non-uniform Fast Fourier Transform that allows continuous optimization of k-space sample locations on a non-Cartesian plane, and the decoder as a deep reconstruction network. Experiments on public 3D acquired MRI datasets show improved reconstruction quality of the proposed AutoSamp method over the prevailing variable density and variable density Poisson disc sampling for both compressed sensing and deep learning reconstructions. We demonstrate that our data-driven sampling optimization method achieves 4.4dB, 2.0dB, 0.75dB, 0.7dB PSNR improvements over reconstruction with Poisson Disc masks for acceleration factors of R = 5, 10, 15, 25, respectively. Prospectively accelerated acquisitions with 3D FSE sequences using our optimized sampling patterns exhibit improved image quality and sharpness. Furthermore, we analyze the characteristics of the learned sampling patterns with respect to changes in acceleration factor, measurement noise, underlying anatomy, and coil sensitivities. We show that all these factors contribute to the optimization result by affecting the sampling density, k-space coverage and point spread functions of the learned sampling patterns.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141989744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}