Pub Date : 2024-08-19DOI: 10.1109/TMI.2024.3446450
Oscar Dabrowski, Jean-Luc Falcone, Antoine Klauser, Julien Songeon, Michel Kocher, Bastien Chopard, Francois Lazeyras, Sebastien Courvoisier
MRI, a widespread non-invasive medical imaging modality, is highly sensitive to patient motion. Despite many attempts over the years, motion correction remains a difficult problem and there is no general method applicable to all situations. We propose a retrospective method for motion estimation and correction to tackle the problem of in-plane rigid-body motion, apt for classical 2D Spin-Echo scans of the brain, which are regularly used in clinical practice. Due to the sequential acquisition of k-space, motion artifacts are well localized. The method leverages the power of deep neural networks to estimate motion parameters in k-space and uses a model-based approach to restore degraded images to avoid "hallucinations". Notable advantages are its ability to estimate motion occurring in high spatial frequencies without the need of a motion-free reference. The proposed method operates on the whole k-space dynamic range and is moderately affected by the lower SNR of higher harmonics. As a proof of concept, we provide models trained using supervised learning on 600k motion simulations based on motion-free scans of 43 different subjects. Generalization performance was tested with simulations as well as in-vivo. Qualitative and quantitative evaluations are presented for motion parameter estimations and image reconstruction. Experimental results show that our approach is able to obtain good generalization performance on simulated data and in-vivo acquisitions. We provide a Python implementation at https://gitlab.unige.ch/Oscar.Dabrowski/sismik_mri/.
核磁共振成像是一种广泛应用的无创医学成像模式,对患者的运动非常敏感。尽管多年来进行了许多尝试,但运动校正仍是一个难题,没有适用于所有情况的通用方法。我们提出了一种运动估计和校正的回顾性方法,以解决平面内刚体运动的问题,适用于临床上经常使用的经典二维脑部自旋回波扫描。由于 k 空间的顺序采集,运动伪影被很好地定位。该方法利用深度神经网络的强大功能来估计 k 空间中的运动参数,并使用基于模型的方法来恢复退化图像,以避免出现 "幻觉"。该方法的显著优点是能够估算高空间频率下的运动,而无需无运动参照物。该方法适用于整个 k 空间动态范围,受高次谐波较低信噪比的影响较小。作为概念验证,我们提供了基于 43 个不同受试者的无运动扫描的 600k 运动模拟的监督学习训练模型。通过模拟和活体测试了泛化性能。对运动参数估计和图像重建进行了定性和定量评估。实验结果表明,我们的方法能够在模拟数据和体内采集中获得良好的泛化性能。我们在 https://gitlab.unige.ch/Oscar.Dabrowski/sismik_mri/ 上提供了 Python 实现。
{"title":"SISMIK for brain MRI: Deep-learning-based motion estimation and model-based motion correction in k-space.","authors":"Oscar Dabrowski, Jean-Luc Falcone, Antoine Klauser, Julien Songeon, Michel Kocher, Bastien Chopard, Francois Lazeyras, Sebastien Courvoisier","doi":"10.1109/TMI.2024.3446450","DOIUrl":"https://doi.org/10.1109/TMI.2024.3446450","url":null,"abstract":"<p><p>MRI, a widespread non-invasive medical imaging modality, is highly sensitive to patient motion. Despite many attempts over the years, motion correction remains a difficult problem and there is no general method applicable to all situations. We propose a retrospective method for motion estimation and correction to tackle the problem of in-plane rigid-body motion, apt for classical 2D Spin-Echo scans of the brain, which are regularly used in clinical practice. Due to the sequential acquisition of k-space, motion artifacts are well localized. The method leverages the power of deep neural networks to estimate motion parameters in k-space and uses a model-based approach to restore degraded images to avoid \"hallucinations\". Notable advantages are its ability to estimate motion occurring in high spatial frequencies without the need of a motion-free reference. The proposed method operates on the whole k-space dynamic range and is moderately affected by the lower SNR of higher harmonics. As a proof of concept, we provide models trained using supervised learning on 600k motion simulations based on motion-free scans of 43 different subjects. Generalization performance was tested with simulations as well as in-vivo. Qualitative and quantitative evaluations are presented for motion parameter estimations and image reconstruction. Experimental results show that our approach is able to obtain good generalization performance on simulated data and in-vivo acquisitions. We provide a Python implementation at https://gitlab.unige.ch/Oscar.Dabrowski/sismik_mri/.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142006161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-19DOI: 10.1109/TMI.2024.3445999
M M Amaan Valiuddin, Christiaan G A Viviers, Ruud J G Van Sloun, Peter H N De With, Fons van der Sommen
Data uncertainties, such as sensor noise, occlusions or limitations in the acquisition method can introduce irreducible ambiguities in images, which result in varying, yet plausible, semantic hypotheses. In Machine Learning, this ambiguity is commonly referred to as aleatoric uncertainty. In image segmentation, latent density models can be utilized to address this problem. The most popular approach is the Probabilistic U-Net (PU-Net), which uses latent Normal densities to optimize the conditional data log-likelihood Evidence Lower Bound. In this work, we demonstrate that the PU-Net latent space is severely sparse and heavily under-utilized. To address this, we introduce mutual information maximization and entropy-regularized Sinkhorn Divergence in the latent space to promote homogeneity across all latent dimensions, effectively improving gradient-descent updates and latent space informativeness. Our results show that by applying this on public datasets of various clinical segmentation problems, our proposed methodology receives up to 11% performance gains compared against preceding latent variable models for probabilistic segmentation on the Hungarian-Matched Intersection over Union. The results indicate that encouraging a homogeneous latent space significantly improves latent density modeling for medical image segmentation.
{"title":"Investigating and Improving Latent Density Segmentation Models for Aleatoric Uncertainty Quantification in Medical Imaging.","authors":"M M Amaan Valiuddin, Christiaan G A Viviers, Ruud J G Van Sloun, Peter H N De With, Fons van der Sommen","doi":"10.1109/TMI.2024.3445999","DOIUrl":"https://doi.org/10.1109/TMI.2024.3445999","url":null,"abstract":"<p><p>Data uncertainties, such as sensor noise, occlusions or limitations in the acquisition method can introduce irreducible ambiguities in images, which result in varying, yet plausible, semantic hypotheses. In Machine Learning, this ambiguity is commonly referred to as aleatoric uncertainty. In image segmentation, latent density models can be utilized to address this problem. The most popular approach is the Probabilistic U-Net (PU-Net), which uses latent Normal densities to optimize the conditional data log-likelihood Evidence Lower Bound. In this work, we demonstrate that the PU-Net latent space is severely sparse and heavily under-utilized. To address this, we introduce mutual information maximization and entropy-regularized Sinkhorn Divergence in the latent space to promote homogeneity across all latent dimensions, effectively improving gradient-descent updates and latent space informativeness. Our results show that by applying this on public datasets of various clinical segmentation problems, our proposed methodology receives up to 11% performance gains compared against preceding latent variable models for probabilistic segmentation on the Hungarian-Matched Intersection over Union. The results indicate that encouraging a homogeneous latent space significantly improves latent density modeling for medical image segmentation.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142006160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Scene graph generation (SGG) of surgical procedures is crucial in enhancing holistically cognitive intelligence in the operating room (OR). However, previous works have primarily relied on multi-stage learning, where the generated semantic scene graphs depend on intermediate processes with pose estimation and object detection. This pipeline may potentially compromise the flexibility of learning multimodal representations, consequently constraining the overall effectiveness. In this study, we introduce a novel single-stage bi-modal transformer framework for SGG in the OR, termed, S2Former-OR, aimed to complementally leverage multi-view 2D scenes and 3D point clouds for SGG in an end-to-end manner. Concretely, our model embraces a View-Sync Transfusion scheme to encourage multi-view visual information interaction. Concurrently, a Geometry-Visual Cohesion operation is designed to integrate the synergic 2D semantic features into 3D point cloud features. Moreover, based on the augmented feature, we propose a novel relation-sensitive transformer decoder that embeds dynamic entity-pair queries and relational trait priors, which enables the direct prediction of entity-pair relations for graph generation without intermediate steps. Extensive experiments have validated the superior SGG performance and lower computational cost of S2Former-OR on 4D-OR benchmark, compared with current OR-SGG methods, e.g., 3 percentage points Precision increase and 24.2M reduction in model parameters. We further compared our method with generic single-stage SGG methods with broader metrics for a comprehensive evaluation, with consistently better performance achieved. Our source code can be made available at: https://github.com/PJLallen/S2Former-OR.
{"title":"S<sup>2</sup>Former-OR: Single-Stage Bi-Modal Transformer for Scene Graph Generation in OR.","authors":"Jialun Pei, Diandian Guo, Jingyang Zhang, Manxi Lin, Yueming Jin, Pheng-Ann Heng","doi":"10.1109/TMI.2024.3444279","DOIUrl":"https://doi.org/10.1109/TMI.2024.3444279","url":null,"abstract":"<p><p>Scene graph generation (SGG) of surgical procedures is crucial in enhancing holistically cognitive intelligence in the operating room (OR). However, previous works have primarily relied on multi-stage learning, where the generated semantic scene graphs depend on intermediate processes with pose estimation and object detection. This pipeline may potentially compromise the flexibility of learning multimodal representations, consequently constraining the overall effectiveness. In this study, we introduce a novel single-stage bi-modal transformer framework for SGG in the OR, termed, S<sup>2</sup>Former-OR, aimed to complementally leverage multi-view 2D scenes and 3D point clouds for SGG in an end-to-end manner. Concretely, our model embraces a View-Sync Transfusion scheme to encourage multi-view visual information interaction. Concurrently, a Geometry-Visual Cohesion operation is designed to integrate the synergic 2D semantic features into 3D point cloud features. Moreover, based on the augmented feature, we propose a novel relation-sensitive transformer decoder that embeds dynamic entity-pair queries and relational trait priors, which enables the direct prediction of entity-pair relations for graph generation without intermediate steps. Extensive experiments have validated the superior SGG performance and lower computational cost of S<sup>2</sup>Former-OR on 4D-OR benchmark, compared with current OR-SGG methods, e.g., 3 percentage points Precision increase and 24.2M reduction in model parameters. We further compared our method with generic single-stage SGG methods with broader metrics for a comprehensive evaluation, with consistently better performance achieved. Our source code can be made available at: https://github.com/PJLallen/S2Former-OR.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141989773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-15DOI: 10.1109/TMI.2024.3443292
Cagan Alkan, Morteza Mardani, Congyu Liao, Zhitao Li, Shreyas S Vasanawala, John M Pauly
Accelerated MRI protocols routinely involve a predefined sampling pattern that undersamples the k-space. Finding an optimal pattern can enhance the reconstruction quality, however this optimization is a challenging task. To address this challenge, we introduce a novel deep learning framework, AutoSamp, based on variational information maximization that enables joint optimization of sampling pattern and reconstruction of MRI scans. We represent the encoder as a non-uniform Fast Fourier Transform that allows continuous optimization of k-space sample locations on a non-Cartesian plane, and the decoder as a deep reconstruction network. Experiments on public 3D acquired MRI datasets show improved reconstruction quality of the proposed AutoSamp method over the prevailing variable density and variable density Poisson disc sampling for both compressed sensing and deep learning reconstructions. We demonstrate that our data-driven sampling optimization method achieves 4.4dB, 2.0dB, 0.75dB, 0.7dB PSNR improvements over reconstruction with Poisson Disc masks for acceleration factors of R = 5, 10, 15, 25, respectively. Prospectively accelerated acquisitions with 3D FSE sequences using our optimized sampling patterns exhibit improved image quality and sharpness. Furthermore, we analyze the characteristics of the learned sampling patterns with respect to changes in acceleration factor, measurement noise, underlying anatomy, and coil sensitivities. We show that all these factors contribute to the optimization result by affecting the sampling density, k-space coverage and point spread functions of the learned sampling patterns.
加速核磁共振成像方案通常会采用预先确定的采样模式,对 k 空间进行低采样。寻找最佳模式可以提高重建质量,但这种优化是一项具有挑战性的任务。为了应对这一挑战,我们引入了一种基于变异信息最大化的新型深度学习框架 AutoSamp,该框架能对磁共振成像扫描的采样模式和重建进行联合优化。我们将编码器表示为非均匀快速傅立叶变换,允许在非笛卡尔平面上连续优化 k 空间采样位置,将解码器表示为深度重建网络。在公开的三维核磁共振成像数据集上进行的实验表明,在压缩传感和深度学习重建方面,所提出的 AutoSamp 方法比现有的变密度和变密度泊松圆盘采样法提高了重建质量。我们证明,在加速因子为 R = 5、10、15、25 时,我们的数据驱动采样优化方法比使用泊松圆盘掩模重建的 PSNR 分别提高了 4.4dB、2.0dB、0.75dB、0.7dB。使用我们优化的采样模式的三维 FSE 序列的前瞻性加速采集显示出更高的图像质量和清晰度。此外,我们还分析了学习到的采样模式在加速因子、测量噪声、基础解剖和线圈灵敏度变化方面的特点。我们发现,所有这些因素都会影响学习到的采样模式的采样密度、k 空间覆盖率和点扩散函数,从而对优化结果产生影响。
{"title":"AutoSamp: Autoencoding k-space Sampling via Variational Information Maximization for 3D MRI.","authors":"Cagan Alkan, Morteza Mardani, Congyu Liao, Zhitao Li, Shreyas S Vasanawala, John M Pauly","doi":"10.1109/TMI.2024.3443292","DOIUrl":"10.1109/TMI.2024.3443292","url":null,"abstract":"<p><p>Accelerated MRI protocols routinely involve a predefined sampling pattern that undersamples the k-space. Finding an optimal pattern can enhance the reconstruction quality, however this optimization is a challenging task. To address this challenge, we introduce a novel deep learning framework, AutoSamp, based on variational information maximization that enables joint optimization of sampling pattern and reconstruction of MRI scans. We represent the encoder as a non-uniform Fast Fourier Transform that allows continuous optimization of k-space sample locations on a non-Cartesian plane, and the decoder as a deep reconstruction network. Experiments on public 3D acquired MRI datasets show improved reconstruction quality of the proposed AutoSamp method over the prevailing variable density and variable density Poisson disc sampling for both compressed sensing and deep learning reconstructions. We demonstrate that our data-driven sampling optimization method achieves 4.4dB, 2.0dB, 0.75dB, 0.7dB PSNR improvements over reconstruction with Poisson Disc masks for acceleration factors of R = 5, 10, 15, 25, respectively. Prospectively accelerated acquisitions with 3D FSE sequences using our optimized sampling patterns exhibit improved image quality and sharpness. Furthermore, we analyze the characteristics of the learned sampling patterns with respect to changes in acceleration factor, measurement noise, underlying anatomy, and coil sensitivities. We show that all these factors contribute to the optimization result by affecting the sampling density, k-space coverage and point spread functions of the learned sampling patterns.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141989744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-15DOI: 10.1109/TMI.2024.3442950
Mengting Luo, Nan Zhou, Tao Wang, Linchao He, Wang Wang, Hu Chen, Peixi Liao, Yi Zhang
In recent years, score-based diffusion models have emerged as effective tools for estimating score functions from empirical data distributions, particularly in integrating implicit priors with inverse problems like CT reconstruction. However, score-based diffusion models are rarely explored in challenging tasks such as metal artifact reduction (MAR). In this paper, we introduce the BiConstraints Diffusion Model for Metal Artifact Reduction (BCDMAR), an innovative approach that enhances iterative reconstruction with a conditional diffusion model for MAR. This method employs a metal artifact degradation operator in place of the traditional metal-excluded projection operator in the data-fidelity term, thereby preserving structure details around metal regions. However, scorebased diffusion models tend to be susceptible to grayscale shifts and unreliable structures, making it challenging to reach an optimal solution. To address this, we utilize a precorrected image as a prior constraint, guiding the generation of the score-based diffusion model. By iteratively applying the score-based diffusion model and the data-fidelity step in each sampling iteration, BCDMAR effectively maintains reliable tissue representation around metal regions and produces highly consistent structures in non-metal regions. Through extensive experiments focused on metal artifact reduction tasks, BCDMAR demonstrates superior performance over other state-of-the-art unsupervised and supervised methods, both quantitatively and in terms of visual results.
近年来,基于分数的扩散模型已成为从经验数据分布中估计分数函数的有效工具,特别是在将隐含先验与 CT 重建等逆问题相结合时。然而,基于分数的扩散模型很少在金属伪影减少(MAR)等具有挑战性的任务中得到应用。在本文中,我们介绍了用于减少金属伪影的双约束扩散模型(BiConstraints Diffusion Model for Metal Artifact Reduction,BCDMAR),这是一种用条件扩散模型增强迭代重建的创新方法。该方法在数据保真度项中采用金属伪影降级算子代替传统的金属排除投影算子,从而保留金属区域周围的结构细节。然而,基于分数的扩散模型往往容易受到灰度偏移和不可靠结构的影响,因此要获得最佳解决方案具有挑战性。为了解决这个问题,我们利用预校正图像作为先验约束,指导生成基于分数的扩散模型。通过在每次采样迭代中迭代应用基于分数的扩散模型和数据保真步骤,BCDMAR 能有效保持金属区域周围可靠的组织表示,并在非金属区域生成高度一致的结构。通过大量以减少金属伪影任务为重点的实验,BCDMAR 在定量和视觉效果方面都表现出优于其他最先进的无监督和有监督方法的性能。
{"title":"Bi-Constraints Diffusion: A Conditional Diffusion Model with Degradation Guidance for Metal Artifact Reduction.","authors":"Mengting Luo, Nan Zhou, Tao Wang, Linchao He, Wang Wang, Hu Chen, Peixi Liao, Yi Zhang","doi":"10.1109/TMI.2024.3442950","DOIUrl":"10.1109/TMI.2024.3442950","url":null,"abstract":"<p><p>In recent years, score-based diffusion models have emerged as effective tools for estimating score functions from empirical data distributions, particularly in integrating implicit priors with inverse problems like CT reconstruction. However, score-based diffusion models are rarely explored in challenging tasks such as metal artifact reduction (MAR). In this paper, we introduce the BiConstraints Diffusion Model for Metal Artifact Reduction (BCDMAR), an innovative approach that enhances iterative reconstruction with a conditional diffusion model for MAR. This method employs a metal artifact degradation operator in place of the traditional metal-excluded projection operator in the data-fidelity term, thereby preserving structure details around metal regions. However, scorebased diffusion models tend to be susceptible to grayscale shifts and unreliable structures, making it challenging to reach an optimal solution. To address this, we utilize a precorrected image as a prior constraint, guiding the generation of the score-based diffusion model. By iteratively applying the score-based diffusion model and the data-fidelity step in each sampling iteration, BCDMAR effectively maintains reliable tissue representation around metal regions and produces highly consistent structures in non-metal regions. Through extensive experiments focused on metal artifact reduction tasks, BCDMAR demonstrates superior performance over other state-of-the-art unsupervised and supervised methods, both quantitatively and in terms of visual results.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141989745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-14DOI: 10.1109/TMI.2024.3443262
Ziru Lu, Yizhe Zhang, Yi Zhou, Ye Wu, Tao Zhou
Accurate polyp segmentation plays a critical role from colonoscopy images in the diagnosis and treatment of colorectal cancer. While deep learning-based polyp segmentation models have made significant progress, they often suffer from performance degradation when applied to unseen target domain datasets collected from different imaging devices. To address this challenge, unsupervised domain adaptation (UDA) methods have gained attention by leveraging labeled source data and unlabeled target data to reduce the domain gap. However, existing UDA methods primarily focus on capturing class-wise representations, neglecting domain-wise representations. Additionally, uncertainty in pseudo labels could hinder the segmentation performance. To tackle these issues, we propose a novel Domain-interactive Contrastive Learning and Prototype-guided Self-training (DCL-PS) framework for cross-domain polyp segmentation. Specifically, domaininteractive contrastive learning (DCL) with a domain-mixed prototype updating strategy is proposed to discriminate class-wise feature representations across domains. Then, to enhance the feature extraction ability of the encoder, we present a contrastive learning-based cross-consistency training (CL-CCT) strategy, which is imposed on both the prototypes obtained by the outputs of the main decoder and perturbed auxiliary outputs. Furthermore, we propose a prototype-guided self-training (PS) strategy, which dynamically assigns a weight for each pixel during selftraining, filtering out unreliable pixels and improving the quality of pseudo-labels. Experimental results demonstrate the superiority of DCL-PS in improving polyp segmentation performance in the target domain. The code will be released at https://github.com/taozh2017/DCLPS.
{"title":"Domain-interactive Contrastive Learning and Prototype-guided Self-training for Cross-domain Polyp Segmentation.","authors":"Ziru Lu, Yizhe Zhang, Yi Zhou, Ye Wu, Tao Zhou","doi":"10.1109/TMI.2024.3443262","DOIUrl":"https://doi.org/10.1109/TMI.2024.3443262","url":null,"abstract":"<p><p>Accurate polyp segmentation plays a critical role from colonoscopy images in the diagnosis and treatment of colorectal cancer. While deep learning-based polyp segmentation models have made significant progress, they often suffer from performance degradation when applied to unseen target domain datasets collected from different imaging devices. To address this challenge, unsupervised domain adaptation (UDA) methods have gained attention by leveraging labeled source data and unlabeled target data to reduce the domain gap. However, existing UDA methods primarily focus on capturing class-wise representations, neglecting domain-wise representations. Additionally, uncertainty in pseudo labels could hinder the segmentation performance. To tackle these issues, we propose a novel Domain-interactive Contrastive Learning and Prototype-guided Self-training (DCL-PS) framework for cross-domain polyp segmentation. Specifically, domaininteractive contrastive learning (DCL) with a domain-mixed prototype updating strategy is proposed to discriminate class-wise feature representations across domains. Then, to enhance the feature extraction ability of the encoder, we present a contrastive learning-based cross-consistency training (CL-CCT) strategy, which is imposed on both the prototypes obtained by the outputs of the main decoder and perturbed auxiliary outputs. Furthermore, we propose a prototype-guided self-training (PS) strategy, which dynamically assigns a weight for each pixel during selftraining, filtering out unreliable pixels and improving the quality of pseudo-labels. Experimental results demonstrate the superiority of DCL-PS in improving polyp segmentation performance in the target domain. The code will be released at https://github.com/taozh2017/DCLPS.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141984206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-13DOI: 10.1109/TMI.2024.3443119
Siyuan Yan, Zhen Yu, Chi Liu, Lie Ju, Dwarikanath Mahapatra, Brigid Betz-Stablein, Victoria Mar, Monika Janda, Peter Soyer, Zongyuan Ge
Deep learning models for medical image analysis easily suffer from distribution shifts caused by dataset artifact bias, camera variations, differences in the imaging station, etc., leading to unreliable diagnoses in real-world clinical settings. Domain generalization (DG) methods, which aim to train models on multiple domains to perform well on unseen domains, offer a promising direction to solve the problem. However, existing DG methods assume domain labels of each image are available and accurate, which is typically feasible for only a limited number of medical datasets. To address these challenges, we propose a unified DG framework for medical image classification without relying on domain labels, called Prompt-driven Latent Domain Generalization (PLDG). PLDG consists of unsupervised domain discovery and prompt learning. This framework first discovers pseudo domain labels by clustering the bias-associated style features, then leverages collaborative domain prompts to guide a Vision Transformer to learn knowledge from discovered diverse domains. To facilitate cross-domain knowledge learning between different prompts, we introduce a domain prompt generator that enables knowledge sharing between domain prompts and a shared prompt. A domain mixup strategy is additionally employed for more flexible decision margins and mitigates the risk of incorrect domain assignments. Extensive experiments on three medical image classification tasks and one debiasing task demonstrate that our method can achieve comparable or even superior performance than conventional DG algorithms without relying on domain labels. Our code is publicly available at https://github.com/SiyuanYan1/PLDG/tree/main.
{"title":"Prompt-driven Latent Domain Generalization for Medical Image Classification.","authors":"Siyuan Yan, Zhen Yu, Chi Liu, Lie Ju, Dwarikanath Mahapatra, Brigid Betz-Stablein, Victoria Mar, Monika Janda, Peter Soyer, Zongyuan Ge","doi":"10.1109/TMI.2024.3443119","DOIUrl":"https://doi.org/10.1109/TMI.2024.3443119","url":null,"abstract":"<p><p>Deep learning models for medical image analysis easily suffer from distribution shifts caused by dataset artifact bias, camera variations, differences in the imaging station, etc., leading to unreliable diagnoses in real-world clinical settings. Domain generalization (DG) methods, which aim to train models on multiple domains to perform well on unseen domains, offer a promising direction to solve the problem. However, existing DG methods assume domain labels of each image are available and accurate, which is typically feasible for only a limited number of medical datasets. To address these challenges, we propose a unified DG framework for medical image classification without relying on domain labels, called Prompt-driven Latent Domain Generalization (PLDG). PLDG consists of unsupervised domain discovery and prompt learning. This framework first discovers pseudo domain labels by clustering the bias-associated style features, then leverages collaborative domain prompts to guide a Vision Transformer to learn knowledge from discovered diverse domains. To facilitate cross-domain knowledge learning between different prompts, we introduce a domain prompt generator that enables knowledge sharing between domain prompts and a shared prompt. A domain mixup strategy is additionally employed for more flexible decision margins and mitigates the risk of incorrect domain assignments. Extensive experiments on three medical image classification tasks and one debiasing task demonstrate that our method can achieve comparable or even superior performance than conventional DG algorithms without relying on domain labels. Our code is publicly available at https://github.com/SiyuanYan1/PLDG/tree/main.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141977493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-09DOI: 10.1109/TMI.2024.3441494
Mengliang Zhang, Xinyue Hu, Lin Gu, Liangchen Liu, Kazuma Kobayashi, Tatsuya Harada, Yan Yan, Ronald M Summers, Yingying Zhu
Chest radiography, commonly known as CXR, is frequently utilized in clinical settings to detect cardiopulmonary conditions. However, even seasoned radiologists might offer different evaluations regarding the seriousness and uncertainty associated with observed abnormalities. Previous research has attempted to utilize clinical notes to extract abnormal labels for training deep-learning models in CXR image diagnosis. However, these methods often neglected the varying degrees of severity and uncertainty linked to different labels. In our study, we initially assembled a comprehensive new dataset of CXR images based on clinical textual data, which incorporated radiologists' assessments of uncertainty and severity. Using this dataset, we introduced a multi-relationship graph learning framework that leverages spatial and semantic relationships while addressing expert uncertainty through a dedicated loss function. Our research showcases a notable enhancement in CXR image diagnosis and the interpretability of the diagnostic model, surpassing existing state-of-the-art methodologies. The dataset address of disease severity and uncertainty we extracted is: https://physionet.org/content/cad-chest/1.0/.
{"title":"A New Benchmark: Clinical Uncertainty and Severity Aware Labeled Chest X-Ray Images with Multi-Relationship Graph Learning.","authors":"Mengliang Zhang, Xinyue Hu, Lin Gu, Liangchen Liu, Kazuma Kobayashi, Tatsuya Harada, Yan Yan, Ronald M Summers, Yingying Zhu","doi":"10.1109/TMI.2024.3441494","DOIUrl":"https://doi.org/10.1109/TMI.2024.3441494","url":null,"abstract":"<p><p>Chest radiography, commonly known as CXR, is frequently utilized in clinical settings to detect cardiopulmonary conditions. However, even seasoned radiologists might offer different evaluations regarding the seriousness and uncertainty associated with observed abnormalities. Previous research has attempted to utilize clinical notes to extract abnormal labels for training deep-learning models in CXR image diagnosis. However, these methods often neglected the varying degrees of severity and uncertainty linked to different labels. In our study, we initially assembled a comprehensive new dataset of CXR images based on clinical textual data, which incorporated radiologists' assessments of uncertainty and severity. Using this dataset, we introduced a multi-relationship graph learning framework that leverages spatial and semantic relationships while addressing expert uncertainty through a dedicated loss function. Our research showcases a notable enhancement in CXR image diagnosis and the interpretability of the diagnostic model, surpassing existing state-of-the-art methodologies. The dataset address of disease severity and uncertainty we extracted is: https://physionet.org/content/cad-chest/1.0/.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141910184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-09DOI: 10.1109/TMI.2024.3441012
Jing Xu, Kai Huang, Lianzhen Zhong, Yuan Gao, Kai Sun, Wei Liu, Yanjie Zhou, Wenchao Guo, Yuan Guo, Yuanqiang Zou, Yuping Duan, Le Lu, Yu Wang, Xiang Chen, Shuang Zhao
Diagnosing malignant skin tumors accurately at an early stage can be challenging due to ambiguous and even confusing visual characteristics displayed by various categories of skin tumors. To improve diagnosis precision, all available clinical data from multiple sources, particularly clinical images, dermoscopy images, and medical history, could be considered. Aligning with clinical practice, we propose a novel Transformer model, named Remix-Former++ that consists of a clinical image branch, a dermoscopy image branch, and a metadata branch. Given the unique characteristics inherent in clinical and dermoscopy images, specialized attention strategies are adopted for each type. Clinical images are processed through a top-down architecture, capturing both localized lesion details and global contextual information. Conversely, dermoscopy images undergo a bottom-up processing with two-level hierarchical encoders, designed to pinpoint fine-grained structural and textural features. A dedicated metadata branch seamlessly integrates non-visual information by encoding relevant patient data. Fusing features from three branches substantially boosts disease classification accuracy. RemixFormer++ demonstrates exceptional performance on four single-modality datasets (PAD-UFES-20, ISIC 2017/2018/2019). Compared with the previous best method using a public multi-modal Derm7pt dataset, we achieved an absolute 5.3% increase in averaged F1 and 1.2% in accuracy for the classification of five skin tumors. Furthermore, using a large-scale in-house dataset of 10,351 patients with the twelve most common skin tumors, our method obtained an overall classification accuracy of 92.6%. These promising results, on par or better with the performance of 191 dermatologists through a comprehensive reader study, evidently imply the potential clinical usability of our method.
{"title":"RemixFormer++: A Multi-modal Transformer Model for Precision Skin Tumor Differential Diagnosis with Memory-efficient Attention.","authors":"Jing Xu, Kai Huang, Lianzhen Zhong, Yuan Gao, Kai Sun, Wei Liu, Yanjie Zhou, Wenchao Guo, Yuan Guo, Yuanqiang Zou, Yuping Duan, Le Lu, Yu Wang, Xiang Chen, Shuang Zhao","doi":"10.1109/TMI.2024.3441012","DOIUrl":"10.1109/TMI.2024.3441012","url":null,"abstract":"<p><p>Diagnosing malignant skin tumors accurately at an early stage can be challenging due to ambiguous and even confusing visual characteristics displayed by various categories of skin tumors. To improve diagnosis precision, all available clinical data from multiple sources, particularly clinical images, dermoscopy images, and medical history, could be considered. Aligning with clinical practice, we propose a novel Transformer model, named Remix-Former++ that consists of a clinical image branch, a dermoscopy image branch, and a metadata branch. Given the unique characteristics inherent in clinical and dermoscopy images, specialized attention strategies are adopted for each type. Clinical images are processed through a top-down architecture, capturing both localized lesion details and global contextual information. Conversely, dermoscopy images undergo a bottom-up processing with two-level hierarchical encoders, designed to pinpoint fine-grained structural and textural features. A dedicated metadata branch seamlessly integrates non-visual information by encoding relevant patient data. Fusing features from three branches substantially boosts disease classification accuracy. RemixFormer++ demonstrates exceptional performance on four single-modality datasets (PAD-UFES-20, ISIC 2017/2018/2019). Compared with the previous best method using a public multi-modal Derm7pt dataset, we achieved an absolute 5.3% increase in averaged F1 and 1.2% in accuracy for the classification of five skin tumors. Furthermore, using a large-scale in-house dataset of 10,351 patients with the twelve most common skin tumors, our method obtained an overall classification accuracy of 92.6%. These promising results, on par or better with the performance of 191 dermatologists through a comprehensive reader study, evidently imply the potential clinical usability of our method.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141910185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-08DOI: 10.1109/TMI.2024.3440651
Ruifeng Chen, Zhongliang Zhang, Guotao Quan, Yanfeng Du, Yang Chen, Yinsheng Li
Recently, the use of photon counting detectors in computed tomography (PCCT) has attracted extensive attention. It is highly desired to improve the quality of material basis image and the quantitative accuracy of elemental composition, particularly when PCCT data is acquired at lower radiation dose levels. In this work, we develop a physics-constrained and noise-controlled diffusion model, PRECISION in short, to address the degraded quality of material basis images and inaccurate quantification of elemental composition mainly caused by imperfect noise model and/or hand-crafted regularization of material basis images, such as local smoothness and/or sparsity, leveraged in the existing direct material basis image reconstruction approaches. In stark contrast, PRECISION learns distribution-level regularization to describe the feature of ideal material basis images via training a noise-controlled spatial-spectral diffusion model. The optimal material basis images of each individual subject are sampled from this learned distribution under the constraint of the physical model of a given PCCT and the measured data obtained from the subject. PRECISION exhibits the potential to improve the quality of material basis images and the quantitative accuracy of elemental composition for PCCT.
{"title":"PRECISION: A Physics-Constrained and Noise-Controlled Diffusion Model for Photon Counting Computed Tomography.","authors":"Ruifeng Chen, Zhongliang Zhang, Guotao Quan, Yanfeng Du, Yang Chen, Yinsheng Li","doi":"10.1109/TMI.2024.3440651","DOIUrl":"https://doi.org/10.1109/TMI.2024.3440651","url":null,"abstract":"<p><p>Recently, the use of photon counting detectors in computed tomography (PCCT) has attracted extensive attention. It is highly desired to improve the quality of material basis image and the quantitative accuracy of elemental composition, particularly when PCCT data is acquired at lower radiation dose levels. In this work, we develop a physics-constrained and noise-controlled diffusion model, PRECISION in short, to address the degraded quality of material basis images and inaccurate quantification of elemental composition mainly caused by imperfect noise model and/or hand-crafted regularization of material basis images, such as local smoothness and/or sparsity, leveraged in the existing direct material basis image reconstruction approaches. In stark contrast, PRECISION learns distribution-level regularization to describe the feature of ideal material basis images via training a noise-controlled spatial-spectral diffusion model. The optimal material basis images of each individual subject are sampled from this learned distribution under the constraint of the physical model of a given PCCT and the measured data obtained from the subject. PRECISION exhibits the potential to improve the quality of material basis images and the quantitative accuracy of elemental composition for PCCT.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141908635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}