Pub Date : 2024-08-15DOI: 10.1109/TMI.2024.3442950
Mengting Luo, Nan Zhou, Tao Wang, Linchao He, Wang Wang, Hu Chen, Peixi Liao, Yi Zhang
In recent years, score-based diffusion models have emerged as effective tools for estimating score functions from empirical data distributions, particularly in integrating implicit priors with inverse problems like CT reconstruction. However, score-based diffusion models are rarely explored in challenging tasks such as metal artifact reduction (MAR). In this paper, we introduce the BiConstraints Diffusion Model for Metal Artifact Reduction (BCDMAR), an innovative approach that enhances iterative reconstruction with a conditional diffusion model for MAR. This method employs a metal artifact degradation operator in place of the traditional metal-excluded projection operator in the data-fidelity term, thereby preserving structure details around metal regions. However, scorebased diffusion models tend to be susceptible to grayscale shifts and unreliable structures, making it challenging to reach an optimal solution. To address this, we utilize a precorrected image as a prior constraint, guiding the generation of the score-based diffusion model. By iteratively applying the score-based diffusion model and the data-fidelity step in each sampling iteration, BCDMAR effectively maintains reliable tissue representation around metal regions and produces highly consistent structures in non-metal regions. Through extensive experiments focused on metal artifact reduction tasks, BCDMAR demonstrates superior performance over other state-of-the-art unsupervised and supervised methods, both quantitatively and in terms of visual results.
近年来,基于分数的扩散模型已成为从经验数据分布中估计分数函数的有效工具,特别是在将隐含先验与 CT 重建等逆问题相结合时。然而,基于分数的扩散模型很少在金属伪影减少(MAR)等具有挑战性的任务中得到应用。在本文中,我们介绍了用于减少金属伪影的双约束扩散模型(BiConstraints Diffusion Model for Metal Artifact Reduction,BCDMAR),这是一种用条件扩散模型增强迭代重建的创新方法。该方法在数据保真度项中采用金属伪影降级算子代替传统的金属排除投影算子,从而保留金属区域周围的结构细节。然而,基于分数的扩散模型往往容易受到灰度偏移和不可靠结构的影响,因此要获得最佳解决方案具有挑战性。为了解决这个问题,我们利用预校正图像作为先验约束,指导生成基于分数的扩散模型。通过在每次采样迭代中迭代应用基于分数的扩散模型和数据保真步骤,BCDMAR 能有效保持金属区域周围可靠的组织表示,并在非金属区域生成高度一致的结构。通过大量以减少金属伪影任务为重点的实验,BCDMAR 在定量和视觉效果方面都表现出优于其他最先进的无监督和有监督方法的性能。
{"title":"Bi-Constraints Diffusion: A Conditional Diffusion Model with Degradation Guidance for Metal Artifact Reduction.","authors":"Mengting Luo, Nan Zhou, Tao Wang, Linchao He, Wang Wang, Hu Chen, Peixi Liao, Yi Zhang","doi":"10.1109/TMI.2024.3442950","DOIUrl":"10.1109/TMI.2024.3442950","url":null,"abstract":"<p><p>In recent years, score-based diffusion models have emerged as effective tools for estimating score functions from empirical data distributions, particularly in integrating implicit priors with inverse problems like CT reconstruction. However, score-based diffusion models are rarely explored in challenging tasks such as metal artifact reduction (MAR). In this paper, we introduce the BiConstraints Diffusion Model for Metal Artifact Reduction (BCDMAR), an innovative approach that enhances iterative reconstruction with a conditional diffusion model for MAR. This method employs a metal artifact degradation operator in place of the traditional metal-excluded projection operator in the data-fidelity term, thereby preserving structure details around metal regions. However, scorebased diffusion models tend to be susceptible to grayscale shifts and unreliable structures, making it challenging to reach an optimal solution. To address this, we utilize a precorrected image as a prior constraint, guiding the generation of the score-based diffusion model. By iteratively applying the score-based diffusion model and the data-fidelity step in each sampling iteration, BCDMAR effectively maintains reliable tissue representation around metal regions and produces highly consistent structures in non-metal regions. Through extensive experiments focused on metal artifact reduction tasks, BCDMAR demonstrates superior performance over other state-of-the-art unsupervised and supervised methods, both quantitatively and in terms of visual results.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141989745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-14DOI: 10.1109/TMI.2024.3443262
Ziru Lu, Yizhe Zhang, Yi Zhou, Ye Wu, Tao Zhou
Accurate polyp segmentation plays a critical role from colonoscopy images in the diagnosis and treatment of colorectal cancer. While deep learning-based polyp segmentation models have made significant progress, they often suffer from performance degradation when applied to unseen target domain datasets collected from different imaging devices. To address this challenge, unsupervised domain adaptation (UDA) methods have gained attention by leveraging labeled source data and unlabeled target data to reduce the domain gap. However, existing UDA methods primarily focus on capturing class-wise representations, neglecting domain-wise representations. Additionally, uncertainty in pseudo labels could hinder the segmentation performance. To tackle these issues, we propose a novel Domain-interactive Contrastive Learning and Prototype-guided Self-training (DCL-PS) framework for cross-domain polyp segmentation. Specifically, domaininteractive contrastive learning (DCL) with a domain-mixed prototype updating strategy is proposed to discriminate class-wise feature representations across domains. Then, to enhance the feature extraction ability of the encoder, we present a contrastive learning-based cross-consistency training (CL-CCT) strategy, which is imposed on both the prototypes obtained by the outputs of the main decoder and perturbed auxiliary outputs. Furthermore, we propose a prototype-guided self-training (PS) strategy, which dynamically assigns a weight for each pixel during selftraining, filtering out unreliable pixels and improving the quality of pseudo-labels. Experimental results demonstrate the superiority of DCL-PS in improving polyp segmentation performance in the target domain. The code will be released at https://github.com/taozh2017/DCLPS.
{"title":"Domain-interactive Contrastive Learning and Prototype-guided Self-training for Cross-domain Polyp Segmentation.","authors":"Ziru Lu, Yizhe Zhang, Yi Zhou, Ye Wu, Tao Zhou","doi":"10.1109/TMI.2024.3443262","DOIUrl":"https://doi.org/10.1109/TMI.2024.3443262","url":null,"abstract":"<p><p>Accurate polyp segmentation plays a critical role from colonoscopy images in the diagnosis and treatment of colorectal cancer. While deep learning-based polyp segmentation models have made significant progress, they often suffer from performance degradation when applied to unseen target domain datasets collected from different imaging devices. To address this challenge, unsupervised domain adaptation (UDA) methods have gained attention by leveraging labeled source data and unlabeled target data to reduce the domain gap. However, existing UDA methods primarily focus on capturing class-wise representations, neglecting domain-wise representations. Additionally, uncertainty in pseudo labels could hinder the segmentation performance. To tackle these issues, we propose a novel Domain-interactive Contrastive Learning and Prototype-guided Self-training (DCL-PS) framework for cross-domain polyp segmentation. Specifically, domaininteractive contrastive learning (DCL) with a domain-mixed prototype updating strategy is proposed to discriminate class-wise feature representations across domains. Then, to enhance the feature extraction ability of the encoder, we present a contrastive learning-based cross-consistency training (CL-CCT) strategy, which is imposed on both the prototypes obtained by the outputs of the main decoder and perturbed auxiliary outputs. Furthermore, we propose a prototype-guided self-training (PS) strategy, which dynamically assigns a weight for each pixel during selftraining, filtering out unreliable pixels and improving the quality of pseudo-labels. Experimental results demonstrate the superiority of DCL-PS in improving polyp segmentation performance in the target domain. The code will be released at https://github.com/taozh2017/DCLPS.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141984206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-13DOI: 10.1109/TMI.2024.3443119
Siyuan Yan, Zhen Yu, Chi Liu, Lie Ju, Dwarikanath Mahapatra, Brigid Betz-Stablein, Victoria Mar, Monika Janda, Peter Soyer, Zongyuan Ge
Deep learning models for medical image analysis easily suffer from distribution shifts caused by dataset artifact bias, camera variations, differences in the imaging station, etc., leading to unreliable diagnoses in real-world clinical settings. Domain generalization (DG) methods, which aim to train models on multiple domains to perform well on unseen domains, offer a promising direction to solve the problem. However, existing DG methods assume domain labels of each image are available and accurate, which is typically feasible for only a limited number of medical datasets. To address these challenges, we propose a unified DG framework for medical image classification without relying on domain labels, called Prompt-driven Latent Domain Generalization (PLDG). PLDG consists of unsupervised domain discovery and prompt learning. This framework first discovers pseudo domain labels by clustering the bias-associated style features, then leverages collaborative domain prompts to guide a Vision Transformer to learn knowledge from discovered diverse domains. To facilitate cross-domain knowledge learning between different prompts, we introduce a domain prompt generator that enables knowledge sharing between domain prompts and a shared prompt. A domain mixup strategy is additionally employed for more flexible decision margins and mitigates the risk of incorrect domain assignments. Extensive experiments on three medical image classification tasks and one debiasing task demonstrate that our method can achieve comparable or even superior performance than conventional DG algorithms without relying on domain labels. Our code is publicly available at https://github.com/SiyuanYan1/PLDG/tree/main.
{"title":"Prompt-driven Latent Domain Generalization for Medical Image Classification.","authors":"Siyuan Yan, Zhen Yu, Chi Liu, Lie Ju, Dwarikanath Mahapatra, Brigid Betz-Stablein, Victoria Mar, Monika Janda, Peter Soyer, Zongyuan Ge","doi":"10.1109/TMI.2024.3443119","DOIUrl":"https://doi.org/10.1109/TMI.2024.3443119","url":null,"abstract":"<p><p>Deep learning models for medical image analysis easily suffer from distribution shifts caused by dataset artifact bias, camera variations, differences in the imaging station, etc., leading to unreliable diagnoses in real-world clinical settings. Domain generalization (DG) methods, which aim to train models on multiple domains to perform well on unseen domains, offer a promising direction to solve the problem. However, existing DG methods assume domain labels of each image are available and accurate, which is typically feasible for only a limited number of medical datasets. To address these challenges, we propose a unified DG framework for medical image classification without relying on domain labels, called Prompt-driven Latent Domain Generalization (PLDG). PLDG consists of unsupervised domain discovery and prompt learning. This framework first discovers pseudo domain labels by clustering the bias-associated style features, then leverages collaborative domain prompts to guide a Vision Transformer to learn knowledge from discovered diverse domains. To facilitate cross-domain knowledge learning between different prompts, we introduce a domain prompt generator that enables knowledge sharing between domain prompts and a shared prompt. A domain mixup strategy is additionally employed for more flexible decision margins and mitigates the risk of incorrect domain assignments. Extensive experiments on three medical image classification tasks and one debiasing task demonstrate that our method can achieve comparable or even superior performance than conventional DG algorithms without relying on domain labels. Our code is publicly available at https://github.com/SiyuanYan1/PLDG/tree/main.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141977493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-09DOI: 10.1109/TMI.2024.3441494
Mengliang Zhang, Xinyue Hu, Lin Gu, Liangchen Liu, Kazuma Kobayashi, Tatsuya Harada, Yan Yan, Ronald M Summers, Yingying Zhu
Chest radiography, commonly known as CXR, is frequently utilized in clinical settings to detect cardiopulmonary conditions. However, even seasoned radiologists might offer different evaluations regarding the seriousness and uncertainty associated with observed abnormalities. Previous research has attempted to utilize clinical notes to extract abnormal labels for training deep-learning models in CXR image diagnosis. However, these methods often neglected the varying degrees of severity and uncertainty linked to different labels. In our study, we initially assembled a comprehensive new dataset of CXR images based on clinical textual data, which incorporated radiologists' assessments of uncertainty and severity. Using this dataset, we introduced a multi-relationship graph learning framework that leverages spatial and semantic relationships while addressing expert uncertainty through a dedicated loss function. Our research showcases a notable enhancement in CXR image diagnosis and the interpretability of the diagnostic model, surpassing existing state-of-the-art methodologies. The dataset address of disease severity and uncertainty we extracted is: https://physionet.org/content/cad-chest/1.0/.
{"title":"A New Benchmark: Clinical Uncertainty and Severity Aware Labeled Chest X-Ray Images with Multi-Relationship Graph Learning.","authors":"Mengliang Zhang, Xinyue Hu, Lin Gu, Liangchen Liu, Kazuma Kobayashi, Tatsuya Harada, Yan Yan, Ronald M Summers, Yingying Zhu","doi":"10.1109/TMI.2024.3441494","DOIUrl":"https://doi.org/10.1109/TMI.2024.3441494","url":null,"abstract":"<p><p>Chest radiography, commonly known as CXR, is frequently utilized in clinical settings to detect cardiopulmonary conditions. However, even seasoned radiologists might offer different evaluations regarding the seriousness and uncertainty associated with observed abnormalities. Previous research has attempted to utilize clinical notes to extract abnormal labels for training deep-learning models in CXR image diagnosis. However, these methods often neglected the varying degrees of severity and uncertainty linked to different labels. In our study, we initially assembled a comprehensive new dataset of CXR images based on clinical textual data, which incorporated radiologists' assessments of uncertainty and severity. Using this dataset, we introduced a multi-relationship graph learning framework that leverages spatial and semantic relationships while addressing expert uncertainty through a dedicated loss function. Our research showcases a notable enhancement in CXR image diagnosis and the interpretability of the diagnostic model, surpassing existing state-of-the-art methodologies. The dataset address of disease severity and uncertainty we extracted is: https://physionet.org/content/cad-chest/1.0/.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141910184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-09DOI: 10.1109/TMI.2024.3441012
Jing Xu, Kai Huang, Lianzhen Zhong, Yuan Gao, Kai Sun, Wei Liu, Yanjie Zhou, Wenchao Guo, Yuan Guo, Yuanqiang Zou, Yuping Duan, Le Lu, Yu Wang, Xiang Chen, Shuang Zhao
Diagnosing malignant skin tumors accurately at an early stage can be challenging due to ambiguous and even confusing visual characteristics displayed by various categories of skin tumors. To improve diagnosis precision, all available clinical data from multiple sources, particularly clinical images, dermoscopy images, and medical history, could be considered. Aligning with clinical practice, we propose a novel Transformer model, named Remix-Former++ that consists of a clinical image branch, a dermoscopy image branch, and a metadata branch. Given the unique characteristics inherent in clinical and dermoscopy images, specialized attention strategies are adopted for each type. Clinical images are processed through a top-down architecture, capturing both localized lesion details and global contextual information. Conversely, dermoscopy images undergo a bottom-up processing with two-level hierarchical encoders, designed to pinpoint fine-grained structural and textural features. A dedicated metadata branch seamlessly integrates non-visual information by encoding relevant patient data. Fusing features from three branches substantially boosts disease classification accuracy. RemixFormer++ demonstrates exceptional performance on four single-modality datasets (PAD-UFES-20, ISIC 2017/2018/2019). Compared with the previous best method using a public multi-modal Derm7pt dataset, we achieved an absolute 5.3% increase in averaged F1 and 1.2% in accuracy for the classification of five skin tumors. Furthermore, using a large-scale in-house dataset of 10,351 patients with the twelve most common skin tumors, our method obtained an overall classification accuracy of 92.6%. These promising results, on par or better with the performance of 191 dermatologists through a comprehensive reader study, evidently imply the potential clinical usability of our method.
{"title":"RemixFormer++: A Multi-modal Transformer Model for Precision Skin Tumor Differential Diagnosis with Memory-efficient Attention.","authors":"Jing Xu, Kai Huang, Lianzhen Zhong, Yuan Gao, Kai Sun, Wei Liu, Yanjie Zhou, Wenchao Guo, Yuan Guo, Yuanqiang Zou, Yuping Duan, Le Lu, Yu Wang, Xiang Chen, Shuang Zhao","doi":"10.1109/TMI.2024.3441012","DOIUrl":"10.1109/TMI.2024.3441012","url":null,"abstract":"<p><p>Diagnosing malignant skin tumors accurately at an early stage can be challenging due to ambiguous and even confusing visual characteristics displayed by various categories of skin tumors. To improve diagnosis precision, all available clinical data from multiple sources, particularly clinical images, dermoscopy images, and medical history, could be considered. Aligning with clinical practice, we propose a novel Transformer model, named Remix-Former++ that consists of a clinical image branch, a dermoscopy image branch, and a metadata branch. Given the unique characteristics inherent in clinical and dermoscopy images, specialized attention strategies are adopted for each type. Clinical images are processed through a top-down architecture, capturing both localized lesion details and global contextual information. Conversely, dermoscopy images undergo a bottom-up processing with two-level hierarchical encoders, designed to pinpoint fine-grained structural and textural features. A dedicated metadata branch seamlessly integrates non-visual information by encoding relevant patient data. Fusing features from three branches substantially boosts disease classification accuracy. RemixFormer++ demonstrates exceptional performance on four single-modality datasets (PAD-UFES-20, ISIC 2017/2018/2019). Compared with the previous best method using a public multi-modal Derm7pt dataset, we achieved an absolute 5.3% increase in averaged F1 and 1.2% in accuracy for the classification of five skin tumors. Furthermore, using a large-scale in-house dataset of 10,351 patients with the twelve most common skin tumors, our method obtained an overall classification accuracy of 92.6%. These promising results, on par or better with the performance of 191 dermatologists through a comprehensive reader study, evidently imply the potential clinical usability of our method.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141910185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-08DOI: 10.1109/TMI.2024.3440651
Ruifeng Chen;Zhongliang Zhang;Guotao Quan;Yanfeng Du;Yang Chen;Yinsheng Li
Recently, the use of photon counting detectors in computed tomography (PCCT) has attracted extensive attention. It is highly desired to improve the quality of material basis image and the quantitative accuracy of elemental composition, particularly when PCCT data is acquired at lower radiation dose levels. In this work, we develop a p