Pub Date : 2024-07-22DOI: 10.1109/TMI.2024.3431916
Yiwen Ye, Jianpeng Zhang, Ziyang Chen, Yong Xia
Self-supervised learning (SSL) has long had great success in advancing the field of annotation-efficient learning. However, when applied to CT volume segmentation, most SSL methods suffer from two limitations, including rarely using the information acquired by different imaging modalities and providing supervision only to the bottleneck encoder layer. To address both limitations, we design a pretext task to align the information in each 3D CT volume and the corresponding 2D generated X-ray image and extend self-distillation to deep self-distillation. Thus, we propose a self-supervised learner based on Cross-modal Alignment and Deep Self-distillation (CADS) to improve the encoder's ability to characterize CT volumes. The cross-modal alignment is a more challenging pretext task that forces the encoder to learn better image representation ability. Deep self-distillation provides supervision to not only the bottleneck layer but also shallow layers, thus boosting the abilities of both. Comparative experiments show that, during pre-training, our CADS has lower computational complexity and GPU memory cost than competing SSL methods. Based on the pre-trained encoder, we construct PVT-UNet for 3D CT volume segmentation. Our results on seven downstream tasks indicate that PVT-UNet outperforms state-of-the-art SSL methods like MOCOv3 and DiRA, as well as prevalent medical image segmentation methods like nnUNet and CoTr. Code and pre-trained weight will be available at https://github.com/yeerwen/CADS.
{"title":"CADS: A Self-supervised Learner via Cross-modal Alignment and Deep Self-distillation for CT Volume Segmentation.","authors":"Yiwen Ye, Jianpeng Zhang, Ziyang Chen, Yong Xia","doi":"10.1109/TMI.2024.3431916","DOIUrl":"https://doi.org/10.1109/TMI.2024.3431916","url":null,"abstract":"<p><p>Self-supervised learning (SSL) has long had great success in advancing the field of annotation-efficient learning. However, when applied to CT volume segmentation, most SSL methods suffer from two limitations, including rarely using the information acquired by different imaging modalities and providing supervision only to the bottleneck encoder layer. To address both limitations, we design a pretext task to align the information in each 3D CT volume and the corresponding 2D generated X-ray image and extend self-distillation to deep self-distillation. Thus, we propose a self-supervised learner based on Cross-modal Alignment and Deep Self-distillation (CADS) to improve the encoder's ability to characterize CT volumes. The cross-modal alignment is a more challenging pretext task that forces the encoder to learn better image representation ability. Deep self-distillation provides supervision to not only the bottleneck layer but also shallow layers, thus boosting the abilities of both. Comparative experiments show that, during pre-training, our CADS has lower computational complexity and GPU memory cost than competing SSL methods. Based on the pre-trained encoder, we construct PVT-UNet for 3D CT volume segmentation. Our results on seven downstream tasks indicate that PVT-UNet outperforms state-of-the-art SSL methods like MOCOv3 and DiRA, as well as prevalent medical image segmentation methods like nnUNet and CoTr. Code and pre-trained weight will be available at https://github.com/yeerwen/CADS.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141750102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-22DOI: 10.1109/TMI.2024.3432388
Ruoyou Wu, Cheng Li, Juan Zou, Xinfeng Liu, Hairong Zheng, Shanshan Wang
Heterogeneous data captured by different scanning devices and imaging protocols can affect the generalization performance of the deep learning magnetic resonance (MR) reconstruction model. While a centralized training model is effective in mitigating this problem, it raises concerns about privacy protection. Federated learning is a distributed training paradigm that can utilize multi-institutional data for collaborative training without sharing data. However, existing federated learning MR image reconstruction methods rely on models designed manually by experts, which are complex and computationally expensive, suffering from performance degradation when facing heterogeneous data distributions. In addition, these methods give inadequate consideration to fairness issues, namely ensuring that the model's training does not introduce bias towards any specific dataset's distribution. To this end, this paper proposes a generalizable federated neural architecture search framework for accelerating MR imaging (GAutoMRI). Specifically, automatic neural architecture search is investigated for effective and efficient neural network representation learning of MR images from different centers. Furthermore, we design a fairness adjustment approach that can enable the model to learn features fairly from inconsistent distributions of different devices and centers, and thus facilitate the model to generalize well to the unseen center. Extensive experiments show that our proposed GAutoMRI has better performances and generalization ability compared with seven state-of-the-art federated learning methods. Moreover, the GAutoMRI model is significantly more lightweight, making it an efficient choice for MR image reconstruction tasks. The code will be made available at https://github.com/ternencewu123/GAutoMRI.
{"title":"Generalizable Reconstruction for Accelerating MR Imaging via Federated Learning with Neural Architecture Search.","authors":"Ruoyou Wu, Cheng Li, Juan Zou, Xinfeng Liu, Hairong Zheng, Shanshan Wang","doi":"10.1109/TMI.2024.3432388","DOIUrl":"https://doi.org/10.1109/TMI.2024.3432388","url":null,"abstract":"<p><p>Heterogeneous data captured by different scanning devices and imaging protocols can affect the generalization performance of the deep learning magnetic resonance (MR) reconstruction model. While a centralized training model is effective in mitigating this problem, it raises concerns about privacy protection. Federated learning is a distributed training paradigm that can utilize multi-institutional data for collaborative training without sharing data. However, existing federated learning MR image reconstruction methods rely on models designed manually by experts, which are complex and computationally expensive, suffering from performance degradation when facing heterogeneous data distributions. In addition, these methods give inadequate consideration to fairness issues, namely ensuring that the model's training does not introduce bias towards any specific dataset's distribution. To this end, this paper proposes a generalizable federated neural architecture search framework for accelerating MR imaging (GAutoMRI). Specifically, automatic neural architecture search is investigated for effective and efficient neural network representation learning of MR images from different centers. Furthermore, we design a fairness adjustment approach that can enable the model to learn features fairly from inconsistent distributions of different devices and centers, and thus facilitate the model to generalize well to the unseen center. Extensive experiments show that our proposed GAutoMRI has better performances and generalization ability compared with seven state-of-the-art federated learning methods. Moreover, the GAutoMRI model is significantly more lightweight, making it an efficient choice for MR image reconstruction tasks. The code will be made available at https://github.com/ternencewu123/GAutoMRI.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141750113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Electron microscopy (EM) image denoising is critical for visualization and subsequent analysis. Despite the remarkable achievements of deep learning-based non-blind denoising methods, their performance drops significantly when domain shifts exist between the training and testing data. To address this issue, unpaired blind denoising methods have been proposed. However, these methods heavily rely on image-to-image translation and neglect the inherent characteristics of EM images, limiting their overall denoising performance. In this paper, we propose the first unsupervised domain adaptive EM image denoising method, which is grounded in the observation that EM images from similar samples share common content characteristics. Specifically, we first disentangle the content representations and the noise components from noisy images and establish a shared domain-agnostic content space via domain alignment to bridge the synthetic images (source domain) and the real images (target domain). To ensure precise domain alignment, we further incorporate domain regularization by enforcing that: the pseudo-noisy images, reconstructed using both content representations and noise components, accurately capture the characteristics of the noisy images from which the noise components originate, all while maintaining semantic consistency with the noisy images from which the content representations originate. To guarantee lossless representation decomposition and image reconstruction, we introduce disentanglement-reconstruction invertible networks. Finally, the reconstructed pseudo-noisy images, paired with their corresponding clean counterparts, serve as valuable training data for the denoising network. Extensive experiments on synthetic and real EM datasets demonstrate the superiority of our method in terms of image restoration quality and downstream neuron segmentation accuracy. Our code is publicly available at https://github.com/sydeng99/DADn.
电子显微镜(EM)图像去噪对于可视化和后续分析至关重要。尽管基于深度学习的非盲去噪方法取得了显著成就,但当训练数据和测试数据之间存在域偏移时,这些方法的性能就会大幅下降。为了解决这个问题,有人提出了非配对盲去噪方法。然而,这些方法严重依赖于图像到图像的平移,忽略了电磁图像的固有特征,从而限制了其整体去噪性能。在本文中,我们提出了首个无监督域自适应 EM 图像去噪方法,该方法基于相似样本的 EM 图像具有共同的内容特征这一观察结果。具体来说,我们首先从噪声图像中分离出内容表示和噪声成分,并通过域对齐建立一个共享的域无关内容空间,以连接合成图像(源域)和真实图像(目标域)。为了确保精确的域对齐,我们进一步纳入了域正则化,强制要求:使用内容表征和噪声分量重建的伪噪声图像能准确捕捉噪声分量所来源的噪声图像的特征,同时与内容表征所来源的噪声图像保持语义一致。为了保证无损表示分解和图像重建,我们引入了分解-重建可逆网络。最后,重建的伪噪声图像与相应的干净图像配对,可作为去噪网络的宝贵训练数据。在合成和真实电磁数据集上进行的大量实验证明了我们的方法在图像复原质量和下游神经元分割准确性方面的优越性。我们的代码可通过 https://github.com/sydeng99/DADn 公开获取。
{"title":"Unsupervised Domain Adaptation for EM Image Denoising with Invertible Networks.","authors":"Shiyu Deng, Yinda Chen, Wei Huang, Ruobing Zhang, Zhiwei Xiong","doi":"10.1109/TMI.2024.3431192","DOIUrl":"https://doi.org/10.1109/TMI.2024.3431192","url":null,"abstract":"<p><p>Electron microscopy (EM) image denoising is critical for visualization and subsequent analysis. Despite the remarkable achievements of deep learning-based non-blind denoising methods, their performance drops significantly when domain shifts exist between the training and testing data. To address this issue, unpaired blind denoising methods have been proposed. However, these methods heavily rely on image-to-image translation and neglect the inherent characteristics of EM images, limiting their overall denoising performance. In this paper, we propose the first unsupervised domain adaptive EM image denoising method, which is grounded in the observation that EM images from similar samples share common content characteristics. Specifically, we first disentangle the content representations and the noise components from noisy images and establish a shared domain-agnostic content space via domain alignment to bridge the synthetic images (source domain) and the real images (target domain). To ensure precise domain alignment, we further incorporate domain regularization by enforcing that: the pseudo-noisy images, reconstructed using both content representations and noise components, accurately capture the characteristics of the noisy images from which the noise components originate, all while maintaining semantic consistency with the noisy images from which the content representations originate. To guarantee lossless representation decomposition and image reconstruction, we introduce disentanglement-reconstruction invertible networks. Finally, the reconstructed pseudo-noisy images, paired with their corresponding clean counterparts, serve as valuable training data for the denoising network. Extensive experiments on synthetic and real EM datasets demonstrate the superiority of our method in terms of image restoration quality and downstream neuron segmentation accuracy. Our code is publicly available at https://github.com/sydeng99/DADn.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141728472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Histopathological examinations heavily rely on hematoxylin and eosin (HE) and immunohistochemistry (IHC) staining. IHC staining can offer more accurate diagnostic details but it brings significant financial and time costs. Furthermore, either re-staining HE-stained slides or using adjacent slides for IHC may compromise the accuracy of pathological diagnosis due to information loss. To address these challenges, we develop PST-Diff, a method for generating virtual IHC images from HE images based on diffusion models, which allows pathologists to simultaneously view multiple staining results from the same tissue slide. To maintain the pathological consistency of the stain transfer, we propose the asymmetric attention mechanism (AAM) and latent transfer (LT) module in PST-Diff. Specifically, the AAM can retain more local pathological information of the source domain images through the design of asymmetric attention mechanisms, while ensuring the model's flexibility in generating virtual stained images that highly confirm to the target domain. Subsequently, the LT module transfers the implicit representations across different domains, effectively alleviating the bias introduced by direct connection and further enhancing the pathological consistency of PST-Diff. Furthermore, to maintain the structural consistency of the stain transfer, the conditional frequency guidance (CFG) module is proposed to precisely control image generation and preserve structural details according to the frequency recovery process. To conclude, the pathological and structural consistency constraints provide PST-Diff with effectiveness and superior generalization in generating stable and functionally pathological IHC images with the best evaluation score. In general, PST-Diff offers prospective application in clinical virtual staining and pathological image analysis.
{"title":"PST-Diff: Achieving High-consistency Stain Transfer by Diffusion Models with Pathological and Structural Constraints.","authors":"Yufang He, Zeyu Liu, Mingxin Qi, Shengwei Ding, Peng Zhang, Fan Song, Chenbin Ma, Huijie Wu, Ruxin Cai, Youdan Feng, Haonan Zhang, Tianyi Zhang, Guanglei Zhang","doi":"10.1109/TMI.2024.3430825","DOIUrl":"https://doi.org/10.1109/TMI.2024.3430825","url":null,"abstract":"<p><p>Histopathological examinations heavily rely on hematoxylin and eosin (HE) and immunohistochemistry (IHC) staining. IHC staining can offer more accurate diagnostic details but it brings significant financial and time costs. Furthermore, either re-staining HE-stained slides or using adjacent slides for IHC may compromise the accuracy of pathological diagnosis due to information loss. To address these challenges, we develop PST-Diff, a method for generating virtual IHC images from HE images based on diffusion models, which allows pathologists to simultaneously view multiple staining results from the same tissue slide. To maintain the pathological consistency of the stain transfer, we propose the asymmetric attention mechanism (AAM) and latent transfer (LT) module in PST-Diff. Specifically, the AAM can retain more local pathological information of the source domain images through the design of asymmetric attention mechanisms, while ensuring the model's flexibility in generating virtual stained images that highly confirm to the target domain. Subsequently, the LT module transfers the implicit representations across different domains, effectively alleviating the bias introduced by direct connection and further enhancing the pathological consistency of PST-Diff. Furthermore, to maintain the structural consistency of the stain transfer, the conditional frequency guidance (CFG) module is proposed to precisely control image generation and preserve structural details according to the frequency recovery process. To conclude, the pathological and structural consistency constraints provide PST-Diff with effectiveness and superior generalization in generating stable and functionally pathological IHC images with the best evaluation score. In general, PST-Diff offers prospective application in clinical virtual staining and pathological image analysis.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141725396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Accurate tissue segmentation of infant brain in magnetic resonance (MR) images is crucial for charting early brain development and identifying biomarkers. Due to ongoing myelination and maturation, in the isointense phase (6-9 months of age), the gray and white matters of infant brain exhibit similar intensity levels in MR images, posing significant challenges for tissue segmentation. Meanwhile, in the adult-like phase around 12 months of age, the MR images show high tissue contrast and can be easily segmented. In this paper, we propose to effectively exploit adult-like phase images to achieve robustmulti-view isointense infant brain segmentation. Specifically, in one way, we transfer adult-like phase images to the isointense view, which have similar tissue contrast as the isointense phase images, and use the transferred images to train an isointense-view segmentation network. On the other way, we transfer isointense phase images to the adult-like view, which have enhanced tissue contrast, for training a segmentation network in the adult-like view. The segmentation networks of different views form a multi-path architecture that performs multi-view learning to further boost the segmentation performance. Since anatomy-preserving style transfer is key to the downstream segmentation task, we develop a Disentangled Cycle-consistent Adversarial Network (DCAN) with strong regularization terms to accurately transfer realistic tissue contrast between isointense and adult-like phase images while still maintaining their structural consistency. Experiments on both NDAR and iSeg-2019 datasets demonstrate a significant superior performance of our method over the state-of-the-art methods.
{"title":"Transferring Adult-like Phase Images for Robust Multi-view Isointense Infant Brain Segmentation.","authors":"Huabing Liu, Jiawei Huang, Dengqiang Jia, Qian Wang, Jun Xu, Dinggang Shen","doi":"10.1109/TMI.2024.3430348","DOIUrl":"https://doi.org/10.1109/TMI.2024.3430348","url":null,"abstract":"<p><p>Accurate tissue segmentation of infant brain in magnetic resonance (MR) images is crucial for charting early brain development and identifying biomarkers. Due to ongoing myelination and maturation, in the isointense phase (6-9 months of age), the gray and white matters of infant brain exhibit similar intensity levels in MR images, posing significant challenges for tissue segmentation. Meanwhile, in the adult-like phase around 12 months of age, the MR images show high tissue contrast and can be easily segmented. In this paper, we propose to effectively exploit adult-like phase images to achieve robustmulti-view isointense infant brain segmentation. Specifically, in one way, we transfer adult-like phase images to the isointense view, which have similar tissue contrast as the isointense phase images, and use the transferred images to train an isointense-view segmentation network. On the other way, we transfer isointense phase images to the adult-like view, which have enhanced tissue contrast, for training a segmentation network in the adult-like view. The segmentation networks of different views form a multi-path architecture that performs multi-view learning to further boost the segmentation performance. Since anatomy-preserving style transfer is key to the downstream segmentation task, we develop a Disentangled Cycle-consistent Adversarial Network (DCAN) with strong regularization terms to accurately transfer realistic tissue contrast between isointense and adult-like phase images while still maintaining their structural consistency. Experiments on both NDAR and iSeg-2019 datasets demonstrate a significant superior performance of our method over the state-of-the-art methods.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141725397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In fully supervised learning-based medical image classification, the robustness of a trained model is influenced by its exposure to the range of candidate disease classes. Generalized Zero Shot Learning (GZSL) aims to correctly predict seen and novel unseen classes. Current GZSL approaches have focused mostly on the single-label case. However, it is common for chest X-rays to be labelled with multiple disease classes. We propose a novel multi-modal multi-label GZSL approach that leverages feature disentanglement andmulti-modal information to synthesize features of unseen classes. Disease labels are processed through a pre-trained BioBert model to obtain text embeddings that are used to create a dictionary encoding similarity among different labels. We then use disentangled features and graph aggregation to learn a second dictionary of inter-label similarities. A subsequent clustering step helps to identify representative vectors for each class. The multi-modal multi-label dictionaries and the class representative vectors are used to guide the feature synthesis step, which is the most important component of our pipeline, for generating realistic multi-label disease samples of seen and unseen classes. Our method is benchmarked against multiple competing methods and we outperform all of them based on experiments conducted on the publicly available NIH and CheXpert chest X-ray datasets.
在基于完全监督学习的医学图像分类中,训练好的模型的鲁棒性会受到候选疾病类别范围的影响。广义零点学习(Generalized Zero Shot Learning,GZSL)旨在正确预测已见和未见的新类别。目前的 GZSL 方法主要侧重于单标签情况。然而,胸部 X 光片上标有多种疾病类别的情况很常见。我们提出了一种新颖的多模态多标签 GZSL 方法,该方法利用特征分解和多模态信息来综合未见类别的特征。疾病标签通过预训练的 BioBert 模型进行处理,以获得文本嵌入,用于创建不同标签之间相似性的编码字典。然后,我们使用分解特征和图聚合来学习第二份标签间相似性字典。随后的聚类步骤有助于确定每个类别的代表性向量。多模式多标签字典和类别代表向量用于指导特征合成步骤,这是我们管道中最重要的组成部分,用于生成真实的已见和未见类别的多标签疾病样本。根据在公开的美国国立卫生研究院(NIH)和 CheXpert 胸部 X 光数据集上进行的实验,我们的方法对多个竞争方法进行了基准测试,结果优于所有竞争方法。
{"title":"Multi-Label Generalized Zero Shot Chest Xray Classification By Combining Image-Text Information With Feature Disentanglement.","authors":"Dwarikanath Mahapatra, Antonio Jimeno Yepes, Behzad Bozorgtabar, Sudipta Roy, Zongyuan Ge, Mauricio Reyes","doi":"10.1109/TMI.2024.3429471","DOIUrl":"https://doi.org/10.1109/TMI.2024.3429471","url":null,"abstract":"<p><p>In fully supervised learning-based medical image classification, the robustness of a trained model is influenced by its exposure to the range of candidate disease classes. Generalized Zero Shot Learning (GZSL) aims to correctly predict seen and novel unseen classes. Current GZSL approaches have focused mostly on the single-label case. However, it is common for chest X-rays to be labelled with multiple disease classes. We propose a novel multi-modal multi-label GZSL approach that leverages feature disentanglement andmulti-modal information to synthesize features of unseen classes. Disease labels are processed through a pre-trained BioBert model to obtain text embeddings that are used to create a dictionary encoding similarity among different labels. We then use disentangled features and graph aggregation to learn a second dictionary of inter-label similarities. A subsequent clustering step helps to identify representative vectors for each class. The multi-modal multi-label dictionaries and the class representative vectors are used to guide the feature synthesis step, which is the most important component of our pipeline, for generating realistic multi-label disease samples of seen and unseen classes. Our method is benchmarked against multiple competing methods and we outperform all of them based on experiments conducted on the publicly available NIH and CheXpert chest X-ray datasets.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141636246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-16DOI: 10.1109/TMI.2024.3429148
Chi Wen, Mang Ye, He Li, Ting Chen, Xuan Xiao
Existing deep learning methods have achieved remarkable results in diagnosing retinal diseases, showcasing the potential of advanced AI in ophthalmology. However, the black-box nature of these methods obscures the decision-making process, compromising their trustworthiness and acceptability. Inspired by the concept-based approaches and recognizing the intrinsic correlation between retinal lesions and diseases, we regard retinal lesions as concepts and propose an inherently interpretable framework designed to enhance both the performance and explainability of diagnostic models. Leveraging the transformer architecture, known for its proficiency in capturing long-range dependencies, our model can effectively identify lesion features. By integrating with image-level annotations, it achieves the alignment of lesion concepts with human cognition under the guidance of a retinal foundation model. Furthermore, to attain interpretability without losing lesion-specific information, our method employs a classifier built on a cross-attention mechanism for disease diagnosis and explanation, where explanations are grounded in the contributions of human-understandable lesion concepts and their visual localization. Notably, due to the structure and inherent interpretability of our model, clinicians can implement concept-level interventions to correct the diagnostic errors by simply adjusting erroneous lesion predictions. Experiments conducted on four fundus image datasets demonstrate that our method achieves favorable performance against state-of-the-art methods while providing faithful explanations and enabling conceptlevel interventions. Our code is publicly available at https://github.com/Sorades/CLAT.
{"title":"Concept-based Lesion Aware Transformer for Interpretable Retinal Disease Diagnosis.","authors":"Chi Wen, Mang Ye, He Li, Ting Chen, Xuan Xiao","doi":"10.1109/TMI.2024.3429148","DOIUrl":"https://doi.org/10.1109/TMI.2024.3429148","url":null,"abstract":"<p><p>Existing deep learning methods have achieved remarkable results in diagnosing retinal diseases, showcasing the potential of advanced AI in ophthalmology. However, the black-box nature of these methods obscures the decision-making process, compromising their trustworthiness and acceptability. Inspired by the concept-based approaches and recognizing the intrinsic correlation between retinal lesions and diseases, we regard retinal lesions as concepts and propose an inherently interpretable framework designed to enhance both the performance and explainability of diagnostic models. Leveraging the transformer architecture, known for its proficiency in capturing long-range dependencies, our model can effectively identify lesion features. By integrating with image-level annotations, it achieves the alignment of lesion concepts with human cognition under the guidance of a retinal foundation model. Furthermore, to attain interpretability without losing lesion-specific information, our method employs a classifier built on a cross-attention mechanism for disease diagnosis and explanation, where explanations are grounded in the contributions of human-understandable lesion concepts and their visual localization. Notably, due to the structure and inherent interpretability of our model, clinicians can implement concept-level interventions to correct the diagnostic errors by simply adjusting erroneous lesion predictions. Experiments conducted on four fundus image datasets demonstrate that our method achieves favorable performance against state-of-the-art methods while providing faithful explanations and enabling conceptlevel interventions. Our code is publicly available at https://github.com/Sorades/CLAT.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141629617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-16DOI: 10.1109/TMI.2024.3424978
Jun Li, Tongkun Su, Baoliang Zhao, Faqin Lv, Qiong Wang, Nassir Navab, Ying Hu, Zhongliang Jiang
Automatic report generation has arisen as a significant research area in computer-aided diagnosis, aiming to alleviate the burden on clinicians by generating reports automatically based on medical images. In this work, we propose a novel framework for automatic ultrasound report generation, leveraging a combination of unsupervised and supervised learning methods to aid the report generation process. Our framework incorporates unsupervised learning methods to extract potential knowledge from ultrasound text reports, serving as the prior information to guide the model in aligning visual and textual features, thereby addressing the challenge of feature discrepancy. Additionally, we design a global semantic comparison mechanism to enhance the performance of generating more comprehensive and accurate medical reports. To enable the implementation of ultrasound report generation, we constructed three large-scale ultrasound image-text datasets from different organs for training and validation purposes. Extensive evaluations with other state-of-the-art approaches exhibit its superior performance across all three datasets. Code and dataset are valuable at this link.
{"title":"Ultrasound Report Generation with Cross-Modality Feature Alignment via Unsupervised Guidance.","authors":"Jun Li, Tongkun Su, Baoliang Zhao, Faqin Lv, Qiong Wang, Nassir Navab, Ying Hu, Zhongliang Jiang","doi":"10.1109/TMI.2024.3424978","DOIUrl":"https://doi.org/10.1109/TMI.2024.3424978","url":null,"abstract":"<p><p>Automatic report generation has arisen as a significant research area in computer-aided diagnosis, aiming to alleviate the burden on clinicians by generating reports automatically based on medical images. In this work, we propose a novel framework for automatic ultrasound report generation, leveraging a combination of unsupervised and supervised learning methods to aid the report generation process. Our framework incorporates unsupervised learning methods to extract potential knowledge from ultrasound text reports, serving as the prior information to guide the model in aligning visual and textual features, thereby addressing the challenge of feature discrepancy. Additionally, we design a global semantic comparison mechanism to enhance the performance of generating more comprehensive and accurate medical reports. To enable the implementation of ultrasound report generation, we constructed three large-scale ultrasound image-text datasets from different organs for training and validation purposes. Extensive evaluations with other state-of-the-art approaches exhibit its superior performance across all three datasets. Code and dataset are valuable at this link.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141629619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-16DOI: 10.1109/TMI.2024.3429403
Jiaxuan Liu, Haitao Li, Bolun Zeng, Huixiang Wang, Ron Kikinis, Leo Joskowicz, Xiaojun Chen
Computer-assisted preoperative planning of pelvic fracture reduction surgery has the potential to increase the accuracy of the surgery and to reduce complications. However, the diversity of the pelvic fractures and the disturbance of small fracture fragments present a great challenge to perform reliable automatic preoperative planning. In this paper, we present a comprehensive and automatic preoperative planning pipeline for pelvic fracture surgery. It includes pelvic fracture labeling, reduction planning of the fracture, and customized screw implantation. First, automatic bone fracture labeling is performed based on the separation of the fracture sections. Then, fracture reduction planning is performed based on automatic extraction and pairing of the fracture surfaces. Finally, screw implantation is planned using the adjoint fracture surfaces. The proposed pipeline was tested on different types of pelvic fracture in 14 clinical cases. Our method achieved a translational and rotational accuracy of 2.56 mm and 3.31° in reduction planning. For fixation planning, a clinical acceptance rate of 86.7% was achieved. The results demonstrate the feasibility of the clinical application of our method. Our method has shown accuracy and reliability for complex multi-body bone fractures, which may provide effective clinical preoperative guidance and may improve the accuracy of pelvic fracture reduction surgery.
{"title":"An end-to-end geometry-based pipeline for automatic preoperative surgical planning of pelvic fracture reduction and fixation.","authors":"Jiaxuan Liu, Haitao Li, Bolun Zeng, Huixiang Wang, Ron Kikinis, Leo Joskowicz, Xiaojun Chen","doi":"10.1109/TMI.2024.3429403","DOIUrl":"https://doi.org/10.1109/TMI.2024.3429403","url":null,"abstract":"<p><p>Computer-assisted preoperative planning of pelvic fracture reduction surgery has the potential to increase the accuracy of the surgery and to reduce complications. However, the diversity of the pelvic fractures and the disturbance of small fracture fragments present a great challenge to perform reliable automatic preoperative planning. In this paper, we present a comprehensive and automatic preoperative planning pipeline for pelvic fracture surgery. It includes pelvic fracture labeling, reduction planning of the fracture, and customized screw implantation. First, automatic bone fracture labeling is performed based on the separation of the fracture sections. Then, fracture reduction planning is performed based on automatic extraction and pairing of the fracture surfaces. Finally, screw implantation is planned using the adjoint fracture surfaces. The proposed pipeline was tested on different types of pelvic fracture in 14 clinical cases. Our method achieved a translational and rotational accuracy of 2.56 mm and 3.31° in reduction planning. For fixation planning, a clinical acceptance rate of 86.7% was achieved. The results demonstrate the feasibility of the clinical application of our method. Our method has shown accuracy and reliability for complex multi-body bone fractures, which may provide effective clinical preoperative guidance and may improve the accuracy of pelvic fracture reduction surgery.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141629616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-16DOI: 10.1109/TMI.2024.3424976
Lei Mou, Qifeng Yan, Jinghui Lin, Yifan Zhao, Yonghuai Liu, Shaodong Ma, Jiong Zhang, Wenhao Lv, Tao Zhou, Alejandro F Frangi, Yitian Zhao
Time-of-flight magnetic resonance angiography (TOF-MRA) is the least invasive and ionizing radiation-free approach for cerebrovascular imaging, but variations in imaging artifacts across different clinical centers and imaging vendors result in inter-site and inter-vendor heterogeneity, making its accurate and robust cerebrovascular segmentation challenging. Moreover, the limited availability and quality of annotated data pose further challenges for segmentation methods to generalize well to unseen datasets. In this paper, we construct the largest and most diverse TOF-MRA dataset (COSTA) from 8 individual imaging centers, with all the volumes manually annotated. Then we propose a novel network for cerebrovascular segmentation, namely CESAR, with the ability to tackle feature granularity and image style heterogeneity issues. Specifically, a coarse-to-fine architecture is implemented to refine cerebrovascular segmentation in an iterative manner. An automatic feature selection module is proposed to selectively fuse global long-range dependencies and local contextual information of cerebrovascular structures. A style self-consistency loss is then introduced to explicitly align diverse styles of TOF-MRA images to a standardized one. Extensive experimental results on the COSTA dataset demonstrate the effectiveness of our CESAR network against state-of-the-art methods. We have made 6 subsets of COSTA with the source code online available, in order to promote relevant research in the community.
飞行时间磁共振血管成像(TOF-MRA)是脑血管成像中侵入性最小、无电离辐射的方法,但不同临床中心和成像供应商的成像伪影存在差异,导致了不同临床中心和供应商之间的异质性,使其准确、稳健的脑血管分割具有挑战性。此外,注释数据的可用性和质量有限,也给分割方法在未见过的数据集上良好推广带来了更多挑战。在本文中,我们构建了最大、最多样化的 TOF-MRA 数据集 (COSTA),该数据集来自 8 个独立的成像中心,所有体量均由人工标注。然后,我们提出了一种用于脑血管分割的新型网络,即 CESAR,它能够解决特征粒度和图像风格异质性问题。具体来说,我们采用了一种从粗到细的架构,以迭代的方式完善脑血管分割。提出了一个自动特征选择模块,以有选择性地融合脑血管结构的全局长程依赖性和局部上下文信息。然后引入风格自一致性损失,明确地将不同风格的 TOF-MRA 图像调整为标准化图像。在 COSTA 数据集上的大量实验结果表明,我们的 CESAR 网络与最先进的方法相比非常有效。我们在线提供了 COSTA 的 6 个子集及源代码,以促进社区的相关研究。
{"title":"COSTA: A Multi-center TOF-MRA Dataset and A Style Self-Consistency Network for Cerebrovascular Segmentation.","authors":"Lei Mou, Qifeng Yan, Jinghui Lin, Yifan Zhao, Yonghuai Liu, Shaodong Ma, Jiong Zhang, Wenhao Lv, Tao Zhou, Alejandro F Frangi, Yitian Zhao","doi":"10.1109/TMI.2024.3424976","DOIUrl":"https://doi.org/10.1109/TMI.2024.3424976","url":null,"abstract":"<p><p>Time-of-flight magnetic resonance angiography (TOF-MRA) is the least invasive and ionizing radiation-free approach for cerebrovascular imaging, but variations in imaging artifacts across different clinical centers and imaging vendors result in inter-site and inter-vendor heterogeneity, making its accurate and robust cerebrovascular segmentation challenging. Moreover, the limited availability and quality of annotated data pose further challenges for segmentation methods to generalize well to unseen datasets. In this paper, we construct the largest and most diverse TOF-MRA dataset (COSTA) from 8 individual imaging centers, with all the volumes manually annotated. Then we propose a novel network for cerebrovascular segmentation, namely CESAR, with the ability to tackle feature granularity and image style heterogeneity issues. Specifically, a coarse-to-fine architecture is implemented to refine cerebrovascular segmentation in an iterative manner. An automatic feature selection module is proposed to selectively fuse global long-range dependencies and local contextual information of cerebrovascular structures. A style self-consistency loss is then introduced to explicitly align diverse styles of TOF-MRA images to a standardized one. Extensive experimental results on the COSTA dataset demonstrate the effectiveness of our CESAR network against state-of-the-art methods. We have made 6 subsets of COSTA with the source code online available, in order to promote relevant research in the community.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141629618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}