Pub Date : 2024-07-30DOI: 10.1109/TMI.2024.3435855
Ming Chen, Yijun Bian, Nanguang Chen, Anqi Qiu
The linear mixed-effects model is commonly utilized to interpret longitudinal data, characterizing both the global longitudinal trajectory across all observations and longitudinal trajectories within individuals. However, characterizing these trajectories in high-dimensional longitudinal data presents a challenge. To address this, our study proposes a novel approach, Unsupervised Orthogonal Mixed-Effects Trajectory Modeling (UOMETM), that leverages unsupervised learning to generate latent representations of both global and individual trajectories. We design an autoencoder with a latent space where an orthogonal constraint is imposed to separate the space of global trajectories from individual trajectories. We also devise a cross-reconstruction loss to ensure consistency of global trajectories and enhance the orthogonality between representation spaces. To evaluate UOMETM, we conducted simulation experiments on images to verify that every component functions as intended. Furthermore, we evaluated its performance and robustness using longitudinal brain cortical thickness from two Alzheimer's disease (AD) datasets. Comparative analyses with state-of-the-art methods revealed UOMETM's superiority in identifying global and individual longitudinal patterns, achieving a lower reconstruction error, superior orthogonality, and higher accuracy in AD classification and conversion forecasting. Remarkably, we found that the space of global trajectories did not significantly contribute to AD classification compared to the space of individual trajectories, emphasizing their clear separation. Moreover, our model exhibited satisfactory generalization and robustness across different datasets. The study shows the outstanding performance and potential clinical use of UOMETM in the context of longitudinal data analysis.
线性混合效应模型通常用于解释纵向数据,既能描述所有观测数据的总体纵向轨迹,也能描述个体内部的纵向轨迹。然而,在高维纵向数据中描述这些轨迹是一项挑战。为了解决这个问题,我们的研究提出了一种新方法--无监督正交混合效应轨迹建模(UOMETM),利用无监督学习生成全局和个体轨迹的潜在表征。我们设计了一个具有潜在空间的自动编码器,其中施加了一个正交约束,以分离全局轨迹空间和个体轨迹空间。我们还设计了一种交叉重构损失,以确保全局轨迹的一致性,并增强表示空间之间的正交性。为了评估 UOMETM,我们在图像上进行了模拟实验,以验证每个组件都能发挥预期功能。此外,我们还利用两个阿尔茨海默病(AD)数据集的纵向大脑皮层厚度对其性能和鲁棒性进行了评估。与最先进方法的对比分析表明,UOMETM 在识别全局和个体纵向模式方面更胜一筹,重建误差更低,正交性更好,在阿尔茨海默病分类和转换预测方面的准确性更高。值得注意的是,我们发现与单个轨迹空间相比,全局轨迹空间对 AD 分类的贡献并不明显,这强调了它们之间的明显分离。此外,我们的模型在不同数据集上表现出令人满意的泛化和鲁棒性。这项研究显示了 UOMETM 在纵向数据分析方面的卓越性能和潜在的临床应用。
{"title":"Orthogonal Mixed-Effects Modeling for High-Dimensional Longitudinal Data: An Unsupervised Learning Approach.","authors":"Ming Chen, Yijun Bian, Nanguang Chen, Anqi Qiu","doi":"10.1109/TMI.2024.3435855","DOIUrl":"10.1109/TMI.2024.3435855","url":null,"abstract":"<p><p>The linear mixed-effects model is commonly utilized to interpret longitudinal data, characterizing both the global longitudinal trajectory across all observations and longitudinal trajectories within individuals. However, characterizing these trajectories in high-dimensional longitudinal data presents a challenge. To address this, our study proposes a novel approach, Unsupervised Orthogonal Mixed-Effects Trajectory Modeling (UOMETM), that leverages unsupervised learning to generate latent representations of both global and individual trajectories. We design an autoencoder with a latent space where an orthogonal constraint is imposed to separate the space of global trajectories from individual trajectories. We also devise a cross-reconstruction loss to ensure consistency of global trajectories and enhance the orthogonality between representation spaces. To evaluate UOMETM, we conducted simulation experiments on images to verify that every component functions as intended. Furthermore, we evaluated its performance and robustness using longitudinal brain cortical thickness from two Alzheimer's disease (AD) datasets. Comparative analyses with state-of-the-art methods revealed UOMETM's superiority in identifying global and individual longitudinal patterns, achieving a lower reconstruction error, superior orthogonality, and higher accuracy in AD classification and conversion forecasting. Remarkably, we found that the space of global trajectories did not significantly contribute to AD classification compared to the space of individual trajectories, emphasizing their clear separation. Moreover, our model exhibited satisfactory generalization and robustness across different datasets. The study shows the outstanding performance and potential clinical use of UOMETM in the context of longitudinal data analysis.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141857448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Segmentation of the coronary artery is an important task for the quantitative analysis of coronary computed tomography angiography (CCTA) images and is being stimulated by the field of deep learning. However, the complex structures with tiny and narrow branches of the coronary artery bring it a great challenge. Coupled with the medical image limitations of low resolution and poor contrast, fragmentations of segmented vessels frequently occur in the prediction. Therefore, a geometry-based cascaded segmentation method is proposed for the coronary artery, which has the following innovations: 1) Integrating geometric deformation networks, we design a cascaded network for segmenting the coronary artery and vectorizing results. The generated meshes of the coronary artery are continuous and accurate for twisted and sophisticated coronary artery structures, without fragmentations. 2) Different from mesh annotations generated by the traditional marching cube method from voxel-based labels, a finer vectorized mesh of the coronary artery is reconstructed with the regularized morphology. The novel mesh annotation benefits the geometry-based segmentation network, avoiding bifurcation adhesion and point cloud dispersion in intricate branches. 3) A dataset named CCA-200 is collected, consisting of 200 CCTA images with coronary artery disease. The ground truths of 200 cases are coronary internal diameter annotations by professional radiologists. Extensive experiments verify our method on our collected dataset CCA-200 and public ASOCA dataset, with a Dice of 0.778 on CCA-200 and 0.895 on ASOCA, showing superior results. Especially, our geometry-based model generates an accurate, intact and smooth coronary artery, devoid of any fragmentations of segmented vessels.
{"title":"Segmentation and Vascular Vectorization for Coronary Artery by Geometry-based Cascaded Neural Network.","authors":"Xiaoyu Yang, Lijian Xu, Simon Yu, Qing Xia, Hongsheng Li, Shaoting Zhang","doi":"10.1109/TMI.2024.3435714","DOIUrl":"https://doi.org/10.1109/TMI.2024.3435714","url":null,"abstract":"<p><p>Segmentation of the coronary artery is an important task for the quantitative analysis of coronary computed tomography angiography (CCTA) images and is being stimulated by the field of deep learning. However, the complex structures with tiny and narrow branches of the coronary artery bring it a great challenge. Coupled with the medical image limitations of low resolution and poor contrast, fragmentations of segmented vessels frequently occur in the prediction. Therefore, a geometry-based cascaded segmentation method is proposed for the coronary artery, which has the following innovations: 1) Integrating geometric deformation networks, we design a cascaded network for segmenting the coronary artery and vectorizing results. The generated meshes of the coronary artery are continuous and accurate for twisted and sophisticated coronary artery structures, without fragmentations. 2) Different from mesh annotations generated by the traditional marching cube method from voxel-based labels, a finer vectorized mesh of the coronary artery is reconstructed with the regularized morphology. The novel mesh annotation benefits the geometry-based segmentation network, avoiding bifurcation adhesion and point cloud dispersion in intricate branches. 3) A dataset named CCA-200 is collected, consisting of 200 CCTA images with coronary artery disease. The ground truths of 200 cases are coronary internal diameter annotations by professional radiologists. Extensive experiments verify our method on our collected dataset CCA-200 and public ASOCA dataset, with a Dice of 0.778 on CCA-200 and 0.895 on ASOCA, showing superior results. Especially, our geometry-based model generates an accurate, intact and smooth coronary artery, devoid of any fragmentations of segmented vessels.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141857449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-29DOI: 10.1109/TMI.2024.3435015
Philip Muller, Felix Meissen, Georgios Kaissis, Daniel Rueckert
Weakly supervised object detection (WSup-OD) increases the usefulness and interpretability of image classification algorithms without requiring additional supervision. The successes of multiple instance learning in this task for natural images, however, do not translate well to medical images due to the very different characteristics of their objects (i.e. pathologies). In this work, we propose Weakly Supervised ROI Proposal Networks (WSRPN), a new method for generating bounding box proposals on the fly using a specialized region of interest-attention (ROI-attention) module. WSRPN integrates well with classic backbone-head classification algorithms and is end-to-end trainable with only image-label supervision. We experimentally demonstrate that our new method outperforms existing methods in the challenging task of disease localization in chest X-ray images. Code: https://anonymous.4open.science/r/WSRPN-DCA1.
弱监督对象检测(WSup-OD)无需额外监督即可提高图像分类算法的实用性和可解释性。然而,由于对象(即病理)的特征截然不同,多实例学习在自然图像任务中取得的成功并不能很好地应用于医学图像。在这项工作中,我们提出了弱监督 ROI 建议网络(WSRPN),这是一种利用专门的兴趣区域关注(ROI-attention)模块即时生成边界框建议的新方法。WSRPN 与经典的骨干头分类算法集成良好,只需图像标签监督即可进行端到端的训练。我们通过实验证明,在胸部 X 光图像疾病定位这一具有挑战性的任务中,我们的新方法优于现有方法。代码:https://anonymous.4open.science/r/WSRPN-DCA1。
{"title":"Weakly Supervised Object Detection in Chest X-Rays with Differentiable ROI Proposal Networks and Soft ROI Pooling.","authors":"Philip Muller, Felix Meissen, Georgios Kaissis, Daniel Rueckert","doi":"10.1109/TMI.2024.3435015","DOIUrl":"https://doi.org/10.1109/TMI.2024.3435015","url":null,"abstract":"<p><p>Weakly supervised object detection (WSup-OD) increases the usefulness and interpretability of image classification algorithms without requiring additional supervision. The successes of multiple instance learning in this task for natural images, however, do not translate well to medical images due to the very different characteristics of their objects (i.e. pathologies). In this work, we propose Weakly Supervised ROI Proposal Networks (WSRPN), a new method for generating bounding box proposals on the fly using a specialized region of interest-attention (ROI-attention) module. WSRPN integrates well with classic backbone-head classification algorithms and is end-to-end trainable with only image-label supervision. We experimentally demonstrate that our new method outperforms existing methods in the challenging task of disease localization in chest X-ray images. Code: https://anonymous.4open.science/r/WSRPN-DCA1.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141794322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Automated breast tumor segmentation on the basis of dynamic contrast-enhancement magnetic resonance imaging (DCE-MRI) has shown great promise in clinical practice, particularly for identifying the presence of breast disease. However, accurate segmentation of breast tumor is a challenging task, often necessitating the development of complex networks. To strike an optimal tradeoff between computational costs and segmentation performance, we propose a hybrid network via the combination of convolution neural network (CNN) and transformer layers. Specifically, the hybrid network consists of a encoder-decoder architecture by stacking convolution and deconvolution layers. Effective 3D transformer layers are then implemented after the encoder subnetworks, to capture global dependencies between the bottleneck features. To improve the efficiency of hybrid network, two parallel encoder sub-networks are designed for the decoder and the transformer layers, respectively. To further enhance the discriminative capability of hybrid network, a prototype learning guided prediction module is proposed, where the category-specified prototypical features are calculated through online clustering. All learned prototypical features are finally combined with the features from decoder for tumor mask prediction. The experimental results on private and public DCE-MRI datasets demonstrate that the proposed hybrid network achieves superior performance than the state-of-the-art (SOTA) methods, while maintaining balance between segmentation accuracy and computation cost. Moreover, we demonstrate that automatically generated tumor masks can be effectively applied to identify HER2-positive subtype from HER2-negative subtype with the similar accuracy to the analysis based on manual tumor segmentation. The source code is available at https://github.com/ZhouL-lab/ PLHN.
{"title":"Prototype Learning Guided Hybrid Network for Breast Tumor Segmentation in DCE-MRI.","authors":"Lei Zhou, Yuzhong Zhang, Jiadong Zhang, Xuejun Qian, Chen Gong, Kun Sun, Zhongxiang Ding, Xing Wang, Zhenhui Li, Zaiyi Liu, Dinggang Shen","doi":"10.1109/TMI.2024.3435450","DOIUrl":"10.1109/TMI.2024.3435450","url":null,"abstract":"<p><p>Automated breast tumor segmentation on the basis of dynamic contrast-enhancement magnetic resonance imaging (DCE-MRI) has shown great promise in clinical practice, particularly for identifying the presence of breast disease. However, accurate segmentation of breast tumor is a challenging task, often necessitating the development of complex networks. To strike an optimal tradeoff between computational costs and segmentation performance, we propose a hybrid network via the combination of convolution neural network (CNN) and transformer layers. Specifically, the hybrid network consists of a encoder-decoder architecture by stacking convolution and deconvolution layers. Effective 3D transformer layers are then implemented after the encoder subnetworks, to capture global dependencies between the bottleneck features. To improve the efficiency of hybrid network, two parallel encoder sub-networks are designed for the decoder and the transformer layers, respectively. To further enhance the discriminative capability of hybrid network, a prototype learning guided prediction module is proposed, where the category-specified prototypical features are calculated through online clustering. All learned prototypical features are finally combined with the features from decoder for tumor mask prediction. The experimental results on private and public DCE-MRI datasets demonstrate that the proposed hybrid network achieves superior performance than the state-of-the-art (SOTA) methods, while maintaining balance between segmentation accuracy and computation cost. Moreover, we demonstrate that automatically generated tumor masks can be effectively applied to identify HER2-positive subtype from HER2-negative subtype with the similar accuracy to the analysis based on manual tumor segmentation. The source code is available at https://github.com/ZhouL-lab/ PLHN.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141794321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-23DOI: 10.1109/TMI.2024.3432531
Houliang Zhou, Lifang He, Brian Y Chen, Li Shen, Yu Zhang
The interconnection between brain regions in neurological disease encodes vital information for the advancement of biomarkers and diagnostics. Although graph convolutional networks are widely applied for discovering brain connection patterns that point to disease conditions, the potential of connection patterns that arise from multiple imaging modalities has yet to be fully realized. In this paper, we propose a multi-modal sparse interpretable GCN framework (SGCN) for the detection of Alzheimer's disease (AD) and its prodromal stage, known as mild cognitive impairment (MCI). In our experimentation, SGCN learned the sparse regional importance probability to find signature regions of interest (ROIs), and the connective importance probability to reveal disease-specific brain network connections. We evaluated SGCN on the Alzheimer's Disease Neuroimaging Initiative database with multi-modal brain images and demonstrated that the ROI features learned by SGCN were effective for enhancing AD status identification. The identified abnormalities were significantly correlated with AD-related clinical symptoms. We further interpreted the identified brain dysfunctions at the level of large-scale neural systems and sex-related connectivity abnormalities in AD/MCI. The salient ROIs and the prominent brain connectivity abnormalities interpreted by SGCN are considerably important for developing novel biomarkers. These findings contribute to a better understanding of the network-based disorder via multi-modal diagnosis and offer the potential for precision diagnostics. The source code is available at https://github.com/Houliang-Zhou/SGCN.
{"title":"Multi-Modal Diagnosis of Alzheimer's Disease using Interpretable Graph Convolutional Networks.","authors":"Houliang Zhou, Lifang He, Brian Y Chen, Li Shen, Yu Zhang","doi":"10.1109/TMI.2024.3432531","DOIUrl":"10.1109/TMI.2024.3432531","url":null,"abstract":"<p><p>The interconnection between brain regions in neurological disease encodes vital information for the advancement of biomarkers and diagnostics. Although graph convolutional networks are widely applied for discovering brain connection patterns that point to disease conditions, the potential of connection patterns that arise from multiple imaging modalities has yet to be fully realized. In this paper, we propose a multi-modal sparse interpretable GCN framework (SGCN) for the detection of Alzheimer's disease (AD) and its prodromal stage, known as mild cognitive impairment (MCI). In our experimentation, SGCN learned the sparse regional importance probability to find signature regions of interest (ROIs), and the connective importance probability to reveal disease-specific brain network connections. We evaluated SGCN on the Alzheimer's Disease Neuroimaging Initiative database with multi-modal brain images and demonstrated that the ROI features learned by SGCN were effective for enhancing AD status identification. The identified abnormalities were significantly correlated with AD-related clinical symptoms. We further interpreted the identified brain dysfunctions at the level of large-scale neural systems and sex-related connectivity abnormalities in AD/MCI. The salient ROIs and the prominent brain connectivity abnormalities interpreted by SGCN are considerably important for developing novel biomarkers. These findings contribute to a better understanding of the network-based disorder via multi-modal diagnosis and offer the potential for precision diagnostics. The source code is available at https://github.com/Houliang-Zhou/SGCN.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141753574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-22DOI: 10.1109/TMI.2024.3428836
Haomin Chen, David Dreizin, Catalina Gomez, Anna Zapaishchykova, Mathias Unberath
Pelvic ring disruptions result from blunt injury mechanisms and are potentially lethal mainly due to associated injuries and massive pelvic hemorrhage. The severity of pelvic fractures in trauma victims is frequently assessed by grading the fracture according to the Tile AO/OTA classification in whole-body Computed Tomography (CT) scans. Due to the high volume of whole-body CT scans generated in trauma centers, the overall information content of a single whole-body CT scan and low manual CT reading speed, an automatic approach to Tile classification would provide substantial value, e. g., to prioritize the reading sequence of the trauma radiologists or enable them to focus on other major injuries in multi-trauma patients. In such a high-stakes scenario, an automated method for Tile grading should ideally be transparent such that the symbolic information provided by the method follows the same logic a radiologist or orthopedic surgeon would use to determine the fracture grade. This paper introduces an automated yet interpretable pelvic trauma decision support system to assist radiologists in fracture detection and Tile grading. To achieve interpretability despite processing high-dimensional whole-body CT images, we design a neurosymbolic algorithm that operates similarly to human interpretation of CT scans. The algorithm first detects relevant pelvic fractures on CTs with high specificity using Faster-RCNN. To generate robust fracture detections and associated detection (un)certainties, we perform test-time augmentation of the CT scans to apply fracture detection several times in a self-ensembling approach. The fracture detections are interpreted using a structural causal model based on clinical best practices to infer an initial Tile grade. We apply a Bayesian causal model to recover likely co-occurring fractures that may have been rejected initially due to the highly specific operating point of the detector, resulting in an updated list of detected fractures and corresponding final Tile grade. Our method is transparent in that it provides fracture location and types, as well as information on important counterfactuals that would invalidate the system's recommendation. Our approach achieves an AUC of 0.89/0.74 for translational and rotational instability,which is comparable to radiologist performance. Despite being designed for human-machine teaming, our approach does not compromise on performance compared to previous black-box methods.
{"title":"Interpretable Severity Scoring of Pelvic Trauma Through Automated Fracture Detection and Bayesian Inference.","authors":"Haomin Chen, David Dreizin, Catalina Gomez, Anna Zapaishchykova, Mathias Unberath","doi":"10.1109/TMI.2024.3428836","DOIUrl":"https://doi.org/10.1109/TMI.2024.3428836","url":null,"abstract":"<p><p>Pelvic ring disruptions result from blunt injury mechanisms and are potentially lethal mainly due to associated injuries and massive pelvic hemorrhage. The severity of pelvic fractures in trauma victims is frequently assessed by grading the fracture according to the Tile AO/OTA classification in whole-body Computed Tomography (CT) scans. Due to the high volume of whole-body CT scans generated in trauma centers, the overall information content of a single whole-body CT scan and low manual CT reading speed, an automatic approach to Tile classification would provide substantial value, e. g., to prioritize the reading sequence of the trauma radiologists or enable them to focus on other major injuries in multi-trauma patients. In such a high-stakes scenario, an automated method for Tile grading should ideally be transparent such that the symbolic information provided by the method follows the same logic a radiologist or orthopedic surgeon would use to determine the fracture grade. This paper introduces an automated yet interpretable pelvic trauma decision support system to assist radiologists in fracture detection and Tile grading. To achieve interpretability despite processing high-dimensional whole-body CT images, we design a neurosymbolic algorithm that operates similarly to human interpretation of CT scans. The algorithm first detects relevant pelvic fractures on CTs with high specificity using Faster-RCNN. To generate robust fracture detections and associated detection (un)certainties, we perform test-time augmentation of the CT scans to apply fracture detection several times in a self-ensembling approach. The fracture detections are interpreted using a structural causal model based on clinical best practices to infer an initial Tile grade. We apply a Bayesian causal model to recover likely co-occurring fractures that may have been rejected initially due to the highly specific operating point of the detector, resulting in an updated list of detected fractures and corresponding final Tile grade. Our method is transparent in that it provides fracture location and types, as well as information on important counterfactuals that would invalidate the system's recommendation. Our approach achieves an AUC of 0.89/0.74 for translational and rotational instability,which is comparable to radiologist performance. Despite being designed for human-machine teaming, our approach does not compromise on performance compared to previous black-box methods.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141750114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-22DOI: 10.1109/TMI.2024.3431916
Yiwen Ye, Jianpeng Zhang, Ziyang Chen, Yong Xia
Self-supervised learning (SSL) has long had great success in advancing the field of annotation-efficient learning. However, when applied to CT volume segmentation, most SSL methods suffer from two limitations, including rarely using the information acquired by different imaging modalities and providing supervision only to the bottleneck encoder layer. To address both limitations, we design a pretext task to align the information in each 3D CT volume and the corresponding 2D generated X-ray image and extend self-distillation to deep self-distillation. Thus, we propose a self-supervised learner based on Cross-modal Alignment and Deep Self-distillation (CADS) to improve the encoder's ability to characterize CT volumes. The cross-modal alignment is a more challenging pretext task that forces the encoder to learn better image representation ability. Deep self-distillation provides supervision to not only the bottleneck layer but also shallow layers, thus boosting the abilities of both. Comparative experiments show that, during pre-training, our CADS has lower computational complexity and GPU memory cost than competing SSL methods. Based on the pre-trained encoder, we construct PVT-UNet for 3D CT volume segmentation. Our results on seven downstream tasks indicate that PVT-UNet outperforms state-of-the-art SSL methods like MOCOv3 and DiRA, as well as prevalent medical image segmentation methods like nnUNet and CoTr. Code and pre-trained weight will be available at https://github.com/yeerwen/CADS.
{"title":"CADS: A Self-supervised Learner via Cross-modal Alignment and Deep Self-distillation for CT Volume Segmentation.","authors":"Yiwen Ye, Jianpeng Zhang, Ziyang Chen, Yong Xia","doi":"10.1109/TMI.2024.3431916","DOIUrl":"https://doi.org/10.1109/TMI.2024.3431916","url":null,"abstract":"<p><p>Self-supervised learning (SSL) has long had great success in advancing the field of annotation-efficient learning. However, when applied to CT volume segmentation, most SSL methods suffer from two limitations, including rarely using the information acquired by different imaging modalities and providing supervision only to the bottleneck encoder layer. To address both limitations, we design a pretext task to align the information in each 3D CT volume and the corresponding 2D generated X-ray image and extend self-distillation to deep self-distillation. Thus, we propose a self-supervised learner based on Cross-modal Alignment and Deep Self-distillation (CADS) to improve the encoder's ability to characterize CT volumes. The cross-modal alignment is a more challenging pretext task that forces the encoder to learn better image representation ability. Deep self-distillation provides supervision to not only the bottleneck layer but also shallow layers, thus boosting the abilities of both. Comparative experiments show that, during pre-training, our CADS has lower computational complexity and GPU memory cost than competing SSL methods. Based on the pre-trained encoder, we construct PVT-UNet for 3D CT volume segmentation. Our results on seven downstream tasks indicate that PVT-UNet outperforms state-of-the-art SSL methods like MOCOv3 and DiRA, as well as prevalent medical image segmentation methods like nnUNet and CoTr. Code and pre-trained weight will be available at https://github.com/yeerwen/CADS.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141750102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-22DOI: 10.1109/TMI.2024.3432388
Ruoyou Wu, Cheng Li, Juan Zou, Xinfeng Liu, Hairong Zheng, Shanshan Wang
Heterogeneous data captured by different scanning devices and imaging protocols can affect the generalization performance of the deep learning magnetic resonance (MR) reconstruction model. While a centralized training model is effective in mitigating this problem, it raises concerns about privacy protection. Federated learning is a distributed training paradigm that can utilize multi-institutional data for collaborative training without sharing data. However, existing federated learning MR image reconstruction methods rely on models designed manually by experts, which are complex and computationally expensive, suffering from performance degradation when facing heterogeneous data distributions. In addition, these methods give inadequate consideration to fairness issues, namely ensuring that the model's training does not introduce bias towards any specific dataset's distribution. To this end, this paper proposes a generalizable federated neural architecture search framework for accelerating MR imaging (GAutoMRI). Specifically, automatic neural architecture search is investigated for effective and efficient neural network representation learning of MR images from different centers. Furthermore, we design a fairness adjustment approach that can enable the model to learn features fairly from inconsistent distributions of different devices and centers, and thus facilitate the model to generalize well to the unseen center. Extensive experiments show that our proposed GAutoMRI has better performances and generalization ability compared with seven state-of-the-art federated learning methods. Moreover, the GAutoMRI model is significantly more lightweight, making it an efficient choice for MR image reconstruction tasks. The code will be made available at https://github.com/ternencewu123/GAutoMRI.
{"title":"Generalizable Reconstruction for Accelerating MR Imaging via Federated Learning with Neural Architecture Search.","authors":"Ruoyou Wu, Cheng Li, Juan Zou, Xinfeng Liu, Hairong Zheng, Shanshan Wang","doi":"10.1109/TMI.2024.3432388","DOIUrl":"https://doi.org/10.1109/TMI.2024.3432388","url":null,"abstract":"<p><p>Heterogeneous data captured by different scanning devices and imaging protocols can affect the generalization performance of the deep learning magnetic resonance (MR) reconstruction model. While a centralized training model is effective in mitigating this problem, it raises concerns about privacy protection. Federated learning is a distributed training paradigm that can utilize multi-institutional data for collaborative training without sharing data. However, existing federated learning MR image reconstruction methods rely on models designed manually by experts, which are complex and computationally expensive, suffering from performance degradation when facing heterogeneous data distributions. In addition, these methods give inadequate consideration to fairness issues, namely ensuring that the model's training does not introduce bias towards any specific dataset's distribution. To this end, this paper proposes a generalizable federated neural architecture search framework for accelerating MR imaging (GAutoMRI). Specifically, automatic neural architecture search is investigated for effective and efficient neural network representation learning of MR images from different centers. Furthermore, we design a fairness adjustment approach that can enable the model to learn features fairly from inconsistent distributions of different devices and centers, and thus facilitate the model to generalize well to the unseen center. Extensive experiments show that our proposed GAutoMRI has better performances and generalization ability compared with seven state-of-the-art federated learning methods. Moreover, the GAutoMRI model is significantly more lightweight, making it an efficient choice for MR image reconstruction tasks. The code will be made available at https://github.com/ternencewu123/GAutoMRI.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141750113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Electron microscopy (EM) image denoising is critical for visualization and subsequent analysis. Despite the remarkable achievements of deep learning-based non-blind denoising methods, their performance drops significantly when domain shifts exist between the training and testing data. To address this issue, unpaired blind denoising methods have been proposed. However, these methods heavily rely on image-to-image translation and neglect the inherent characteristics of EM images, limiting their overall denoising performance. In this paper, we propose the first unsupervised domain adaptive EM image denoising method, which is grounded in the observation that EM images from similar samples share common content characteristics. Specifically, we first disentangle the content representations and the noise components from noisy images and establish a shared domain-agnostic content space via domain alignment to bridge the synthetic images (source domain) and the real images (target domain). To ensure precise domain alignment, we further incorporate domain regularization by enforcing that: the pseudo-noisy images, reconstructed using both content representations and noise components, accurately capture the characteristics of the noisy images from which the noise components originate, all while maintaining semantic consistency with the noisy images from which the content representations originate. To guarantee lossless representation decomposition and image reconstruction, we introduce disentanglement-reconstruction invertible networks. Finally, the reconstructed pseudo-noisy images, paired with their corresponding clean counterparts, serve as valuable training data for the denoising network. Extensive experiments on synthetic and real EM datasets demonstrate the superiority of our method in terms of image restoration quality and downstream neuron segmentation accuracy. Our code is publicly available at https://github.com/sydeng99/DADn.
电子显微镜(EM)图像去噪对于可视化和后续分析至关重要。尽管基于深度学习的非盲去噪方法取得了显著成就,但当训练数据和测试数据之间存在域偏移时,这些方法的性能就会大幅下降。为了解决这个问题,有人提出了非配对盲去噪方法。然而,这些方法严重依赖于图像到图像的平移,忽略了电磁图像的固有特征,从而限制了其整体去噪性能。在本文中,我们提出了首个无监督域自适应 EM 图像去噪方法,该方法基于相似样本的 EM 图像具有共同的内容特征这一观察结果。具体来说,我们首先从噪声图像中分离出内容表示和噪声成分,并通过域对齐建立一个共享的域无关内容空间,以连接合成图像(源域)和真实图像(目标域)。为了确保精确的域对齐,我们进一步纳入了域正则化,强制要求:使用内容表征和噪声分量重建的伪噪声图像能准确捕捉噪声分量所来源的噪声图像的特征,同时与内容表征所来源的噪声图像保持语义一致。为了保证无损表示分解和图像重建,我们引入了分解-重建可逆网络。最后,重建的伪噪声图像与相应的干净图像配对,可作为去噪网络的宝贵训练数据。在合成和真实电磁数据集上进行的大量实验证明了我们的方法在图像复原质量和下游神经元分割准确性方面的优越性。我们的代码可通过 https://github.com/sydeng99/DADn 公开获取。
{"title":"Unsupervised Domain Adaptation for EM Image Denoising with Invertible Networks.","authors":"Shiyu Deng, Yinda Chen, Wei Huang, Ruobing Zhang, Zhiwei Xiong","doi":"10.1109/TMI.2024.3431192","DOIUrl":"https://doi.org/10.1109/TMI.2024.3431192","url":null,"abstract":"<p><p>Electron microscopy (EM) image denoising is critical for visualization and subsequent analysis. Despite the remarkable achievements of deep learning-based non-blind denoising methods, their performance drops significantly when domain shifts exist between the training and testing data. To address this issue, unpaired blind denoising methods have been proposed. However, these methods heavily rely on image-to-image translation and neglect the inherent characteristics of EM images, limiting their overall denoising performance. In this paper, we propose the first unsupervised domain adaptive EM image denoising method, which is grounded in the observation that EM images from similar samples share common content characteristics. Specifically, we first disentangle the content representations and the noise components from noisy images and establish a shared domain-agnostic content space via domain alignment to bridge the synthetic images (source domain) and the real images (target domain). To ensure precise domain alignment, we further incorporate domain regularization by enforcing that: the pseudo-noisy images, reconstructed using both content representations and noise components, accurately capture the characteristics of the noisy images from which the noise components originate, all while maintaining semantic consistency with the noisy images from which the content representations originate. To guarantee lossless representation decomposition and image reconstruction, we introduce disentanglement-reconstruction invertible networks. Finally, the reconstructed pseudo-noisy images, paired with their corresponding clean counterparts, serve as valuable training data for the denoising network. Extensive experiments on synthetic and real EM datasets demonstrate the superiority of our method in terms of image restoration quality and downstream neuron segmentation accuracy. Our code is publicly available at https://github.com/sydeng99/DADn.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141728472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Histopathological examinations heavily rely on hematoxylin and eosin (HE) and immunohistochemistry (IHC) staining. IHC staining can offer more accurate diagnostic details but it brings significant financial and time costs. Furthermore, either re-staining HE-stained slides or using adjacent slides for IHC may compromise the accuracy of pathological diagnosis due to information loss. To address these challenges, we develop PST-Diff, a method for generating virtual IHC images from HE images based on diffusion models, which allows pathologists to simultaneously view multiple staining results from the same tissue slide. To maintain the pathological consistency of the stain transfer, we propose the asymmetric attention mechanism (AAM) and latent transfer (LT) module in PST-Diff. Specifically, the AAM can retain more local pathological information of the source domain images, while ensuring the model’s flexibility in generating virtual stained images that highly confirm to the target domain. Subsequently, the LT module transfers the implicit representations across different domains, effectively alleviating the bias introduced by direct connection and further enhancing the pathological consistency of PST-Diff. Furthermore, to maintain the structural consistency of the stain transfer, the conditional frequency guidance (CFG) module is proposed to precisely control image generation and preserve structural details according to the frequency recovery process. To conclude, the pathological and structural consistency constraints provide PST-Diff with effectiveness and superior generalization in generating stable and functionally pathological IHC images with the best evaluation score. In general, PST-Diff offers prospective application in clinical virtual staining and pathological image analysis.
{"title":"PST-Diff: Achieving High-Consistency Stain Transfer by Diffusion Models With Pathological and Structural Constraints","authors":"Yufang He;Zeyu Liu;Mingxin Qi;Shengwei Ding;Peng Zhang;Fan Song;Chenbin Ma;Huijie Wu;Ruxin Cai;Youdan Feng;Haonan Zhang;Tianyi Zhang;Guanglei Zhang","doi":"10.1109/TMI.2024.3430825","DOIUrl":"10.1109/TMI.2024.3430825","url":null,"abstract":"Histopathological examinations heavily rely on hematoxylin and eosin (HE) and immunohistochemistry (IHC) staining. IHC staining can offer more accurate diagnostic details but it brings significant financial and time costs. Furthermore, either re-staining HE-stained slides or using adjacent slides for IHC may compromise the accuracy of pathological diagnosis due to information loss. To address these challenges, we develop PST-Diff, a method for generating virtual IHC images from HE images based on diffusion models, which allows pathologists to simultaneously view multiple staining results from the same tissue slide. To maintain the pathological consistency of the stain transfer, we propose the asymmetric attention mechanism (AAM) and latent transfer (LT) module in PST-Diff. Specifically, the AAM can retain more local pathological information of the source domain images, while ensuring the model’s flexibility in generating virtual stained images that highly confirm to the target domain. Subsequently, the LT module transfers the implicit representations across different domains, effectively alleviating the bias introduced by direct connection and further enhancing the pathological consistency of PST-Diff. Furthermore, to maintain the structural consistency of the stain transfer, the conditional frequency guidance (CFG) module is proposed to precisely control image generation and preserve structural details according to the frequency recovery process. To conclude, the pathological and structural consistency constraints provide PST-Diff with effectiveness and superior generalization in generating stable and functionally pathological IHC images with the best evaluation score. In general, PST-Diff offers prospective application in clinical virtual staining and pathological image analysis.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"43 10","pages":"3634-3647"},"PeriodicalIF":0.0,"publicationDate":"2024-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141725396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}