IEEE transactions on medical imaging最新文献_第3页

Shapley Values-enabled Progressive Pseudo Bag Augmentation for Whole-Slide Image Classification. Shapley Values-enabled Progressive Pseudo Bag Augmentation for Whole-Slide Image Classification.

IEEE transactions on medical imaging

Pub Date : 2024-09-02 DOI: 10.1109/TMI.2024.3453386

Renao Yan, Qiehe Sun, Cheng Jin, Yiqing Liu, Yonghong He, Tian Guan, Hao Chen

In computational pathology, whole-slide image (WSI) classification presents a formidable challenge due to its gigapixel resolution and limited fine-grained annotations. Multiple-instance learning (MIL) offers a weakly supervised solution, yet refining instance-level information from bag-level labels remains challenging. While most of the conventional MIL methods use attention scores to estimate instance importance scores (IIS) which contribute to the prediction of the slide labels, these often lead to skewed attention distributions and inaccuracies in identifying crucial instances. To address these issues, we propose a new approach inspired by cooperative game theory: employing Shapley values to assess each instance's contribution, thereby improving IIS estimation. The computation of the Shapley value is then accelerated using attention, meanwhile retaining the enhanced instance identification and prioritization. We further introduce a framework for the progressive assignment of pseudo bags based on estimated IIS, encouraging more balanced attention distributions in MIL models. Our extensive experiments on CAMELYON-16, BRACS, TCGA-LUNG, and TCGA-BRCA datasets show our method's superiority over existing state-of-the-art approaches, offering enhanced interpretability and class-wise insights. We will release the code upon acceptance.

在计算病理学中，由于千兆像素的分辨率和有限的细粒度注释，整张幻灯片图像（WSI）分类是一项艰巨的挑战。多实例学习（Multiple-instance Learning，MIL）提供了一种弱监督解决方案，但从包级标签中提炼实例级信息仍是一项挑战。虽然大多数传统的多实例学习方法都使用注意力分数来估算有助于预测幻灯片标签的实例重要性分数（IIS），但这些方法往往会导致注意力分布偏斜，无法准确识别关键实例。为了解决这些问题，我们提出了一种受合作博弈论启发的新方法：使用夏普利值来评估每个实例的贡献，从而改进 IIS 估算。然后利用注意力加速夏普利值的计算，同时保留增强的实例识别和优先级排序。我们进一步引入了一个框架，用于根据估计的 IIS 逐步分配伪袋，从而鼓励在 MIL 模型中实现更均衡的注意力分布。我们在 CAMELYON-16、BRACS、TCGA-LUNG 和 TCGA-BRCA 数据集上进行了广泛的实验，结果表明我们的方法优于现有的先进方法，具有更强的可解释性和分类洞察力。我们将在验收合格后发布代码。

{"title":"Shapley Values-enabled Progressive Pseudo Bag Augmentation for Whole-Slide Image Classification.","authors":"Renao Yan, Qiehe Sun, Cheng Jin, Yiqing Liu, Yonghong He, Tian Guan, Hao Chen","doi":"10.1109/TMI.2024.3453386","DOIUrl":"10.1109/TMI.2024.3453386","url":null,"abstract":"In computational pathology, whole-slide image (WSI) classification presents a formidable challenge due to its gigapixel resolution and limited fine-grained annotations. Multiple-instance learning (MIL) offers a weakly supervised solution, yet refining instance-level information from bag-level labels remains challenging. While most of the conventional MIL methods use attention scores to estimate instance importance scores (IIS) which contribute to the prediction of the slide labels, these often lead to skewed attention distributions and inaccuracies in identifying crucial instances. To address these issues, we propose a new approach inspired by cooperative game theory: employing Shapley values to assess each instance's contribution, thereby improving IIS estimation. The computation of the Shapley value is then accelerated using attention, meanwhile retaining the enhanced instance identification and prioritization. We further introduce a framework for the progressive assignment of pseudo bags based on estimated IIS, encouraging more balanced attention distributions in MIL models. Our extensive experiments on CAMELYON-16, BRACS, TCGA-LUNG, and TCGA-BRCA datasets show our method's superiority over existing state-of-the-art approaches, offering enhanced interpretability and class-wise insights. We will release the code upon acceptance.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142121431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

GenSelfDiff-HIS: Generative Self-Supervision Using Diffusion for Histopathological Image Segmentation. GenSelfDiff-HIS：利用扩散进行组织病理图像分割的生成式自我监督

IEEE transactions on medical imaging

Pub Date : 2024-09-02 DOI: 10.1109/TMI.2024.3453492

Vishnuvardhan Purma, Suhas Srinath, Seshan Srirangarajan, Aanchal Kakkar, A P Prathosh

Histopathological image segmentation is a laborious and time-intensive task, often requiring analysis from experienced pathologists for accurate examinations. To reduce this burden, supervised machine-learning approaches have been adopted using large-scale annotated datasets for histopathological image analysis. However, in several scenarios, the availability of large-scale annotated data is a bottleneck while training such models. Self-supervised learning (SSL) is an alternative paradigm that provides some respite by constructing models utilizing only the unannotated data which is often abundant. The basic idea of SSL is to train a network to perform one or many pseudo or pretext tasks on unannotated data and use it subsequently as the basis for a variety of downstream tasks. It is seen that the success of SSL depends critically on the considered pretext task. While there have been many efforts in designing pretext tasks for classification problems, there have not been many attempts on SSL for histopathological image segmentation. Motivated by this, we propose an SSL approach for segmenting histopathological images via generative diffusion models. Our method is based on the observation that diffusion models effectively solve an image-to-image translation task akin to a segmentation task. Hence, we propose generative diffusion as the pretext task for histopathological image segmentation. We also utilize a multi-loss function-based fine-tuning for the downstream task. We validate our method using several metrics on two publicly available datasets along with a newly proposed head and neck (HN) cancer dataset containing Hematoxylin and Eosin (H&E) stained images along with annotations.

组织病理学图像分割是一项费力费时的工作，通常需要经验丰富的病理学家进行分析，才能获得准确的检查结果。为了减轻这一负担，人们采用了有监督的机器学习方法，利用大规模注释数据集进行组织病理学图像分析。然而，在一些情况下，大规模标注数据的可用性成为训练此类模型的瓶颈。自我监督学习（SSL）是一种替代范式，它只利用通常非常丰富的未注释数据构建模型，从而提供了一些喘息机会。自监督学习的基本思想是训练一个网络，让它在未标注的数据上执行一个或多个伪任务或借口任务，然后以此为基础执行各种下游任务。可以看出，SSL 的成功与否关键取决于所考虑的借口任务。虽然人们在为分类问题设计前置任务方面做了很多努力，但在组织病理学图像分割的 SSL 方面还没有很多尝试。受此启发，我们提出了一种通过生成扩散模型分割组织病理学图像的 SSL 方法。我们的方法基于这样一个观察结果，即扩散模型能有效解决类似于分割任务的图像到图像转换任务。因此，我们提出将生成扩散作为组织病理学图像分割的前置任务。我们还利用基于多损失函数的微调来完成下游任务。我们在两个公开可用的数据集和一个新提出的头颈部（HN）癌症数据集上使用多个指标验证了我们的方法，该数据集包含带有注释的苏木精和伊红（H&E）染色图像。

{"title":"GenSelfDiff-HIS: Generative Self-Supervision Using Diffusion for Histopathological Image Segmentation.","authors":"Vishnuvardhan Purma, Suhas Srinath, Seshan Srirangarajan, Aanchal Kakkar, A P Prathosh","doi":"10.1109/TMI.2024.3453492","DOIUrl":"https://doi.org/10.1109/TMI.2024.3453492","url":null,"abstract":"Histopathological image segmentation is a laborious and time-intensive task, often requiring analysis from experienced pathologists for accurate examinations. To reduce this burden, supervised machine-learning approaches have been adopted using large-scale annotated datasets for histopathological image analysis. However, in several scenarios, the availability of large-scale annotated data is a bottleneck while training such models. Self-supervised learning (SSL) is an alternative paradigm that provides some respite by constructing models utilizing only the unannotated data which is often abundant. The basic idea of SSL is to train a network to perform one or many pseudo or pretext tasks on unannotated data and use it subsequently as the basis for a variety of downstream tasks. It is seen that the success of SSL depends critically on the considered pretext task. While there have been many efforts in designing pretext tasks for classification problems, there have not been many attempts on SSL for histopathological image segmentation. Motivated by this, we propose an SSL approach for segmenting histopathological images via generative diffusion models. Our method is based on the observation that diffusion models effectively solve an image-to-image translation task akin to a segmentation task. Hence, we propose generative diffusion as the pretext task for histopathological image segmentation. We also utilize a multi-loss function-based fine-tuning for the downstream task. We validate our method using several metrics on two publicly available datasets along with a newly proposed head and neck (HN) cancer dataset containing Hematoxylin and Eosin (H&E) stained images along with annotations.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142121429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Knowledge-aware Multisite Adaptive Graph Transformer for Brain Disorder Diagnosis. 用于脑部疾病诊断的知识感知多站点自适应图转换器

IEEE transactions on medical imaging

Pub Date : 2024-09-02 DOI: 10.1109/TMI.2024.3453419

Xuegang Song, Kaixiang Shu, Peng Yang, Cheng Zhao, Feng Zhou, Alejandro F Frangi, Xiaohua Xiao, Lei Dong, Tianfu Wang, Shuqiang Wang, Baiying Lei

Brain disorder diagnosis via resting-state functional magnetic resonance imaging (rs-fMRI) is usually limited due to the complex imaging features and sample size. For brain disorder diagnosis, the graph convolutional network (GCN) has achieved remarkable success by capturing interactions between individuals and the population. However, there are mainly three limitations: 1) The previous GCN approaches consider the non-imaging information in edge construction but ignore the sensitivity differences of features to non-imaging information. 2) The previous GCN approaches solely focus on establishing interactions between subjects (i.e., individuals and the population), disregarding the essential relationship between features. 3) Multisite data increase the sample size to help classifier training, but the inter-site heterogeneity limits the performance to some extent. This paper proposes a knowledge-aware multisite adaptive graph Transformer to address the above problems. First, we evaluate the sensitivity of features to each piece of non-imaging information, and then construct feature-sensitive and feature-insensitive subgraphs. Second, after fusing the above subgraphs, we integrate a Transformer module to capture the intrinsic relationship between features. Third, we design a domain adaptive GCN using multiple loss function terms to relieve data heterogeneity and to produce the final classification results. Last, the proposed framework is validated on two brain disorder diagnostic tasks. Experimental results show that the proposed framework can achieve state-of-the-art performance.

由于成像特征复杂、样本量大，通过静息态功能磁共振成像（rs-fMRI）诊断脑部疾病通常受到限制。在脑部疾病诊断方面，图卷积网络（GCN）通过捕捉个体与群体之间的相互作用取得了显著的成功。然而，它主要有三个局限性：1) 以往的 GCN 方法在构建边缘时考虑了非成像信息，但忽略了特征对非成像信息的敏感性差异。2) 以往的 GCN 方法只关注建立主体（即个体和群体）之间的相互作用，忽略了特征之间的本质关系。3) 多站点数据增加了样本量，有助于分类器的训练，但站点间的异质性在一定程度上限制了分类器的性能。本文提出了一种知识感知的多站点自适应图变换器来解决上述问题。首先，我们评估特征对每一条非图像信息的敏感度，然后构建特征敏感子图和特征不敏感子图。其次，在融合上述子图之后，我们集成了一个变换器模块来捕捉特征之间的内在关系。第三，我们设计了一个域自适应 GCN，使用多个损失函数项来缓解数据异质性，并生成最终分类结果。最后，我们在两项脑部疾病诊断任务中验证了所提出的框架。实验结果表明，所提出的框架可以达到最先进的性能。

{"title":"Knowledge-aware Multisite Adaptive Graph Transformer for Brain Disorder Diagnosis.","authors":"Xuegang Song, Kaixiang Shu, Peng Yang, Cheng Zhao, Feng Zhou, Alejandro F Frangi, Xiaohua Xiao, Lei Dong, Tianfu Wang, Shuqiang Wang, Baiying Lei","doi":"10.1109/TMI.2024.3453419","DOIUrl":"https://doi.org/10.1109/TMI.2024.3453419","url":null,"abstract":"Brain disorder diagnosis via resting-state functional magnetic resonance imaging (rs-fMRI) is usually limited due to the complex imaging features and sample size. For brain disorder diagnosis, the graph convolutional network (GCN) has achieved remarkable success by capturing interactions between individuals and the population. However, there are mainly three limitations: 1) The previous GCN approaches consider the non-imaging information in edge construction but ignore the sensitivity differences of features to non-imaging information. 2) The previous GCN approaches solely focus on establishing interactions between subjects (i.e., individuals and the population), disregarding the essential relationship between features. 3) Multisite data increase the sample size to help classifier training, but the inter-site heterogeneity limits the performance to some extent. This paper proposes a knowledge-aware multisite adaptive graph Transformer to address the above problems. First, we evaluate the sensitivity of features to each piece of non-imaging information, and then construct feature-sensitive and feature-insensitive subgraphs. Second, after fusing the above subgraphs, we integrate a Transformer module to capture the intrinsic relationship between features. Third, we design a domain adaptive GCN using multiple loss function terms to relieve data heterogeneity and to produce the final classification results. Last, the proposed framework is validated on two brain disorder diagnostic tasks. Experimental results show that the proposed framework can achieve state-of-the-art performance.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142121430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Spatiotemporal Microstate Dynamics of Spike-free Scalp EEG Offer a Potential Biomarker for Refractory Temporal Lobe Epilepsy. 无尖峰头皮脑电图的时空微状态动力学为难治性颞叶癫痫提供了一种潜在的生物标记。

IEEE transactions on medical imaging

Pub Date : 2024-09-02 DOI: 10.1109/TMI.2024.3453377

Rui Feng, Jingwen Yang, Hao Huang, Zelin Chen, Ruiyan Feng, N U Farrukh Hameed, Xudong Zhang, Jie Hu, Liang Chen, Shuo Lu

Refractory temporal lobe epilepsy (TLE) is one of the most frequently observed subtypes of epilepsy and endangers more than 50 million people world-wide. Although electroencephalogram (EEG) had been widely recognized as a classic tool to screen and diagnose epilepsy, for many years it heavily relied on identifying epileptic discharges and epileptogenic zone localization, which however, limits the understanding of refractory epilepsy due to the network nature of this disease. This work hypothesizes that the microstate dynamics based on resting-state scalp EEG can offer an additional network depiction of the disease and provide potential complementary evaluation tool for the TLE even without detectable epileptic discharges on EEG. We propose a novel framework for EEG microstate spatial-temporal dynamics (EEG-MiSTD) analysis based on machine learning to comprehensively model millisecond-changing whole-brain network dynamics. With only 100 seconds of resting-state EEG even without epileptic discharges, this approach successfully distinguishes TLE patients from healthy controls and is related to the lateralization of epileptic focus. Besides, microstate temporal and spatial features are found to be widely related to clinical parameters, which further demonstrate that TLE is a network disease. A preliminary exploration suggests that the spatial topography is sensitive to the following surgical outcomes. From such a new perspective, our results suggest that spatiotemporal microstate dynamics is potentially a biomarker of the disease. The developed EEG-MiSTD framework can probably be considered as a general tool to examine dynamical brain network disruption in a user-friendly way for other types of epilepsy.

难治性颞叶癫痫（TLE）是最常见的癫痫亚型之一，危害着全球 5000 多万人。尽管脑电图（EEG）已被广泛认为是筛查和诊断癫痫的经典工具，但多年来，它主要依赖于识别癫痫放电和致痫区定位，然而，由于难治性癫痫的网络性质，这限制了对难治性癫痫的理解。这项研究假设，基于静息态头皮脑电图的微状态动力学可以提供疾病的额外网络描述，并为 TLE 提供潜在的补充评估工具，即使脑电图上没有可检测到的癫痫放电。我们提出了一种基于机器学习的脑电图微状态时空动态（EEG-MiSTD）分析新框架，以全面模拟毫秒级变化的全脑网络动态。即使没有癫痫放电，只需100秒的静息状态脑电图，这种方法就能成功地将TLE患者与健康对照组区分开来，并与癫痫灶的侧向性有关。此外，研究还发现微状态的时间和空间特征与临床参数广泛相关，这进一步证明了 TLE 是一种网络性疾病。初步研究表明，空间地形图对后续手术结果很敏感。从这一新的角度来看，我们的研究结果表明，时空微状态动态可能是该疾病的一种生物标志物。所开发的脑电图-微状态框架或许可被视为一种通用工具，以用户友好的方式检查其他类型癫痫的动态脑网络破坏。

{"title":"Spatiotemporal Microstate Dynamics of Spike-free Scalp EEG Offer a Potential Biomarker for Refractory Temporal Lobe Epilepsy.","authors":"Rui Feng, Jingwen Yang, Hao Huang, Zelin Chen, Ruiyan Feng, N U Farrukh Hameed, Xudong Zhang, Jie Hu, Liang Chen, Shuo Lu","doi":"10.1109/TMI.2024.3453377","DOIUrl":"https://doi.org/10.1109/TMI.2024.3453377","url":null,"abstract":"Refractory temporal lobe epilepsy (TLE) is one of the most frequently observed subtypes of epilepsy and endangers more than 50 million people world-wide. Although electroencephalogram (EEG) had been widely recognized as a classic tool to screen and diagnose epilepsy, for many years it heavily relied on identifying epileptic discharges and epileptogenic zone localization, which however, limits the understanding of refractory epilepsy due to the network nature of this disease. This work hypothesizes that the microstate dynamics based on resting-state scalp EEG can offer an additional network depiction of the disease and provide potential complementary evaluation tool for the TLE even without detectable epileptic discharges on EEG. We propose a novel framework for EEG microstate spatial-temporal dynamics (EEG-MiSTD) analysis based on machine learning to comprehensively model millisecond-changing whole-brain network dynamics. With only 100 seconds of resting-state EEG even without epileptic discharges, this approach successfully distinguishes TLE patients from healthy controls and is related to the lateralization of epileptic focus. Besides, microstate temporal and spatial features are found to be widely related to clinical parameters, which further demonstrate that TLE is a network disease. A preliminary exploration suggests that the spatial topography is sensitive to the following surgical outcomes. From such a new perspective, our results suggest that spatiotemporal microstate dynamics is potentially a biomarker of the disease. The developed EEG-MiSTD framework can probably be considered as a general tool to examine dynamical brain network disruption in a user-friendly way for other types of epilepsy.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142121432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multi-Modal Federated Learning for Cancer Staging over Non-IID Datasets with Unbalanced Modalities. 在不平衡模式的非 IID 数据集上进行癌症分期的多模式联合学习

IEEE transactions on medical imaging

Pub Date : 2024-08-28 DOI: 10.1109/TMI.2024.3450855

Kasra Borazjani, Naji Khosravan, Leslie Ying, Seyyedali Hosseinalipour

The use of machine learning (ML) for cancer staging through medical image analysis has gained substantial interest across medical disciplines. When accompanied by the innovative federated learning (FL) framework, ML techniques can further overcome privacy concerns related to patient data exposure. Given the frequent presence of diverse data modalities within patient records, leveraging FL in a multi-modal learning framework holds considerable promise for cancer staging. However, existing works on multi-modal FL often presume that all data-collecting institutions have access to all data modalities. This oversimplified approach neglects institutions that have access to only a portion of data modalities within the system. In this work, we introduce a novel FL architecture designed to accommodate not only the heterogeneity of data samples, but also the inherent heterogeneity/non-uniformity of data modalities across institutions. We shed light on the challenges associated with varying convergence speeds observed across different data modalities within our FL system. Subsequently, we propose a solution to tackle these challenges by devising a distributed gradient blending and proximity-aware client weighting strategy tailored for multi-modal FL. To show the superiority of our method, we conduct experiments using The Cancer Genome Atlas program (TCGA) datalake considering different cancer types and three modalities of data: mRNA sequences, histopathological image data, and clinical information. Our results further unveil the impact and severity of class-based vs type-based heterogeneity across institutions on the model performance, which widens the perspective to the notion of data heterogeneity in multi-modal FL literature.

通过医学影像分析将机器学习（ML）用于癌症分期的做法在医学各学科中引起了广泛关注。如果辅以创新的联合学习（FL）框架，机器学习技术就能进一步克服与患者数据暴露相关的隐私问题。鉴于患者记录中经常出现不同的数据模式，在多模式学习框架中利用 FL 对癌症分期具有相当大的前景。然而，现有的多模态 FL 工作通常假定所有数据收集机构都能访问所有数据模态。这种过于简化的方法忽略了系统中只能访问部分数据模式的机构。在这项工作中，我们介绍了一种新颖的 FL 架构，其设计不仅考虑到了数据样本的异质性，还考虑到了各机构数据模式的固有异质性/不均匀性。我们阐明了在我们的 FL 系统中，不同数据模式的收敛速度不同所带来的挑战。随后，我们提出了应对这些挑战的解决方案，即为多模 FL 量身定制分布式梯度混合和近距离感知客户端加权策略。为了证明我们的方法的优越性，我们使用癌症基因组图谱计划（TCGA）数据集进行了实验，考虑了不同的癌症类型和三种数据模式：mRNA 序列、组织病理学图像数据和临床信息。我们的结果进一步揭示了不同机构间基于类别与基于类型的异质性对模型性能的影响和严重程度，从而拓宽了多模态 FL 文献中数据异质性概念的视野。

{"title":"Multi-Modal Federated Learning for Cancer Staging over Non-IID Datasets with Unbalanced Modalities.","authors":"Kasra Borazjani, Naji Khosravan, Leslie Ying, Seyyedali Hosseinalipour","doi":"10.1109/TMI.2024.3450855","DOIUrl":"https://doi.org/10.1109/TMI.2024.3450855","url":null,"abstract":"The use of machine learning (ML) for cancer staging through medical image analysis has gained substantial interest across medical disciplines. When accompanied by the innovative federated learning (FL) framework, ML techniques can further overcome privacy concerns related to patient data exposure. Given the frequent presence of diverse data modalities within patient records, leveraging FL in a multi-modal learning framework holds considerable promise for cancer staging. However, existing works on multi-modal FL often presume that all data-collecting institutions have access to all data modalities. This oversimplified approach neglects institutions that have access to only a portion of data modalities within the system. In this work, we introduce a novel FL architecture designed to accommodate not only the heterogeneity of data samples, but also the inherent heterogeneity/non-uniformity of data modalities across institutions. We shed light on the challenges associated with varying convergence speeds observed across different data modalities within our FL system. Subsequently, we propose a solution to tackle these challenges by devising a distributed gradient blending and proximity-aware client weighting strategy tailored for multi-modal FL. To show the superiority of our method, we conduct experiments using The Cancer Genome Atlas program (TCGA) datalake considering different cancer types and three modalities of data: mRNA sequences, histopathological image data, and clinical information. Our results further unveil the impact and severity of class-based vs type-based heterogeneity across institutions on the model performance, which widens the perspective to the notion of data heterogeneity in multi-modal FL literature.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142086428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Emulating Low-Dose PCCT Image Pairs with Independent Noise for Self-Supervised Spectral Image Denoising. 利用独立噪声模拟低剂量 PCCT 图像对，实现自监督光谱图像去噪。

IEEE transactions on medical imaging

Pub Date : 2024-08-28 DOI: 10.1109/TMI.2024.3449817

Sen Wang, Yirong Yang, Grant M Stevens, Zhye Yin, Adam S Wang

Photon counting CT (PCCT) acquires spectral measurements and enables generation of material decomposition (MD) images that provide distinct advantages in various clinical situations. However, noise amplification is observed in MD images, and denoising is typically applied. Clean or high-quality references are rare in clinical scans, often making supervised learning (Noise2Clean) impractical. Noise2Noise is a self-supervised counterpart, using noisy images and corresponding noisy references with zero-mean, independent noise. PCCT counts transmitted photons separately, and raw measurements are assumed to follow a Poisson distribution in each energy bin, providing the possibility to create noise-independent pairs. The approach is to use binomial selection to split the counts into two low-dose scans with independent noise. We prove that the reconstructed spectral images inherit the noise independence from counts domain through noise propagation analysis and also validated it in numerical simulation and experimental phantom scans. The method offers the flexibility to split measurements into desired dose levels while ensuring the reconstructed images share identical underlying features, thereby strengthening the model's robustness for input dose levels and capability of preserving fine details. In both numerical simulation and experimental phantom scans, we demonstrated that Noise2Noise with binomial selection outperforms other common self-supervised learning methods based on different presumptive conditions.

光子计数 CT（PCCT）可获取光谱测量数据并生成物质分解（MD）图像，在各种临床情况下具有明显的优势。然而，MD 图像中会出现噪声放大现象，通常需要进行去噪处理。临床扫描中很少有干净或高质量的参考图像，这往往使得监督学习（Noise2Clean）变得不切实际。Noise2Noise 是一种自我监督的对应方法，使用的是噪声图像和相应的零均值、独立噪声参考。PCCT 对传输的光子进行单独计数，并假定原始测量值在每个能量分区中遵循泊松分布，从而为创建与噪声无关的数据对提供了可能。我们的方法是使用二项式选择，将计数分成两个具有独立噪声的低剂量扫描。我们通过噪声传播分析证明，重建的光谱图像继承了计数域的噪声独立性，并在数值模拟和实验幻影扫描中进行了验证。该方法可灵活地将测量结果分成所需的剂量水平，同时确保重建图像具有相同的基本特征，从而增强了模型对输入剂量水平的鲁棒性和保留精细细节的能力。在数值模拟和实验幻影扫描中，我们都证明了采用二叉选择的 Noise2Noise 优于其他基于不同推定条件的常见自监督学习方法。

{"title":"Emulating Low-Dose PCCT Image Pairs with Independent Noise for Self-Supervised Spectral Image Denoising.","authors":"Sen Wang, Yirong Yang, Grant M Stevens, Zhye Yin, Adam S Wang","doi":"10.1109/TMI.2024.3449817","DOIUrl":"https://doi.org/10.1109/TMI.2024.3449817","url":null,"abstract":"Photon counting CT (PCCT) acquires spectral measurements and enables generation of material decomposition (MD) images that provide distinct advantages in various clinical situations. However, noise amplification is observed in MD images, and denoising is typically applied. Clean or high-quality references are rare in clinical scans, often making supervised learning (Noise2Clean) impractical. Noise2Noise is a self-supervised counterpart, using noisy images and corresponding noisy references with zero-mean, independent noise. PCCT counts transmitted photons separately, and raw measurements are assumed to follow a Poisson distribution in each energy bin, providing the possibility to create noise-independent pairs. The approach is to use binomial selection to split the counts into two low-dose scans with independent noise. We prove that the reconstructed spectral images inherit the noise independence from counts domain through noise propagation analysis and also validated it in numerical simulation and experimental phantom scans. The method offers the flexibility to split measurements into desired dose levels while ensuring the reconstructed images share identical underlying features, thereby strengthening the model's robustness for input dose levels and capability of preserving fine details. In both numerical simulation and experimental phantom scans, we demonstrated that Noise2Noise with binomial selection outperforms other common self-supervised learning methods based on different presumptive conditions.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142086427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

IMITATE: Clinical Prior Guided Hierarchical Vision-Language Pre-training. IMITATE：临床先导分层视觉语言预培训。

IEEE transactions on medical imaging

Pub Date : 2024-08-26 DOI: 10.1109/TMI.2024.3449690

Che Liu, Sibo Cheng, Miaojing Shi, Anand Shah, Wenjia Bai, Rossella Arcucci

In the field of medical Vision-Language Pretraining (VLP), significant efforts have been devoted to deriving text and image features from both clinical reports and associated medical images. However, most existing methods may have overlooked the opportunity in leveraging the inherent hierarchical structure of clinical reports, which are generally split into 'findings' for descriptive content and 'impressions' for conclusive observation. Instead of utilizing this rich, structured format, current medical VLP approaches often simplify the report into either a unified entity or fragmented tokens. In this work, we propose a novel clinical prior guided VLP framework named IMITATE to learn the structure information from medical reports with hierarchical vision-language alignment. The framework derives multi-level visual features from the chest X-ray (CXR) images and separately aligns these features with the descriptive and the conclusive text encoded in the hierarchical medical report. Furthermore, a new clinical-informed contrastive loss is introduced for cross-modal learning, which accounts for clinical prior knowledge in formulating sample correlations in contrastive learning. The proposed model, IMITATE, outperforms baseline VLP methods across six different datasets, spanning five medical imaging downstream tasks. Comprehensive experimental results highlight the advantages of integrating the hierarchical structure of medical reports for vision-language alignment.

在医学视觉语言预训练（VLP）领域，人们一直致力于从临床报告和相关医学图像中获取文本和图像特征。然而，大多数现有方法可能忽略了利用临床报告固有的层次结构的机会，临床报告一般分为描述性内容的 "发现 "和结论性观察的 "印象"。当前的医学 VLP 方法往往没有利用这种丰富的结构化格式，而是将报告简化为统一的实体或零散的标记。在这项工作中，我们提出了一种名为 "IMITATE "的新型临床先验指导 VLP 框架，通过分层视觉语言对齐从医疗报告中学习结构信息。该框架从胸部 X 光（CXR）图像中提取多层次视觉特征，并分别将这些特征与分层医疗报告中编码的描述性和结论性文本进行对齐。此外，还为跨模态学习引入了一种新的临床信息对比损失（contrast-informed loss），它在对比学习中考虑到了制定样本相关性时的临床先验知识。在横跨五个医学影像下游任务的六个不同数据集上，所提出的模型 IMITATE 优于基准 VLP 方法。全面的实验结果凸显了将医学报告的层次结构整合到视觉语言配准中的优势。

{"title":"IMITATE: Clinical Prior Guided Hierarchical Vision-Language Pre-training.","authors":"Che Liu, Sibo Cheng, Miaojing Shi, Anand Shah, Wenjia Bai, Rossella Arcucci","doi":"10.1109/TMI.2024.3449690","DOIUrl":"https://doi.org/10.1109/TMI.2024.3449690","url":null,"abstract":"In the field of medical Vision-Language Pretraining (VLP), significant efforts have been devoted to deriving text and image features from both clinical reports and associated medical images. However, most existing methods may have overlooked the opportunity in leveraging the inherent hierarchical structure of clinical reports, which are generally split into 'findings' for descriptive content and 'impressions' for conclusive observation. Instead of utilizing this rich, structured format, current medical VLP approaches often simplify the report into either a unified entity or fragmented tokens. In this work, we propose a novel clinical prior guided VLP framework named IMITATE to learn the structure information from medical reports with hierarchical vision-language alignment. The framework derives multi-level visual features from the chest X-ray (CXR) images and separately aligns these features with the descriptive and the conclusive text encoded in the hierarchical medical report. Furthermore, a new clinical-informed contrastive loss is introduced for cross-modal learning, which accounts for clinical prior knowledge in formulating sample correlations in contrastive learning. The proposed model, IMITATE, outperforms baseline VLP methods across six different datasets, spanning five medical imaging downstream tasks. Comprehensive experimental results highlight the advantages of integrating the hierarchical structure of medical reports for vision-language alignment.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142074850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Generative Adversarial Network with Robust Discriminator Through Multi-Task Learning for Low-Dose CT Denoising. 通过多任务学习为低剂量 CT 去噪提供具有鲁棒判别器的生成对抗网络

IEEE transactions on medical imaging

Pub Date : 2024-08-26 DOI: 10.1109/TMI.2024.3449647

Sunggu Kyung, Jongjun Won, Seongyong Pak, Sunwoo Kim, Sangyoon Lee, Kanggil Park, Gil-Sun Hong, Namkug Kim

Reducing the dose of radiation in computed tomography (CT) is vital to decreasing secondary cancer risk. However, the use of low-dose CT (LDCT) images is accompanied by increased noise that can negatively impact diagnoses. Although numerous deep learning algorithms have been developed for LDCT denoising, several challenges persist, including the visual incongruence experienced by radiologists, unsatisfactory performances across various metrics, and insufficient exploration of the networks' robustness in other CT domains. To address such issues, this study proposes three novel accretions. First, we propose a generative adversarial network (GAN) with a robust discriminator through multi-task learning that simultaneously performs three vision tasks: restoration, image-level, and pixel-level decisions. The more multi-tasks that are performed, the better the denoising performance of the generator, which means multi-task learning enables the discriminator to provide more meaningful feedback to the generator. Second, two regulatory mechanisms, restoration consistency (RC) and non-difference suppression (NDS), are introduced to improve the discriminator's representation capabilities. These mechanisms eliminate irrelevant regions and compare the discriminator's results from the input and restoration, thus facilitating effective GAN training. Lastly, we incorporate residual fast Fourier transforms with convolution (Res-FFT-Conv) blocks into the generator to utilize both frequency and spatial representations. This approach provides mixed receptive fields by using spatial (or local), spectral (or global), and residual connections. Our model was evaluated using various pixel- and feature-space metrics in two denoising tasks. Additionally, we conducted visual scoring with radiologists. The results indicate superior performance in both quantitative and qualitative measures compared to state-of-the-art denoising techniques.

减少计算机断层扫描（CT）的辐射剂量对于降低继发性癌症风险至关重要。然而，低剂量 CT（LDCT）图像的使用伴随着噪声的增加，会对诊断产生负面影响。虽然针对 LDCT 去噪已经开发出了许多深度学习算法，但仍存在一些挑战，包括放射科医生体验到的视觉不协调、各种指标的表现不尽如人意，以及对网络在其他 CT 领域的鲁棒性探索不足。为了解决这些问题，本研究提出了三个新的增量。首先，我们提出了一种生成式对抗网络（GAN），该网络通过多任务学习具有鲁棒性判别器，可同时执行三项视觉任务：还原、图像级和像素级决策。执行的多任务越多，生成器的去噪性能就越好，这意味着多任务学习能让判别器为生成器提供更有意义的反馈。其次，为了提高鉴别器的表征能力，引入了两种调节机制，即恢复一致性（RC）和无差异抑制（NDS）。这些机制可以消除无关区域，并比较鉴别器从输入和恢复中得到的结果，从而促进有效的 GAN 训练。最后，我们将残差快速傅立叶变换与卷积（Res-FFT-Conv）块纳入生成器，以利用频率和空间表示。这种方法通过使用空间（或局部）、频谱（或全局）和残差连接来提供混合感受野。我们在两项去噪任务中使用各种像素和特征空间指标对我们的模型进行了评估。此外，我们还与放射科医生进行了视觉评分。结果表明，与最先进的去噪技术相比，我们的模型在定量和定性测量方面都表现出色。

{"title":"Generative Adversarial Network with Robust Discriminator Through Multi-Task Learning for Low-Dose CT Denoising.","authors":"Sunggu Kyung, Jongjun Won, Seongyong Pak, Sunwoo Kim, Sangyoon Lee, Kanggil Park, Gil-Sun Hong, Namkug Kim","doi":"10.1109/TMI.2024.3449647","DOIUrl":"https://doi.org/10.1109/TMI.2024.3449647","url":null,"abstract":"Reducing the dose of radiation in computed tomography (CT) is vital to decreasing secondary cancer risk. However, the use of low-dose CT (LDCT) images is accompanied by increased noise that can negatively impact diagnoses. Although numerous deep learning algorithms have been developed for LDCT denoising, several challenges persist, including the visual incongruence experienced by radiologists, unsatisfactory performances across various metrics, and insufficient exploration of the networks' robustness in other CT domains. To address such issues, this study proposes three novel accretions. First, we propose a generative adversarial network (GAN) with a robust discriminator through multi-task learning that simultaneously performs three vision tasks: restoration, image-level, and pixel-level decisions. The more multi-tasks that are performed, the better the denoising performance of the generator, which means multi-task learning enables the discriminator to provide more meaningful feedback to the generator. Second, two regulatory mechanisms, restoration consistency (RC) and non-difference suppression (NDS), are introduced to improve the discriminator's representation capabilities. These mechanisms eliminate irrelevant regions and compare the discriminator's results from the input and restoration, thus facilitating effective GAN training. Lastly, we incorporate residual fast Fourier transforms with convolution (Res-FFT-Conv) blocks into the generator to utilize both frequency and spatial representations. This approach provides mixed receptive fields by using spatial (or local), spectral (or global), and residual connections. Our model was evaluated using various pixel- and feature-space metrics in two denoising tasks. Additionally, we conducted visual scoring with radiologists. The results indicate superior performance in both quantitative and qualitative measures compared to state-of-the-art denoising techniques.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142074849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

BCNet: Bronchus Classification via Structure Guided Representation Learning. BCNet：通过结构引导表征学习进行支气管分类

IEEE transactions on medical imaging

Pub Date : 2024-08-23 DOI: 10.1109/TMI.2024.3448468

Wenhao Huang, Haifan Gong, Huan Zhang, Yu Wang, Xiang Wan, Guanbin Li, Haofeng Li, Hong Shen

CT-based bronchial tree analysis is a key step for the diagnosis of lung and airway diseases. However, the topology of bronchial trees varies across individuals, which presents a challenge to the automatic bronchus classification. To solve this issue, we propose the Bronchus Classification Network (BCNet), a structure-guided framework that exploits the segment-level topological information using point clouds to learn the voxel-level features. BCNet has two branches, a Point-Voxel Graph Neural Network (PV-GNN) for segment classification, and a Convolutional Neural Network (CNN) for voxel labeling. The two branches are simultaneously trained to learn topology-aware features for their shared backbone while it is feasible to run only the CNN branch for the inference. Therefore, BCNet maintains the same inference efficiency as its CNN baseline. Experimental results show that BCNet significantly exceeds the state-of-the-art methods by over 8.0% both on F1-score for classifying bronchus. Furthermore, we contribute BronAtlas: an open-access benchmark of bronchus imaging analysis with high-quality voxel-wise annotations of both anatomical and abnormal bronchial segments. The benchmark is available at link¹.

基于 CT 的支气管树分析是诊断肺部和气道疾病的关键步骤。然而，支气管树的拓扑结构因人而异，这给支气管自动分类带来了挑战。为了解决这个问题，我们提出了支气管分类网络（Bronchus Classification Network，BCNet），这是一个结构引导的框架，它利用点云的节段级拓扑信息来学习体素级特征。BCNet 有两个分支，一个是用于节段分类的点-体素图神经网络（PV-GNN），另一个是用于体素标记的卷积神经网络（CNN）。这两个分支同时接受训练，为其共享骨干学习拓扑感知特征，而只运行 CNN 分支进行推理是可行的。因此，BCNet 保持了与 CNN 基线相同的推理效率。实验结果表明，在支气管分类的 F1 分数上，BCNet 都比最先进的方法高出 8.0% 以上。此外，我们还贡献了 BronAtlas：一个开放存取的支气管成像分析基准，其中包含解剖和异常支气管段的高质量体素注释。该基准可在 link1 上获取。

{"title":"BCNet: Bronchus Classification via Structure Guided Representation Learning.","authors":"Wenhao Huang, Haifan Gong, Huan Zhang, Yu Wang, Xiang Wan, Guanbin Li, Haofeng Li, Hong Shen","doi":"10.1109/TMI.2024.3448468","DOIUrl":"https://doi.org/10.1109/TMI.2024.3448468","url":null,"abstract":"CT-based bronchial tree analysis is a key step for the diagnosis of lung and airway diseases. However, the topology of bronchial trees varies across individuals, which presents a challenge to the automatic bronchus classification. To solve this issue, we propose the Bronchus Classification Network (BCNet), a structure-guided framework that exploits the segment-level topological information using point clouds to learn the voxel-level features. BCNet has two branches, a Point-Voxel Graph Neural Network (PV-GNN) for segment classification, and a Convolutional Neural Network (CNN) for voxel labeling. The two branches are simultaneously trained to learn topology-aware features for their shared backbone while it is feasible to run only the CNN branch for the inference. Therefore, BCNet maintains the same inference efficiency as its CNN baseline. Experimental results show that BCNet significantly exceeds the state-of-the-art methods by over 8.0% both on F1-score for classifying bronchus. Furthermore, we contribute BronAtlas: an open-access benchmark of bronchus imaging analysis with high-quality voxel-wise annotations of both anatomical and abnormal bronchial segments. The benchmark is available at link1.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142044270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Self-Supervised Representation Distribution Learning for Reliable Data Augmentation in Histopathology WSI Classification. 组织病理学 WSI 分类中用于可靠数据增强的自监督表征分布学习

IEEE transactions on medical imaging

Pub Date : 2024-08-22 DOI: 10.1109/TMI.2024.3447672

Kunming Tang, Zhiguo Jiang, Kun Wu, Jun Shi, Fengying Xie, Wei Wang, Haibo Wu, Yushan Zheng

Multiple instance learning (MIL) based whole slide image (WSI) classification is often carried out on the representations of patches extracted from WSI with a pre-trained patch encoder. The performance of classification relies on both patch-level representation learning and MIL classifier training. Most MIL methods utilize a frozen model pre-trained on ImageNet or a model trained with self-supervised learning on histopathology image dataset to extract patch image representations and then fix these representations in the training of the MIL classifiers for efficiency consideration. However, the invariance of representations cannot meet the diversity requirement for training a robust MIL classifier, which has significantly limited the performance of the WSI classification. In this paper, we propose a Self-Supervised Representation Distribution Learning framework (SSRDL) for patch-level representation learning with an online representation sampling strategy (ORS) for both patch feature extraction and WSI-level data augmentation. The proposed method was evaluated on three datasets under three MIL frameworks. The experimental results have demonstrated that the proposed method achieves the best performance in histopathology image representation learning and data augmentation and outperforms state-of-the-art methods under different WSI classification frameworks. The code is available at https://github.com/lazytkm/SSRDL.

基于多实例学习（MIL）的整张幻灯片图像（WSI）分类通常是通过预先训练的补丁编码器从 WSI 提取的补丁表示进行的。分类的性能取决于补丁级表示学习和 MIL 分类器训练。大多数 MIL 方法利用在 ImageNet 上预先训练的冻结模型或在组织病理学图像数据集上通过自监督学习训练的模型来提取补丁图像表征，然后出于效率考虑在 MIL 分类器的训练中固定这些表征。然而，表征的不变性无法满足训练鲁棒性 MIL 分类器的多样性要求，这大大限制了 WSI 分类的性能。在本文中，我们提出了一种用于斑块级表征学习的自监督表征分布学习框架（SSRDL），采用在线表征采样策略（ORS）进行斑块特征提取和 WSI 级数据增强。在三个 MIL 框架下的三个数据集上对所提出的方法进行了评估。实验结果表明，所提出的方法在组织病理学图像表征学习和数据增强方面取得了最佳性能，在不同的 WSI 分类框架下优于最先进的方法。代码见 https://github.com/lazytkm/SSRDL。

{"title":"Self-Supervised Representation Distribution Learning for Reliable Data Augmentation in Histopathology WSI Classification.","authors":"Kunming Tang, Zhiguo Jiang, Kun Wu, Jun Shi, Fengying Xie, Wei Wang, Haibo Wu, Yushan Zheng","doi":"10.1109/TMI.2024.3447672","DOIUrl":"https://doi.org/10.1109/TMI.2024.3447672","url":null,"abstract":"Multiple instance learning (MIL) based whole slide image (WSI) classification is often carried out on the representations of patches extracted from WSI with a pre-trained patch encoder. The performance of classification relies on both patch-level representation learning and MIL classifier training. Most MIL methods utilize a frozen model pre-trained on ImageNet or a model trained with self-supervised learning on histopathology image dataset to extract patch image representations and then fix these representations in the training of the MIL classifiers for efficiency consideration. However, the invariance of representations cannot meet the diversity requirement for training a robust MIL classifier, which has significantly limited the performance of the WSI classification. In this paper, we propose a Self-Supervised Representation Distribution Learning framework (SSRDL) for patch-level representation learning with an online representation sampling strategy (ORS) for both patch feature extraction and WSI-level data augmentation. The proposed method was evaluated on three datasets under three MIL frameworks. The experimental results have demonstrated that the proposed method achieves the best performance in histopathology image representation learning and data augmentation and outperforms state-of-the-art methods under different WSI classification frameworks. The code is available at https://github.com/lazytkm/SSRDL.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142037962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0