In computational pathology, whole-slide image (WSI) classification presents a formidable challenge due to its gigapixel resolution and limited fine-grained annotations. Multiple-instance learning (MIL) offers a weakly supervised solution, yet refining instance-level information from bag-level labels remains challenging. While most of the conventional MIL methods use attention scores to estimate instance importance scores (IIS) which contribute to the prediction of the slide labels, these often lead to skewed attention distributions and inaccuracies in identifying crucial instances. To address these issues, we propose a new approach inspired by cooperative game theory: employing Shapley values to assess each instance's contribution, thereby improving IIS estimation. The computation of the Shapley value is then accelerated using attention, meanwhile retaining the enhanced instance identification and prioritization. We further introduce a framework for the progressive assignment of pseudo bags based on estimated IIS, encouraging more balanced attention distributions in MIL models. Our extensive experiments on CAMELYON-16, BRACS, TCGA-LUNG, and TCGA-BRCA datasets show our method's superiority over existing state-of-the-art approaches, offering enhanced interpretability and class-wise insights. We will release the code upon acceptance.
{"title":"Shapley Values-enabled Progressive Pseudo Bag Augmentation for Whole-Slide Image Classification.","authors":"Renao Yan, Qiehe Sun, Cheng Jin, Yiqing Liu, Yonghong He, Tian Guan, Hao Chen","doi":"10.1109/TMI.2024.3453386","DOIUrl":"10.1109/TMI.2024.3453386","url":null,"abstract":"<p><p>In computational pathology, whole-slide image (WSI) classification presents a formidable challenge due to its gigapixel resolution and limited fine-grained annotations. Multiple-instance learning (MIL) offers a weakly supervised solution, yet refining instance-level information from bag-level labels remains challenging. While most of the conventional MIL methods use attention scores to estimate instance importance scores (IIS) which contribute to the prediction of the slide labels, these often lead to skewed attention distributions and inaccuracies in identifying crucial instances. To address these issues, we propose a new approach inspired by cooperative game theory: employing Shapley values to assess each instance's contribution, thereby improving IIS estimation. The computation of the Shapley value is then accelerated using attention, meanwhile retaining the enhanced instance identification and prioritization. We further introduce a framework for the progressive assignment of pseudo bags based on estimated IIS, encouraging more balanced attention distributions in MIL models. Our extensive experiments on CAMELYON-16, BRACS, TCGA-LUNG, and TCGA-BRCA datasets show our method's superiority over existing state-of-the-art approaches, offering enhanced interpretability and class-wise insights. We will release the code upon acceptance.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142121431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-02DOI: 10.1109/TMI.2024.3453492
Vishnuvardhan Purma, Suhas Srinath, Seshan Srirangarajan, Aanchal Kakkar, A P Prathosh
Histopathological image segmentation is a laborious and time-intensive task, often requiring analysis from experienced pathologists for accurate examinations. To reduce this burden, supervised machine-learning approaches have been adopted using large-scale annotated datasets for histopathological image analysis. However, in several scenarios, the availability of large-scale annotated data is a bottleneck while training such models. Self-supervised learning (SSL) is an alternative paradigm that provides some respite by constructing models utilizing only the unannotated data which is often abundant. The basic idea of SSL is to train a network to perform one or many pseudo or pretext tasks on unannotated data and use it subsequently as the basis for a variety of downstream tasks. It is seen that the success of SSL depends critically on the considered pretext task. While there have been many efforts in designing pretext tasks for classification problems, there have not been many attempts on SSL for histopathological image segmentation. Motivated by this, we propose an SSL approach for segmenting histopathological images via generative diffusion models. Our method is based on the observation that diffusion models effectively solve an image-to-image translation task akin to a segmentation task. Hence, we propose generative diffusion as the pretext task for histopathological image segmentation. We also utilize a multi-loss function-based fine-tuning for the downstream task. We validate our method using several metrics on two publicly available datasets along with a newly proposed head and neck (HN) cancer dataset containing Hematoxylin and Eosin (H&E) stained images along with annotations.
{"title":"GenSelfDiff-HIS: Generative Self-Supervision Using Diffusion for Histopathological Image Segmentation.","authors":"Vishnuvardhan Purma, Suhas Srinath, Seshan Srirangarajan, Aanchal Kakkar, A P Prathosh","doi":"10.1109/TMI.2024.3453492","DOIUrl":"https://doi.org/10.1109/TMI.2024.3453492","url":null,"abstract":"<p><p>Histopathological image segmentation is a laborious and time-intensive task, often requiring analysis from experienced pathologists for accurate examinations. To reduce this burden, supervised machine-learning approaches have been adopted using large-scale annotated datasets for histopathological image analysis. However, in several scenarios, the availability of large-scale annotated data is a bottleneck while training such models. Self-supervised learning (SSL) is an alternative paradigm that provides some respite by constructing models utilizing only the unannotated data which is often abundant. The basic idea of SSL is to train a network to perform one or many pseudo or pretext tasks on unannotated data and use it subsequently as the basis for a variety of downstream tasks. It is seen that the success of SSL depends critically on the considered pretext task. While there have been many efforts in designing pretext tasks for classification problems, there have not been many attempts on SSL for histopathological image segmentation. Motivated by this, we propose an SSL approach for segmenting histopathological images via generative diffusion models. Our method is based on the observation that diffusion models effectively solve an image-to-image translation task akin to a segmentation task. Hence, we propose generative diffusion as the pretext task for histopathological image segmentation. We also utilize a multi-loss function-based fine-tuning for the downstream task. We validate our method using several metrics on two publicly available datasets along with a newly proposed head and neck (HN) cancer dataset containing Hematoxylin and Eosin (H&E) stained images along with annotations.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142121429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-02DOI: 10.1109/TMI.2024.3453419
Xuegang Song, Kaixiang Shu, Peng Yang, Cheng Zhao, Feng Zhou, Alejandro F Frangi, Xiaohua Xiao, Lei Dong, Tianfu Wang, Shuqiang Wang, Baiying Lei
Brain disorder diagnosis via resting-state functional magnetic resonance imaging (rs-fMRI) is usually limited due to the complex imaging features and sample size. For brain disorder diagnosis, the graph convolutional network (GCN) has achieved remarkable success by capturing interactions between individuals and the population. However, there are mainly three limitations: 1) The previous GCN approaches consider the non-imaging information in edge construction but ignore the sensitivity differences of features to non-imaging information. 2) The previous GCN approaches solely focus on establishing interactions between subjects (i.e., individuals and the population), disregarding the essential relationship between features. 3) Multisite data increase the sample size to help classifier training, but the inter-site heterogeneity limits the performance to some extent. This paper proposes a knowledge-aware multisite adaptive graph Transformer to address the above problems. First, we evaluate the sensitivity of features to each piece of non-imaging information, and then construct feature-sensitive and feature-insensitive subgraphs. Second, after fusing the above subgraphs, we integrate a Transformer module to capture the intrinsic relationship between features. Third, we design a domain adaptive GCN using multiple loss function terms to relieve data heterogeneity and to produce the final classification results. Last, the proposed framework is validated on two brain disorder diagnostic tasks. Experimental results show that the proposed framework can achieve state-of-the-art performance.
{"title":"Knowledge-aware Multisite Adaptive Graph Transformer for Brain Disorder Diagnosis.","authors":"Xuegang Song, Kaixiang Shu, Peng Yang, Cheng Zhao, Feng Zhou, Alejandro F Frangi, Xiaohua Xiao, Lei Dong, Tianfu Wang, Shuqiang Wang, Baiying Lei","doi":"10.1109/TMI.2024.3453419","DOIUrl":"https://doi.org/10.1109/TMI.2024.3453419","url":null,"abstract":"<p><p>Brain disorder diagnosis via resting-state functional magnetic resonance imaging (rs-fMRI) is usually limited due to the complex imaging features and sample size. For brain disorder diagnosis, the graph convolutional network (GCN) has achieved remarkable success by capturing interactions between individuals and the population. However, there are mainly three limitations: 1) The previous GCN approaches consider the non-imaging information in edge construction but ignore the sensitivity differences of features to non-imaging information. 2) The previous GCN approaches solely focus on establishing interactions between subjects (i.e., individuals and the population), disregarding the essential relationship between features. 3) Multisite data increase the sample size to help classifier training, but the inter-site heterogeneity limits the performance to some extent. This paper proposes a knowledge-aware multisite adaptive graph Transformer to address the above problems. First, we evaluate the sensitivity of features to each piece of non-imaging information, and then construct feature-sensitive and feature-insensitive subgraphs. Second, after fusing the above subgraphs, we integrate a Transformer module to capture the intrinsic relationship between features. Third, we design a domain adaptive GCN using multiple loss function terms to relieve data heterogeneity and to produce the final classification results. Last, the proposed framework is validated on two brain disorder diagnostic tasks. Experimental results show that the proposed framework can achieve state-of-the-art performance.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142121430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-02DOI: 10.1109/TMI.2024.3453377
Rui Feng, Jingwen Yang, Hao Huang, Zelin Chen, Ruiyan Feng, N U Farrukh Hameed, Xudong Zhang, Jie Hu, Liang Chen, Shuo Lu
Refractory temporal lobe epilepsy (TLE) is one of the most frequently observed subtypes of epilepsy and endangers more than 50 million people world-wide. Although electroencephalogram (EEG) had been widely recognized as a classic tool to screen and diagnose epilepsy, for many years it heavily relied on identifying epileptic discharges and epileptogenic zone localization, which however, limits the understanding of refractory epilepsy due to the network nature of this disease. This work hypothesizes that the microstate dynamics based on resting-state scalp EEG can offer an additional network depiction of the disease and provide potential complementary evaluation tool for the TLE even without detectable epileptic discharges on EEG. We propose a novel framework for EEG microstate spatial-temporal dynamics (EEG-MiSTD) analysis based on machine learning to comprehensively model millisecond-changing whole-brain network dynamics. With only 100 seconds of resting-state EEG even without epileptic discharges, this approach successfully distinguishes TLE patients from healthy controls and is related to the lateralization of epileptic focus. Besides, microstate temporal and spatial features are found to be widely related to clinical parameters, which further demonstrate that TLE is a network disease. A preliminary exploration suggests that the spatial topography is sensitive to the following surgical outcomes. From such a new perspective, our results suggest that spatiotemporal microstate dynamics is potentially a biomarker of the disease. The developed EEG-MiSTD framework can probably be considered as a general tool to examine dynamical brain network disruption in a user-friendly way for other types of epilepsy.
难治性颞叶癫痫(TLE)是最常见的癫痫亚型之一,危害着全球 5000 多万人。尽管脑电图(EEG)已被广泛认为是筛查和诊断癫痫的经典工具,但多年来,它主要依赖于识别癫痫放电和致痫区定位,然而,由于难治性癫痫的网络性质,这限制了对难治性癫痫的理解。这项研究假设,基于静息态头皮脑电图的微状态动力学可以提供疾病的额外网络描述,并为 TLE 提供潜在的补充评估工具,即使脑电图上没有可检测到的癫痫放电。我们提出了一种基于机器学习的脑电图微状态时空动态(EEG-MiSTD)分析新框架,以全面模拟毫秒级变化的全脑网络动态。即使没有癫痫放电,只需100秒的静息状态脑电图,这种方法就能成功地将TLE患者与健康对照组区分开来,并与癫痫灶的侧向性有关。此外,研究还发现微状态的时间和空间特征与临床参数广泛相关,这进一步证明了 TLE 是一种网络性疾病。初步研究表明,空间地形图对后续手术结果很敏感。从这一新的角度来看,我们的研究结果表明,时空微状态动态可能是该疾病的一种生物标志物。所开发的脑电图-微状态框架或许可被视为一种通用工具,以用户友好的方式检查其他类型癫痫的动态脑网络破坏。
{"title":"Spatiotemporal Microstate Dynamics of Spike-free Scalp EEG Offer a Potential Biomarker for Refractory Temporal Lobe Epilepsy.","authors":"Rui Feng, Jingwen Yang, Hao Huang, Zelin Chen, Ruiyan Feng, N U Farrukh Hameed, Xudong Zhang, Jie Hu, Liang Chen, Shuo Lu","doi":"10.1109/TMI.2024.3453377","DOIUrl":"https://doi.org/10.1109/TMI.2024.3453377","url":null,"abstract":"<p><p>Refractory temporal lobe epilepsy (TLE) is one of the most frequently observed subtypes of epilepsy and endangers more than 50 million people world-wide. Although electroencephalogram (EEG) had been widely recognized as a classic tool to screen and diagnose epilepsy, for many years it heavily relied on identifying epileptic discharges and epileptogenic zone localization, which however, limits the understanding of refractory epilepsy due to the network nature of this disease. This work hypothesizes that the microstate dynamics based on resting-state scalp EEG can offer an additional network depiction of the disease and provide potential complementary evaluation tool for the TLE even without detectable epileptic discharges on EEG. We propose a novel framework for EEG microstate spatial-temporal dynamics (EEG-MiSTD) analysis based on machine learning to comprehensively model millisecond-changing whole-brain network dynamics. With only 100 seconds of resting-state EEG even without epileptic discharges, this approach successfully distinguishes TLE patients from healthy controls and is related to the lateralization of epileptic focus. Besides, microstate temporal and spatial features are found to be widely related to clinical parameters, which further demonstrate that TLE is a network disease. A preliminary exploration suggests that the spatial topography is sensitive to the following surgical outcomes. From such a new perspective, our results suggest that spatiotemporal microstate dynamics is potentially a biomarker of the disease. The developed EEG-MiSTD framework can probably be considered as a general tool to examine dynamical brain network disruption in a user-friendly way for other types of epilepsy.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142121432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The use of machine learning (ML) for cancer staging through medical image analysis has gained substantial interest across medical disciplines. When accompanied by the innovative federated learning (FL) framework, ML techniques can further overcome privacy concerns related to patient data exposure. Given the frequent presence of diverse data modalities within patient records, leveraging FL in a multi-modal learning framework holds considerable promise for cancer staging. However, existing works on multi-modal FL often presume that all data-collecting institutions have access to all data modalities. This oversimplified approach neglects institutions that have access to only a portion of data modalities within the system. In this work, we introduce a novel FL architecture designed to accommodate not only the heterogeneity of data samples, but also the inherent heterogeneity/non-uniformity of data modalities across institutions. We shed light on the challenges associated with varying convergence speeds observed across different data modalities within our FL system. Subsequently, we propose a solution to tackle these challenges by devising a distributed gradient blending and proximity-aware client weighting strategy tailored for multi-modal FL. To show the superiority of our method, we conduct experiments using The Cancer Genome Atlas program (TCGA) datalake considering different cancer types and three modalities of data: mRNA sequences, histopathological image data, and clinical information. Our results further unveil the impact and severity of class-based vs type-based heterogeneity across institutions on the model performance, which widens the perspective to the notion of data heterogeneity in multi-modal FL literature.
{"title":"Multi-Modal Federated Learning for Cancer Staging over Non-IID Datasets with Unbalanced Modalities.","authors":"Kasra Borazjani, Naji Khosravan, Leslie Ying, Seyyedali Hosseinalipour","doi":"10.1109/TMI.2024.3450855","DOIUrl":"https://doi.org/10.1109/TMI.2024.3450855","url":null,"abstract":"<p><p>The use of machine learning (ML) for cancer staging through medical image analysis has gained substantial interest across medical disciplines. When accompanied by the innovative federated learning (FL) framework, ML techniques can further overcome privacy concerns related to patient data exposure. Given the frequent presence of diverse data modalities within patient records, leveraging FL in a multi-modal learning framework holds considerable promise for cancer staging. However, existing works on multi-modal FL often presume that all data-collecting institutions have access to all data modalities. This oversimplified approach neglects institutions that have access to only a portion of data modalities within the system. In this work, we introduce a novel FL architecture designed to accommodate not only the heterogeneity of data samples, but also the inherent heterogeneity/non-uniformity of data modalities across institutions. We shed light on the challenges associated with varying convergence speeds observed across different data modalities within our FL system. Subsequently, we propose a solution to tackle these challenges by devising a distributed gradient blending and proximity-aware client weighting strategy tailored for multi-modal FL. To show the superiority of our method, we conduct experiments using The Cancer Genome Atlas program (TCGA) datalake considering different cancer types and three modalities of data: mRNA sequences, histopathological image data, and clinical information. Our results further unveil the impact and severity of class-based vs type-based heterogeneity across institutions on the model performance, which widens the perspective to the notion of data heterogeneity in multi-modal FL literature.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142086428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-28DOI: 10.1109/TMI.2024.3449817
Sen Wang, Yirong Yang, Grant M Stevens, Zhye Yin, Adam S Wang
Photon counting CT (PCCT) acquires spectral measurements and enables generation of material decomposition (MD) images that provide distinct advantages in various clinical situations. However, noise amplification is observed in MD images, and denoising is typically applied. Clean or high-quality references are rare in clinical scans, often making supervised learning (Noise2Clean) impractical. Noise2Noise is a self-supervised counterpart, using noisy images and corresponding noisy references with zero-mean, independent noise. PCCT counts transmitted photons separately, and raw measurements are assumed to follow a Poisson distribution in each energy bin, providing the possibility to create noise-independent pairs. The approach is to use binomial selection to split the counts into two low-dose scans with independent noise. We prove that the reconstructed spectral images inherit the noise independence from counts domain through noise propagation analysis and also validated it in numerical simulation and experimental phantom scans. The method offers the flexibility to split measurements into desired dose levels while ensuring the reconstructed images share identical underlying features, thereby strengthening the model's robustness for input dose levels and capability of preserving fine details. In both numerical simulation and experimental phantom scans, we demonstrated that Noise2Noise with binomial selection outperforms other common self-supervised learning methods based on different presumptive conditions.
{"title":"Emulating Low-Dose PCCT Image Pairs with Independent Noise for Self-Supervised Spectral Image Denoising.","authors":"Sen Wang, Yirong Yang, Grant M Stevens, Zhye Yin, Adam S Wang","doi":"10.1109/TMI.2024.3449817","DOIUrl":"https://doi.org/10.1109/TMI.2024.3449817","url":null,"abstract":"<p><p>Photon counting CT (PCCT) acquires spectral measurements and enables generation of material decomposition (MD) images that provide distinct advantages in various clinical situations. However, noise amplification is observed in MD images, and denoising is typically applied. Clean or high-quality references are rare in clinical scans, often making supervised learning (Noise2Clean) impractical. Noise2Noise is a self-supervised counterpart, using noisy images and corresponding noisy references with zero-mean, independent noise. PCCT counts transmitted photons separately, and raw measurements are assumed to follow a Poisson distribution in each energy bin, providing the possibility to create noise-independent pairs. The approach is to use binomial selection to split the counts into two low-dose scans with independent noise. We prove that the reconstructed spectral images inherit the noise independence from counts domain through noise propagation analysis and also validated it in numerical simulation and experimental phantom scans. The method offers the flexibility to split measurements into desired dose levels while ensuring the reconstructed images share identical underlying features, thereby strengthening the model's robustness for input dose levels and capability of preserving fine details. In both numerical simulation and experimental phantom scans, we demonstrated that Noise2Noise with binomial selection outperforms other common self-supervised learning methods based on different presumptive conditions.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142086427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In the field of medical Vision-Language Pretraining (VLP), significant efforts have been devoted to deriving text and image features from both clinical reports and associated medical images. However, most existing methods may have overlooked the opportunity in leveraging the inherent hierarchical structure of clinical reports, which are generally split into 'findings' for descriptive content and 'impressions' for conclusive observation. Instead of utilizing this rich, structured format, current medical VLP approaches often simplify the report into either a unified entity or fragmented tokens. In this work, we propose a novel clinical prior guided VLP framework named IMITATE to learn the structure information from medical reports with hierarchical vision-language alignment. The framework derives multi-level visual features from the chest X-ray (CXR) images and separately aligns these features with the descriptive and the conclusive text encoded in the hierarchical medical report. Furthermore, a new clinical-informed contrastive loss is introduced for cross-modal learning, which accounts for clinical prior knowledge in formulating sample correlations in contrastive learning. The proposed model, IMITATE, outperforms baseline VLP methods across six different datasets, spanning five medical imaging downstream tasks. Comprehensive experimental results highlight the advantages of integrating the hierarchical structure of medical reports for vision-language alignment.
{"title":"IMITATE: Clinical Prior Guided Hierarchical Vision-Language Pre-training.","authors":"Che Liu, Sibo Cheng, Miaojing Shi, Anand Shah, Wenjia Bai, Rossella Arcucci","doi":"10.1109/TMI.2024.3449690","DOIUrl":"https://doi.org/10.1109/TMI.2024.3449690","url":null,"abstract":"<p><p>In the field of medical Vision-Language Pretraining (VLP), significant efforts have been devoted to deriving text and image features from both clinical reports and associated medical images. However, most existing methods may have overlooked the opportunity in leveraging the inherent hierarchical structure of clinical reports, which are generally split into 'findings' for descriptive content and 'impressions' for conclusive observation. Instead of utilizing this rich, structured format, current medical VLP approaches often simplify the report into either a unified entity or fragmented tokens. In this work, we propose a novel clinical prior guided VLP framework named IMITATE to learn the structure information from medical reports with hierarchical vision-language alignment. The framework derives multi-level visual features from the chest X-ray (CXR) images and separately aligns these features with the descriptive and the conclusive text encoded in the hierarchical medical report. Furthermore, a new clinical-informed contrastive loss is introduced for cross-modal learning, which accounts for clinical prior knowledge in formulating sample correlations in contrastive learning. The proposed model, IMITATE, outperforms baseline VLP methods across six different datasets, spanning five medical imaging downstream tasks. Comprehensive experimental results highlight the advantages of integrating the hierarchical structure of medical reports for vision-language alignment.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142074850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-26DOI: 10.1109/TMI.2024.3449647
Sunggu Kyung, Jongjun Won, Seongyong Pak, Sunwoo Kim, Sangyoon Lee, Kanggil Park, Gil-Sun Hong, Namkug Kim
Reducing the dose of radiation in computed tomography (CT) is vital to decreasing secondary cancer risk. However, the use of low-dose CT (LDCT) images is accompanied by increased noise that can negatively impact diagnoses. Although numerous deep learning algorithms have been developed for LDCT denoising, several challenges persist, including the visual incongruence experienced by radiologists, unsatisfactory performances across various metrics, and insufficient exploration of the networks' robustness in other CT domains. To address such issues, this study proposes three novel accretions. First, we propose a generative adversarial network (GAN) with a robust discriminator through multi-task learning that simultaneously performs three vision tasks: restoration, image-level, and pixel-level decisions. The more multi-tasks that are performed, the better the denoising performance of the generator, which means multi-task learning enables the discriminator to provide more meaningful feedback to the generator. Second, two regulatory mechanisms, restoration consistency (RC) and non-difference suppression (NDS), are introduced to improve the discriminator's representation capabilities. These mechanisms eliminate irrelevant regions and compare the discriminator's results from the input and restoration, thus facilitating effective GAN training. Lastly, we incorporate residual fast Fourier transforms with convolution (Res-FFT-Conv) blocks into the generator to utilize both frequency and spatial representations. This approach provides mixed receptive fields by using spatial (or local), spectral (or global), and residual connections. Our model was evaluated using various pixel- and feature-space metrics in two denoising tasks. Additionally, we conducted visual scoring with radiologists. The results indicate superior performance in both quantitative and qualitative measures compared to state-of-the-art denoising techniques.
减少计算机断层扫描(CT)的辐射剂量对于降低继发性癌症风险至关重要。然而,低剂量 CT(LDCT)图像的使用伴随着噪声的增加,会对诊断产生负面影响。虽然针对 LDCT 去噪已经开发出了许多深度学习算法,但仍存在一些挑战,包括放射科医生体验到的视觉不协调、各种指标的表现不尽如人意,以及对网络在其他 CT 领域的鲁棒性探索不足。为了解决这些问题,本研究提出了三个新的增量。首先,我们提出了一种生成式对抗网络(GAN),该网络通过多任务学习具有鲁棒性判别器,可同时执行三项视觉任务:还原、图像级和像素级决策。执行的多任务越多,生成器的去噪性能就越好,这意味着多任务学习能让判别器为生成器提供更有意义的反馈。其次,为了提高鉴别器的表征能力,引入了两种调节机制,即恢复一致性(RC)和无差异抑制(NDS)。这些机制可以消除无关区域,并比较鉴别器从输入和恢复中得到的结果,从而促进有效的 GAN 训练。最后,我们将残差快速傅立叶变换与卷积(Res-FFT-Conv)块纳入生成器,以利用频率和空间表示。这种方法通过使用空间(或局部)、频谱(或全局)和残差连接来提供混合感受野。我们在两项去噪任务中使用各种像素和特征空间指标对我们的模型进行了评估。此外,我们还与放射科医生进行了视觉评分。结果表明,与最先进的去噪技术相比,我们的模型在定量和定性测量方面都表现出色。
{"title":"Generative Adversarial Network with Robust Discriminator Through Multi-Task Learning for Low-Dose CT Denoising.","authors":"Sunggu Kyung, Jongjun Won, Seongyong Pak, Sunwoo Kim, Sangyoon Lee, Kanggil Park, Gil-Sun Hong, Namkug Kim","doi":"10.1109/TMI.2024.3449647","DOIUrl":"https://doi.org/10.1109/TMI.2024.3449647","url":null,"abstract":"<p><p>Reducing the dose of radiation in computed tomography (CT) is vital to decreasing secondary cancer risk. However, the use of low-dose CT (LDCT) images is accompanied by increased noise that can negatively impact diagnoses. Although numerous deep learning algorithms have been developed for LDCT denoising, several challenges persist, including the visual incongruence experienced by radiologists, unsatisfactory performances across various metrics, and insufficient exploration of the networks' robustness in other CT domains. To address such issues, this study proposes three novel accretions. First, we propose a generative adversarial network (GAN) with a robust discriminator through multi-task learning that simultaneously performs three vision tasks: restoration, image-level, and pixel-level decisions. The more multi-tasks that are performed, the better the denoising performance of the generator, which means multi-task learning enables the discriminator to provide more meaningful feedback to the generator. Second, two regulatory mechanisms, restoration consistency (RC) and non-difference suppression (NDS), are introduced to improve the discriminator's representation capabilities. These mechanisms eliminate irrelevant regions and compare the discriminator's results from the input and restoration, thus facilitating effective GAN training. Lastly, we incorporate residual fast Fourier transforms with convolution (Res-FFT-Conv) blocks into the generator to utilize both frequency and spatial representations. This approach provides mixed receptive fields by using spatial (or local), spectral (or global), and residual connections. Our model was evaluated using various pixel- and feature-space metrics in two denoising tasks. Additionally, we conducted visual scoring with radiologists. The results indicate superior performance in both quantitative and qualitative measures compared to state-of-the-art denoising techniques.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142074849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
CT-based bronchial tree analysis is a key step for the diagnosis of lung and airway diseases. However, the topology of bronchial trees varies across individuals, which presents a challenge to the automatic bronchus classification. To solve this issue, we propose the Bronchus Classification Network (BCNet), a structure-guided framework that exploits the segment-level topological information using point clouds to learn the voxel-level features. BCNet has two branches, a Point-Voxel Graph Neural Network (PV-GNN) for segment classification, and a Convolutional Neural Network (CNN) for voxel labeling. The two branches are simultaneously trained to learn topology-aware features for their shared backbone while it is feasible to run only the CNN branch for the inference. Therefore, BCNet maintains the same inference efficiency as its CNN baseline. Experimental results show that BCNet significantly exceeds the state-of-the-art methods by over 8.0% both on F1-score for classifying bronchus. Furthermore, we contribute BronAtlas: an open-access benchmark of bronchus imaging analysis with high-quality voxel-wise annotations of both anatomical and abnormal bronchial segments. The benchmark is available at link1.
{"title":"BCNet: Bronchus Classification via Structure Guided Representation Learning.","authors":"Wenhao Huang, Haifan Gong, Huan Zhang, Yu Wang, Xiang Wan, Guanbin Li, Haofeng Li, Hong Shen","doi":"10.1109/TMI.2024.3448468","DOIUrl":"https://doi.org/10.1109/TMI.2024.3448468","url":null,"abstract":"<p><p>CT-based bronchial tree analysis is a key step for the diagnosis of lung and airway diseases. However, the topology of bronchial trees varies across individuals, which presents a challenge to the automatic bronchus classification. To solve this issue, we propose the Bronchus Classification Network (BCNet), a structure-guided framework that exploits the segment-level topological information using point clouds to learn the voxel-level features. BCNet has two branches, a Point-Voxel Graph Neural Network (PV-GNN) for segment classification, and a Convolutional Neural Network (CNN) for voxel labeling. The two branches are simultaneously trained to learn topology-aware features for their shared backbone while it is feasible to run only the CNN branch for the inference. Therefore, BCNet maintains the same inference efficiency as its CNN baseline. Experimental results show that BCNet significantly exceeds the state-of-the-art methods by over 8.0% both on F1-score for classifying bronchus. Furthermore, we contribute BronAtlas: an open-access benchmark of bronchus imaging analysis with high-quality voxel-wise annotations of both anatomical and abnormal bronchial segments. The benchmark is available at link<sup>1</sup>.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142044270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-22DOI: 10.1109/TMI.2024.3447672
Kunming Tang, Zhiguo Jiang, Kun Wu, Jun Shi, Fengying Xie, Wei Wang, Haibo Wu, Yushan Zheng
Multiple instance learning (MIL) based whole slide image (WSI) classification is often carried out on the representations of patches extracted from WSI with a pre-trained patch encoder. The performance of classification relies on both patch-level representation learning and MIL classifier training. Most MIL methods utilize a frozen model pre-trained on ImageNet or a model trained with self-supervised learning on histopathology image dataset to extract patch image representations and then fix these representations in the training of the MIL classifiers for efficiency consideration. However, the invariance of representations cannot meet the diversity requirement for training a robust MIL classifier, which has significantly limited the performance of the WSI classification. In this paper, we propose a Self-Supervised Representation Distribution Learning framework (SSRDL) for patch-level representation learning with an online representation sampling strategy (ORS) for both patch feature extraction and WSI-level data augmentation. The proposed method was evaluated on three datasets under three MIL frameworks. The experimental results have demonstrated that the proposed method achieves the best performance in histopathology image representation learning and data augmentation and outperforms state-of-the-art methods under different WSI classification frameworks. The code is available at https://github.com/lazytkm/SSRDL.
基于多实例学习(MIL)的整张幻灯片图像(WSI)分类通常是通过预先训练的补丁编码器从 WSI 提取的补丁表示进行的。分类的性能取决于补丁级表示学习和 MIL 分类器训练。大多数 MIL 方法利用在 ImageNet 上预先训练的冻结模型或在组织病理学图像数据集上通过自监督学习训练的模型来提取补丁图像表征,然后出于效率考虑在 MIL 分类器的训练中固定这些表征。然而,表征的不变性无法满足训练鲁棒性 MIL 分类器的多样性要求,这大大限制了 WSI 分类的性能。在本文中,我们提出了一种用于斑块级表征学习的自监督表征分布学习框架(SSRDL),采用在线表征采样策略(ORS)进行斑块特征提取和 WSI 级数据增强。在三个 MIL 框架下的三个数据集上对所提出的方法进行了评估。实验结果表明,所提出的方法在组织病理学图像表征学习和数据增强方面取得了最佳性能,在不同的 WSI 分类框架下优于最先进的方法。代码见 https://github.com/lazytkm/SSRDL。
{"title":"Self-Supervised Representation Distribution Learning for Reliable Data Augmentation in Histopathology WSI Classification.","authors":"Kunming Tang, Zhiguo Jiang, Kun Wu, Jun Shi, Fengying Xie, Wei Wang, Haibo Wu, Yushan Zheng","doi":"10.1109/TMI.2024.3447672","DOIUrl":"https://doi.org/10.1109/TMI.2024.3447672","url":null,"abstract":"<p><p>Multiple instance learning (MIL) based whole slide image (WSI) classification is often carried out on the representations of patches extracted from WSI with a pre-trained patch encoder. The performance of classification relies on both patch-level representation learning and MIL classifier training. Most MIL methods utilize a frozen model pre-trained on ImageNet or a model trained with self-supervised learning on histopathology image dataset to extract patch image representations and then fix these representations in the training of the MIL classifiers for efficiency consideration. However, the invariance of representations cannot meet the diversity requirement for training a robust MIL classifier, which has significantly limited the performance of the WSI classification. In this paper, we propose a Self-Supervised Representation Distribution Learning framework (SSRDL) for patch-level representation learning with an online representation sampling strategy (ORS) for both patch feature extraction and WSI-level data augmentation. The proposed method was evaluated on three datasets under three MIL frameworks. The experimental results have demonstrated that the proposed method achieves the best performance in histopathology image representation learning and data augmentation and outperforms state-of-the-art methods under different WSI classification frameworks. The code is available at https://github.com/lazytkm/SSRDL.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142037962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}