Pub Date : 2025-09-12DOI: 10.1109/TMI.2025.3609319
Victoria Wu;Andrea Fung;Bahar Khodabakhshian;Baraa Abdelsamad;Hooman Vaseli;Neda Ahmadi;Jamie A. D. Goco;Michael Y. Tsang;Christina Luong;Purang Abolmaesumi;Teresa S. M. Tsang
Aortic stenosis (AS), a prevalent and serious heart valve disorder, requires early detection but remains difficult to diagnose in routine practice. Although echocardiography with Doppler imaging is the clinical standard, these assessments are typically limited to trained specialists. Point-of-care ultrasound (POCUS) offers an accessible alternative for AS screening but is restricted to basic 2D B-mode imaging, often lacking the analysis Doppler provides. Our project introduces MultiASNet, a multimodal machine learning framework designed to enhance AS screening with POCUS by combining 2D B-mode videos with structured data from echocardiography reports, including Doppler parameters. Using contrastive learning, MultiASNet aligns video features with report features in tabular form from the same patient to improve interpretive quality. To address misalignment where a single report corresponds to multiple video views, some irrelevant to AS diagnosis, we use cross-attention in a transformer-based video and tabular network to assign less importance to irrelevant report data. The model integrates structured data only during training, enabling independent use with B-mode videos during inference for broader accessibility. MultiASNet also incorporates sample selection to counteract label noise from observer variability, yielding improved accuracy on two datasets. We achieved balanced accuracy scores of 93.0% on a private dataset and 83.9% on the public TMED-2 dataset for AS detection. For severity classification, balanced accuracy scores were 80.4% and 59.4% on the private and public datasets, respectively. This model facilitates reliable AS screening in non-specialist settings, bridging the gap left by Doppler data while reducing noise-related errors. Our code is publicly available at github.com/DeepRCL/MultiASNet
{"title":"MultiASNet: Multimodal Label Noise Robust Framework for the Classification of Aortic Stenosis in Echocardiography","authors":"Victoria Wu;Andrea Fung;Bahar Khodabakhshian;Baraa Abdelsamad;Hooman Vaseli;Neda Ahmadi;Jamie A. D. Goco;Michael Y. Tsang;Christina Luong;Purang Abolmaesumi;Teresa S. M. Tsang","doi":"10.1109/TMI.2025.3609319","DOIUrl":"10.1109/TMI.2025.3609319","url":null,"abstract":"Aortic stenosis (AS), a prevalent and serious heart valve disorder, requires early detection but remains difficult to diagnose in routine practice. Although echocardiography with Doppler imaging is the clinical standard, these assessments are typically limited to trained specialists. Point-of-care ultrasound (POCUS) offers an accessible alternative for AS screening but is restricted to basic 2D B-mode imaging, often lacking the analysis Doppler provides. Our project introduces MultiASNet, a multimodal machine learning framework designed to enhance AS screening with POCUS by combining 2D B-mode videos with structured data from echocardiography reports, including Doppler parameters. Using contrastive learning, MultiASNet aligns video features with report features in tabular form from the same patient to improve interpretive quality. To address misalignment where a single report corresponds to multiple video views, some irrelevant to AS diagnosis, we use cross-attention in a transformer-based video and tabular network to assign less importance to irrelevant report data. The model integrates structured data only during training, enabling independent use with B-mode videos during inference for broader accessibility. MultiASNet also incorporates sample selection to counteract label noise from observer variability, yielding improved accuracy on two datasets. We achieved balanced accuracy scores of 93.0% on a private dataset and 83.9% on the public TMED-2 dataset for AS detection. For severity classification, balanced accuracy scores were 80.4% and 59.4% on the private and public datasets, respectively. This model facilitates reliable AS screening in non-specialist settings, bridging the gap left by Doppler data while reducing noise-related errors. Our code is publicly available at github.com/DeepRCL/MultiASNet","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"45 2","pages":"799-810"},"PeriodicalIF":0.0,"publicationDate":"2025-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145043575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-11DOI: 10.1109/TMI.2025.3609245
Shahira Abousamra;Danielle Fassler;Rajarsi Gupta;Tahsin Kurc;Luisa F. Escobar-Hoyos;Dimitris Samaras;Kenneth R. Shroyer;Joel Saltz;Chao Chen
Brightfield Multiplex Immunohistochemistry (mIHC) provides simultaneous labeling of multiple protein biomarkers in the same tissue section. It enables the exploration of spatial relationships between the inflammatory microenvironment and tumor cells, and to uncover how tumor cell morphology relates to cancer biomarker expression. Color deconvolution is required to analyze and quantify the different cell phenotype populations present as indicated by the biomarkers. However, this becomes a challenging task as the number of multiplexed stains increase. In this work, we present self-supervised and semi-supervised approaches to mIHC color deconvolution. Our proposed methods are based on deep convolutional autoencoders and learn using innovative reconstruction losses inspired by physics. We show how we can integrate weak annotations and the abundant unlabeled data available to train a model to reliably unmix the multiplexed stains and generate stain segmentation maps. We demonstrate the effectiveness of our proposed methods through experiments on mIHC dataset of 7-plexed IHC images.
{"title":"Label-Efficient Deep Color Deconvolution of Brightfield Multiplex IHC Images","authors":"Shahira Abousamra;Danielle Fassler;Rajarsi Gupta;Tahsin Kurc;Luisa F. Escobar-Hoyos;Dimitris Samaras;Kenneth R. Shroyer;Joel Saltz;Chao Chen","doi":"10.1109/TMI.2025.3609245","DOIUrl":"10.1109/TMI.2025.3609245","url":null,"abstract":"Brightfield Multiplex Immunohistochemistry (mIHC) provides simultaneous labeling of multiple protein biomarkers in the same tissue section. It enables the exploration of spatial relationships between the inflammatory microenvironment and tumor cells, and to uncover how tumor cell morphology relates to cancer biomarker expression. Color deconvolution is required to analyze and quantify the different cell phenotype populations present as indicated by the biomarkers. However, this becomes a challenging task as the number of multiplexed stains increase. In this work, we present self-supervised and semi-supervised approaches to mIHC color deconvolution. Our proposed methods are based on deep convolutional autoencoders and learn using innovative reconstruction losses inspired by physics. We show how we can integrate weak annotations and the abundant unlabeled data available to train a model to reliably unmix the multiplexed stains and generate stain segmentation maps. We demonstrate the effectiveness of our proposed methods through experiments on mIHC dataset of 7-plexed IHC images.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"45 2","pages":"853-864"},"PeriodicalIF":0.0,"publicationDate":"2025-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145035225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-10DOI: 10.1109/TMI.2025.3607752
W. Jeffrey Zabel;Héctor Contreras-Sánchez;Warren Foltz;Costel Flueraru;Edward Taylor;Alex Vitkin
Intravoxel Incoherent Motion (IVIM) MRI is a contrast-agent-free microvascular imaging method finding increasing use in biomedicine. However, there is uncertainty in the ability of IVIM-MRI to quantify tissue microvasculature given MRI’s limited spatial resolution (mm scale). Nine NRG mice were subcutaneously inoculated with human pancreatic cancer BxPC-3 cells transfected with DsRed, and MR-compatible plastic window chambers were surgically installed in the dorsal skinfold. Mice were imaged with speckle variance optical coherence tomography (OCT) and colour Doppler OCT, providing high resolution 3D measurements of the vascular volume density (VVD) and average Doppler phase shift ($overline {Delta phi }text {)}$ respectively. IVIM imaging was performed on a 7T preclinical MRI scanner, to generate maps of the perfusion fraction f, the extravascular diffusion coefficient ${D}_{textit {slow}}$ , and the intravascular diffusion coefficient ${D}_{textit {fast}}$ . The IVIM parameter maps were coregistered with the optical datasets to enable direct spatial correlation. A significant positive correlation was noted between OCT’s VVD and MR’s f (Pearson correlation coefficient ${r}={0}.{34},{p}lt {0}.{0001}text {)}$ . Surprisingly, no significant correlation was found between $overline {Delta phi }$ and ${D}_{textit {fast}}$ . This may be due to larger errors in the determined ${D}_{textit {fast}}$ values compared to f, as confirmed by Monte Carlo simulations. Several other inter- and intra-modality correlations were also quantified. Direct same-animal correlation of clinically applicable IVIM imaging with preclinical OCT microvascular imaging support the biomedical relevance of IVIM-MRI metrics, for example through f’s relationship to the VVD.
{"title":"Quantifying Tumor Microvasculature With Optical Coherence Angiography and Intravoxel Incoherent Motion Diffusion MRI","authors":"W. Jeffrey Zabel;Héctor Contreras-Sánchez;Warren Foltz;Costel Flueraru;Edward Taylor;Alex Vitkin","doi":"10.1109/TMI.2025.3607752","DOIUrl":"10.1109/TMI.2025.3607752","url":null,"abstract":"Intravoxel Incoherent Motion (IVIM) MRI is a contrast-agent-free microvascular imaging method finding increasing use in biomedicine. However, there is uncertainty in the ability of IVIM-MRI to quantify tissue microvasculature given MRI’s limited spatial resolution (mm scale). Nine NRG mice were subcutaneously inoculated with human pancreatic cancer BxPC-3 cells transfected with DsRed, and MR-compatible plastic window chambers were surgically installed in the dorsal skinfold. Mice were imaged with speckle variance optical coherence tomography (OCT) and colour Doppler OCT, providing high resolution 3D measurements of the vascular volume density (VVD) and average Doppler phase shift (<inline-formula> <tex-math>$overline {Delta phi }text {)}$ </tex-math></inline-formula> respectively. IVIM imaging was performed on a 7T preclinical MRI scanner, to generate maps of the perfusion fraction f, the extravascular diffusion coefficient <inline-formula> <tex-math>${D}_{textit {slow}}$ </tex-math></inline-formula>, and the intravascular diffusion coefficient <inline-formula> <tex-math>${D}_{textit {fast}}$ </tex-math></inline-formula>. The IVIM parameter maps were coregistered with the optical datasets to enable direct spatial correlation. A significant positive correlation was noted between OCT’s VVD and MR’s f (Pearson correlation coefficient <inline-formula> <tex-math>${r}={0}.{34},{p}lt {0}.{0001}text {)}$ </tex-math></inline-formula>. Surprisingly, no significant correlation was found between <inline-formula> <tex-math>$overline {Delta phi }$ </tex-math></inline-formula> and <inline-formula> <tex-math>${D}_{textit {fast}}$ </tex-math></inline-formula>. This may be due to larger errors in the determined <inline-formula> <tex-math>${D}_{textit {fast}}$ </tex-math></inline-formula> values compared to f, as confirmed by Monte Carlo simulations. Several other inter- and intra-modality correlations were also quantified. Direct same-animal correlation of clinically applicable IVIM imaging with preclinical OCT microvascular imaging support the biomedical relevance of IVIM-MRI metrics, for example through f’s relationship to the VVD.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"45 2","pages":"789-798"},"PeriodicalIF":0.0,"publicationDate":"2025-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145031941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-09DOI: 10.1109/TMI.2025.3607700
Xiaoru Gao;Housheng Xie;Donghua Hang;Guoyan Zheng
Computed Tomography (CT) to Cone-Beam Computed Tomography (CBCT) image registration is crucial for image-guided radiotherapy and surgical procedures. However, achieving accurate CT-CBCT registration remains challenging due to various factors such as inconsistent intensities, low contrast resolution and imaging artifacts. In this study, we propose a Context-Aware Semantics-driven Hierarchical Network (referred to as CASHNet), which hierarchically integrates context-aware semantics-encoded features into a coarse-to-fine registration scheme, to explicitly enhance semantic structural perception during progressive alignment. Moreover, it leverages diffeomorphisms to integrate rigid and non-rigid registration within a single end-to-end trainable network, enabling anatomically plausible deformations and preserving topological consistency. CASHNet comprises a Siamese Mamba-based multi-scale feature encoder and a coarse-to-fine registration decoder, which integrates a Rigid Registration (RR) module with multiple Semantics-guided Velocity Estimation and Feature Alignment (SVEFA) modules operating at different resolutions. Each SVEFA module comprises three carefully designed components: i) a cross-resolution feature aggregation (CFA) component that synthesizes enhanced global contextual representations, ii) a semantics perception and encoding (SPE) component that captures and encodes local semantic information, and iii) an incremental velocity estimation and feature alignment (IVEFA) component that leverages contextual and semantic features to update velocity fields and to align features. These modules work synergistically to boost the overall registration performance. Extensive experiments on three typical yet challenging CT-CBCT datasets of both soft and hard tissues demonstrate the superiority of our proposed method over other state-of-the-art methods. The code will be publicly available at https://github.com/xiaorugao999/CASHNet
{"title":"CASHNet: Context-Aware Semantics-Driven Hierarchical Network for Hybrid Diffeomorphic CT-CBCT Image Registration","authors":"Xiaoru Gao;Housheng Xie;Donghua Hang;Guoyan Zheng","doi":"10.1109/TMI.2025.3607700","DOIUrl":"10.1109/TMI.2025.3607700","url":null,"abstract":"Computed Tomography (CT) to Cone-Beam Computed Tomography (CBCT) image registration is crucial for image-guided radiotherapy and surgical procedures. However, achieving accurate CT-CBCT registration remains challenging due to various factors such as inconsistent intensities, low contrast resolution and imaging artifacts. In this study, we propose a Context-Aware Semantics-driven Hierarchical Network (referred to as CASHNet), which hierarchically integrates context-aware semantics-encoded features into a coarse-to-fine registration scheme, to explicitly enhance semantic structural perception during progressive alignment. Moreover, it leverages diffeomorphisms to integrate rigid and non-rigid registration within a single end-to-end trainable network, enabling anatomically plausible deformations and preserving topological consistency. CASHNet comprises a Siamese Mamba-based multi-scale feature encoder and a coarse-to-fine registration decoder, which integrates a Rigid Registration (RR) module with multiple Semantics-guided Velocity Estimation and Feature Alignment (SVEFA) modules operating at different resolutions. Each SVEFA module comprises three carefully designed components: i) a cross-resolution feature aggregation (CFA) component that synthesizes enhanced global contextual representations, ii) a semantics perception and encoding (SPE) component that captures and encodes local semantic information, and iii) an incremental velocity estimation and feature alignment (IVEFA) component that leverages contextual and semantic features to update velocity fields and to align features. These modules work synergistically to boost the overall registration performance. Extensive experiments on three typical yet challenging CT-CBCT datasets of both soft and hard tissues demonstrate the superiority of our proposed method over other state-of-the-art methods. The code will be publicly available at <uri>https://github.com/xiaorugao999/CASHNet</uri>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"45 2","pages":"825-842"},"PeriodicalIF":0.0,"publicationDate":"2025-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145025299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mammography is a primary method for early screening, and developing deep learning-based computer-aided systems is of great significance. However, current deep learning models typically treat each image as an independent entity for diagnosis, rather than integrating images from multiple views to diagnose the patient. These methods do not fully consider and address the complex interactions between different views, resulting in poor diagnostic performance and interpretability. To address this issue, this paper proposes a novel end-to-end framework for breast cancer diagnosis: lesion asymmetry screening assisted global awareness multi-view network (LAS-GAM). More than just the most common image-level diagnostic model, LAS-GAM operates at the patient level, simulating the workflow of radiologists analyzing mammographic images. The framework processes the four views of a patient and revolves around two key modules: a global module and a lesion screening module. The global module simulates the comprehensive assessment by radiologists, integrating complementary information from the craniocaudal (CC) and mediolateral oblique (MLO) views of both breasts to generate global features that represent the patient’s overall condition. The lesion screening module mimics the process of locating lesions by comparing symmetric regions in contralateral views, identifying potential lesion areas and extracting lesion-specific features using a lightweight model. By combining the global features and lesion-specific features, LAS-GAM simulates the diagnostic process, making patient-level predictions. Moreover, it is trained using only patient-level labels, significantly reducing data annotation costs. Experiments on the Digital Database for Screening Mammography (DDSM) and In-house datasets validate LAS-GAM, achieving AUCs of 0.817 and 0.894, respectively.
{"title":"Lesion Asymmetry Screening Assisted Global Awareness Multi-View Network for Mammogram Classification","authors":"Xinchuan Liu;Luhao Sun;Chao Li;Bowen Han;Wenzong Jiang;Tianhao Yuan;Weifeng Liu;Zhaoyun Liu;Zhiyong Yu;Baodi Liu","doi":"10.1109/TMI.2025.3607877","DOIUrl":"10.1109/TMI.2025.3607877","url":null,"abstract":"Mammography is a primary method for early screening, and developing deep learning-based computer-aided systems is of great significance. However, current deep learning models typically treat each image as an independent entity for diagnosis, rather than integrating images from multiple views to diagnose the patient. These methods do not fully consider and address the complex interactions between different views, resulting in poor diagnostic performance and interpretability. To address this issue, this paper proposes a novel end-to-end framework for breast cancer diagnosis: lesion asymmetry screening assisted global awareness multi-view network (LAS-GAM). More than just the most common image-level diagnostic model, LAS-GAM operates at the patient level, simulating the workflow of radiologists analyzing mammographic images. The framework processes the four views of a patient and revolves around two key modules: a global module and a lesion screening module. The global module simulates the comprehensive assessment by radiologists, integrating complementary information from the craniocaudal (CC) and mediolateral oblique (MLO) views of both breasts to generate global features that represent the patient’s overall condition. The lesion screening module mimics the process of locating lesions by comparing symmetric regions in contralateral views, identifying potential lesion areas and extracting lesion-specific features using a lightweight model. By combining the global features and lesion-specific features, LAS-GAM simulates the diagnostic process, making patient-level predictions. Moreover, it is trained using only patient-level labels, significantly reducing data annotation costs. Experiments on the Digital Database for Screening Mammography (DDSM) and In-house datasets validate LAS-GAM, achieving AUCs of 0.817 and 0.894, respectively.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"45 2","pages":"777-788"},"PeriodicalIF":0.0,"publicationDate":"2025-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145025300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-08DOI: 10.1109/TMI.2025.3607113
Zihao Yuan;Jiaqing Chen;Han Qiu;Houxiang Wang;Yangxin Huang;Fuchun Lin
Analyzing the spontaneous activity of the human brain using dynamic approaches can reveal functional organizations. The co-activation pattern (CAP) analysis of signals from different brain regions is used to characterize brain neural networks that may serve specialized functions. However, CAP is based on spatial information but ignores temporal reproducible transition patterns, and lacks robustness to low signal-to-noise rate (SNR) data. To address these issues, this study proposes a new CAP framework based on hidden semi-Markov model (HSMM) called HSMM-CAP analysis, which can be performed to investigate spatiotemporal CAPs (stCAPs) of the brain. HSMM-CAP uses empirical spatial distributions of stCAPs as emission models, and assumes that the state sequence of stCAPs follows a semi-Markov process. Based on the assumptions of sparsity, heterogeneity, and semi-Markov property of stCAPs, the HSMM-CAP-K-means method is constructed to infer the state sequence and transition parameters of stCAPs. In addition, HSMM-CAP provides the inverse relationship between the number of states and sparsity. Simulation studies verify the performance of HSMM-CAP at different levels of SNR. The spatiotemporal dynamics of stCAPs are also revealed by the proposed method on real-world resting-state fMRI data. Our method provides a new data-driven computational framework for revealing the brain spatiotemporal dynamics of resting-state fMRI data.
{"title":"Co-Activation Pattern Analysis Based on Hidden Semi-Markov Model for Brain Spatiotemporal Dynamics","authors":"Zihao Yuan;Jiaqing Chen;Han Qiu;Houxiang Wang;Yangxin Huang;Fuchun Lin","doi":"10.1109/TMI.2025.3607113","DOIUrl":"10.1109/TMI.2025.3607113","url":null,"abstract":"Analyzing the spontaneous activity of the human brain using dynamic approaches can reveal functional organizations. The co-activation pattern (CAP) analysis of signals from different brain regions is used to characterize brain neural networks that may serve specialized functions. However, CAP is based on spatial information but ignores temporal reproducible transition patterns, and lacks robustness to low signal-to-noise rate (SNR) data. To address these issues, this study proposes a new CAP framework based on hidden semi-Markov model (HSMM) called HSMM-CAP analysis, which can be performed to investigate spatiotemporal CAPs (stCAPs) of the brain. HSMM-CAP uses empirical spatial distributions of stCAPs as emission models, and assumes that the state sequence of stCAPs follows a semi-Markov process. Based on the assumptions of sparsity, heterogeneity, and semi-Markov property of stCAPs, the HSMM-CAP-K-means method is constructed to infer the state sequence and transition parameters of stCAPs. In addition, HSMM-CAP provides the inverse relationship between the number of states and sparsity. Simulation studies verify the performance of HSMM-CAP at different levels of SNR. The spatiotemporal dynamics of stCAPs are also revealed by the proposed method on real-world resting-state fMRI data. Our method provides a new data-driven computational framework for revealing the brain spatiotemporal dynamics of resting-state fMRI data.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"45 2","pages":"843-852"},"PeriodicalIF":0.0,"publicationDate":"2025-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145017614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-03DOI: 10.1109/TMI.2025.3605617
Weiren Zhao;Lanfeng Zhong;Xin Liao;Wenjun Liao;Sichuan Zhang;Shaoting Zhang;Guotai Wang
Semi-Supervised Learning (SSL) is important for reducing the annotation cost for medical image segmentation models. State-of-the-art SSL methods such as Mean Teacher, FixMatch and Cross Pseudo Supervision (CPS) are mainly based on consistency regularization or pseudo-label supervision between a reference prediction and a supervised prediction. Despite the effectiveness, they have overlooked the potential noise in the labeled data, and mainly focus on strategies to generate the reference prediction, while ignoring the heterogeneous values of different unlabeled pixels. We argue that effectively mining the rich information contained by the two predictions in the loss function, instead of the specific strategy to obtain a reference prediction, is more essential for SSL, and propose a universal framework MetaSSL based on a spatially heterogeneous loss that assigns different weights to pixels by simultaneously leveraging the uncertainty and consistency information between the reference and supervised predictions. Specifically, we split the predictions on unlabeled data into four regions with decreasing weights in the loss: Unanimous and Confident (UC), Unanimous and Suspicious (US), Discrepant and Confident (DC), and Discrepant and Suspicious (DS), where an adaptive threshold is proposed to distinguish confident predictions from suspicious ones. The heterogeneous loss is also applied to labeled images for robust learning considering the potential annotation noise. Our method is plug-and-play and general to most existing SSL methods. The experimental results showed that it improved the segmentation performance significantly when integrated with existing SSL frameworks on different datasets. Code is available at https://github.com/HiLab-git/MetaSSL
{"title":"MetaSSL: A General Heterogeneous Loss for Semi-Supervised Medical Image Segmentation","authors":"Weiren Zhao;Lanfeng Zhong;Xin Liao;Wenjun Liao;Sichuan Zhang;Shaoting Zhang;Guotai Wang","doi":"10.1109/TMI.2025.3605617","DOIUrl":"10.1109/TMI.2025.3605617","url":null,"abstract":"Semi-Supervised Learning (SSL) is important for reducing the annotation cost for medical image segmentation models. State-of-the-art SSL methods such as Mean Teacher, FixMatch and Cross Pseudo Supervision (CPS) are mainly based on consistency regularization or pseudo-label supervision between a reference prediction and a supervised prediction. Despite the effectiveness, they have overlooked the potential noise in the labeled data, and mainly focus on strategies to generate the reference prediction, while ignoring the heterogeneous values of different unlabeled pixels. We argue that effectively mining the rich information contained by the two predictions in the loss function, instead of the specific strategy to obtain a reference prediction, is more essential for SSL, and propose a universal framework <bold>MetaSSL</b> based on a spatially heterogeneous loss that assigns different weights to pixels by simultaneously leveraging the uncertainty and consistency information between the reference and supervised predictions. Specifically, we split the predictions on unlabeled data into four regions with decreasing weights in the loss: Unanimous and Confident (UC), Unanimous and Suspicious (US), Discrepant and Confident (DC), and Discrepant and Suspicious (DS), where an adaptive threshold is proposed to distinguish confident predictions from suspicious ones. The heterogeneous loss is also applied to labeled images for robust learning considering the potential annotation noise. Our method is plug-and-play and general to most existing SSL methods. The experimental results showed that it improved the segmentation performance significantly when integrated with existing SSL frameworks on different datasets. Code is available at <uri>https://github.com/HiLab-git/MetaSSL</uri>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"45 2","pages":"751-763"},"PeriodicalIF":0.0,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144987556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recently, single-source domain generalization (SDG) has gained popularity in medical image segmentation. As a prominent technique, adversarial image augmentation technique can generate synthetic training data that are challenging for the segmentation model to recognize. To avoid the over-augmentation problem, existing adversarial-based works often employ augmenters with relatively simple structures for medical images, typically operating at the image level, limiting the diversity of the augmented images. In this paper, we propose a Teacher-Student Instance-level Adversarial Augmentation (TSIAA) model for generalized medical image segmentation. The objective of TSIAA is to derive domain-generalizable representations by exploring out-of-source data distributions. First, we construct an Instance-level Image Augmenter (IIAG) using several Instance-level Augmentation Modules (IAMs), which are based on the learnable constrained Bèzier transformation function. Compared to image-level adversarial augmentation, instance-level adversarial augmentation breaks the uniformity of augmentation rules across different structures within an image, thereby providing greater diversity. Then, TSIAA conducts Teacher-Student (TS) learning through an adversarial approach, alternating novel image augmentation and generalized representation learning. The former delves into out-of-source and plausible data, while the latter continuously updates both the student and teacher to ensure the original and augmented features maintain consistent and generalized characteristics. By integrating both strategies, our proposed TSIAA model achieves significant improvements over state-of-the-art methods in four challenging SDG tasks. The code can be accessed at https://github.com/Wangzs0228/TSIAA
{"title":"Teacher–Student Instance-Level Adversarial Augmentation for Single Domain Generalized Medical Image Segmentation","authors":"Zhengshan Wang;Long Chen;Xuelin Xie;Yang Zhang;Yunpeng Cai;Weiping Ding","doi":"10.1109/TMI.2025.3605162","DOIUrl":"10.1109/TMI.2025.3605162","url":null,"abstract":"Recently, single-source domain generalization (SDG) has gained popularity in medical image segmentation. As a prominent technique, adversarial image augmentation technique can generate synthetic training data that are challenging for the segmentation model to recognize. To avoid the over-augmentation problem, existing adversarial-based works often employ augmenters with relatively simple structures for medical images, typically operating at the image level, limiting the diversity of the augmented images. In this paper, we propose a Teacher-Student Instance-level Adversarial Augmentation (TSIAA) model for generalized medical image segmentation. The objective of TSIAA is to derive domain-generalizable representations by exploring out-of-source data distributions. First, we construct an Instance-level Image Augmenter (IIAG) using several Instance-level Augmentation Modules (IAMs), which are based on the learnable constrained Bèzier transformation function. Compared to image-level adversarial augmentation, instance-level adversarial augmentation breaks the uniformity of augmentation rules across different structures within an image, thereby providing greater diversity. Then, TSIAA conducts Teacher-Student (TS) learning through an adversarial approach, alternating novel image augmentation and generalized representation learning. The former delves into out-of-source and plausible data, while the latter continuously updates both the student and teacher to ensure the original and augmented features maintain consistent and generalized characteristics. By integrating both strategies, our proposed TSIAA model achieves significant improvements over state-of-the-art methods in four challenging SDG tasks. The code can be accessed at <uri>https://github.com/Wangzs0228/TSIAA</uri>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"45 2","pages":"764-776"},"PeriodicalIF":0.0,"publicationDate":"2025-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144930697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Motion estimation of left ventricle myocardium on the cardiac image sequence is crucial for assessing cardiac function. However, the intensity variation of cardiac image sequences brings the challenge of uncertain interference to myocardial motion estimation. Such imaging-related uncertain interference appears in different cardiac imaging modalities. We propose adaptive sequential Bayesian iterative learning to overcome the challenge. Specifically, our method applies the adaptive structural inference to state transition and observation to cope with a complex myocardial motion under uncertain setting. In state transition, adaptive structural inference establishes a hierarchical structure recurrence to obtain the complex latent representation of cardiac image sequences. In state observation, the adaptive structural inference forms a chain structure mapping to correlate the latent representation of the cardiac image sequence with that of the motion. Extensive experiments on US, CMR, and TMR datasets concerning 1270 patients (650 patients for CMR, 500 patients for US and 120 patients for TMR) have shown the effectiveness of our method, as well as the superiority to eight state-of-the-art motion estimation methods.
{"title":"Adaptive Sequential Bayesian Iterative Learning for Myocardial Motion Estimation on Cardiac Image Sequences","authors":"Shuxin Zhuang;Heye Zhang;Dong Liang;Hui Liu;Zhifan Gao","doi":"10.1109/TMI.2025.3599487","DOIUrl":"10.1109/TMI.2025.3599487","url":null,"abstract":"Motion estimation of left ventricle myocardium on the cardiac image sequence is crucial for assessing cardiac function. However, the intensity variation of cardiac image sequences brings the challenge of uncertain interference to myocardial motion estimation. Such imaging-related uncertain interference appears in different cardiac imaging modalities. We propose adaptive sequential Bayesian iterative learning to overcome the challenge. Specifically, our method applies the adaptive structural inference to state transition and observation to cope with a complex myocardial motion under uncertain setting. In state transition, adaptive structural inference establishes a hierarchical structure recurrence to obtain the complex latent representation of cardiac image sequences. In state observation, the adaptive structural inference forms a chain structure mapping to correlate the latent representation of the cardiac image sequence with that of the motion. Extensive experiments on US, CMR, and TMR datasets concerning 1270 patients (650 patients for CMR, 500 patients for US and 120 patients for TMR) have shown the effectiveness of our method, as well as the superiority to eight state-of-the-art motion estimation methods.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"45 1","pages":"406-420"},"PeriodicalIF":0.0,"publicationDate":"2025-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144877629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Automatic anatomical localization is critical for radiology report generation. While many studies focus on lesion detection and segmentation, anatomical localization—accurately describing lesion positions in radiology reports—has received less attention. Conventional segmentation-based methods are limited to organ-level localization and often fail in severe disease cases due to low segmentation accuracy. To address these limitations, we reformulate anatomical localization as an image-to-text retrieval task. Specifically, we propose a CLIP-based framework that aligns lesion image patches with anatomically descriptive text embeddings in a shared multimodal space. By projecting lesion features into the semantic space and retrieving the most relevant anatomical descriptions in a coarse-to-fine manner, our method achieves fine-grained lesion localization with high accuracy across the entire body. Our main contributions are as follows: (1) hierarchical anatomical retrieval, which organizes 387 locations into a two-level hierarchy, by retrieving from the first level of 124 coarse categories to narrow down the search space and reduce localization complexity; (2) augmented location descriptions, which integrate domain-specific anatomical knowledge for enhancing semantic representation and improving visual—text alignment; and (3) semi-hard negative sample mining, which improves training stability and discriminative learning by avoiding selecting the overly similar negative samples that may introduce label noise or semantic ambiguity. We validate our method on two whole-body PET/CT datasets, achieving an 84.13% localization accuracy on the internal test set and 80.42% on the external test set, with a per-lesion inference time of 34 ms. The proposed framework also demonstrated superior robustness in complex clinical cases compared to segmentation-based approaches.
{"title":"Hierarchical Contrastive Learning for Precise Whole-Body Anatomical Localization in PET/CT Imaging","authors":"Yaozong Gao;Yiran Shu;Mingyang Yu;Yanbo Chen;Jingyu Liu;Shaonan Zhong;Weifang Zhang;Yiqiang Zhan;Xiang Sean Zhou;Xinlu Wang;Meixin Zhao;Dinggang Shen","doi":"10.1109/TMI.2025.3599197","DOIUrl":"10.1109/TMI.2025.3599197","url":null,"abstract":"Automatic anatomical localization is critical for radiology report generation. While many studies focus on lesion detection and segmentation, anatomical localization—accurately describing lesion positions in radiology reports—has received less attention. Conventional segmentation-based methods are limited to organ-level localization and often fail in severe disease cases due to low segmentation accuracy. To address these limitations, we reformulate anatomical localization as an image-to-text retrieval task. Specifically, we propose a CLIP-based framework that aligns lesion image patches with anatomically descriptive text embeddings in a shared multimodal space. By projecting lesion features into the semantic space and retrieving the most relevant anatomical descriptions in a coarse-to-fine manner, our method achieves fine-grained lesion localization with high accuracy across the entire body. Our main contributions are as follows: (1) hierarchical anatomical retrieval, which organizes 387 locations into a two-level hierarchy, by retrieving from the first level of 124 coarse categories to narrow down the search space and reduce localization complexity; (2) augmented location descriptions, which integrate domain-specific anatomical knowledge for enhancing semantic representation and improving visual—text alignment; and (3) semi-hard negative sample mining, which improves training stability and discriminative learning by avoiding selecting the overly similar negative samples that may introduce label noise or semantic ambiguity. We validate our method on two whole-body PET/CT datasets, achieving an 84.13% localization accuracy on the internal test set and 80.42% on the external test set, with a per-lesion inference time of 34 ms. The proposed framework also demonstrated superior robustness in complex clinical cases compared to segmentation-based approaches.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"45 1","pages":"391-405"},"PeriodicalIF":0.0,"publicationDate":"2025-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144877630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}