Pub Date : 2025-10-10DOI: 10.1109/TMI.2025.3618754
Mengzhou Li;Chuang Niu;Ge Wang;Maya R. Amma;Krishna M. Chapagain;Stefan Gabrielson;Andrew Li;Kevin Jonker;Niels de Ruiter;Jennifer A. Clark;Phil Butler;Anthony Butler;Hengyong Yu
X-ray photon-counting computed tomography (PCCT) for extremity allows multi-energy high-resolution (HR) imaging but its radiation dose can be further improved. Despite the great potential of deep learning techniques, their application in HR volumetric PCCT reconstruction has been challenged by the large memory burden, training data scarcity, and domain gap issues. In this paper, we propose a deep learning-based approach for PCCT image reconstruction at halved dose and doubled speed validated in a New Zealand clinical trial. Specifically, we design a patch-based volumetric refinement network to alleviate the GPU memory limitation, train network with synthetic data, and use model-based iterative refinement to bridge the gap between synthetic and clinical data. Our results in a reader study of 8 patients from the clinical trial demonstrate a great potential to cut the radiation dose to half that of the clinical PCCT standard without compromising image quality and diagnostic value.
{"title":"Deep Few-View High-Resolution Photon-Counting CT at Halved Dose for Extremity Imaging","authors":"Mengzhou Li;Chuang Niu;Ge Wang;Maya R. Amma;Krishna M. Chapagain;Stefan Gabrielson;Andrew Li;Kevin Jonker;Niels de Ruiter;Jennifer A. Clark;Phil Butler;Anthony Butler;Hengyong Yu","doi":"10.1109/TMI.2025.3618754","DOIUrl":"10.1109/TMI.2025.3618754","url":null,"abstract":"X-ray photon-counting computed tomography (PCCT) for extremity allows multi-energy high-resolution (HR) imaging but its radiation dose can be further improved. Despite the great potential of deep learning techniques, their application in HR volumetric PCCT reconstruction has been challenged by the large memory burden, training data scarcity, and domain gap issues. In this paper, we propose a deep learning-based approach for PCCT image reconstruction at halved dose and doubled speed validated in a New Zealand clinical trial. Specifically, we design a patch-based volumetric refinement network to alleviate the GPU memory limitation, train network with synthetic data, and use model-based iterative refinement to bridge the gap between synthetic and clinical data. Our results in a reader study of 8 patients from the clinical trial demonstrate a great potential to cut the radiation dose to half that of the clinical PCCT standard without compromising image quality and diagnostic value.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"45 3","pages":"1193-1207"},"PeriodicalIF":0.0,"publicationDate":"2025-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145260747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-09DOI: 10.1109/TMI.2025.3619837
Yong Chen;Xiangde Luo;Renyi Chen;Yiyue Li;Han Zhang;He Lyu;Huan Song;Kang Li
Domain adaptation in medical image segmentation enables pre-trained models to generalize to new target domains. Given limited annotated data and privacy constraints, Source-Free Active Domain Adaptation (SFADA) methods provide promising solutions by selecting a few target samples for labeling without accessing source samples. However, in a fully source-free setting, existing works have not fully explored how to select these target samples in a class-balanced manner and how to conduct robust model adaptation using both labeled and unlabeled samples. In this study, we discover that boundary samples with source-like semantics but sharp predictive discrepancies are beneficial for SFADA. We define these samples as the most influential points and propose a slice-wise framework using influential points learning to explore them. Specifically, we detect source-like samples to retain source-specific knowledge. For each target sample, an adaptive K-nearest neighbor algorithm based on local density is introduced to construct neighborhoods of source-like samples for knowledge transfer. We then propose a class-balanced Kullback-Leibler divergence for these neighborhoods, calculating it to obtain an influential score ranking. A diverse subset of the highest-ranked target samples (considered influential points) is manually annotated. Furthermore, we design a progressive teacher model to facilitate SFADA for medical image segmentation. With the guidance of influential points, this model independently generates and utilizes pseudo-labels to mitigate error accumulation. To further suppress noise, curriculum learning is incorporated into the model to progressively leverage reliable supervision signals from pseudo-labels. Experiments on multiple benchmarks demonstrate that our method outperforms state-of-the-art methods even with only 2.5% of the labeling budget.
{"title":"Source-Free Active Domain Adaptation via Influential-Points-Guided Progressive Teacher for Medical Image Segmentation","authors":"Yong Chen;Xiangde Luo;Renyi Chen;Yiyue Li;Han Zhang;He Lyu;Huan Song;Kang Li","doi":"10.1109/TMI.2025.3619837","DOIUrl":"10.1109/TMI.2025.3619837","url":null,"abstract":"Domain adaptation in medical image segmentation enables pre-trained models to generalize to new target domains. Given limited annotated data and privacy constraints, Source-Free Active Domain Adaptation (SFADA) methods provide promising solutions by selecting a few target samples for labeling without accessing source samples. However, in a fully source-free setting, existing works have not fully explored how to select these target samples in a class-balanced manner and how to conduct robust model adaptation using both labeled and unlabeled samples. In this study, we discover that boundary samples with source-like semantics but sharp predictive discrepancies are beneficial for SFADA. We define these samples as the most influential points and propose a slice-wise framework using influential points learning to explore them. Specifically, we detect source-like samples to retain source-specific knowledge. For each target sample, an adaptive K-nearest neighbor algorithm based on local density is introduced to construct neighborhoods of source-like samples for knowledge transfer. We then propose a class-balanced Kullback-Leibler divergence for these neighborhoods, calculating it to obtain an influential score ranking. A diverse subset of the highest-ranked target samples (considered influential points) is manually annotated. Furthermore, we design a progressive teacher model to facilitate SFADA for medical image segmentation. With the guidance of influential points, this model independently generates and utilizes pseudo-labels to mitigate error accumulation. To further suppress noise, curriculum learning is incorporated into the model to progressively leverage reliable supervision signals from pseudo-labels. Experiments on multiple benchmarks demonstrate that our method outperforms state-of-the-art methods even with only 2.5% of the labeling budget.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"45 3","pages":"1223-1236"},"PeriodicalIF":0.0,"publicationDate":"2025-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145254811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Diffusion-weighted MRI (dMRI) is increasingly used to study the normal and abnormal development of fetal brain in-utero. It offers invaluable insights into the neurodevelopmental processes in the fetal stage. However, reliable analysis of fetal dMRI data requires dedicated computational methods that are currently unavailable. The lack of automated methods for fast, accurate, and reproducible data analysis has seriously limited our ability to tap the potential of fetal brain dMRI for medical and scientific applications. In this work, we developed and validated a unified computational framework to:1) segment the brain tissue into white matter, cortical/subcortical gray matter, and cerebrospinal fluid,:2) segment 31 distinct white matter tracts, and:3) parcellate the brain’s cortex, deep gray nuclei, and white matter structures into 96 anatomically meaningful regions. We utilized a set of manual, semi-automatic, and automatic approaches to annotate 97 fetal brains. Using these labels, we developed and validated a multi-task deep learning method to perform the three computations. Evaluations show that the new method can accurately carry out all three tasks, achieving a mean Dice similarity coefficient of 0.865 on tissue segmentation, 0.825 on white matter tract segmentation, and 0.819 on parcellation. Further validation on independent external data shows generalizability of the proposed method. The new method can help advance the field of fetal neuroimaging as it can lead to substantial improvements in fetal brain tractography, tract-specific analysis, and structural connectivity assessment.
{"title":"Detailed Delineation of the Fetal Brain in Diffusion MRI via Multi-Task Learning","authors":"Davood Karimi;Camilo Calixto;Haykel Snoussi;Bo Li;Maria Camila Cortes-Albornoz;Clemente Velasco-Annis;Caitlin Rollins;Lana Pierotich;Camilo Jaimes;Ali Gholipour;Simon K. Warfield","doi":"10.1109/TMI.2025.3619809","DOIUrl":"10.1109/TMI.2025.3619809","url":null,"abstract":"Diffusion-weighted MRI (dMRI) is increasingly used to study the normal and abnormal development of fetal brain in-utero. It offers invaluable insights into the neurodevelopmental processes in the fetal stage. However, reliable analysis of fetal dMRI data requires dedicated computational methods that are currently unavailable. The lack of automated methods for fast, accurate, and reproducible data analysis has seriously limited our ability to tap the potential of fetal brain dMRI for medical and scientific applications. In this work, we developed and validated a unified computational framework to:1) segment the brain tissue into white matter, cortical/subcortical gray matter, and cerebrospinal fluid,:2) segment 31 distinct white matter tracts, and:3) parcellate the brain’s cortex, deep gray nuclei, and white matter structures into 96 anatomically meaningful regions. We utilized a set of manual, semi-automatic, and automatic approaches to annotate 97 fetal brains. Using these labels, we developed and validated a multi-task deep learning method to perform the three computations. Evaluations show that the new method can accurately carry out all three tasks, achieving a mean Dice similarity coefficient of 0.865 on tissue segmentation, 0.825 on white matter tract segmentation, and 0.819 on parcellation. Further validation on independent external data shows generalizability of the proposed method. The new method can help advance the field of fetal neuroimaging as it can lead to substantial improvements in fetal brain tractography, tract-specific analysis, and structural connectivity assessment.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"45 3","pages":"1208-1222"},"PeriodicalIF":0.0,"publicationDate":"2025-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145255596","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-07DOI: 10.1109/TMI.2025.3618711
Xiaoyu Wang;Jinyu Zheng;Chaolu Feng;Lian-Ming Wu
Major adverse cardiac events (MACE) pose a high life-threatening risk to patients with arrhythmogenic right ventricular cardiomyopathy (ARVC). Cardiac magnetic resonance (CMR) has been proven to reflect the risk of MACE, but two challenges remain: limited dataset size due to the rarity of ARVC and overlapping image distributions between non-MACE and MACE patients. To address these challenges by fully leveraging the dynamic and spatial information in the limited CMR dataset, a deep learning-based risk prediction model named Three-Tier Spatiotemporal Transformer (TTST) is proposed in this paper, which utilizes three transformer-based tiers to sequentially extract and fuse features from three domains: the 2D spatial domain of each slice, the temporal dimension of slice sequence and the inter-slice depth dimension. In TTST, a pericardial adipose tissue (PAT) embedding unit is proposed to incorporate the dynamic and positional information of PAT, a key biomarker for distinguishing MACE from non-MACE based on its thickening and reduced motion, as prior knowledge to reduce reliance on large-scale datasets. Additionally, a patch voting unit is introduced to pick out local features that highlight more indicative regions in the heart, guided by the PAT embedding information. Experimental results demonstrate that TTST outperforms existing classification methods in MACE prediction (internal: AUC = 0.89, ACC = 84.02%; external: AUC = 0.87, ACC = 86.21%). Clinically, TTST achieves effective risk prediction performance either independently (C-index = 0.744) or in combination with the existing 5-year risk score model (increasing C-index from 0.686 to 0.777). Code and dataset are accessible at https://github.com/DFLAG-NEU
{"title":"MACE Risk Prediction in ARVC Patients via CMR: A Three-Tier Spatiotemporal Transformer With Pericardial Adipose Tissue Embedding","authors":"Xiaoyu Wang;Jinyu Zheng;Chaolu Feng;Lian-Ming Wu","doi":"10.1109/TMI.2025.3618711","DOIUrl":"10.1109/TMI.2025.3618711","url":null,"abstract":"Major adverse cardiac events (MACE) pose a high life-threatening risk to patients with arrhythmogenic right ventricular cardiomyopathy (ARVC). Cardiac magnetic resonance (CMR) has been proven to reflect the risk of MACE, but two challenges remain: limited dataset size due to the rarity of ARVC and overlapping image distributions between non-MACE and MACE patients. To address these challenges by fully leveraging the dynamic and spatial information in the limited CMR dataset, a deep learning-based risk prediction model named Three-Tier Spatiotemporal Transformer (TTST) is proposed in this paper, which utilizes three transformer-based tiers to sequentially extract and fuse features from three domains: the 2D spatial domain of each slice, the temporal dimension of slice sequence and the inter-slice depth dimension. In TTST, a pericardial adipose tissue (PAT) embedding unit is proposed to incorporate the dynamic and positional information of PAT, a key biomarker for distinguishing MACE from non-MACE based on its thickening and reduced motion, as prior knowledge to reduce reliance on large-scale datasets. Additionally, a patch voting unit is introduced to pick out local features that highlight more indicative regions in the heart, guided by the PAT embedding information. Experimental results demonstrate that TTST outperforms existing classification methods in MACE prediction (internal: AUC = 0.89, ACC = 84.02%; external: AUC = 0.87, ACC = 86.21%). Clinically, TTST achieves effective risk prediction performance either independently (C-index = 0.744) or in combination with the existing 5-year risk score model (increasing C-index from 0.686 to 0.777). Code and dataset are accessible at <uri>https://github.com/DFLAG-NEU</uri>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"45 3","pages":"1179-1192"},"PeriodicalIF":0.0,"publicationDate":"2025-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145241078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-07DOI: 10.1109/TMI.2025.3618683
Pengli Zhu;Yingji Fu;Nanguang Chen;Anqi Qiu
Diffusion-weighted imaging (DWI) enables non-invasive characterization of tissue microstructure, yet acquiring densely sampled q-space data remains time-consuming and impractical in many clinical settings. Existing deep learning methods are typically constrained by fixed q-space sampling, limiting their adaptability to variable sampling scenarios. In this paper, we propose a Q-space Guided Multi-Modal Translation Network (Q-MMTN) for synthesizing multi-shell, high-angular resolution DWI (MS-HARDI) from flexible q-space sampling, leveraging commonly acquired structural data (e.g., T1- and T2-weighted MRI). Q-MMTN integrates the hybrid encoder and multi-modal attention fusion mechanism to effectively extract both local and global complementary information from multiple modalities. This design enhances feature representation and, together with a flexible q-space-aware embedding, enables dynamic modulation of internal features without relying on fixed sampling schemes. Additionally, we introduce a set of task-specific constraints, including adversarial, reconstruction, and anatomical consistency losses, which jointly enforce anatomical fidelity and signal realism. These constraints guide Q-MMTN to accurately capture the intrinsic and nonlinear relationships between directional DWI signals and q-space information. Extensive experiments across four lifespan datasets of children, adolescents, young and older adults demonstrate that Q-MMTN outperforms existing methods, including 1D-qDL, 2D-qDL, MESC-SD, and Q-GAN in estimating parameter maps and fiber tracts with fine-grained anatomical details. Notably, its ability to accommodate flexible q-space sampling highlights its potential as a promising toolkit for clinical and research applications. Our code is available at https://github.com/Idea89560041/Q-MMTN
{"title":"Q-Space Guided Multi-Modal Translation Network for Diffusion-Weighted Image Synthesis","authors":"Pengli Zhu;Yingji Fu;Nanguang Chen;Anqi Qiu","doi":"10.1109/TMI.2025.3618683","DOIUrl":"10.1109/TMI.2025.3618683","url":null,"abstract":"Diffusion-weighted imaging (DWI) enables non-invasive characterization of tissue microstructure, yet acquiring densely sampled q-space data remains time-consuming and impractical in many clinical settings. Existing deep learning methods are typically constrained by fixed q-space sampling, limiting their adaptability to variable sampling scenarios. In this paper, we propose a Q-space Guided Multi-Modal Translation Network (Q-MMTN) for synthesizing multi-shell, high-angular resolution DWI (MS-HARDI) from <italic>flexible q-space sampling</i>, leveraging commonly acquired structural data (e.g., T1- and T2-weighted MRI). Q-MMTN integrates the <italic>hybrid encoder</i> and <italic>multi-modal attention fusion mechanism</i> to effectively extract both local and global complementary information from multiple modalities. This design enhances feature representation and, together with a <italic>flexible q-space-aware embedding</i>, enables dynamic modulation of internal features <italic>without relying on fixed sampling schemes</i>. Additionally, we introduce a set of <italic>task-specific constraints</i>, including <italic>adversarial</i>, <italic>reconstruction</i>, and <italic>anatomical consistency losses</i>, which jointly enforce anatomical fidelity and signal realism. These constraints guide Q-MMTN to accurately capture the intrinsic and nonlinear relationships between directional DWI signals and q-space information. Extensive experiments across four lifespan datasets of children, adolescents, young and older adults demonstrate that Q-MMTN outperforms existing methods, including 1D-qDL, 2D-qDL, MESC-SD, and Q-GAN in estimating parameter maps and fiber tracts with fine-grained anatomical details. <italic>Notably, its ability to accommodate flexible q-space sampling highlights its potential as a promising toolkit for clinical and research applications.</i> Our code is available at <uri>https://github.com/Idea89560041/Q-MMTN</uri>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"45 3","pages":"1167-1178"},"PeriodicalIF":0.0,"publicationDate":"2025-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145240934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-25DOI: 10.1109/TMI.2025.3608467
Yanglong He;Rongjun Ge;Hui Tang;Yuxin Liu;Mengqing Su;Jean-Louis Coatrieux;Huazhong Shu;Yang Chen;Yuting He
In the field of medical image processing, vascular image segmentation plays a crucial role in clinical diagnosis, treatment planning, prognosis, and medical decision-making. Accurate and automated segmentation of vascular images can assist clinicians in understanding the vascular network structure, leading to more informed medical decisions. However, manual annotation of vascular images is time-consuming and challenging due to the fine and low-contrast vascular branches, especially in the medical imaging domain where annotation requires specialized knowledge and clinical expertise. Data-driven deep learning models struggle to achieve good performance when only a small number of annotated vascular images are available. To address this issue, this paper proposes a novel Conditional Virtual Imaging (CVI) framework for few-shot vascular image segmentation learning. The framework combines limited annotated data with extensive unlabeled data to generate high-quality images, effectively improving the accuracy and robustness of segmentation learning. Our approach primarily includes two innovations: First, aligned image-mask pair generation, which leverages the powerful image generation capabilities of large pre-trained models to produce high-quality vascular images with complex structures using only a few training images; Second, the Dual-Consistency Learning (DCL) strategy, which simultaneously trains the generator and segmentation model, allowing them to learn from each other and maximize the utilization of limited data. Experimental results demonstrate that our CVI framework can generate high-quality medical images and effectively enhance the performance of segmentation models in few-shot scenarios. Our code will be made publicly available online.
{"title":"Conditional Virtual Imaging for Few-Shot Vascular Image Segmentation","authors":"Yanglong He;Rongjun Ge;Hui Tang;Yuxin Liu;Mengqing Su;Jean-Louis Coatrieux;Huazhong Shu;Yang Chen;Yuting He","doi":"10.1109/TMI.2025.3608467","DOIUrl":"10.1109/TMI.2025.3608467","url":null,"abstract":"In the field of medical image processing, vascular image segmentation plays a crucial role in clinical diagnosis, treatment planning, prognosis, and medical decision-making. Accurate and automated segmentation of vascular images can assist clinicians in understanding the vascular network structure, leading to more informed medical decisions. However, manual annotation of vascular images is time-consuming and challenging due to the fine and low-contrast vascular branches, especially in the medical imaging domain where annotation requires specialized knowledge and clinical expertise. Data-driven deep learning models struggle to achieve good performance when only a small number of annotated vascular images are available. To address this issue, this paper proposes a novel Conditional Virtual Imaging (CVI) framework for few-shot vascular image segmentation learning. The framework combines limited annotated data with extensive unlabeled data to generate high-quality images, effectively improving the accuracy and robustness of segmentation learning. Our approach primarily includes two innovations: First, aligned image-mask pair generation, which leverages the powerful image generation capabilities of large pre-trained models to produce high-quality vascular images with complex structures using only a few training images; Second, the Dual-Consistency Learning (DCL) strategy, which simultaneously trains the generator and segmentation model, allowing them to learn from each other and maximize the utilization of limited data. Experimental results demonstrate that our CVI framework can generate high-quality medical images and effectively enhance the performance of segmentation models in few-shot scenarios. Our code will be made publicly available online.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"45 2","pages":"811-824"},"PeriodicalIF":0.0,"publicationDate":"2025-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145140266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-12DOI: 10.1109/TMI.2025.3609319
Victoria Wu;Andrea Fung;Bahar Khodabakhshian;Baraa Abdelsamad;Hooman Vaseli;Neda Ahmadi;Jamie A. D. Goco;Michael Y. Tsang;Christina Luong;Purang Abolmaesumi;Teresa S. M. Tsang
Aortic stenosis (AS), a prevalent and serious heart valve disorder, requires early detection but remains difficult to diagnose in routine practice. Although echocardiography with Doppler imaging is the clinical standard, these assessments are typically limited to trained specialists. Point-of-care ultrasound (POCUS) offers an accessible alternative for AS screening but is restricted to basic 2D B-mode imaging, often lacking the analysis Doppler provides. Our project introduces MultiASNet, a multimodal machine learning framework designed to enhance AS screening with POCUS by combining 2D B-mode videos with structured data from echocardiography reports, including Doppler parameters. Using contrastive learning, MultiASNet aligns video features with report features in tabular form from the same patient to improve interpretive quality. To address misalignment where a single report corresponds to multiple video views, some irrelevant to AS diagnosis, we use cross-attention in a transformer-based video and tabular network to assign less importance to irrelevant report data. The model integrates structured data only during training, enabling independent use with B-mode videos during inference for broader accessibility. MultiASNet also incorporates sample selection to counteract label noise from observer variability, yielding improved accuracy on two datasets. We achieved balanced accuracy scores of 93.0% on a private dataset and 83.9% on the public TMED-2 dataset for AS detection. For severity classification, balanced accuracy scores were 80.4% and 59.4% on the private and public datasets, respectively. This model facilitates reliable AS screening in non-specialist settings, bridging the gap left by Doppler data while reducing noise-related errors. Our code is publicly available at github.com/DeepRCL/MultiASNet
{"title":"MultiASNet: Multimodal Label Noise Robust Framework for the Classification of Aortic Stenosis in Echocardiography","authors":"Victoria Wu;Andrea Fung;Bahar Khodabakhshian;Baraa Abdelsamad;Hooman Vaseli;Neda Ahmadi;Jamie A. D. Goco;Michael Y. Tsang;Christina Luong;Purang Abolmaesumi;Teresa S. M. Tsang","doi":"10.1109/TMI.2025.3609319","DOIUrl":"10.1109/TMI.2025.3609319","url":null,"abstract":"Aortic stenosis (AS), a prevalent and serious heart valve disorder, requires early detection but remains difficult to diagnose in routine practice. Although echocardiography with Doppler imaging is the clinical standard, these assessments are typically limited to trained specialists. Point-of-care ultrasound (POCUS) offers an accessible alternative for AS screening but is restricted to basic 2D B-mode imaging, often lacking the analysis Doppler provides. Our project introduces MultiASNet, a multimodal machine learning framework designed to enhance AS screening with POCUS by combining 2D B-mode videos with structured data from echocardiography reports, including Doppler parameters. Using contrastive learning, MultiASNet aligns video features with report features in tabular form from the same patient to improve interpretive quality. To address misalignment where a single report corresponds to multiple video views, some irrelevant to AS diagnosis, we use cross-attention in a transformer-based video and tabular network to assign less importance to irrelevant report data. The model integrates structured data only during training, enabling independent use with B-mode videos during inference for broader accessibility. MultiASNet also incorporates sample selection to counteract label noise from observer variability, yielding improved accuracy on two datasets. We achieved balanced accuracy scores of 93.0% on a private dataset and 83.9% on the public TMED-2 dataset for AS detection. For severity classification, balanced accuracy scores were 80.4% and 59.4% on the private and public datasets, respectively. This model facilitates reliable AS screening in non-specialist settings, bridging the gap left by Doppler data while reducing noise-related errors. Our code is publicly available at github.com/DeepRCL/MultiASNet","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"45 2","pages":"799-810"},"PeriodicalIF":0.0,"publicationDate":"2025-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145043575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-11DOI: 10.1109/TMI.2025.3609245
Shahira Abousamra;Danielle Fassler;Rajarsi Gupta;Tahsin Kurc;Luisa F. Escobar-Hoyos;Dimitris Samaras;Kenneth R. Shroyer;Joel Saltz;Chao Chen
Brightfield Multiplex Immunohistochemistry (mIHC) provides simultaneous labeling of multiple protein biomarkers in the same tissue section. It enables the exploration of spatial relationships between the inflammatory microenvironment and tumor cells, and to uncover how tumor cell morphology relates to cancer biomarker expression. Color deconvolution is required to analyze and quantify the different cell phenotype populations present as indicated by the biomarkers. However, this becomes a challenging task as the number of multiplexed stains increase. In this work, we present self-supervised and semi-supervised approaches to mIHC color deconvolution. Our proposed methods are based on deep convolutional autoencoders and learn using innovative reconstruction losses inspired by physics. We show how we can integrate weak annotations and the abundant unlabeled data available to train a model to reliably unmix the multiplexed stains and generate stain segmentation maps. We demonstrate the effectiveness of our proposed methods through experiments on mIHC dataset of 7-plexed IHC images.
{"title":"Label-Efficient Deep Color Deconvolution of Brightfield Multiplex IHC Images","authors":"Shahira Abousamra;Danielle Fassler;Rajarsi Gupta;Tahsin Kurc;Luisa F. Escobar-Hoyos;Dimitris Samaras;Kenneth R. Shroyer;Joel Saltz;Chao Chen","doi":"10.1109/TMI.2025.3609245","DOIUrl":"10.1109/TMI.2025.3609245","url":null,"abstract":"Brightfield Multiplex Immunohistochemistry (mIHC) provides simultaneous labeling of multiple protein biomarkers in the same tissue section. It enables the exploration of spatial relationships between the inflammatory microenvironment and tumor cells, and to uncover how tumor cell morphology relates to cancer biomarker expression. Color deconvolution is required to analyze and quantify the different cell phenotype populations present as indicated by the biomarkers. However, this becomes a challenging task as the number of multiplexed stains increase. In this work, we present self-supervised and semi-supervised approaches to mIHC color deconvolution. Our proposed methods are based on deep convolutional autoencoders and learn using innovative reconstruction losses inspired by physics. We show how we can integrate weak annotations and the abundant unlabeled data available to train a model to reliably unmix the multiplexed stains and generate stain segmentation maps. We demonstrate the effectiveness of our proposed methods through experiments on mIHC dataset of 7-plexed IHC images.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"45 2","pages":"853-864"},"PeriodicalIF":0.0,"publicationDate":"2025-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145035225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-10DOI: 10.1109/TMI.2025.3607752
W. Jeffrey Zabel;Héctor Contreras-Sánchez;Warren Foltz;Costel Flueraru;Edward Taylor;Alex Vitkin
Intravoxel Incoherent Motion (IVIM) MRI is a contrast-agent-free microvascular imaging method finding increasing use in biomedicine. However, there is uncertainty in the ability of IVIM-MRI to quantify tissue microvasculature given MRI’s limited spatial resolution (mm scale). Nine NRG mice were subcutaneously inoculated with human pancreatic cancer BxPC-3 cells transfected with DsRed, and MR-compatible plastic window chambers were surgically installed in the dorsal skinfold. Mice were imaged with speckle variance optical coherence tomography (OCT) and colour Doppler OCT, providing high resolution 3D measurements of the vascular volume density (VVD) and average Doppler phase shift ($overline {Delta phi }text {)}$ respectively. IVIM imaging was performed on a 7T preclinical MRI scanner, to generate maps of the perfusion fraction f, the extravascular diffusion coefficient ${D}_{textit {slow}}$ , and the intravascular diffusion coefficient ${D}_{textit {fast}}$ . The IVIM parameter maps were coregistered with the optical datasets to enable direct spatial correlation. A significant positive correlation was noted between OCT’s VVD and MR’s f (Pearson correlation coefficient ${r}={0}.{34},{p}lt {0}.{0001}text {)}$ . Surprisingly, no significant correlation was found between $overline {Delta phi }$ and ${D}_{textit {fast}}$ . This may be due to larger errors in the determined ${D}_{textit {fast}}$ values compared to f, as confirmed by Monte Carlo simulations. Several other inter- and intra-modality correlations were also quantified. Direct same-animal correlation of clinically applicable IVIM imaging with preclinical OCT microvascular imaging support the biomedical relevance of IVIM-MRI metrics, for example through f’s relationship to the VVD.
{"title":"Quantifying Tumor Microvasculature With Optical Coherence Angiography and Intravoxel Incoherent Motion Diffusion MRI","authors":"W. Jeffrey Zabel;Héctor Contreras-Sánchez;Warren Foltz;Costel Flueraru;Edward Taylor;Alex Vitkin","doi":"10.1109/TMI.2025.3607752","DOIUrl":"10.1109/TMI.2025.3607752","url":null,"abstract":"Intravoxel Incoherent Motion (IVIM) MRI is a contrast-agent-free microvascular imaging method finding increasing use in biomedicine. However, there is uncertainty in the ability of IVIM-MRI to quantify tissue microvasculature given MRI’s limited spatial resolution (mm scale). Nine NRG mice were subcutaneously inoculated with human pancreatic cancer BxPC-3 cells transfected with DsRed, and MR-compatible plastic window chambers were surgically installed in the dorsal skinfold. Mice were imaged with speckle variance optical coherence tomography (OCT) and colour Doppler OCT, providing high resolution 3D measurements of the vascular volume density (VVD) and average Doppler phase shift (<inline-formula> <tex-math>$overline {Delta phi }text {)}$ </tex-math></inline-formula> respectively. IVIM imaging was performed on a 7T preclinical MRI scanner, to generate maps of the perfusion fraction f, the extravascular diffusion coefficient <inline-formula> <tex-math>${D}_{textit {slow}}$ </tex-math></inline-formula>, and the intravascular diffusion coefficient <inline-formula> <tex-math>${D}_{textit {fast}}$ </tex-math></inline-formula>. The IVIM parameter maps were coregistered with the optical datasets to enable direct spatial correlation. A significant positive correlation was noted between OCT’s VVD and MR’s f (Pearson correlation coefficient <inline-formula> <tex-math>${r}={0}.{34},{p}lt {0}.{0001}text {)}$ </tex-math></inline-formula>. Surprisingly, no significant correlation was found between <inline-formula> <tex-math>$overline {Delta phi }$ </tex-math></inline-formula> and <inline-formula> <tex-math>${D}_{textit {fast}}$ </tex-math></inline-formula>. This may be due to larger errors in the determined <inline-formula> <tex-math>${D}_{textit {fast}}$ </tex-math></inline-formula> values compared to f, as confirmed by Monte Carlo simulations. Several other inter- and intra-modality correlations were also quantified. Direct same-animal correlation of clinically applicable IVIM imaging with preclinical OCT microvascular imaging support the biomedical relevance of IVIM-MRI metrics, for example through f’s relationship to the VVD.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"45 2","pages":"789-798"},"PeriodicalIF":0.0,"publicationDate":"2025-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145031941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-09DOI: 10.1109/TMI.2025.3607700
Xiaoru Gao;Housheng Xie;Donghua Hang;Guoyan Zheng
Computed Tomography (CT) to Cone-Beam Computed Tomography (CBCT) image registration is crucial for image-guided radiotherapy and surgical procedures. However, achieving accurate CT-CBCT registration remains challenging due to various factors such as inconsistent intensities, low contrast resolution and imaging artifacts. In this study, we propose a Context-Aware Semantics-driven Hierarchical Network (referred to as CASHNet), which hierarchically integrates context-aware semantics-encoded features into a coarse-to-fine registration scheme, to explicitly enhance semantic structural perception during progressive alignment. Moreover, it leverages diffeomorphisms to integrate rigid and non-rigid registration within a single end-to-end trainable network, enabling anatomically plausible deformations and preserving topological consistency. CASHNet comprises a Siamese Mamba-based multi-scale feature encoder and a coarse-to-fine registration decoder, which integrates a Rigid Registration (RR) module with multiple Semantics-guided Velocity Estimation and Feature Alignment (SVEFA) modules operating at different resolutions. Each SVEFA module comprises three carefully designed components: i) a cross-resolution feature aggregation (CFA) component that synthesizes enhanced global contextual representations, ii) a semantics perception and encoding (SPE) component that captures and encodes local semantic information, and iii) an incremental velocity estimation and feature alignment (IVEFA) component that leverages contextual and semantic features to update velocity fields and to align features. These modules work synergistically to boost the overall registration performance. Extensive experiments on three typical yet challenging CT-CBCT datasets of both soft and hard tissues demonstrate the superiority of our proposed method over other state-of-the-art methods. The code will be publicly available at https://github.com/xiaorugao999/CASHNet
{"title":"CASHNet: Context-Aware Semantics-Driven Hierarchical Network for Hybrid Diffeomorphic CT-CBCT Image Registration","authors":"Xiaoru Gao;Housheng Xie;Donghua Hang;Guoyan Zheng","doi":"10.1109/TMI.2025.3607700","DOIUrl":"10.1109/TMI.2025.3607700","url":null,"abstract":"Computed Tomography (CT) to Cone-Beam Computed Tomography (CBCT) image registration is crucial for image-guided radiotherapy and surgical procedures. However, achieving accurate CT-CBCT registration remains challenging due to various factors such as inconsistent intensities, low contrast resolution and imaging artifacts. In this study, we propose a Context-Aware Semantics-driven Hierarchical Network (referred to as CASHNet), which hierarchically integrates context-aware semantics-encoded features into a coarse-to-fine registration scheme, to explicitly enhance semantic structural perception during progressive alignment. Moreover, it leverages diffeomorphisms to integrate rigid and non-rigid registration within a single end-to-end trainable network, enabling anatomically plausible deformations and preserving topological consistency. CASHNet comprises a Siamese Mamba-based multi-scale feature encoder and a coarse-to-fine registration decoder, which integrates a Rigid Registration (RR) module with multiple Semantics-guided Velocity Estimation and Feature Alignment (SVEFA) modules operating at different resolutions. Each SVEFA module comprises three carefully designed components: i) a cross-resolution feature aggregation (CFA) component that synthesizes enhanced global contextual representations, ii) a semantics perception and encoding (SPE) component that captures and encodes local semantic information, and iii) an incremental velocity estimation and feature alignment (IVEFA) component that leverages contextual and semantic features to update velocity fields and to align features. These modules work synergistically to boost the overall registration performance. Extensive experiments on three typical yet challenging CT-CBCT datasets of both soft and hard tissues demonstrate the superiority of our proposed method over other state-of-the-art methods. The code will be publicly available at <uri>https://github.com/xiaorugao999/CASHNet</uri>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"45 2","pages":"825-842"},"PeriodicalIF":0.0,"publicationDate":"2025-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145025299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}