Pub Date : 2025-12-09DOI: 10.1109/TMI.2025.3642134
Tengya Peng, Ruyi Zha, Zhen Li, Xiaofeng Liu, Qing Zou
Three-Dimensional Gaussian representation (3DGS) has shown substantial promise in the field of computer vision, but remains unexplored in the field of magnetic resonance imaging (MRI). This study explores its potential for the reconstruction of isotropic resolution 3D MRI from undersampled k-space data. We introduce a novel framework termed 3D Gaussian MRI (3DGSMR), which employs 3D Gaussian distributions as an explicit representation for MR volumes. Experimental evaluations indicate that this method can effectively reconstruct voxelized MR images, achieving a quality on par with that of well-established 3D MRI reconstruction techniques found in the literature. Notably, the 3DGSMR scheme operates under a self-supervised framework, obviating the need for extensive training datasets or prior model training. This approach introduces significant innovations to the domain, notably the adaptation of 3DGS to MRI reconstruction and the novel application of the existing 3DGS methodology to decompose MR signals, which are presented in a complex-valued format.
{"title":"Three-Dimensional MRI Reconstruction with 3D Gaussian Representations: Tackling the Undersampling Problem.","authors":"Tengya Peng, Ruyi Zha, Zhen Li, Xiaofeng Liu, Qing Zou","doi":"10.1109/TMI.2025.3642134","DOIUrl":"https://doi.org/10.1109/TMI.2025.3642134","url":null,"abstract":"<p><p>Three-Dimensional Gaussian representation (3DGS) has shown substantial promise in the field of computer vision, but remains unexplored in the field of magnetic resonance imaging (MRI). This study explores its potential for the reconstruction of isotropic resolution 3D MRI from undersampled k-space data. We introduce a novel framework termed 3D Gaussian MRI (3DGSMR), which employs 3D Gaussian distributions as an explicit representation for MR volumes. Experimental evaluations indicate that this method can effectively reconstruct voxelized MR images, achieving a quality on par with that of well-established 3D MRI reconstruction techniques found in the literature. Notably, the 3DGSMR scheme operates under a self-supervised framework, obviating the need for extensive training datasets or prior model training. This approach introduces significant innovations to the domain, notably the adaptation of 3DGS to MRI reconstruction and the novel application of the existing 3DGS methodology to decompose MR signals, which are presented in a complex-valued format.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145717086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-02DOI: 10.1109/TMI.2025.3639398
Matthew A McCready, Xiaozhi Cao, Kawin Setsompop, John M Pauly, Adam B Kerr
A customizable method (OPTIKS) for designing fast trajectory-constrained gradient waveforms with optimized time domain properties was developed. Given a specified multidimensional k-space trajectory, the method optimizes traversal speed (and therefore timing) with position along the trajectory. OPTIKS facilitates optimization of objectives dependent on the time domain gradient waveform and the arc-length domain k-space speed. OPTIKS is applied to design waveforms which limit peripheral nerve stimulation (PNS), minimize mechanical resonance excitation, and reduce acoustic noise. A variety of trajectory examples are presented including spirals, circular echo-planar-imaging, and rosettes. Design performance is evaluated based on duration, standardized PNS models, field measurements, gradient coil back-EMF measurements, and calibrated acoustic measurements. We show reductions in back-EMF of up to 94% and field oscillations up to 91.1%, acoustic noise decreases of up to 9.22 dB, and with efficient use of PNS models speed increases of up to 11.4%. The design method implementation is made available as an open source Python package through GitHub (https://github.com/mamccready/optiks).
{"title":"OPTIKS: Optimized Gradient Properties Through Timing in K-Space.","authors":"Matthew A McCready, Xiaozhi Cao, Kawin Setsompop, John M Pauly, Adam B Kerr","doi":"10.1109/TMI.2025.3639398","DOIUrl":"https://doi.org/10.1109/TMI.2025.3639398","url":null,"abstract":"<p><p>A customizable method (OPTIKS) for designing fast trajectory-constrained gradient waveforms with optimized time domain properties was developed. Given a specified multidimensional k-space trajectory, the method optimizes traversal speed (and therefore timing) with position along the trajectory. OPTIKS facilitates optimization of objectives dependent on the time domain gradient waveform and the arc-length domain k-space speed. OPTIKS is applied to design waveforms which limit peripheral nerve stimulation (PNS), minimize mechanical resonance excitation, and reduce acoustic noise. A variety of trajectory examples are presented including spirals, circular echo-planar-imaging, and rosettes. Design performance is evaluated based on duration, standardized PNS models, field measurements, gradient coil back-EMF measurements, and calibrated acoustic measurements. We show reductions in back-EMF of up to 94% and field oscillations up to 91.1%, acoustic noise decreases of up to 9.22 dB, and with efficient use of PNS models speed increases of up to 11.4%. The design method implementation is made available as an open source Python package through GitHub (https://github.com/mamccready/optiks).</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145663017","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01DOI: 10.1109/TMI.2025.3585765
Haibo Jin, Haoxuan Che, Sunan He, Hao Chen
Despite the progress of radiology report generation (RRG), existing works face two challenges: 1) The performances in clinical efficacy are unsatisfactory, especially for lesion attributes description; 2) the generated text lacks explainability, making it difficult for radiologists to trust the results. To address the challenges, we focus on a trustworthy RRG model, which not only generates accurate descriptions of abnormalities, but also provides basis of its predictions. To this end, we propose a framework named chain of diagnosis (CoD), which maintains a chain of diagnostic process for clinically accurate and explainable RRG. It first generates question-answer (QA) pairs via diagnostic conversation to extract key findings, then prompts a large language model with QA diagnoses for accurate generation. To enhance explainability, a diagnosis grounding module is designed to match QA diagnoses and generated sentences, where the diagnoses act as a reference. Moreover, a lesion grounding module is designed to locate abnormalities in the image, further improving the working efficiency of radiologists. To facilitate label-efficient training, we propose an omni-supervised learning strategy with clinical consistency to leverage various types of annotations from different datasets. Our efforts lead to 1) an omni-labeled RRG dataset with QA pairs and lesion boxes; 2) a evaluation tool for assessing the accuracy of reports in describing lesion location and severity; 3) extensive experiments to demonstrate the effectiveness of CoD, where it outperforms both specialist and generalist models consistently on two RRG benchmarks and shows promising explainability by accurately grounding generated sentences to QA diagnoses and images.
{"title":"A Chain of Diagnosis Framework for Accurate and Explainable Radiology Report Generation.","authors":"Haibo Jin, Haoxuan Che, Sunan He, Hao Chen","doi":"10.1109/TMI.2025.3585765","DOIUrl":"10.1109/TMI.2025.3585765","url":null,"abstract":"<p><p>Despite the progress of radiology report generation (RRG), existing works face two challenges: 1) The performances in clinical efficacy are unsatisfactory, especially for lesion attributes description; 2) the generated text lacks explainability, making it difficult for radiologists to trust the results. To address the challenges, we focus on a trustworthy RRG model, which not only generates accurate descriptions of abnormalities, but also provides basis of its predictions. To this end, we propose a framework named chain of diagnosis (CoD), which maintains a chain of diagnostic process for clinically accurate and explainable RRG. It first generates question-answer (QA) pairs via diagnostic conversation to extract key findings, then prompts a large language model with QA diagnoses for accurate generation. To enhance explainability, a diagnosis grounding module is designed to match QA diagnoses and generated sentences, where the diagnoses act as a reference. Moreover, a lesion grounding module is designed to locate abnormalities in the image, further improving the working efficiency of radiologists. To facilitate label-efficient training, we propose an omni-supervised learning strategy with clinical consistency to leverage various types of annotations from different datasets. Our efforts lead to 1) an omni-labeled RRG dataset with QA pairs and lesion boxes; 2) a evaluation tool for assessing the accuracy of reports in describing lesion location and severity; 3) extensive experiments to demonstrate the effectiveness of CoD, where it outperforms both specialist and generalist models consistently on two RRG benchmarks and shows promising explainability by accurately grounding generated sentences to QA diagnoses and images.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":"4986-4997"},"PeriodicalIF":0.0,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144562423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01DOI: 10.1109/TMI.2025.3585880
Yuhui Du, Zheng Wang, Ju Niu, Yulong Wang, Godfrey D Pearlson, Vince D Calhoun
The subjective nature of diagnosing mental disorders complicates achieving accurate diagnoses. The complex relationship among disorders further exacerbates this issue, particularly in clinical practice where conditions like bipolar disorder (BP) and schizophrenia (SZ) can present similar clinical symptoms and cognitive impairments. To address these challenges, this paper proposes a mutualistic multi-network noisy label learning (MMNNLL) method, which aims to enhance diagnostic accuracy by leveraging neuroimaging data under the presence of potential clinical diagnosis bias or errors. MMNNLL effectively utilizes multiple deep neural networks (DNNs) for learning from data with noisy labels by maximizing the consistency among DNNs in identifying and utilizing samples with clean and noisy labels. Experimental results on public CIFAR-10 and PathMNIST datasets demonstrate the effectiveness of our method in classifying independent test data across various types and levels of label noise. Additionally, our MMNNLL method significantly outperforms state-of-the-art noisy label learning methods. When applied to brain functional connectivity data from BP and SZ patients, our method identifies two biotypes that show more pronounced group differences, and improved classification accuracy compared to the original clinical categories, using both traditional machine learning and advanced deep learning techniques. In summary, our method effectively addresses the possible inaccuracy in nosology of mental disorders and achieves transdiagnostic classification through robust noisy label learning via multi-network collaboration and competition.
{"title":"Mutualistic Multi-Network Noisy Label Learning (MMNNLL) Method and Its Application to Transdiagnostic Classification of Bipolar Disorder and Schizophrenia.","authors":"Yuhui Du, Zheng Wang, Ju Niu, Yulong Wang, Godfrey D Pearlson, Vince D Calhoun","doi":"10.1109/TMI.2025.3585880","DOIUrl":"10.1109/TMI.2025.3585880","url":null,"abstract":"<p><p>The subjective nature of diagnosing mental disorders complicates achieving accurate diagnoses. The complex relationship among disorders further exacerbates this issue, particularly in clinical practice where conditions like bipolar disorder (BP) and schizophrenia (SZ) can present similar clinical symptoms and cognitive impairments. To address these challenges, this paper proposes a mutualistic multi-network noisy label learning (MMNNLL) method, which aims to enhance diagnostic accuracy by leveraging neuroimaging data under the presence of potential clinical diagnosis bias or errors. MMNNLL effectively utilizes multiple deep neural networks (DNNs) for learning from data with noisy labels by maximizing the consistency among DNNs in identifying and utilizing samples with clean and noisy labels. Experimental results on public CIFAR-10 and PathMNIST datasets demonstrate the effectiveness of our method in classifying independent test data across various types and levels of label noise. Additionally, our MMNNLL method significantly outperforms state-of-the-art noisy label learning methods. When applied to brain functional connectivity data from BP and SZ patients, our method identifies two biotypes that show more pronounced group differences, and improved classification accuracy compared to the original clinical categories, using both traditional machine learning and advanced deep learning techniques. In summary, our method effectively addresses the possible inaccuracy in nosology of mental disorders and achieves transdiagnostic classification through robust noisy label learning via multi-network collaboration and competition.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":"5014-5026"},"PeriodicalIF":0.0,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12812316/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144565572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Deep implicit functions (DIFs) effectively represent shapes by using a neural network to map 3D spatial coordinates to scalar values that encode the shape's geometry, but it is difficult to establish correspondences between shapes directly, limiting their use in medical image registration. The recently presented deformation field-based methods achieve implicit templates learning via template field learning with DIFs and deformation field learning, establishing shape correspondence through deformation fields. Although these approaches enable joint learning of shape representation and shape correspondence, the decoupled optimization for template field and deformation field, caused by the absence of deformation annotations lead to a relatively accurate template field but an underoptimized deformation field. In this paper, we propose a novel implicit template learning framework via a shared hybrid diffeomorphic flow (SHDF), which enables shared optimization for deformation and template, contributing to better deformations and shape representation. Specifically, we formulate the signed distance function (SDF, a type of DIFs) as a one-dimensional (1D) integral, unifying dimensions to match the form used in solving ordinary differential equation (ODE) for deformation field learning. Then, SDF in 1D integral form is integrated seamlessly into the deformation field learning. Using a recurrent learning strategy, we frame shape representations and deformations as solving different initial value problems of the same ODE. We also introduce a global smoothness regularization to handle local optima due to limited outside-of-shape data. Experiments on medical datasets show that SHDF outperforms state-of-the-art methods in shape representation and registration.
{"title":"Joint Shape Reconstruction and Registration via a Shared Hybrid Diffeomorphic Flow.","authors":"Hengxiang Shi, Ping Wang, Shouhui Zhang, Xiuyang Zhao, Bo Yang, Caiming Zhang","doi":"10.1109/TMI.2025.3585560","DOIUrl":"10.1109/TMI.2025.3585560","url":null,"abstract":"<p><p>Deep implicit functions (DIFs) effectively represent shapes by using a neural network to map 3D spatial coordinates to scalar values that encode the shape's geometry, but it is difficult to establish correspondences between shapes directly, limiting their use in medical image registration. The recently presented deformation field-based methods achieve implicit templates learning via template field learning with DIFs and deformation field learning, establishing shape correspondence through deformation fields. Although these approaches enable joint learning of shape representation and shape correspondence, the decoupled optimization for template field and deformation field, caused by the absence of deformation annotations lead to a relatively accurate template field but an underoptimized deformation field. In this paper, we propose a novel implicit template learning framework via a shared hybrid diffeomorphic flow (SHDF), which enables shared optimization for deformation and template, contributing to better deformations and shape representation. Specifically, we formulate the signed distance function (SDF, a type of DIFs) as a one-dimensional (1D) integral, unifying dimensions to match the form used in solving ordinary differential equation (ODE) for deformation field learning. Then, SDF in 1D integral form is integrated seamlessly into the deformation field learning. Using a recurrent learning strategy, we frame shape representations and deformations as solving different initial value problems of the same ODE. We also introduce a global smoothness regularization to handle local optima due to limited outside-of-shape data. Experiments on medical datasets show that SHDF outperforms state-of-the-art methods in shape representation and registration.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":"4998-5013"},"PeriodicalIF":0.0,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144562424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-27DOI: 10.1109/TMI.2025.3613074
Tianming Liu;Dinggang Shen;Jong Chul Ye;Marleen de Bruijne;Wei Liu
Pretrained on massive datasets, Foundation Models (FMs) are revolutionizing medical imaging by offering scalable and generalizable solutions to longstanding challenges. This Special Issue on Advancements in Foundation Models for Medical Imaging presents FM-related works that explore the potential of FMs to address data scarcity, domain shifts, and multimodal integration across a wide range of medical imaging tasks, including segmentation, diagnosis, reconstruction, and prognosis. The included papers also examine critical concerns such as interpretability, efficiency, benchmarking, and ethics in the adoption of FMs for medical imaging. Collectively, the articles in this Special Issue mark a significant step toward establishing FMs as a cornerstone of next-generation medical imaging AI.
{"title":"Guest Editorial Special Issue on Advancements in Foundation Models for Medical Imaging","authors":"Tianming Liu;Dinggang Shen;Jong Chul Ye;Marleen de Bruijne;Wei Liu","doi":"10.1109/TMI.2025.3613074","DOIUrl":"https://doi.org/10.1109/TMI.2025.3613074","url":null,"abstract":"Pretrained on massive datasets, Foundation Models (FMs) are revolutionizing medical imaging by offering scalable and generalizable solutions to longstanding challenges. This Special Issue on Advancements in Foundation Models for Medical Imaging presents FM-related works that explore the potential of FMs to address data scarcity, domain shifts, and multimodal integration across a wide range of medical imaging tasks, including segmentation, diagnosis, reconstruction, and prognosis. The included papers also examine critical concerns such as interpretability, efficiency, benchmarking, and ethics in the adoption of FMs for medical imaging. Collectively, the articles in this Special Issue mark a significant step toward establishing FMs as a cornerstone of next-generation medical imaging AI.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 10","pages":"3894-3897"},"PeriodicalIF":0.0,"publicationDate":"2025-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11218696","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145371487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-20DOI: 10.1109/TMI.2025.3623507
Lin Zhao, Xin Yu, Yikang Liu, Xiao Chen, Eric Z Chen, Terrence Chen, Shanhui Sun
Accurate correspondence matching in coronary angiography images is crucial for reconstructing 3D coronary artery structures, which is essential for precise diagnosis and treatment planning of coronary artery disease (CAD). Traditional matching methods for natural images often fail to generalize to X-ray images due to inherent differences such as lack of texture, lower contrast, and overlapping structures, compounded by insufficient training data. To address these challenges, we propose a novel pipeline that generates realistic paired coronary angiography images using a diffusion model conditioned on 2D projections of 3D reconstructed meshes from Coronary Computed Tomography Angiography (CCTA), providing high-quality synthetic data for training. Additionally, we employ large-scale image foundation models to guide feature aggregation, enhancing correspondence matching accuracy by focusing on semantically relevant regions and keypoints. Our approach demonstrates superior matching performance on synthetic datasets and effectively generalizes to real-world datasets, offering a practical solution for this task. Furthermore, our work investigates the efficacy of different foundation models in correspondence matching, providing novel insights into leveraging advanced image foundation models for medical imaging applications.
{"title":"Leveraging Diffusion Model and Image Foundation Model for Improved Correspondence Matching in Coronary Angiography.","authors":"Lin Zhao, Xin Yu, Yikang Liu, Xiao Chen, Eric Z Chen, Terrence Chen, Shanhui Sun","doi":"10.1109/TMI.2025.3623507","DOIUrl":"https://doi.org/10.1109/TMI.2025.3623507","url":null,"abstract":"<p><p>Accurate correspondence matching in coronary angiography images is crucial for reconstructing 3D coronary artery structures, which is essential for precise diagnosis and treatment planning of coronary artery disease (CAD). Traditional matching methods for natural images often fail to generalize to X-ray images due to inherent differences such as lack of texture, lower contrast, and overlapping structures, compounded by insufficient training data. To address these challenges, we propose a novel pipeline that generates realistic paired coronary angiography images using a diffusion model conditioned on 2D projections of 3D reconstructed meshes from Coronary Computed Tomography Angiography (CCTA), providing high-quality synthetic data for training. Additionally, we employ large-scale image foundation models to guide feature aggregation, enhancing correspondence matching accuracy by focusing on semantically relevant regions and keypoints. Our approach demonstrates superior matching performance on synthetic datasets and effectively generalizes to real-world datasets, offering a practical solution for this task. Furthermore, our work investigates the efficacy of different foundation models in correspondence matching, providing novel insights into leveraging advanced image foundation models for medical imaging applications.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145338362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-16DOI: 10.1109/TMI.2025.3622522
Minghan Li, Congcong Wen, Yu Tian, Min Shi, Yan Luo, Hao Huang, Yi Fang, Mengyu Wang
Fairness remains a critical concern in healthcare, where unequal access to services and treatment outcomes can adversely affect patient health. While Federated Learning (FL) presents a collaborative and privacy-preserving approach to model training, ensuring fairness is challenging due to heterogeneous data across institutions, and current research primarily addresses non-medical applications. To fill this gap, we establish the first experimental benchmark for fairness in medical FL, evaluating six representative FL methods across diverse demographic attributes and imaging modalities. We introduce FairFedMed, the first medical FL dataset specifically designed to study group fairness (i.e., consistent performance across demographic groups). It comprises two parts: FairFedMed-Oph, featuring 2D fundus and 3D OCT ophthalmology samples with six demographic attributes; and FairFedMed-Chest, which simulates real cross-institutional FL using subsets of CheXpert and MIMIC-CXR. Together, they support both simulated and real-world FL across diverse medical modalities and demographic groups. Existing FL models often underperform on medical images and overlook fairness across demographic groups. To address this, we propose FairLoRA, a fairness-aware FL framework based on SVD-based low-rank approximation. It customizes singular value matrices per demographic group while sharing singular vectors, ensuring both fairness and efficiency. Experimental results on the FairFedMed dataset demonstrate that FairLoRA not only achieves state-of-the-art performance in medical image classification but also significantly improves fairness across diverse populations. Our code and dataset can be accessible via GitHub link: https://github.com/Harvard-AI-and-Robotics-Lab/FairFedMed.
{"title":"FairFedMed: Benchmarking Group Fairness in Federated Medical Imaging with FairLoRA.","authors":"Minghan Li, Congcong Wen, Yu Tian, Min Shi, Yan Luo, Hao Huang, Yi Fang, Mengyu Wang","doi":"10.1109/TMI.2025.3622522","DOIUrl":"https://doi.org/10.1109/TMI.2025.3622522","url":null,"abstract":"<p><p>Fairness remains a critical concern in healthcare, where unequal access to services and treatment outcomes can adversely affect patient health. While Federated Learning (FL) presents a collaborative and privacy-preserving approach to model training, ensuring fairness is challenging due to heterogeneous data across institutions, and current research primarily addresses non-medical applications. To fill this gap, we establish the first experimental benchmark for fairness in medical FL, evaluating six representative FL methods across diverse demographic attributes and imaging modalities. We introduce FairFedMed, the first medical FL dataset specifically designed to study group fairness (i.e., consistent performance across demographic groups). It comprises two parts: FairFedMed-Oph, featuring 2D fundus and 3D OCT ophthalmology samples with six demographic attributes; and FairFedMed-Chest, which simulates real cross-institutional FL using subsets of CheXpert and MIMIC-CXR. Together, they support both simulated and real-world FL across diverse medical modalities and demographic groups. Existing FL models often underperform on medical images and overlook fairness across demographic groups. To address this, we propose FairLoRA, a fairness-aware FL framework based on SVD-based low-rank approximation. It customizes singular value matrices per demographic group while sharing singular vectors, ensuring both fairness and efficiency. Experimental results on the FairFedMed dataset demonstrate that FairLoRA not only achieves state-of-the-art performance in medical image classification but also significantly improves fairness across diverse populations. Our code and dataset can be accessible via GitHub link: https://github.com/Harvard-AI-and-Robotics-Lab/FairFedMed.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145310409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-25DOI: 10.1109/TMI.2025.3608467
Yanglong He;Rongjun Ge;Hui Tang;Yuxin Liu;Mengqing Su;Jean-Louis Coatrieux;Huazhong Shu;Yang Chen;Yuting He
In the field of medical image processing, vascular image segmentation plays a crucial role in clinical diagnosis, treatment planning, prognosis, and medical decision-making. Accurate and automated segmentation of vascular images can assist clinicians in understanding the vascular network structure, leading to more informed medical decisions. However, manual annotation of vascular images is time-consuming and challenging due to the fine and low-contrast vascular branches, especially in the medical imaging domain where annotation requires specialized knowledge and clinical expertise. Data-driven deep learning models struggle to achieve good performance when only a small number of annotated vascular images are available. To address this issue, this paper proposes a novel Conditional Virtual Imaging (CVI) framework for few-shot vascular image segmentation learning. The framework combines limited annotated data with extensive unlabeled data to generate high-quality images, effectively improving the accuracy and robustness of segmentation learning. Our approach primarily includes two innovations: First, aligned image-mask pair generation, which leverages the powerful image generation capabilities of large pre-trained models to produce high-quality vascular images with complex structures using only a few training images; Second, the Dual-Consistency Learning (DCL) strategy, which simultaneously trains the generator and segmentation model, allowing them to learn from each other and maximize the utilization of limited data. Experimental results demonstrate that our CVI framework can generate high-quality medical images and effectively enhance the performance of segmentation models in few-shot scenarios. Our code will be made publicly available online.
{"title":"Conditional Virtual Imaging for Few-Shot Vascular Image Segmentation","authors":"Yanglong He;Rongjun Ge;Hui Tang;Yuxin Liu;Mengqing Su;Jean-Louis Coatrieux;Huazhong Shu;Yang Chen;Yuting He","doi":"10.1109/TMI.2025.3608467","DOIUrl":"10.1109/TMI.2025.3608467","url":null,"abstract":"In the field of medical image processing, vascular image segmentation plays a crucial role in clinical diagnosis, treatment planning, prognosis, and medical decision-making. Accurate and automated segmentation of vascular images can assist clinicians in understanding the vascular network structure, leading to more informed medical decisions. However, manual annotation of vascular images is time-consuming and challenging due to the fine and low-contrast vascular branches, especially in the medical imaging domain where annotation requires specialized knowledge and clinical expertise. Data-driven deep learning models struggle to achieve good performance when only a small number of annotated vascular images are available. To address this issue, this paper proposes a novel Conditional Virtual Imaging (CVI) framework for few-shot vascular image segmentation learning. The framework combines limited annotated data with extensive unlabeled data to generate high-quality images, effectively improving the accuracy and robustness of segmentation learning. Our approach primarily includes two innovations: First, aligned image-mask pair generation, which leverages the powerful image generation capabilities of large pre-trained models to produce high-quality vascular images with complex structures using only a few training images; Second, the Dual-Consistency Learning (DCL) strategy, which simultaneously trains the generator and segmentation model, allowing them to learn from each other and maximize the utilization of limited data. Experimental results demonstrate that our CVI framework can generate high-quality medical images and effectively enhance the performance of segmentation models in few-shot scenarios. Our code will be made publicly available online.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"45 2","pages":"811-824"},"PeriodicalIF":0.0,"publicationDate":"2025-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145140266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-12DOI: 10.1109/TMI.2025.3609319
Victoria Wu;Andrea Fung;Bahar Khodabakhshian;Baraa Abdelsamad;Hooman Vaseli;Neda Ahmadi;Jamie A. D. Goco;Michael Y. Tsang;Christina Luong;Purang Abolmaesumi;Teresa S. M. Tsang
Aortic stenosis (AS), a prevalent and serious heart valve disorder, requires early detection but remains difficult to diagnose in routine practice. Although echocardiography with Doppler imaging is the clinical standard, these assessments are typically limited to trained specialists. Point-of-care ultrasound (POCUS) offers an accessible alternative for AS screening but is restricted to basic 2D B-mode imaging, often lacking the analysis Doppler provides. Our project introduces MultiASNet, a multimodal machine learning framework designed to enhance AS screening with POCUS by combining 2D B-mode videos with structured data from echocardiography reports, including Doppler parameters. Using contrastive learning, MultiASNet aligns video features with report features in tabular form from the same patient to improve interpretive quality. To address misalignment where a single report corresponds to multiple video views, some irrelevant to AS diagnosis, we use cross-attention in a transformer-based video and tabular network to assign less importance to irrelevant report data. The model integrates structured data only during training, enabling independent use with B-mode videos during inference for broader accessibility. MultiASNet also incorporates sample selection to counteract label noise from observer variability, yielding improved accuracy on two datasets. We achieved balanced accuracy scores of 93.0% on a private dataset and 83.9% on the public TMED-2 dataset for AS detection. For severity classification, balanced accuracy scores were 80.4% and 59.4% on the private and public datasets, respectively. This model facilitates reliable AS screening in non-specialist settings, bridging the gap left by Doppler data while reducing noise-related errors. Our code is publicly available at github.com/DeepRCL/MultiASNet
{"title":"MultiASNet: Multimodal Label Noise Robust Framework for the Classification of Aortic Stenosis in Echocardiography","authors":"Victoria Wu;Andrea Fung;Bahar Khodabakhshian;Baraa Abdelsamad;Hooman Vaseli;Neda Ahmadi;Jamie A. D. Goco;Michael Y. Tsang;Christina Luong;Purang Abolmaesumi;Teresa S. M. Tsang","doi":"10.1109/TMI.2025.3609319","DOIUrl":"10.1109/TMI.2025.3609319","url":null,"abstract":"Aortic stenosis (AS), a prevalent and serious heart valve disorder, requires early detection but remains difficult to diagnose in routine practice. Although echocardiography with Doppler imaging is the clinical standard, these assessments are typically limited to trained specialists. Point-of-care ultrasound (POCUS) offers an accessible alternative for AS screening but is restricted to basic 2D B-mode imaging, often lacking the analysis Doppler provides. Our project introduces MultiASNet, a multimodal machine learning framework designed to enhance AS screening with POCUS by combining 2D B-mode videos with structured data from echocardiography reports, including Doppler parameters. Using contrastive learning, MultiASNet aligns video features with report features in tabular form from the same patient to improve interpretive quality. To address misalignment where a single report corresponds to multiple video views, some irrelevant to AS diagnosis, we use cross-attention in a transformer-based video and tabular network to assign less importance to irrelevant report data. The model integrates structured data only during training, enabling independent use with B-mode videos during inference for broader accessibility. MultiASNet also incorporates sample selection to counteract label noise from observer variability, yielding improved accuracy on two datasets. We achieved balanced accuracy scores of 93.0% on a private dataset and 83.9% on the public TMED-2 dataset for AS detection. For severity classification, balanced accuracy scores were 80.4% and 59.4% on the private and public datasets, respectively. This model facilitates reliable AS screening in non-specialist settings, bridging the gap left by Doppler data while reducing noise-related errors. Our code is publicly available at github.com/DeepRCL/MultiASNet","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"45 2","pages":"799-810"},"PeriodicalIF":0.0,"publicationDate":"2025-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145043575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}