Pub Date : 2025-12-01DOI: 10.1109/TMI.2025.3585765
Haibo Jin, Haoxuan Che, Sunan He, Hao Chen
Despite the progress of radiology report generation (RRG), existing works face two challenges: 1) The performances in clinical efficacy are unsatisfactory, especially for lesion attributes description; 2) the generated text lacks explainability, making it difficult for radiologists to trust the results. To address the challenges, we focus on a trustworthy RRG model, which not only generates accurate descriptions of abnormalities, but also provides basis of its predictions. To this end, we propose a framework named chain of diagnosis (CoD), which maintains a chain of diagnostic process for clinically accurate and explainable RRG. It first generates question-answer (QA) pairs via diagnostic conversation to extract key findings, then prompts a large language model with QA diagnoses for accurate generation. To enhance explainability, a diagnosis grounding module is designed to match QA diagnoses and generated sentences, where the diagnoses act as a reference. Moreover, a lesion grounding module is designed to locate abnormalities in the image, further improving the working efficiency of radiologists. To facilitate label-efficient training, we propose an omni-supervised learning strategy with clinical consistency to leverage various types of annotations from different datasets. Our efforts lead to 1) an omni-labeled RRG dataset with QA pairs and lesion boxes; 2) a evaluation tool for assessing the accuracy of reports in describing lesion location and severity; 3) extensive experiments to demonstrate the effectiveness of CoD, where it outperforms both specialist and generalist models consistently on two RRG benchmarks and shows promising explainability by accurately grounding generated sentences to QA diagnoses and images.
{"title":"A Chain of Diagnosis Framework for Accurate and Explainable Radiology Report Generation.","authors":"Haibo Jin, Haoxuan Che, Sunan He, Hao Chen","doi":"10.1109/TMI.2025.3585765","DOIUrl":"10.1109/TMI.2025.3585765","url":null,"abstract":"<p><p>Despite the progress of radiology report generation (RRG), existing works face two challenges: 1) The performances in clinical efficacy are unsatisfactory, especially for lesion attributes description; 2) the generated text lacks explainability, making it difficult for radiologists to trust the results. To address the challenges, we focus on a trustworthy RRG model, which not only generates accurate descriptions of abnormalities, but also provides basis of its predictions. To this end, we propose a framework named chain of diagnosis (CoD), which maintains a chain of diagnostic process for clinically accurate and explainable RRG. It first generates question-answer (QA) pairs via diagnostic conversation to extract key findings, then prompts a large language model with QA diagnoses for accurate generation. To enhance explainability, a diagnosis grounding module is designed to match QA diagnoses and generated sentences, where the diagnoses act as a reference. Moreover, a lesion grounding module is designed to locate abnormalities in the image, further improving the working efficiency of radiologists. To facilitate label-efficient training, we propose an omni-supervised learning strategy with clinical consistency to leverage various types of annotations from different datasets. Our efforts lead to 1) an omni-labeled RRG dataset with QA pairs and lesion boxes; 2) a evaluation tool for assessing the accuracy of reports in describing lesion location and severity; 3) extensive experiments to demonstrate the effectiveness of CoD, where it outperforms both specialist and generalist models consistently on two RRG benchmarks and shows promising explainability by accurately grounding generated sentences to QA diagnoses and images.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":"4986-4997"},"PeriodicalIF":0.0,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144562423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01DOI: 10.1109/TMI.2025.3585880
Yuhui Du, Zheng Wang, Ju Niu, Yulong Wang, Godfrey D Pearlson, Vince D Calhoun
The subjective nature of diagnosing mental disorders complicates achieving accurate diagnoses. The complex relationship among disorders further exacerbates this issue, particularly in clinical practice where conditions like bipolar disorder (BP) and schizophrenia (SZ) can present similar clinical symptoms and cognitive impairments. To address these challenges, this paper proposes a mutualistic multi-network noisy label learning (MMNNLL) method, which aims to enhance diagnostic accuracy by leveraging neuroimaging data under the presence of potential clinical diagnosis bias or errors. MMNNLL effectively utilizes multiple deep neural networks (DNNs) for learning from data with noisy labels by maximizing the consistency among DNNs in identifying and utilizing samples with clean and noisy labels. Experimental results on public CIFAR-10 and PathMNIST datasets demonstrate the effectiveness of our method in classifying independent test data across various types and levels of label noise. Additionally, our MMNNLL method significantly outperforms state-of-the-art noisy label learning methods. When applied to brain functional connectivity data from BP and SZ patients, our method identifies two biotypes that show more pronounced group differences, and improved classification accuracy compared to the original clinical categories, using both traditional machine learning and advanced deep learning techniques. In summary, our method effectively addresses the possible inaccuracy in nosology of mental disorders and achieves transdiagnostic classification through robust noisy label learning via multi-network collaboration and competition.
{"title":"Mutualistic Multi-Network Noisy Label Learning (MMNNLL) Method and Its Application to Transdiagnostic Classification of Bipolar Disorder and Schizophrenia.","authors":"Yuhui Du, Zheng Wang, Ju Niu, Yulong Wang, Godfrey D Pearlson, Vince D Calhoun","doi":"10.1109/TMI.2025.3585880","DOIUrl":"10.1109/TMI.2025.3585880","url":null,"abstract":"<p><p>The subjective nature of diagnosing mental disorders complicates achieving accurate diagnoses. The complex relationship among disorders further exacerbates this issue, particularly in clinical practice where conditions like bipolar disorder (BP) and schizophrenia (SZ) can present similar clinical symptoms and cognitive impairments. To address these challenges, this paper proposes a mutualistic multi-network noisy label learning (MMNNLL) method, which aims to enhance diagnostic accuracy by leveraging neuroimaging data under the presence of potential clinical diagnosis bias or errors. MMNNLL effectively utilizes multiple deep neural networks (DNNs) for learning from data with noisy labels by maximizing the consistency among DNNs in identifying and utilizing samples with clean and noisy labels. Experimental results on public CIFAR-10 and PathMNIST datasets demonstrate the effectiveness of our method in classifying independent test data across various types and levels of label noise. Additionally, our MMNNLL method significantly outperforms state-of-the-art noisy label learning methods. When applied to brain functional connectivity data from BP and SZ patients, our method identifies two biotypes that show more pronounced group differences, and improved classification accuracy compared to the original clinical categories, using both traditional machine learning and advanced deep learning techniques. In summary, our method effectively addresses the possible inaccuracy in nosology of mental disorders and achieves transdiagnostic classification through robust noisy label learning via multi-network collaboration and competition.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":"5014-5026"},"PeriodicalIF":0.0,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12812316/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144565572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Deep implicit functions (DIFs) effectively represent shapes by using a neural network to map 3D spatial coordinates to scalar values that encode the shape's geometry, but it is difficult to establish correspondences between shapes directly, limiting their use in medical image registration. The recently presented deformation field-based methods achieve implicit templates learning via template field learning with DIFs and deformation field learning, establishing shape correspondence through deformation fields. Although these approaches enable joint learning of shape representation and shape correspondence, the decoupled optimization for template field and deformation field, caused by the absence of deformation annotations lead to a relatively accurate template field but an underoptimized deformation field. In this paper, we propose a novel implicit template learning framework via a shared hybrid diffeomorphic flow (SHDF), which enables shared optimization for deformation and template, contributing to better deformations and shape representation. Specifically, we formulate the signed distance function (SDF, a type of DIFs) as a one-dimensional (1D) integral, unifying dimensions to match the form used in solving ordinary differential equation (ODE) for deformation field learning. Then, SDF in 1D integral form is integrated seamlessly into the deformation field learning. Using a recurrent learning strategy, we frame shape representations and deformations as solving different initial value problems of the same ODE. We also introduce a global smoothness regularization to handle local optima due to limited outside-of-shape data. Experiments on medical datasets show that SHDF outperforms state-of-the-art methods in shape representation and registration.
{"title":"Joint Shape Reconstruction and Registration via a Shared Hybrid Diffeomorphic Flow.","authors":"Hengxiang Shi, Ping Wang, Shouhui Zhang, Xiuyang Zhao, Bo Yang, Caiming Zhang","doi":"10.1109/TMI.2025.3585560","DOIUrl":"10.1109/TMI.2025.3585560","url":null,"abstract":"<p><p>Deep implicit functions (DIFs) effectively represent shapes by using a neural network to map 3D spatial coordinates to scalar values that encode the shape's geometry, but it is difficult to establish correspondences between shapes directly, limiting their use in medical image registration. The recently presented deformation field-based methods achieve implicit templates learning via template field learning with DIFs and deformation field learning, establishing shape correspondence through deformation fields. Although these approaches enable joint learning of shape representation and shape correspondence, the decoupled optimization for template field and deformation field, caused by the absence of deformation annotations lead to a relatively accurate template field but an underoptimized deformation field. In this paper, we propose a novel implicit template learning framework via a shared hybrid diffeomorphic flow (SHDF), which enables shared optimization for deformation and template, contributing to better deformations and shape representation. Specifically, we formulate the signed distance function (SDF, a type of DIFs) as a one-dimensional (1D) integral, unifying dimensions to match the form used in solving ordinary differential equation (ODE) for deformation field learning. Then, SDF in 1D integral form is integrated seamlessly into the deformation field learning. Using a recurrent learning strategy, we frame shape representations and deformations as solving different initial value problems of the same ODE. We also introduce a global smoothness regularization to handle local optima due to limited outside-of-shape data. Experiments on medical datasets show that SHDF outperforms state-of-the-art methods in shape representation and registration.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":"4998-5013"},"PeriodicalIF":0.0,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144562424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-27DOI: 10.1109/TMI.2025.3613074
Tianming Liu;Dinggang Shen;Jong Chul Ye;Marleen de Bruijne;Wei Liu
Pretrained on massive datasets, Foundation Models (FMs) are revolutionizing medical imaging by offering scalable and generalizable solutions to longstanding challenges. This Special Issue on Advancements in Foundation Models for Medical Imaging presents FM-related works that explore the potential of FMs to address data scarcity, domain shifts, and multimodal integration across a wide range of medical imaging tasks, including segmentation, diagnosis, reconstruction, and prognosis. The included papers also examine critical concerns such as interpretability, efficiency, benchmarking, and ethics in the adoption of FMs for medical imaging. Collectively, the articles in this Special Issue mark a significant step toward establishing FMs as a cornerstone of next-generation medical imaging AI.
{"title":"Guest Editorial Special Issue on Advancements in Foundation Models for Medical Imaging","authors":"Tianming Liu;Dinggang Shen;Jong Chul Ye;Marleen de Bruijne;Wei Liu","doi":"10.1109/TMI.2025.3613074","DOIUrl":"https://doi.org/10.1109/TMI.2025.3613074","url":null,"abstract":"Pretrained on massive datasets, Foundation Models (FMs) are revolutionizing medical imaging by offering scalable and generalizable solutions to longstanding challenges. This Special Issue on Advancements in Foundation Models for Medical Imaging presents FM-related works that explore the potential of FMs to address data scarcity, domain shifts, and multimodal integration across a wide range of medical imaging tasks, including segmentation, diagnosis, reconstruction, and prognosis. The included papers also examine critical concerns such as interpretability, efficiency, benchmarking, and ethics in the adoption of FMs for medical imaging. Collectively, the articles in this Special Issue mark a significant step toward establishing FMs as a cornerstone of next-generation medical imaging AI.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 10","pages":"3894-3897"},"PeriodicalIF":0.0,"publicationDate":"2025-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11218696","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145371487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-20DOI: 10.1109/TMI.2025.3623507
Lin Zhao, Xin Yu, Yikang Liu, Xiao Chen, Eric Z Chen, Terrence Chen, Shanhui Sun
Accurate correspondence matching in coronary angiography images is crucial for reconstructing 3D coronary artery structures, which is essential for precise diagnosis and treatment planning of coronary artery disease (CAD). Traditional matching methods for natural images often fail to generalize to X-ray images due to inherent differences such as lack of texture, lower contrast, and overlapping structures, compounded by insufficient training data. To address these challenges, we propose a novel pipeline that generates realistic paired coronary angiography images using a diffusion model conditioned on 2D projections of 3D reconstructed meshes from Coronary Computed Tomography Angiography (CCTA), providing high-quality synthetic data for training. Additionally, we employ large-scale image foundation models to guide feature aggregation, enhancing correspondence matching accuracy by focusing on semantically relevant regions and keypoints. Our approach demonstrates superior matching performance on synthetic datasets and effectively generalizes to real-world datasets, offering a practical solution for this task. Furthermore, our work investigates the efficacy of different foundation models in correspondence matching, providing novel insights into leveraging advanced image foundation models for medical imaging applications.
{"title":"Leveraging Diffusion Model and Image Foundation Model for Improved Correspondence Matching in Coronary Angiography.","authors":"Lin Zhao, Xin Yu, Yikang Liu, Xiao Chen, Eric Z Chen, Terrence Chen, Shanhui Sun","doi":"10.1109/TMI.2025.3623507","DOIUrl":"https://doi.org/10.1109/TMI.2025.3623507","url":null,"abstract":"<p><p>Accurate correspondence matching in coronary angiography images is crucial for reconstructing 3D coronary artery structures, which is essential for precise diagnosis and treatment planning of coronary artery disease (CAD). Traditional matching methods for natural images often fail to generalize to X-ray images due to inherent differences such as lack of texture, lower contrast, and overlapping structures, compounded by insufficient training data. To address these challenges, we propose a novel pipeline that generates realistic paired coronary angiography images using a diffusion model conditioned on 2D projections of 3D reconstructed meshes from Coronary Computed Tomography Angiography (CCTA), providing high-quality synthetic data for training. Additionally, we employ large-scale image foundation models to guide feature aggregation, enhancing correspondence matching accuracy by focusing on semantically relevant regions and keypoints. Our approach demonstrates superior matching performance on synthetic datasets and effectively generalizes to real-world datasets, offering a practical solution for this task. Furthermore, our work investigates the efficacy of different foundation models in correspondence matching, providing novel insights into leveraging advanced image foundation models for medical imaging applications.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145338362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-16DOI: 10.1109/TMI.2025.3622522
Minghan Li, Congcong Wen, Yu Tian, Min Shi, Yan Luo, Hao Huang, Yi Fang, Mengyu Wang
Fairness remains a critical concern in healthcare, where unequal access to services and treatment outcomes can adversely affect patient health. While Federated Learning (FL) presents a collaborative and privacy-preserving approach to model training, ensuring fairness is challenging due to heterogeneous data across institutions, and current research primarily addresses non-medical applications. To fill this gap, we establish the first experimental benchmark for fairness in medical FL, evaluating six representative FL methods across diverse demographic attributes and imaging modalities. We introduce FairFedMed, the first medical FL dataset specifically designed to study group fairness (i.e., consistent performance across demographic groups). It comprises two parts: FairFedMed-Oph, featuring 2D fundus and 3D OCT ophthalmology samples with six demographic attributes; and FairFedMed-Chest, which simulates real cross-institutional FL using subsets of CheXpert and MIMIC-CXR. Together, they support both simulated and real-world FL across diverse medical modalities and demographic groups. Existing FL models often underperform on medical images and overlook fairness across demographic groups. To address this, we propose FairLoRA, a fairness-aware FL framework based on SVD-based low-rank approximation. It customizes singular value matrices per demographic group while sharing singular vectors, ensuring both fairness and efficiency. Experimental results on the FairFedMed dataset demonstrate that FairLoRA not only achieves state-of-the-art performance in medical image classification but also significantly improves fairness across diverse populations. Our code and dataset can be accessible via GitHub link: https://github.com/Harvard-AI-and-Robotics-Lab/FairFedMed.
{"title":"FairFedMed: Benchmarking Group Fairness in Federated Medical Imaging with FairLoRA.","authors":"Minghan Li, Congcong Wen, Yu Tian, Min Shi, Yan Luo, Hao Huang, Yi Fang, Mengyu Wang","doi":"10.1109/TMI.2025.3622522","DOIUrl":"10.1109/TMI.2025.3622522","url":null,"abstract":"<p><p>Fairness remains a critical concern in healthcare, where unequal access to services and treatment outcomes can adversely affect patient health. While Federated Learning (FL) presents a collaborative and privacy-preserving approach to model training, ensuring fairness is challenging due to heterogeneous data across institutions, and current research primarily addresses non-medical applications. To fill this gap, we establish the first experimental benchmark for fairness in medical FL, evaluating six representative FL methods across diverse demographic attributes and imaging modalities. We introduce FairFedMed, the first medical FL dataset specifically designed to study group fairness (i.e., consistent performance across demographic groups). It comprises two parts: FairFedMed-Oph, featuring 2D fundus and 3D OCT ophthalmology samples with six demographic attributes; and FairFedMed-Chest, which simulates real cross-institutional FL using subsets of CheXpert and MIMIC-CXR. Together, they support both simulated and real-world FL across diverse medical modalities and demographic groups. Existing FL models often underperform on medical images and overlook fairness across demographic groups. To address this, we propose FairLoRA, a fairness-aware FL framework based on SVD-based low-rank approximation. It customizes singular value matrices per demographic group while sharing singular vectors, ensuring both fairness and efficiency. Experimental results on the FairFedMed dataset demonstrate that FairLoRA not only achieves state-of-the-art performance in medical image classification but also significantly improves fairness across diverse populations. Our code and dataset can be accessible via GitHub link: https://github.com/Harvard-AI-and-Robotics-Lab/FairFedMed.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145310409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-14DOI: 10.1109/TMI.2025.3620406
Hao Lin;Yonghong Song;You Su;Yunfei Ma
Deformable image registration aims to achieve nonlinear alignment of image spaces by estimating dense displacement fields. It is widely used in clinical tasks such as surgical planning, assisted diagnosis, and surgical navigation. While efficient, deep learning registration methods often struggle with large, complex displacements. Pyramid-based approaches address this with a coarse-to-fine strategy, but their single-feature processing can lead to error accumulation. In this paper, we introduce a dense Mixture of Experts (MoE) pyramid registration model, using routing schemes and multiple heterogeneous experts to increase the width and flexibility of feature processing within a single layer. The collaboration among heterogeneous experts enables the model to retain more precise details and maintain greater feature freedom when dealing with complex displacements. We use only deformation fields as the information transmission paradigm between different levels, with deformation field interactions between layers, which encourages the model to focus on the feature location matching process and perform registration in the correct direction. We do not utilize any complex mechanisms such as attention or ViT, keeping the model at its simplest form. The powerful deformable capability allows the model to perform volume registration directly and accurately without the need for affine registration. Experimental results show that the model achieves outstanding performance across four public datasets, including brain registration, lung registration, and abdominal multi-modal registration. The code will be published at https://github.com/Darlinglinlinlin/MOE_Morph
{"title":"MoE-Morph: Lightweight Pyramid Model With Heterogeneous Mixture of Experts for Deformable Medical Image Registration","authors":"Hao Lin;Yonghong Song;You Su;Yunfei Ma","doi":"10.1109/TMI.2025.3620406","DOIUrl":"10.1109/TMI.2025.3620406","url":null,"abstract":"Deformable image registration aims to achieve nonlinear alignment of image spaces by estimating dense displacement fields. It is widely used in clinical tasks such as surgical planning, assisted diagnosis, and surgical navigation. While efficient, deep learning registration methods often struggle with large, complex displacements. Pyramid-based approaches address this with a coarse-to-fine strategy, but their single-feature processing can lead to error accumulation. In this paper, we introduce a dense Mixture of Experts (MoE) pyramid registration model, using routing schemes and multiple heterogeneous experts to increase the width and flexibility of feature processing within a single layer. The collaboration among heterogeneous experts enables the model to retain more precise details and maintain greater feature freedom when dealing with complex displacements. We use only deformation fields as the information transmission paradigm between different levels, with deformation field interactions between layers, which encourages the model to focus on the feature location matching process and perform registration in the correct direction. We do not utilize any complex mechanisms such as attention or ViT, keeping the model at its simplest form. The powerful deformable capability allows the model to perform volume registration directly and accurately without the need for affine registration. Experimental results show that the model achieves outstanding performance across four public datasets, including brain registration, lung registration, and abdominal multi-modal registration. The code will be published at <uri>https://github.com/Darlinglinlinlin/MOE_Morph</uri>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"45 3","pages":"1251-1264"},"PeriodicalIF":0.0,"publicationDate":"2025-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145288378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-14DOI: 10.1109/TMI.2025.3621452
Junfei Hu;Tao Zhou;Kaiwen Huang;Yi Zhou;Haofeng Zhang;Boqiang Fan;Huazhu Fu
Few-Shot Learning (FSL) has garnered increasing attention for data-scarce scenarios, particularly in medical segmentation tasks where only a few labeled data points are available. Existing few-shot segmentation methods typically learn prototypes from support images and employ nearest-neighbor searching to segment query images. Despite notable progress, effectively learning prototypes for each class remains a challenging task to achieve promising results. In this paper, we propose an Uncertainty-guided Prototype Reliability Enhancement Network (UPRE-Net) for few-shot medical image segmentation. Specifically, we present a dual-support branch to maximize the extraction of information from support images through augmentation techniques. To enhance the reliability of prototypes, we propose an Uncertainty-guided Prototype Generation (UPG) module. Within the UPG module, we first extract both global and local prototypes for each class and then apply uncertainty measures to select the most informative prototypes. Additionally, to effectively combine the prediction results from the dual-support branch, we present a Reliable Dynamic Fusion (RDF) module. This module dynamically integrates the two prediction results to generate a more reliable output. Furthermore, we present an Uncertainty-induced Weighted Loss (UWL) to ensure that the model pays more attention to these regions with high uncertainty. Experiments on four benchmark medical image datasets demonstrate that our proposed model significantly outperforms state-of-the-art methods. The code will be released at https://github.com/taozh2017/UPRENet
{"title":"Uncertainty-Guided Prototype Reliability Enhancement Network for Few-Shot Medical Image Segmentation","authors":"Junfei Hu;Tao Zhou;Kaiwen Huang;Yi Zhou;Haofeng Zhang;Boqiang Fan;Huazhu Fu","doi":"10.1109/TMI.2025.3621452","DOIUrl":"10.1109/TMI.2025.3621452","url":null,"abstract":"Few-Shot Learning (FSL) has garnered increasing attention for data-scarce scenarios, particularly in medical segmentation tasks where only a few labeled data points are available. Existing few-shot segmentation methods typically learn prototypes from support images and employ nearest-neighbor searching to segment query images. Despite notable progress, effectively learning prototypes for each class remains a challenging task to achieve promising results. In this paper, we propose an Uncertainty-guided Prototype Reliability Enhancement Network (UPRE-Net) for few-shot medical image segmentation. Specifically, we present a dual-support branch to maximize the extraction of information from support images through augmentation techniques. To enhance the reliability of prototypes, we propose an Uncertainty-guided Prototype Generation (UPG) module. Within the UPG module, we first extract both global and local prototypes for each class and then apply uncertainty measures to select the most informative prototypes. Additionally, to effectively combine the prediction results from the dual-support branch, we present a Reliable Dynamic Fusion (RDF) module. This module dynamically integrates the two prediction results to generate a more reliable output. Furthermore, we present an Uncertainty-induced Weighted Loss (UWL) to ensure that the model pays more attention to these regions with high uncertainty. Experiments on four benchmark medical image datasets demonstrate that our proposed model significantly outperforms state-of-the-art methods. The code will be released at <uri>https://github.com/taozh2017/UPRENet</uri>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"45 3","pages":"1279-1290"},"PeriodicalIF":0.0,"publicationDate":"2025-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145288544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-13DOI: 10.1109/TMI.2025.3620714
Zhuotong Cai;Tianyi Zeng;Jiazhen Zhang;Eléonore V. Lieffrig;Kathryn Fontaine;Chenyu You;Enette Mae Revilla;James S. Duncan;Jingmin Xin;Yihuan Lu;John A. Onofrey
Head movement poses a significant challenge in brain positron emission tomography (PET) imaging, resulting in image artifacts and tracer uptake quantification inaccuracies. Effective head motion estimation and correction are crucial for precise quantitative image analysis and accurate diagnosis of neurological disorders. Hardware-based motion tracking (HMT) has limited applicability in real-world clinical practice. To overcome this limitation, we propose a deep-learning head motion correction approach with cross-attention (DL-HMC++) to predict rigid head motion from one-second 3D PET raw data. DL-HMC++ is trained in a supervised manner by leveraging existing dynamic PET scans with gold-standard motion measurements from external HMT. We evaluate DL-HMC++ on two PET scanners (HRRT and mCT) and four radiotracers (18F-FDG, 18F-FPEB, 11C-UCB-J, and 11C-LSN3172176) to demonstrate the effectiveness and generalization of the approach in large cohort PET studies. Quantitative and qualitative results demonstrate that DL-HMC++ consistently outperforms state-of-the-art data-driven motion estimation methods, producing motion-free images with clear delineation of brain structures and reduced motion artifacts that are indistinguishable from gold-standard HMT. Brain region of interest standard uptake value analysis exhibits average difference ratios between DL-HMC++ and gold-standard HMT to be $1.2pm 0.5$ % for HRRT and $0.5pm 0.2$ % for mCT. DL-HMC++ demonstrates the potential for data-driven PET head motion correction to remove the burden of HMT, making motion correction accessible to clinical populations beyond research settings. The code is available at https://github.com/maxxxxxxcai/DL-HMC-TMI
{"title":"PET Head Motion Estimation Using Supervised Deep Learning With Attention","authors":"Zhuotong Cai;Tianyi Zeng;Jiazhen Zhang;Eléonore V. Lieffrig;Kathryn Fontaine;Chenyu You;Enette Mae Revilla;James S. Duncan;Jingmin Xin;Yihuan Lu;John A. Onofrey","doi":"10.1109/TMI.2025.3620714","DOIUrl":"10.1109/TMI.2025.3620714","url":null,"abstract":"Head movement poses a significant challenge in brain positron emission tomography (PET) imaging, resulting in image artifacts and tracer uptake quantification inaccuracies. Effective head motion estimation and correction are crucial for precise quantitative image analysis and accurate diagnosis of neurological disorders. Hardware-based motion tracking (HMT) has limited applicability in real-world clinical practice. To overcome this limitation, we propose a deep-learning head motion correction approach with cross-attention (DL-HMC++) to predict rigid head motion from one-second 3D PET raw data. DL-HMC++ is trained in a supervised manner by leveraging existing dynamic PET scans with gold-standard motion measurements from external HMT. We evaluate DL-HMC++ on two PET scanners (HRRT and mCT) and four radiotracers (<sup>18</sup>F-FDG, <sup>18</sup>F-FPEB, <sup>11</sup>C-UCB-J, and <sup>11</sup>C-LSN3172176) to demonstrate the effectiveness and generalization of the approach in large cohort PET studies. Quantitative and qualitative results demonstrate that DL-HMC++ consistently outperforms state-of-the-art data-driven motion estimation methods, producing motion-free images with clear delineation of brain structures and reduced motion artifacts that are indistinguishable from gold-standard HMT. Brain region of interest standard uptake value analysis exhibits average difference ratios between DL-HMC++ and gold-standard HMT to be <inline-formula> <tex-math>$1.2pm 0.5$ </tex-math></inline-formula>% for HRRT and <inline-formula> <tex-math>$0.5pm 0.2$ </tex-math></inline-formula>% for mCT. DL-HMC++ demonstrates the potential for data-driven PET head motion correction to remove the burden of HMT, making motion correction accessible to clinical populations beyond research settings. The code is available at <uri>https://github.com/maxxxxxxcai/DL-HMC-TMI</uri>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"45 3","pages":"1265-1278"},"PeriodicalIF":0.0,"publicationDate":"2025-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145282738","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The presence of metallic implants introduces bright and dark streaks that appear in computed tomography (CT) images, degrading image quality and interfering with medical diagnosis. To reduce these artifacts, deep learning approaches have been applied for metal-corrupted restoration, which usually requires a large amount of simulated degraded-clean pairs for training. To achieve metal artifact reduction (MAR) without reference images, implicit neural representation (INR) has emerged and shown capabilities for image restoration in an unsupervised manner. However, existing INR methods for MAR usually treat the spatial coordinates independently and ignore their correlation, resulting in detail loss and artifacts remaining. In this paper, we propose an INR-based unsupervised MAR framework and design a High-order Line Attention Network to capture local contextual and geometric representations from X-rays, which maps the spatial coordinates into discrete linear attenuation coefficients of imaged objects for artifact-free CT image reconstruction. The second-order feature interaction can effectively improve the spectral bias problems and fit low and high-frequency details of real signals well. The proposed line-attention module with linear complexity can establish global relationships among spatial point tokens from sampled rays. To provide more local contextual information, a multiple local adjacent ray sampling strategy is adopted to compose several sub-fan beams with more context as a training batch. With the help of these components, the unsupervised MAR framework can approximate the implicit continuous function to estimate measurements and generate artifact-free CT images. Simulated and real experiments indicated that the proposed approach achieved superior MAR performance compared with other state-of-the-art methods.
{"title":"Unsupervised High-Order Implicit Neural Representation With Line Attention for Metal Artifact Reduction","authors":"Hongyu Chen;Shaoguang Huang;Wei He;Guangyi Yang;Hongyan Zhang","doi":"10.1109/TMI.2025.3620222","DOIUrl":"10.1109/TMI.2025.3620222","url":null,"abstract":"The presence of metallic implants introduces bright and dark streaks that appear in computed tomography (CT) images, degrading image quality and interfering with medical diagnosis. To reduce these artifacts, deep learning approaches have been applied for metal-corrupted restoration, which usually requires a large amount of simulated degraded-clean pairs for training. To achieve metal artifact reduction (MAR) without reference images, implicit neural representation (INR) has emerged and shown capabilities for image restoration in an unsupervised manner. However, existing INR methods for MAR usually treat the spatial coordinates independently and ignore their correlation, resulting in detail loss and artifacts remaining. In this paper, we propose an INR-based unsupervised MAR framework and design a High-order Line Attention Network to capture local contextual and geometric representations from X-rays, which maps the spatial coordinates into discrete linear attenuation coefficients of imaged objects for artifact-free CT image reconstruction. The second-order feature interaction can effectively improve the spectral bias problems and fit low and high-frequency details of real signals well. The proposed line-attention module with linear complexity can establish global relationships among spatial point tokens from sampled rays. To provide more local contextual information, a multiple local adjacent ray sampling strategy is adopted to compose several sub-fan beams with more context as a training batch. With the help of these components, the unsupervised MAR framework can approximate the implicit continuous function to estimate measurements and generate artifact-free CT images. Simulated and real experiments indicated that the proposed approach achieved superior MAR performance compared with other state-of-the-art methods.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"45 3","pages":"1237-1250"},"PeriodicalIF":0.0,"publicationDate":"2025-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145260753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}